Data Quality has always been a front runner challenge within the Data Management Industry. There are many data quality dimensions each company can have its priorities and have its definition of the data quality dimensions. How many Data quality dimensions are there? from what I researched many but if we go by the EDM Council’s definition there are about 7 Data Quality dimensions.
As per EDM Council, 7 Data Quality dimensions are as follows along with a one-liner definition, or you can get more details on the EDM Council website mentioned in the references.
- Accuracy
measures the precision of the data against authoritative sources, documents, and business rules.
- Completeness
measures the existence of required attributes in the population of the data records
- Conformity
measures if data is aligned to some internal, external, or industry standard
- Consistency
assures that data values, formats, and definition in one data population agree with those in another population
- Coverage
measures the breadth, depth, and availability of data that exists but missing from a data provider.
- Timeliness
measures how well content represents the current market & business conditions as well as if the content is available when truly needed i.e. content available post reports are submitted to Fed does not help.
- Uniqueness
measures if there is no duplication of attributes or records.
So far so good, but exactly how to measure data quality against these dimensions? How to calculate the score?
The answer is not simple, one size does not fit all. You have to look carefully at your needs and devise the methodology to get a data quality score for your organization/enterprise.
We have come up with our methods and formula, I will not go into details or nitty-gritty but we have followed a weighted approach and assigned each dimension its weight. The weight we are using for each dimension is given below;
# | DQ Dimensions | Weight |
---|
1 | Accuracy | 20 |
2 | Completeness | 15 |
3 | Conformity | 15 |
4 | Consistency | 10 |
5 | Coverage | 5 |
6 | Timeliness | 20 |
7 | Uniqueness | 15 |
| Overall | 100 |
---|
From our point of view, the accuracy of the data and timeliness is the most important data quality dimension and coverage is the least weighted. Coverage becomes important when we are looking to consolidate data providers where one provider solves all the problems.
Again, there is no thumb rule for this, you can come up with your methods and assign weights to individual dimensions. A data element can be 100% accurate when measured accuracy but can fail on any other dimension thus data quality should be measured for each dimension at the same time it should give a clear picture of the overall data quality.
With our years of experience in the industry and serving various financial clients, we understand how data quality can go a long way. We consider multiple factors need to measure our data quality such as the Data quality decay rate, in which stage of life cycle data is, when was it last modified even if it was modified one year ago does it still holds correct, is it a Critical Data Element vs Non-Critical Data Element, is it available when requested or what is time a data element is available since it was requested, what are the dependencies in getting the data from source to target, etc.