Data Quality Measurements

November 5, 2020 | Apnav Agrawal
Data Quality Measurements
Data Quality has always been a front runner challenge within the Data Management Industry. There are many data quality dimensions each company can have its priorities and have its definition of the data quality dimensions. How many Data quality dimensions are there? from what I researched many but if we go by the EDM Council’s definition there are about 7 Data Quality dimensions.
What are the 7 Data Quality Dimensions?
As per EDM Council, 7 Data Quality dimensions are as follows along with a one-liner definition, or you can get more details on the EDM Council website mentioned in the references.
  • Accuracy

    measures the precision of the data against authoritative sources, documents, and business rules.

  • Completeness

    measures the existence of required attributes in the population of the data records

  • Conformity

    measures if data is aligned to some internal, external, or industry standard

  • Consistency

    assures that data values, formats, and definition in one data population agree with those in another population

  • Coverage

    measures the breadth, depth, and availability of data that exists but missing from a data provider.

  • Timeliness

    measures how well content represents the current market & business conditions as well as if the content is available when truly needed i.e. content available post reports are submitted to Fed does not help.

  • Uniqueness

    measures if there is no duplication of attributes or records.

How to measure Data Quality against these Dimensions?
So far so good, but exactly how to measure data quality against these dimensions? How to calculate the score?
The answer is not simple, one size does not fit all. You have to look carefully at your needs and devise the methodology to get a data quality score for your organization/enterprise.
We have come up with our methods and formula, I will not go into details or nitty-gritty but we have followed a weighted approach and assigned each dimension its weight. The weight we are using for each dimension is given below;
#DQ DimensionsWeight
From our point of view, the accuracy of the data and timeliness is the most important data quality dimension and coverage is the least weighted. Coverage becomes important when we are looking to consolidate data providers where one provider solves all the problems.
Again, there is no thumb rule for this, you can come up with your methods and assign weights to individual dimensions. A data element can be 100% accurate when measured accuracy but can fail on any other dimension thus data quality should be measured for each dimension at the same time it should give a clear picture of the overall data quality.
With our years of experience in the industry and serving various financial clients, we understand how data quality can go a long way. We consider multiple factors need to measure our data quality such as the Data quality decay rate, in which stage of life cycle data is, when was it last modified even if it was modified one year ago does it still holds correct, is it a Critical Data Element vs Non-Critical Data Element, is it available when requested or what is time a data element is available since it was requested, what are the dependencies in getting the data from source to target, etc.
References: Data Quality Dimensions EDM Council
Solution Guide

Xoriant LEI Data Quality Monthly Report

LEI Quality Monthly Report summarizes Xoriant CDi’s Data Quality assessment on LEI population for a month, this accuracy test is based on some pre-defined criteria with respect to Name & country etc.

Subscribe for Monthly Report
(by clicking Subscribe you agree with our privacy policy)