Data quality management needs to be an integral part of the design and planning of your corporate performance management (CPM), business intelligence (BI) or data warehouse project — as I discussed in my previous column, “Data quality management: Follow the doctor’s orders.”
You need to gather the business requirements and priorities for data quality; determine the appropriate business logic to handle the various data quality conditions you encounter; incorporate data quality processing throughout your data lifecycle from data sourcing through information consumption; and regularly report on the data quality levels through dashboards.
The following five steps will help you weave data quality management into your enterprise data integration efforts.
1. Set a baseline for data quality management
First, determine and establish the baseline data quality level in your data source systems. Use data profiling software to analyze source systems for completeness and accuracy. Trying to perform data profiling manually, by coding queries on the source systems and comparing the answers to expected results, is laborious and rarely thorough. The good news is that you can find excellent data profiling software. The even better news is this functionality is increasingly being bundled with data integration software.
2. Verify the data — and don’t pass the buck!
Second, verify that the data extracted from source systems is the same data that was imported into your data warehouse. This seems obvious, but too often source system data is not as correct as people assume.
In cases like these, IT sometimes assumes it is the source system’s problem. But if you move the data into your CPM or BI solution, then you take responsibility for its quality. You can’t pass the buck. Too often, IT assumes their duty is simply to compare record count and sum checks of the loaded records with the source system to make sure they match. But unless you know — beyond a shadow of a doubt — the data is already correct and consistent, this minimalist quality check is not enough.
Remember the data profiling you did on your source systems? You need to do the same for your data warehouse after it has been loaded with data. This post-load audit is really essential to validate the data loading process.
3. Clean up the data
After verifying the data, it’s time to perform the data cleansing processes required by your business and industry.
For example, in businesses that sell to consumers, name and address matching is an important data cleansing function. Specific industries, such as healthcare and finance, will have their own best practices for data quality management.
To perform data cleansing, you’ll likely need to use specialized data quality software packages. Some of these can even be invoked during your enterprise data integration work. That’s because many data integration packages now includedata cleansing capabilities.
4. Make the data consistent
Now you need to make sure the data is consistent, so it can safely be used as reliable business information across an enterprise. Although individual departments or processes may have their own customer or product lists, the enterprise needs a single view of customers and products.
In data modeling terms, this process is called conforming dimensions. The enterprise applications market has bundled these processes as master data management (MDM) and customer data integration (CDI).
Establishing a single customer or product list may sound easy from a conceptual and technical perspective. However, the difficult roadblocks are the business and political issues that you may encounter when trying to obtain definition, agreement and responsibility for these lists. IT has to work collaboratively with the business to be successful in this area.
5. Transform data into information for business users
Finally, in order to create business information and enable business analytics, data needs to be filtered, transformed, enriched and aggregated. This last step transforms raw data into useful business information.
But beware — by this time, the IT group has often put all its energy into the enterprise data warehouse and assumes its job is done. Not so fast. Business users typically need filtering, summarization and aggregation in order to make data into useful information that they can use for business decision making. When left to their own devices, business users often create data shadow systems, which can be a headache for an enterprise.
These five steps are critical for CPM and BI projects. Transforming data into business information is the cornerstone of these kinds of projects. And ultimately, these systems are only as good as the quality of the information that resides in them.