(This is part of our ongoing Series of Unfortunate Data Warehousing and Business Intelligence Events. Click for the complete series, so far.)
If data warehouses could talk, they might say “Don’t shoot me, I am just the messenger!”
People sometimes think the data warehouse (DW) has caused a problem with data quality or inconsistency, when in fact, the problem started in the enterprise’s source systems. But because the DW can be the only place where business people can view enterprise data, it becomes the messenger delivering the bad news. It’s an easy target to blame.
Although the DW program can’t be blamed for creating the data quality problems or inconsistency, you can blame it for not identifying and addressing the problems. Too many people in DW programs believe the following myths:
- The DW does not create or alter data, but passes on what is in the systems-of-record (SORs). A corollary is that any data quality or inconsistency problems need to be resolved by the SORs.
- The data is fine and does not have any significant quality issues (The SOR owners, IT or business people may state this.)
Don’t fall into these traps. Don’t assume anything about the state of the data. The areas where data quality and inconsistency problems lurk:
- Data quality within SOR applications may be “masked” by corrections made within reports or spreadsheets created from this data. The people who told you the data is fine might not even be aware of these “adjustments.”
- Data does not age well. Although data quality may be fine now, there’s always the chance that you’ll have problems or inconsistencies with the historical data. The problems can also arise when applications like predicative analytics need to use historical data.
- Data quality may be fine within each SOR application, but may be very inconsistent across applications. Many companies have master data inconsistency problems with product, customer and other dimensions that will not be apparent until the data is loaded into the enterprise DW.
The unfortunate events are that data quality and inconsistency problems will become evident in the enterprise DW and it will be blamed. Even if the DW program can prove the problems reside in the SORs it still will be blamed for being surprised and not proactively dealing with it. The worst case is that the DW program’s credibility will be dealt a blow from which it cannot recover.
What should be done?
Never assume the data quality or inconsistency problems don’t exist or that the DW program can ignore them. The steps you should undertake:
- Obtain data quality and consistency estimates and assumed metrics as part of a Service Level Agreement (SLA) when gathering the business and data requirements from the business and SOR application owners.
- Perform a data profiling and source systems analysis to determine the current state of data quality and consistency within and across SORs.
- Create a gap analysis between current state and desired state, i.e. data quality metrics in SLA.
- Propose data architecture and data integration tasks that are needed to bridge that gap. This should include timeline, tasks, resources and costs to implement and maintain an ongoing set of data quality processes.
- Negotiate with business and SOR application owners if effort or costs are too high to lower metrics within SLA.
You can’t fix a problem unless you can identify and admit that one exists. Data quality and inconsistency problems are fairly common, so don’t be surprised that they exist and don’t lose the DW credibility by being blindsided.
This is a great article as is the series. Congratulations and many thanks. I have enjoyed reading it and have learned a great deal.
Right now and all this year I have been actually in the middle of a couple of large DW/BI projects
It prompts me to think about this more as I face the issues and write something down in a posts on the subject myself. In one this week I cited your post as a source and acknowledgement to you for any new readership it brings you.
In it I made a statement:
“The Pareto theory to focuses on 20% of effort to get 80% of the value does not work in business intelligence projects.
There you need to get 100% of the data right and cleaned up, before it can be mapped and reported consistently.”
What do you think?
I now look forward to your next item in your series.