It’s been 20 years since the term “data warehousing” was first coined. Since then, it has evolved into a mainstream activity for Fortune 1000 companies, with many undertaking one or more data warehousing projects over the last decade. But, even after 20 years in the marketplace, it’s evident that data warehousing is still misunderstood.
The goal of data warehousing is to create “one version of the truth.” But in these attempts, many companies often create a proliferation of data silos instead. Here are some classic examples:
Just because a company’s data warehousing efforts have resulted in more data silos doesn’t mean that data warehousing is a poor choice. What it might mean is that people simply do not understand or cannot get company-wide buy-in about how to do data warehousing right.
Why is there such a large gap between the goal of data warehousing and the reality of what a company has actually implemented? Quite often, it’s confusion.
People often confuse a “data warehouse” with “data warehousing .” Data warehousing encompasses a complete architecture and process; it’s not just having a single data warehouse . Data warehousing is the transformation of data to information, thereby enabling the business to examine its operations and performance. This task is accomplished by the staging and transformation of data from data sources, enabling the business to access and analyze information. The data stores may be persistent (stored on disk) or transient (using disk or memory). In addition, the workflow usually involves multiple data stores to support the staging and transformation of data into information such as operational data stores, data warehouses, data marts, online analytical processing cubes, files such as a flat file (comma-separated values extract, for example), XML data and even spreadsheets.
Any time you get data from the operational systems to perform reporting and analysis you are performing a data warehousing process. In the old days, it was called decision support; now the term is business intelligence. Data warehousing is what lies beneath the graphs and pivots presented by BI. In fact, BI is merely the presentation layer of the data warehousing architecture.
Too often data warehousing is associated merely with a data warehouse rather than the entire architecture and process. The problem is, when you narrow your focus to a single database, you lose the entire context of the staging of data. Data quality, consistency and integrity — not to mention being able to audit the data trail — is only achieved when theentire data staging (or data warehousing) architecture is considered.
A data warehousing program is much more than a data warehouse. With a single, narrow DW focus, separate efforts for ERP, CPM and BI recreate the DW architecture and create their own data silos. Ideally, an overall architectural view would let all these efforts leverage each other’s work and reuse tools, code, processes, data and standards. A company would be able to implement these systems more economically, with a higher ROI, lower overall operating and maintenance costs, but also strive toward the “single version of the truth.”