People involved in business intelligence (BI) and data warehousing (DW) projects are very familiar with terms such as facts, dimensions, attributes, surrogate keys and slowly changing dimensions. But knowing the terms is not enough. Increasingly, there appears to be a significant disconnect between how BI or DW systems are actually built versus the best practices described in books and articles. Clients point out that the concepts in some books and articles are esoteric because they are not connected to real-life situations. They teach the how but not the why. Two best practices that people think they are implementing but often fall short of are dimensional modeling and the hub-and-spoke architecture.
When you don’t know why you should design a DW a certain way, it’s easy to make the common mistake of designing DW and BI systems to look like your source systems.
Granted, the target systems (DW, data marts and cubes) contain the data obtained from the source systems, so it does make sense that the content is similar. Similar content is not the problem; a similar physical design, however, is. Rather than applying the best practice design techniques they’ve learned to support DW and BI, people copy the underlying enterprise applications’ designs to the DW. This process propagates the limitations of the enterprise applications for reporting and analysis without taking advantage of DW best practices such as dimensional models or the hub-and-spoke architecture. History keeps repeating itself, resulting in frustrated businesspeople who aren’t getting the information they need. Meanwhile, IT is wondering why BI is not yet pervasive.
Enterprise applications, such as enterprise resource planning (ERP), customer relationship management (CRM) and supply chain management (SCM), are built for data capture and processing transactions rather than for reporting and analytics. When you bring across your transaction system’s design into your DW, you significantly constrain the use and performance of your DW. The top enterprise application vendors must agree, as they have developed DW and BI systems beyond their underlying applications to enable reporting and analytics. They have done this because of the need for the new data design, i.e., DW, data marts and cubes, to support analytics.
Reporting and analytics require two data areas: hub and spoke. Most people understand the need for and benefits of building a data warehouse – the hub. You gather all your source data, cleanse it and make it consistent. You store your historical data once in the data warehouse and then distribute that data many times throughout your enterprise. “Create once and use many times” is the mantra you should follow in creating applications, data and services. It is efficient, and it is the most productive approach to support reporting and analysis. Most enterprises accept this as a best practice.
But it is equally important to create data marts or cubes from the data warehouse to enable reporting and analysis. These are the spokes. The benefit is that it is more efficient to create data marts oriented toward a business process or a group of businesspeople than to continually reinvent the wheel every time you create a report. Once again, I am talking about “create once and use many times.”
Although most people have bought into the DW benefits, they aren’t yet convinced that they need data marts or cubes. They carefully plan the DW, but they continually create downstream data stores in an ad hoc, unplanned manner. Rather than determining what should be shared amongst a group of businesspeople or reports, the current approach used by many enterprises is to create reports that do it all. This results in complex reports that are harder to design and maintain. In addition, the one-off approach to reporting often results in the business spending a lot of time reconciling data between approaches.
Part of the reason people don’t build data marts is simply because they do not fully understand the architectures that support BI/DW and performance management. They especially do not understand the whys. I often get comments from customers and students that the books they read are great at explaining concepts but fall short on why to follow these best practices.
The real reason for using dimensional models and creating data marts and cubes is simple: it is the most productive approach toward implementing reporting and analytics for the business. Failing to build the spokes (data marts and cubes) means shifting a huge burden onto the BI and performance management application layers. Remember, you want to build a DW solution once so you can use it many times. Build common data filtering, transformations and aggregations into the data mart or cubes, and you will not have to repeatedly build them into each BI report. Ignoring the data marts or cubes means you significantly increase the total cost of ownership for developing and maintaining BI applications. But the more serious problem with shifting the data burden to BI reports is the inconsistencies that usually crop up when reports overlap each other. This causes the business to lose faith in the data and to spend too much of its time reconciling the data.
When working on DW, BI and performance management projects, you need to know the how and why of the things you have learned and the best practices in the industry today. You may always follow the advice you read, but it is good to know why so you can make an informed decision.