As a coach of my sons’ soccer teams, I’ve noticed something that occasionally holds back the ambitious players. Rather than focusing on basics, such as learning good ball control, they are seduced by the lure of scoring. But without good basic skills, they don’t have the ability to score.
The same thing happens in business. As a consultant, my job is to guide clients through all the stages of a business intelligence (BI)/data warehouse (DW) project. Ideally they avoid the common mistakes of a neophyte, but some businesses don’t have the patience to learn. They want to jump right into the game and end up learning the expensive way: by trial and error.
Three of the common mistakes beginners make in architecting data include:
BI/DW data architecture design, specifically DWs and data marts, should not be designed like the enterprise applications used as the source systems. But people design what they know.
That may be a quick way to design the DW, but it puts the emphasis on operational processing rather than BI. This can make the downstream data marts or reports add complex and time-consuming processing in order for the data to be consumable. And if there is only a DW, it is harder to build and deploy BI applications.
Many longstanding DWs have followed this approach. If you hear that the DW is too complex, makes it difficult to create BI queries and processes queries slowly, then the BI/DW team may have based its design on the enterprise application. In their defense, they may not have even realized it.
This approach was not designed from the business user’s BI and performance management requirements but rather from the transactional processing requirements of the source systems. It doesn’t take into account many of the best practices and methods that can be used to provide a truly significant business ROI.
Enterprise application vendors generally fall into this design trap when they initially build their DW, BI or corporate performance management (CPM) solutions because it reflects their frame of reference, but it can be just as damaging to your BI and CPM solutions.
The team designing the DW may fall into the “religious” war between having everything in third normal form (3NF) or in a dimensional model. As with the first challenge, the basic concern is that the design needs to be created based on business needs rather than an esoteric data modeling concept. Most DW environments need data stored in both 3NF and dimensional models, not either/or. The impact shifts data integration logic to the BI application, thus increasing time and costs to develop the BI solution.
The flip side is that the DW team decides everything should be stored in a dimensional model and not in a normalized manner. Although this is great for BI reports and analysis – hence, perfect for a data mart – if your DW needs to support history, has changing dimensions and you need to implement data integrity and quality, then the normalized form is the best practice under these conditions.
This mixed environment is really a DW utilizing a normalized form to store data historically and manage change data capture (CDC) with a data mart in a dimensional model to enable BI reporting and analysis. Nowadays these two approaches can be implemented within the same database using different logical areas, such as a schema in an Oracle environment.
I love it when someone tells me that they don’t need a data mart or summary tables, relying on their DW tables and the associated database technology to provide all the performance they need. They wonder why they need to store the data again when they bought the best database and a terrific hardware platform with a lot of memory and fast disk arrays.
The first thing I do is check out the BI reports and analysis. The data has to be summarized and aggregated to start most analysis. If it is not done in the DW/data mart, then the BI report code is going to do it, making it much more costly to develop and maintain.
In addition, you’ll also see some of these reports creating temporary summary tables to improve performance. Instead of doing it once for everyone in a data mart, each BI report does it over again. Not only is this time-consuming, but also, every time someone different does it, the more likely it is that the numbers are going to differ.
Even if there are no summary tables, there may be BI cubes or, even worse, data shadow systems built by the business users to make up for this shortcoming in the data architecture.
A little patience goes a long way when a business is designing its data architecture. In future columns I’ll discuss some of the pitfalls in data integration, BI, project management, production and ongoing support.