Reinventing the BI Solution You Already Have – A Series of Unfortunate Data Warehousing/Business Intelligence Events #1

The Extended Enterprise: Federated Data Models
June 16, 2010
Yesterday’s DM Radio broadcast on data federation
June 18, 2010
Show all

Reinventing the BI Solution You Already Have – A Series of Unfortunate Data Warehousing/Business Intelligence Events #1

(This is part of our ongoing Series of Unfortunate Data Warehousing and Business Intelligence Events. Click for the complete series, so far.)

Series_unfortunate A fundamental flaw of many business intelligence solutions is recreating what the company is already using for reporting and analysis. This takes one of two paths:

1)    The data warehouse is built using essentially the source systems’ data model. It may be “cleaned up” with new names and use only a subset of the source data, but it is really just a retread of what you already have.

It does shift your reporting from the source systems to a DW, but you have not taken advantage of the advanced dimensional modeling techniques that have grown to provide superior analytic performance. An entity-relationship (ER) model or third normal form (3NF) is indeed best practice for transactional systems, but not for business intelligence or data integration. IT knows 3NF and hence figures that is what they should do; many experienced practitioners starting off using 3NF and they continue to do so.

The cost is longer development times and more labor-intensive maintenance. It also performs slower than best practice design, so many companies compensate by buying more infrastructure such as CPUs, memory, storage and network bandwidth. If you sell or resell hardware then using this design is fine, but for the consumers of BI solutions you should try another way.

2)    The other end of the spectrum from 3NF is recreating your current reporting solutions, often data shadow systems or spreadmarts, that basically flatten out the data. It is easy to see why people recreate the spreadsheets the business people are using for reporting, but it leads to inflexible reports that require more and more reports to be built every time the business changes or expands their reporting requirements.

The fundamental concept behind dimensional modeling and OLAP (online analytical processing) design was to provide business people with the flexibility in their reporting and analysis. This is how a company can enable business self-service reporting rather than have a large group of BI developers designing, building and maintaining dozens or hundreds of custom reports.

Just as with 3NF the “flat world” approach to data mart design results in a much higher TCO and the huge queue of report development one sees at many BI implementations. Most assume that queue and costs come with the territory, but it does not have to be that way.

I will follow up with more unfortunate events I have observed. Feel free to e-mail with the unfortunate events you have seen.

6 Comments

  1. Laura says:

    Very good an accurate post.
    i would mention, though, that data warehouses are extremely difficult to develop/maintain if they
    rely on traditional RDBMS platforms such as SQL Server, Oracle and MySQL. Not as painful when relying on column-based database technology.
    Unlike in the past, there are many companies doing what you would call “real BI” (=not Excel) without a defacto data warehouse.
    you may want to check out companies like sisense (http://www.sisense.com) and Green Plum (http://www.greenplum.com)
    I would also mention that there are alternatives to OLAP for achieving multi-dimensional analysis. 🙂

  2. doug says:

    A problem (similar to case #1 above) is to base the warehouse data model on one operational system in an environment with multiple sources. I work with a DW that is fed by four different systems for the same business function. Instead of deciding what was wanted as a target and working back, one system was selected and pushed forward. The DW works reasonably well for that system. I keep pain-killers handy for when the other three need maintenance.

  3. Well don’t know whats going on but its not a Good way to do this. in my opinion we have to look again about this issue

  4. Rick Sherman says:

    Doug,
    The pain killers will probably be needed at some point. Selecting one source system as the target schema was often used when people were building ODS (operational data stores). It certainly a quicker way to to build a “DW” but generally lacks the robustness needed for changes that may occur in the multiple SORs and the inevitable changes in organization, products/services and other dimensions.
    In addition, this approach may cause more downstream work to support truly enterprise-wide analytics.
    An ODS is often a 1st generation approach.
    Rick

  5. Rick Sherman says:

    Laura,
    Thanks for the feedback. Certainly columnar databases and other OLAP alternatives have come a long way in being able to handle the scale that many Enterprise DWs require.
    I generally position the non-relational alternatives for use in data marts or as a single enterprise data mart. These products were built for analytics and do a great job there.
    Generally we recommend two layers (schemas/databases) for an enterprise: DW for data integration & distribution and a Data Mart(s) layer for analytics & reporting. Generally a one-size fits all either reduces the flexibility of the DW to absorb change and present an enterprise view or the DM to provide business-specific or process-specific metrics.
    Rick

Leave a Reply

Your email address will not be published. Required fields are marked *