Conventional wisdom states that you should build a data warehouse (DW) and then populate data marts from it. This data architecture is called a hub and spoke. The central premise of the hub and spoke is that the heavy-duty data integration activities are completed as you move the data from source systems into the DW. Once these data integration tasks are done, you can propagate or “franchise” the data from the DW into the data marts. Data franchising takes the DW data and packages it via filtering, aggregation and transformations into data usable by business intelligence (BI) tools and understandable to business users (see Figure 1). This approach works extremely well, but lately I am concerned because it appears this approach is being abandoned. I think this is happening because people have forgotten its fundamental aspects and why it works better than the alternatives. Due to advances in databases, storage, network and server capabilities, it is often more effective to deploy the hub and spoke all on the same database or instance rather than store the data mart “spokes” on separate servers and databases.
Figure 1: Hub and Spoke
Physically storing all the data on the same database instance works well, but often when companies deploy this architecture, they forget why they split the data into a DW and data marts. What often happens is the DW and data mart-oriented tables get mixed together or, worse, the data marts tables or cubes do not get built at all. Instead, people start trying to use the DW tables for all reporting and analysis needs. This gets us back to the old-school idea of a central DW trying to be all things for all purposes and generally getting none of it right.
Data warehousing has evolved with a DW becoming the data integration and data distribution hub feeding all the downstream feeds (spokes) to support BI applications. The BI applications need the data they use to be recast from their DW schema into a dimensional model, OLAP cube or a denormalized structure. In addition, these downstream feeds from the hub into the spokes or data marts also apply filtering, aggregation and business transformations – I’ll refer to this as data franchising, which a business group or business process might need to be done across many reports or analytical applications. It’s far more efficient to perform this data franchising once for a business group or a business process. Not only does this reduce the time to develop and maintain individual reports, but it also results in a significant reduction in errors and report-processing time because the franchising is prebuilt into the data marts.
People are overusing materialized views instead of creating the data marts of the hub-and-spoke model. Yes, it’s tempting to skip that extract, transform and load (ETL) step and just let the database create the equivalent of the data mart via materialized views, but it’s a simplistic approach and it doesn’t work. Too often, developers do not really understand the data transformation needs of the business and assume that what they can do in materialized views is sufficient. When they oversimplify this way, they pay the price of a much more expansive and expensive reporting and BI tool development cycle. That’s because the BI reports have to implement the transformations that the data franchising processes would have done. And they have to do it every time a report is developed. It also adds to maintenance costs. When developers do try to implement the data franchising process in the materialized views, it takes too long to create them. Just because you can do something technically doesn’t mean that it’s what your business needs. Lastly, I’m concerned about lost opportunities. The data marts franchised from the DW start the process of packaging data for business consumption. But why end there? Why not extend this approach to have the data marts become hubs for creating OLAP cubes or submarts for providing performance management, reporting and business analytics?
Figure 2: Hub and Spoke Extended
Further extending the hub-and-spoke model (Figure 2) can be an effective method to shift the filtering, aggregation and business transformations that occur in BI reports into the data franchising or ETL process. Reusing transformations, as with hub and spoke, is more efficient and reduces errors. Hub and spoke still has a lot to offer, so let’s not allow it to die out yet.