If you’re trying to architect or build a cost-effective, productive business intelligence (BI) program, it’s important that you thoroughly understand the data franchising component of the data integration framework (DIF) information architecture. (For a quick DIF recap, see my column, “Understanding the DIF Information Architecture,” in the December 2003 issue of DM Review.
As shown in Figure 1, data franchising takes place after data preparation – after data integration has been done to pull data from source systems into a data warehouse (DW) or an operational data store (ODS).
You already know that your DW is all about storing enterprise-wide data, both current and historical, in a manner that is efficient for IT to define, manage and control. The purpose of data franchising is to “package” data so businesspeople can understand it and use it. Granted, this creates data that is redundant with what’s in the DW, but it is a controlled redundancy that is created from the DW. It’s often called a dependent data mart.
Data packaging can help your BI tools access the data used for reporting and analysis more effectively and efficiently. Although your BI tool may be able to point at almost any data source and query it, the reality is that the data needs to be packaged appropriately. Otherwise, the logic needed in a BI report, for example, is too complex to widely deploy or maintain. In addition, if you do not package the data in a data mart, your BI reports and online analysis can suffer from slow performance because the complex logic has to be performed countless times in each report rather than once when you first stage the data mart.
Often, the data is franchised from a DW or ODS into data marts or “cubes.”1Don’t jump to the conclusion that this is only useful if you have a hub-and-spoke data architecture with separate physical DWs and data marts. The DIF architecture, with regards to ODSs, DWs and data marts, is a logical representation of the actual physical data storage. The ODS, DW and data marts may actually all be colocated in one physical database with each logical construct presented by a set of tables. In fact, materialized views may be used to instantiate the data marts. The data mart cubes are generally physically separate; but depending on the type of OLAP product used, they too may be virtual constructs.
Data franchising involves data filtering, reorganizing, transforming, summarizing, aggregating and storing data generally extracted from a DW or ODS. Sometimes, external data sources are introduced either to increase dimensionality or to add attributes for analysis that may not have been in the original source systems. For example, a company might purchase or use external data for customer demographics or credit scoring.
Of all the DIF components, data franchising is the most likely to be driven by tools — in this case, BI tools that you deploy in your DIF architecture. To avoid repetitive queries and complex transformations or aggregations built into each BI report or executed with each query generated from BI analysis, it often makes sense to stage data for further analysis. When data is not staged (see the example to follow), you have to repeat the same steps for every report or analysis. Generally, business groups need the same data analysis over and over again, so it’s a lot more efficient to just stage it once. Business users can refine the data in their reports or BI analysis after it has been staged into a data mart or cube.
There are two problems with doing complex transformations in a BI report. First, it becomes expensive to maintain. Second, and more important, the reports get out of sync with one another and have conflicting numbers. As soon as the business users see inconsistent numbers in their reports, they’ll question their integrity. Then, someone has to do a costly reconciliation to determine which numbers are right.
Tool-specific data storage depends on the type of data schema with which the tool works. To take advantage of the slice-and-dice features and impressive speed of some OLAP tools, you may need a physical cube built in their database-specific format. BI tools often need a specific type of schema, such as a star or snowflake, to perform analysis. Also, some BI tools perform better with pre-built aggregations. Data franchising enables BI tools to perform reporting and analysis functions the way they were meant to by incorporating their needs into the data staging and formatting processes.
To illustrate, let’s look at a simplified example that includes only one of the many dimensions that business users need. Consider a company with global sales data that wants to franchise a U.S. sales-by-product-line data mart. The data franchising process would begin by filtering only U.S. sales records and associating them with product information. This would reorganize the data so that sales transactions (facts) would be coupled (denormalized) with product-related information (dimensions). There may be certain business rules (transactions) that would associate sales transactions with specific business units, product categories or some other grouping. Data may be stored in an aggregated or summarized state to improve performance, or that might be done later when a report or analysis is generated.
Many times, data franchising processes are repeated. For example, the U.S. sales data mart may be generated and then various cubes created for certain vice presidents, product line managers and marketing analysts. In the past, analysts did much of this data franchising by pulling data into a spreadsheet and transforming, aggregating and summarizing it manually or by creating macro filters. Currently, it is possible to automate, document and manage most of the data franchising used in a corporation. The more data is automated and managed, the more information quality and consistency is improved. Business-user productivity improves dramatically when work is shifted from data gathering and messaging to information analysis and action.
Data franchising is the aggregation, summarization and formularization of data for use with BI tools. This process overlaps with data preparation with regard to extracting the data from a data source and reformulating it into a data mart. However, the data franchising component does not need to perform many of the more sophisticated data integration functions of data preparation processes.
You may find yourself with a target database requiring special processing that a general-purpose ETL tool cannot provide. Examples are multidimensional online analytical (MOLAP) BI tools that store their data in cubes and relational online analytical (ROLAP) BI tools that store their data in a relational database but need it set up a specific way in order to work. Often, an OLAP BI tool includes ETL tools or utilities that move data into its databases so that its tool can access them. In other cases, you may need to create custom code.
Data franchising is like the poor stepchild of data preparation. People focus on their multiterabyte size data warehouse and assume that data integration is complete. However, data is not transformed into information until the business user obtains it, examines it and acts upon it. The data supply chain begins with your source systems and ends with your business users. It’s with your users that data preparation plays such a crucial role – determining if the information is truly useful to the business.