In a previous article, I discussed the importance of adhering to product and technology standards to increase both the ROI of your IT investments and the value of your IT organization. In this issue, I continue the architecture discussion to include the architecture of your entire warehouse project.
Architecture standards form the blueprint of what you are going to build and howyou are going to build it. They are not limited to the data warehouse itself. These blueprints help determine the people, skills and resources needed. They also help in planning the project tasks, timeline, costs and risks.
Just as you need a blueprint in building a house, you need a blueprint in building data warehousing (DW) and business intelligence (BI) solutions. This framework needs sub-architectures for data, information, technology and products.
You should have a data warehouse for storing enterprise-wide data, including historical, detailed transactional data and dimensions or reference data such as customers, products and employees. There will also be data marts, built from the data warehouse, that contain domain-specific data oriented toward business users in a specific business function, division, geography, product line, et cetera. The DW is generally normalized, while the data marts are in a star (or snowflake) schema or other format linked to the BI tools being used.
It is important to note that the data warehouse and data marts may actually be schemas in one database instance, separate databases or even distributed networked databases. The hub-and-spoke architecture is a logical design, while the actual physical implementation should be governed by your data warehouse architecture standards and procedures.
Further extension of the hub-and-spoke best practice would involve creating additional spokes from the original spokes (like branches on a tree) to accommodate the use of cubes associated with online analytical processing (OLAP) tools or pivot tables with spreadsheets. The data should flow from the DW to the data marts to the cubes, with the IT group being able to show the workflow and lineage of the data. This is how you obtain consistent numbers across your business user reports or analysis, provide transparency and traceability for the business, and improve the IT group’s productivity.
The use of an extract, transform and load (ETL) tool provides several significant benefits. First, it will be your point of documentation and version control for your data sourcing and data transformation processes. Very few IT groups document their custom-coded extracts, and even fewer maintain that documentation. The current major ETL tools on the market will be able to generate your documentation.
Second, workflow, with its dependencies and error or exception handling, is built into major ETL tools. If you “roll your own” ETL workflow, you will have to use or build tools to create the workflow, schedule the processes, manage job scheduling and processing, and control code management. “Roll-your-own” solutions typically fall short of recreating what is built into robust ETL tools.
Finally, it’s simply more productive to generate and maintain extracts with ETL tools than it is with custom coding. Just make sure that your people are trained and experienced in the ETL tool and that the scope of the project justifies the investment.
The data modeling tool is another point of documentation and version control. In addition, data modeling tools have interfaces that feed your data models (schemas) into the ETL tool, database and BI tools, thus improving productivity, increasing consistency and reducing errors or misinterpretations between members of the team working with different tools.
Your BI and DW environments will only be as solid as their architecture. If the foundation is poorly designed, it will only encourage the creation of more information silos.