In an earlier column, I provided an overview of the five data integration framework (DIF) components. This month, I will examine the first component –information architecture. Future columns will discuss the processes, standards, tools, and resources and skills.
Does information architecture make you think of years of effort without accomplishing anything useful? Nothing could be farther from the truth. You need blueprints for building your house, and you need a blueprint for creating an information asset.
A solid information architecture can transform scattered data into information that your business uses to operate and plan for the future. Data is gathered, transformed using business rules and technical conversions, staged in databases and made available to business users to report and analyze. Data flows from creation through transformation to information, just as materials flow from the supplier (enterprise resource planning systems, transaction systems), to factory (data integration), to warehouses (data warehouses) and finally to retail stores (data marts and business analytic applications).
The information architecture depicted in Figure 1 illustrates the processes that transform data into useful information as it progresses from data sources to the business intelligence (BI) tools where it is used in reports by business users. It is a conventional hub-and-spoke architecture, simplified to show a single data warehouse and multiple data marts. Subsequent columns will examine more complex information architectures that include cubes, sub-marts, federated data marts, federated data warehouses, operational data stores (ODSs), closed-loop systems and analytic applications.
Figure 1: DIF Information Architecture
Traditionally, data sources were just transaction systems, legacy applications and enterprise resource planning (ERP) systems. Now, data sources can include front-office systems such as customer relationship management (CRM) and Web analytics, business analytic applications, data from enterprise partners such as suppliers and customers, and forecasting and budgeting systems. Although the proliferation of different data sources makes things seem more complicated, it doesn’t affect the underlying structure of the architecture. It’s important that all these different sources of data be included in the architecture to prevent disparate silos of information.
Data preparation includes gathering, reformatting, consolidating, transforming, cleansing and storing data – both in staging areas and in the data warehouse. During this step, unforeseen data quality problems can suddenly arise. If you’re using business analytics, your data preparation process may need to handle additional dimensions, measures or metrics.
Data preparation includes three key steps: a source system data conversion, data cleansing assurance and a business transformation.
First, the source system conversions convert the source system data into the data warehouse target format. It may be necessary to consolidate multiple data sources to provide single, consistent definitions. The consolidation involves validation by querying dimensions or reference files to determine how to conform the data and provide referential integrity (in database terms).
The next step, data cleansing, analyzes the data beyond the record-by-record checking. Name-and-address cleansing and householding are two examples.
Finally, business transformations turn data into business information. They place data in the context of a business state such as a type of inventory, customer order or medical procedure. Transformation might also attach dimensional context (foreign keys) to data such as a region, business division, or product or service grouping.
Data franchising is the process of reconstructing data into information for reporting and analysis with business intelligence (BI) tools. Often, data is franchised from a data warehouse into data marts or cubes. The data is filtered, reorganized, transformed, summarized/aggregated and stored.
For example, suppose you have global sales data and need to franchise a U.S. sales-by-product-line data mart. Data franchising would filter only U.S. sales records and associate them with product information. This would reorganize the data in order to couple sales transactions with product-related information.
Information access and analytics are the processes that retrieve the data stored in the information architecture and deliver it to business users in the form of information products. These can take the form of reports, spreadsheets, alerts, graphics, analytic applications and slice-and- dice cubes. BI tools, spreadsheets and analytic applications, sometimes through portals, enable these processes.
Data and meta data management are processes behind the scenes that pass data and meta data between the other processes. It’s important to follow standards and document procedures during this phase to ensure that it is easy for IT and business users to maintain and understand the data.
Next month we’ll address the next component of the DIF – the processes used to gather, consolidate, transform, cleanse and aggregate data and meta data. In the meantime, please feel free to e-mail me with your questions and feedback at firstname.lastname@example.org.