Data Integration Depends on Vigilant Data Modeling

Data Standards
October 12, 2004
There’s Nothing Funny about “Funny” Data
January 28, 2005
Show all

Data Integration Depends on Vigilant Data Modeling

published in Information ManagementKey components of the data integration framework (DIF) are the software tools used to create, deploy and maintain it. This month, we discuss tools used for data modeling. In subsequent months, we’ll discuss tools used for data profiling, data preparation, ETL (extract, transform and load), data franchising, meta data interchange and data management.

The analogy most often used when discussing the need for a data modeling tool is that you would not build a house without a blueprint, so why would you build an application or database without one? Data models become your blueprints for developing your data warehouse (DW), data marts (DMs), business intelligence (BI) application or corporate performance management (CPM) system. You need these blueprints to give you a clear picture of the business users’ requirements and processes. They also show business users that you understand what they asked for. You need these blueprints to design your databases, ETL processes and all the business logic (i.e., application code) used in your reporting and analytic applications. Without the blueprints, your application and database are destined to become the next lame duck legacy system that business users and project team members want to replace.

Let’s Be Honest

Everyone starts by creating a data model. They may use a data modeling tool, a graphics-oriented tool or just sketch it out on paper. Whatever method they use, I ask these questions: What did you model? Was the model used throughout your processes? Is the data model part of an ongoing effort?

Best practice is to use data modeling tools throughout your entire data integration project. From a data perspective, that means using the tools with your source systems, through your DW and DMs, and finally with your reporting and analytic applications – that includes anywhere that you store the data. Many efforts model the DW and DM, but fail to model any staging areas or the final destination of the data that the business user consumes. How can you follow the various states of the data if you have only a couple of the pit stops that it makes along the way?

Similar to tracing the data storage, your data models need to be tied to the transformation processing that occurs on the data when it travels from source systems to the business users who consume it. This implies that your data models are tied to both the ETL and BI applications that you have created. You need the workflow of data to be able to determine how, when and ultimately why the data was transformed for the business users.

Finally, after answering these questions honestly, ask yourself if your data models have become shelfware. Once you created the logical data models, were they updated on an ongoing basis during development and after deployment as requirements changed or knowledge increased? Did you use the logical data model as the basis for the physical data models? Have all database changes been reflected in your data models?

Use Your Data Models to Improve Communications

Data models facilitate communications with two groups of people. First, you use your data models to facilitate communications between your business users and development team regarding their business requirements. Logical data models and associated data definitions and business rules help business users communicate and validate their requirements in the context of the data. The data model provides a feedback mechanism that helps verify business subject areas, data definitions, business rules and information products, such as reports and analytics that business users require. A word of advice: business users do not need to understand the details or semantics of data modeling. You can present the models, definitions and rules without confusing them with the nomenclature of data modeling.

Data models also help foster communications with the database administrators (DBAs). DBAs are often in an infrastructure group that works with but is not formally part of the development group. DBAs need the development team to provide them with the specifications to create databases. The conversion of the logical data model to the physical data model is the process to create the specifications. Use a data modeling tool that can help the DBAs in their tasks, but also allows them the flexibility to apply their expertise to the process. Use data modeling tools that generate the initial physical data model, but also allow the DBAs to modify it. Establish two key best practices: the DBAs must document the physical data model (schema) and they must update it when they make any changes.

Data Modeling Tool Functionality

The key functionality needed to enable the best practices you should adopt include:

  • Support for conceptual, logical and physical data models
  • Connectivity to a variety of data sources to examine their schemas
  • Reverse-engineering support for creating data models when none exist
  • Creation of a data definition language (DDL) to instantiate your physical data models
  • Interfaces with database, ETL and BI tools to exchange meta data
  • Ability to generate reports, on the Web and in PDF format, to business users, developers and DBAs

The features that facilitate team development and foster an ongoing effort include:

  • Version control to track and document changes,
  • Model management to combining multiple models or subsetting (and later reincorporation) of a model to allow individuals along parallel paths
  • Multiple user support along with appropriate security

Finally, if an enterprise is performing business process modeling, it needs an interconnection between the tools used to support those efforts and the data modeling tools.

Best practices for application development are to use source code management tools and processes along with release management. That only works effectively if you also manage the data models along with the application code. Too often, they are not managed together, resulting in errors when one side changes. In DW, BI and corporate performance management projects, your data modeling, database, ETL and BI tools need to exchange meta data and support versioning of both the code and associated data. A release is a snapshot of your application code and data with accompanying documentation.

The key to success is managing your data models just as you do your application source code. You need the data models documented and versioned to understand what was used, when it was used and why it was used. With today’s regulatory environment, it is more important than ever to know where your data is, how it got there and what the impact of any changes will be. Best practices call for treating data models as assets and managing them as such. That is why it is called data management.

Leave a Reply

Your email address will not be published. Required fields are marked *