The thought of meta data management doesn’t exactly inspire whoops of joy from the average IT manager. However, like flossing, it’s one of those things you’ve just got to do – and for good reason.
Done right, meta data management makes it easier to manage and understand data – an especially good thing in this age of Sarbanes-Oxley. It improves data quality and makes it auditable. So why hasn’t everyone done it?
Let’s divide the world into three camps regarding meta data: those who ignore it, the zealots and the clueless. Plenty of companies have devoted significant time, resources and money to meta data management with the zeal to make it successful. However, for the clueless which is most companies meta data management needs to start small, produce value and then expand. For these firms, where do you start?
The classic definition of meta data is data about data. Meta data is the description of the data as it created, transformed, stored, accessed and consumed in the data integration framework. Businesspeople need to know what the data represents – where it came from, how it was transformed and what it means. IT people need to know what happened to the data from the point of capture through its consumption by the business in their reports and analysis.
There are two types of meta data: technical and business. This distinction has been poorly understood and has caused much confusion, especially with IT and software vendors. Technical meta data is the description of data as it is processed by software tools. Databases, for example, need to define columns (format, size, etc.), tables and indexes; extract, transform and load (ETL) tools need to define fields, mappings between source and targets, transformations and workflows; and business intelligence (BI) tools need to describe fields and reports. All of this meta data is used to enable the software tools (not people) to understand and process data.
Business meta data, in contrast, is the description of information from the business perspective (e.g., the business context of the inventory turns, weekly sales or budget variance reports). Some of the meta data that describes a report is technical – such as field size and type. However, most of the data the businessperson cares about is not used by the software tools. BI tools have implemented “semantic layers” where text can be associated with fields to allow the input of business descriptions. While helpful, this is not nearly extensive enough to encapsulate the full business meta data needed.
Managing meta data can be a dirty job, but a few guidelines can smooth out the process.
First, manage the scope. If you try to “boil the ocean” you will get burned. Unless budget is not an issue, constrain your goals in the beginning. This will keep you from spending too much time and money, wasting resources and delaying the project. You will probably only get one chance, so start small.
Second, address the cultural and political issues. Know who owns the data, who has time to define it and if management really is committed to the project. It is nice to get everyone excited about meta data management, but is it on someone’s priority list? Is it considered someone’s real job?
Establish standards and processes that ingrain meta data management into business and IT people’s jobs. Standards and processes need to be part of what we do, no excuses. The incentive of Sarbanes-Oxley and other regulations should spur management to back a reasonable meta data initiative.
Finally, don’t count on technology to be a silver bullet for meta data management. It’s dirty, it’s ugly and you typically need to piece together a solution. In addition, a lot of the work involves talking with businesspeople about what the data means and finding where the meta data is. Don’t be scared away. Keep the scope small enough that your technology can handle it without significant custom software development.
There are several levels of meta data that can be managed:
Data element descriptions: Documenting data element descriptions should be part of the requirements gathering of any BI or data warehouse (DW) project. As such, you have already done the heavy lifting for this type of meta data. This is where most meta data initiatives start, and typically a data dictionary is created. Although the data elements are initially documented in the project’s business requirements document, they are usually stored and queried from a data modeling tool or BI tool accessing a customized data dictionary stored in a relational database. More sophisticated projects may have purchased and deployed a meta data repository.
ETL source-to-target mapping: If you have used an ETL tool rather than custom code, then this should be an easy step. Your ETL tool should have a catalog that can be used for documentation and for examining the ETL workflow. Data lineage, at least within the context of your ETL code, can be examined and queried both to determine how a data element was transformed, as well as what the impact of a change would be on that data element or workflow. This meta data, although of interest to business, is generally an IT tool and improves their productivity.
BI query cataloging: This elementary meta data management step includes BI query cataloging of reports, pre-built queries and data element definitions. Most of today’s top tier BI products come with BI catalogs that store information and allow queries from BI query tools. In addition, portals are also available within these BI product suites that allow grouping of reports, queries and data elements that business users can access.
The previous three levels of meta data management are relatively simple, and there is technology available to help you with them. Now you need to worry about documentation – both creating and maintaining it and using the available features of your ETL and BI tools to manage and promote meta data.
The next three steps are where meta data initiatives need to invest more time, resources and money to achieve results. The business benefits can be significant, but there are risks.
DW/BI complete data life cycle: Tracking the complete data life cycle within a DW environment is lofty objective, but if you constrain its scope, it will yield significant business benefits. If you have succeeded in the initial three levels, you should at least examine stepping up to this level. We’ll discuss the constraints in the following section.
Business process mapping and enterprise-wide data and process controls:The last two levels go well beyond your DW programs and should not be attempted by the fainthearted. There are successes in those areas, but if you have that level of commitment and funding, you do not need to read this column.
The major approaches to meta data integration encompass:
You will be pleased to know (if you are from the misery-loves-company camp) that Gartner estimates that “at least 70 percent of organizations live with meta data descriptions in multiple technologies. Most have made little effort to coordinate the descriptions …”1 Further, the remaining 30 percent are split between acquiring a meta data repository and building their own.
Although most project teams have opted to sit on the sidelines when it comes to meta data integration, it is about time for many of them to add meta data integration capabilities. DW and BI vendors have improved bridging and expanded their own meta data repositories. The key tools used in DW projects are data modeling, ETL, database and BI tools. The major vendors generally support meta data exchange between their product and the other categories of tools, which means you can transfer meta data from your data modeling tool to your database, then to your ETL tool and finally to your BI tool.
It boosts productivity and data consistency when meta data is exchanged between these tools rather than being manually (and redundantly) input into each category of tool. If each tool is treated separately, the chances for errors and inconsistencies increase.
In addition, some of the ETL vendors have expanded their offerings to include data integration suites with meta data integration. In essence, you can use the ETL tool as your meta data hub. This is especially appealing when the data-integration vendor has extended their repository to many types of meta data along with data lineage, audit and management capabilities.
Another approach is to standardize on one of the major BI vendors that has expanded their product suite to include ETL tools that exchange meta data with their BI products. With the addition of the meta data exchange between the ETL and BI tools, you are able to track data lineage as the data flows through the ETL and BI tools. In addition, the meta data exchange often enables you to populate the BI catalog with your ETL meta data, thus ensuring consistency and improving productivity. This results in a much more robust meta data management capability.
This is not to suggest that either of the previously mentioned alternatives would match the capability of an enterprise-wide repository or a custom-built meta data management environment. However, without a significant management commitment or resources, the suggestions herein are a more pragmatic approach to gaining a significant ROI and success from your meta data management initiatives.
Ignoring meta data management should not be an option in today’s business climate. Companies need consistent and accurate data that they understand and can verify. Meta data management processes and tools are readily available. What’s needed is the desire and commitment, along with a reasonable scope and measurable goals.
Reference: