Metadata – The Missing Link

Essential Guidelines for Evaluating Analytic Applications
August 13, 2003
Show all

Metadata – The Missing Link

Originally published by DMBS Magazine (Intelligent Enterprise/Information Week)

A tremendous amount of resources are being used in enterprises to build data warehouses and data marts. This type of decision-support activity is being performed as part of the IT mainstream. Product vendors, systems integrators, and consultants are mobilized to help IT in their efforts. But often, after investing much hard work and resources, business users are disappointed with the results. Did the IT groups, vendors, and consultants miss something?

Database query tools have proliferated over the past few years. There have been more than 100 of these tools in the marketplace at various times. Despite allowing business users access to virtually any database that IT can build, these tools have not gained the widespread usage that spreadsheets or word processors enjoy. How do business users locate information with these tools? How do they know what the data represents? How do they get the information they need? Without being able to answer these questions, business users cannot make effective use of these tools or the data warehouse.

If You Build It, They Will Come. . .

Expectations for data warehouse projects are established by an initial enthusiastic group of business users. This is often reinforced by a successful pilot project with these same users, who raise expectations even further. These business users are generally innovators and early adopters in the adoption of technology. Geoffrey Moore does an excellent examination of the Technology Adoption Life Cycle in his two books, Crossing the Chasm and Inside the Tornado (both published by HarperBusiness in 1995). These business users enjoy exploiting new technology as part of their jobs, hoping it will give them an edge in their business. Most business users, however, are more pragmatic and will use new technology only if it has been proven to make their jobs easier. Furthermore, the new technology must not require a significant investment of time on their part. These business users do not assume that it’s always necessary to use the latest technology. They reap the technology and information that have been harvested for them by the early adopters.

It is very common for IT to follow the philosophy of “If we build it, they will come.” This philosophy is reinforced by the “data explorers” who are self-sufficient with new technology and eager to find new information assets. Data explorers are users of the various query and OLAP tools who enjoy exploiting new technology in their jobs. They delight at the success of finding new pieces of information while using these new tools. Data explorers have a disproportionate influence on all parties building data warehouses. They create the false expectation that business users will leap at data warehouses and find new, exciting information jewels previously locked in data basements (legacy applications to which business users could not or would not gain access). Typical businesspeople need some help and support in that endeavor. They will not invest the time in the new technology just for the joy of using it.

Data Rich, Information Poor

Initially the Internet was used by a small number of technical people. The World Wide Web and Internet browsers expanded its use significantly. However, as the content expanded exponentially, search engines such as AltaVista and Yahoo were needed to help people find information. But even the search engines were not enough, because inquiries returned thousands of choices. A skeptic once said that the Internet contains all the information you will ever need to know but cannot find. Millions of people, however, still use AOL and CompuServe, because these services organize the information in a more useful way. Recently, PointCast and others have incorporated push technology to broadcast information to users. Users select data published from various information channels, which are organized by content. PointCast will then “push” any updated information from those channels to users as requested. Both approaches, regardless of their underlying technologies, are successful because they offer an organized information catalog for users to browse and select information from.

The Missing Link

The Business Information Directory (BID) is the missing link needed to open up data warehouses to the business community. It is the catalog of information that is available for decision support throughout the enterprise. This information includes data warehouses, data marts, OLAP, data mines, workgroup applications, and personal analytical databases (spreadsheets).

The cornerstone of the BID is the “M” word: metadata. IT personnel cringe and business users’ eyes glaze over when metadata is mentioned. Metadata, however, is a means to an end — an enabler to the desired goal of making decision-support data accessible to the business community throughout an enterprise. The two usual approaches to metadata are at opposite ends of the spectrum: It is either ignored or praised with zealous fever. If ignored, metadata will proliferate with every tool brought into the data warehouse environment. If approached as a “religion,” it will focus IT on the wrong issues. The balanced approach is to place it as a resource to be harnessed in successful decision-support environments.

Metadata is data about data. There are two categories of metadata: technical and business. Technical metadata is the description of the data needed by various tools to store, manipulate, or move data. These tools include relational databases, application development tools, database query tools, data modeling tools, data extraction tools, online analytical processing (OLAP) tools, and data mining tools. Business metadata is the description of the data needed by business users to understand the business context and meaning of the data. Technical metadata has spread like wildfire across the enterprise as more tools and types of tools are used to build decision-support systems (DSSs). Business metadata is contained in the business requirements and specifications for DSSs. It is often only online in the Word documents used in designing these systems. After it is used in the design phase, the business metadata is generally “shelfware” (collecting dust in three-ring binders on the business analyst’s shelf).

Business Information Directory Functionality

The Business Information Directory supports three main functions. First, the BID enables information discovery. The business user needs to find out what information is available. Data is worthless if the user does not know it is there. In fact, as the amount of data, in terms of the number of data subjects, facts, and dimensions you have available, increases, the business users’ ability to find what they need generally decreases. Decision-support information is located in databases and directories across the enterprise. The BID should be the equivalent of Yahoo or AOL in giving the business user a friendly, effective way to find out what information is available.

Second, the BID promotes business understanding. Just knowing that data exists is not enough. What the data represents is crucial to business users. They need to determine if the information is pertinent to them and how to interpret it. Terms such as sales and profit can mean vastly different things to various business groups within an enterprise. Business users need to understand the context of the data in order to use it properly.

Finally, once the business users know the data exists, they want it. They may want to access it now, or they may want it delivered to their desktop on a regular basis. The latter would be necessary for them to perform repetitive tasks such as weekly or monthly reports. Business users, accustomed to double-clicking on links on a Web page, want similar functionality in their decision-support systems.

Recently, some of the more sophisticated query tools have been created as Managed Query Environments (MQE). This is an attempt to make the query tools more business-user friendly by using business terminology in developing the queries. An MQE accomplishes this through a semantic layer (metadata) that replaces the physical names of tables and columns with views and synonyms with business terms. This can be viewed as a limited information catalog. A great enhancement over the earlier generations of query tools that presented physical table and column names to end users, MQEs should be a selection criteria when query tools are being evaluated. But their semantic layer, or information catalog, is too limited to extend across the data warehouses, data marts, and so on that are needed.

The Users of a Business Information Directory

The potential customers for the BID are business users and members of the IT groups building and operating the data warehouse. The former includes both data explorers and data farmers. Data farmers, however, are not interested in using the query tools just for the sake of using new technology. As experienced spreadsheet and word-processing users, they use these technologies as tools in their jobs. They harvest the data that the data explorers find and turn it into information using spreadsheets to analyze this data. Data explorers and IT personnel can find and access data within the data warehouse with various database access and OLAP tools. They accomplish this by spending the time looking for the data. However, data farmers cannot exploit these tools effectively because of the time requirements.

The BID’s initial targets are the data farmers of the business community. They need an information catalog they can search for information, understand it, and get it. It is important to note, however, that if a BID was available to the data explorers and IT personnel, they, too, would benefit because they could exploit the data warehouse more effectively. Data explorers and IT personnel, however, may not perceive the need for a BID because they think they already have tools to access the data warehouse.

The target market shapes what functionality the BID offers, which in turn determines what is stored in its information catalog. Vendors, consultants, and IT all have the data explorers in mind when considering the need for or designing BIDs. In fact, from the data explorers’ point of view, an information catalog may not be as critical because they are willing to search for information on their own. However, as previously noted, data explorers would benefit significantly from a BID.

The BID serves two purposes for the data farmer. First, it acts as the librarian who researches what information is available and pertinent for the business user. Second, it is a mail-order catalog from which business users can order the information to arrive when they need it. This latter purpose is similar to PointCast in that business users want the information delivered to their desktops to use in their work.

BID Components

The BID is composed of four components and interfaces. These include the Information Navigator, Information Catalog, Administrator, and the Information Delivery Agent. Most products include the first three components, but not all products currently implement an Information Delivery Agent.

The Information Navigator is the business user interface. It provides the navigation, understanding, and access functionality for the BID. It interacts with the other BID components, as well as invoking various tools to access and manipulate information by the business user. This is the business user’s view into data warehouses, data marts, workgroup databases, and personal databases.

The Information Catalog is the brains of the BID. It stores the metadata needed to provide BID functionality. Various import and export facilities as well as APIs are used to move metadata between different metadata sources and the BID.

The Administrator is a superset of the Information Navigator. IT also uses this interface for BID administration. These functions include maintaining the Information Catalog, managing business users access capabilities, maintaining security, and updating metadata not handled by the Import/Export capabilities.

The Information Delivery Agent moves the information requested by business users to their desktop or workgroup applications. This is equivalent to a push model in which the business user requests information to be delivered and it is published onto the user’s desktop.

Market Observations

The Business Information Directory market is very immature. According to Crossing the Chasm, we are currently in the Innovators and Early Adopters stages of the Technology Adoption Life Cycle, and we have been in these stages for a few years. Only a handful of products on the market today are very new. Many innovators and early adopters built their own BIDs, which greatly enhanced their data warehouse efforts. Several products on the market are the result of IT internal projects or consulting engagements trying to transform these efforts into commercial products.

It is also a poorly understood market. Most vendors do not understand what the business users’ needs really are. Vendors usually work with IT groups and therefore view the need for a BID through IT’s eyes, which leads to a belief that users simply want access to databases. But this functionality is just the means to an end. The real objective is information access, which means finding and understanding the information in business context but not how a database administrator would find it. In addition to the vendors, IT also does not fully appreciate the extent of the problems and needs. Most IT people are too busy to deal with metadata. Because of the ever-increasing pressures to deliver projects quickly, items that do not have a perceived immediate impact, such as metadata, are postponed. And those IT groups that do not postpone dealing with metadata are frustrated by vendor solutions that are, at best, partial solutions addressing a limited set of metadata sources.

BIDs are also very diverse in nature. Most BIDs were created during specific customer engagements or as add-ons or extensions to existing product lines. The products from Prism Solutions Inc., Platinum Technology Inc., IBM Corp., Logic Works Inc., and Virtual Integration Technology Inc. were all initially built under these circumstances. As such, they address the particular metadata integration needs encountered for that specific engagement or product line. The resulting BIDs need to be expanded to meet the wide variety of environments encountered in the general marketplace. In addition, the engagements in which the BIDs were created were consulting or specific IT projects, with a lot of personal attention paid to tailoring them to be successful. With the move to a commercial product, the extensive consultative support is eliminated, and implementation success is greatly diminished.

The Market

The products available in the market that I will discuss are:

  • Prism Warehouse Directory
  • Platinum Data Shopper
  • IBM DataGuide
  • Logic Works Universal Directory
  • Virtual Integration Technology deliveryMANAGER

Prism Warehouse Directory

The BID with the most market visibility is the Prism Warehouse Directory (PWD). Prism, founded by Bill Inmon, helped define and expand the data warehouse market. The company’s main product is Prism Warehouse Executive (PWE), the revision to Prism Warehouse Manager, which addressed building data warehouses through extracting, transforming, and loading them from legacy systems. This process involved mapping source and target systems with code being generated to do the previously mentioned tasks. Because all of the metadata to support these operations was input into the tool’s data store, metadata documentation and management were provided.

The Prism Warehouse Directory was a natural extension of the Prism Warehouse Executive — a great deal of the technical metadata for the BID was already available. The initial releases of the PWD were geared toward IT and data explorers and oriented toward the physical aspects of storage and transformation between sources, which was the purpose of the PWE. At that time, the BID was a totally passive catalog; users found references to the information they desired, wrote down where it was located, and then went into other tools to access the data.

This BID has progressed significantly since its inception. Prism has partnered with several vendors to create import and/or export capabilities with repository, CASE, data modeling, and MQE tools. This greatly expands the metadata available in the information catalog. In addition, Prism has added the capability to launch applications once information is located. This moves the BID from a passive to an active catalog. Prism Warehouse Directory Web Access allows Web access to the BID and expands access to data by enabling users to build and launch queries to databases.

The Prism Warehouse Directory has been installed by approximately 100 companies. It has three components: Directory Builder (administrative tool), Directory Navigator (end-user tool), and the Information Directory. It can be purchased standalone at $50,000 with five Navigator seats or bundled with the Prism Warehouse Executive. Almost all purchases of PWD are bundled with PWE.

Although it has made great strides in expanding its audience, PWD is still centered around the sourcing of data into data warehouses or data marts. This is a key application of metadata, but it is still technically oriented and will appeal to IT and data explorers. If you are already a PWE customer, it is natural to utilize PWD. If you are not using PWE, you should evaluate other options.

Platinum Data Shopper

In my view, Platinum Technology’s Data Shopper has the largest market share of the commercial BIDs. This product was acquired through Platinum’s purchase of RelTech in 1995. Data Shopper uses the Platinum Repository (an integration of the repositories from RelTech and BrownStone Solutions) as its information catalog. Most of the installed base, which is approximately 300 sites for the Platinum Repository, has purchased Data Shopper.

The metaphor used is that of file cabinets and folders. Information content is organized into “file cabinets,” which are logically business subjects or topics. These are further divided into business categories. Business rules, logic entities, data structures, data elements, and data usage tabs are also provided.

Data Shopper is marketed as a tool for business users to browse and understand what is contained in a data warehouse (via a repository). Business users can find information that they might not have otherwise known existed. They can identify, understand, and locate objects such as database tables and columns, queries, reports, spreadsheets, Word documents, application programs, and other information stored in repository.

Data Shopper lists for $500 per seat, with volume discounts applying. However, Platinum Repository is required for the information catalog. The MVS version will easily sell for more than $100,000, and the Open Edition will approach $100,000 when loaded with various options. So the cost of admission is more than $100,000 and buying into the use of Platinum Repository. The merits of repositories in general and Platinum’s in particular are beyond the scope of this article. If you have the Platinum Repository, you should implement Data Shopper. If not, then first consider whether you should purchase Platinum Repository on its own merits.

IBM DataGuide

IBM’s DataGuide is sometimes lost in the large amount of the company’s product offerings. DataGuide, sold both on its own and bundled with IBM Visual Warehouse, was limited initially to “IBM shops,” with its first offering being OS/2 only and requiring DB2/2. It has now been released on Windows 95/NT and should offer Web-based access in the future.

DataGuide provides business users with an information catalog containing metadata about both structured (databases) and unstructured (files) data. This data is treated as an information object and can be grouped together in a variety of ways. The information catalog is extensible, with the capability to add different types of objects. Import and exports are achieved through published APIs or through a published command language interface. Initially, the only metadata exchange occurred among DB2 family products, but partnerships with market-leading OLAP and MQE vendors have expanded this capability.

DataGuide consists of three tools: DataGuide User, DataGuide Administrator, and Information Catalog. The User interface presents a tree structure of objects that the business user expands to get the contents of folders or more details. Business metadata and help are available on each object. Once information has been found, the business user can launch an application to access that information.

DataGuide has been installed at approximately 100 companies. It costs $209 for the User tool and $1,149 for the Administrator tool; volume discounts apply. In addition, a version of DB2 on NT, OS/2, or MVS must be purchased for the Information Catalog. This is the lowest-cost tool examined in this article, but that does not equate to usefulness or functionality. The only prerequisite that may hinder its implementation is the use of DB2/NT or DB2/2 for its Information Catalog. It would be more robust if the other major relational databases were also offered. But the cost of DB2/x is low and its use is limited (note: the data warehouse can be in any relational database, it is just the Information Catalog that needs to be in DB2/x), so this should not be a criteria to reject this BID. It is well worth the cost to explore this BID as a starting point for implementing BID functionality.

Logic Works Universal Directory

The Universal Directory was announced on April 1, 1997. Logic Works understands metadata for building databases, given its successful track record with the ERwin data modeling tool. This BID evolved from the idea of using the models generated during the design phase of your data warehouse or data mart as the base of metadata management. This metadata would then be expanded to incorporate more full-featured capabilities.

Universal Directory uses a three-tier architecture with the following components: Universal Explorer (business user interface), Directory Administrator (administration tool), Data Server (manages flow of data between clients and information directory), License Server (manages concurrent use of client tools), and the Information Directory (stored in Microsoft SQL Server, Sybase SQL Server, or Oracle). ModelMart, which handles the model management database (stored with the Information Directory), is also required. Other optional products that integrate with these tools are ERwin/Open and ERwin/Navigator (used for viewing and editing data models, including star schemas), Micro Focus Revolve (used for scanning legacy data), and Sterling CLEAR:Access (query tool used to access a data warehouse). Clients work on Windows 95 or Windows NT while the servers work on Windows NT.

Universal Directory sells for $30,000 for 10 Navigators, one Administrator, and one ModelMart. The company had at least a half dozen purchases as the product was formally announced. The product is very new and does not have extensive metadata import and export capabilities. Logic Works’ approach does favor IT and data explorers, especially those familiar with data modeling. However, the company has included BID capabilities to attract the data farmer. This is definitely a tool to watch and evaluate as it matures.

Virtual Integration Technology (VIT) deliveryMANAGER

Virtual Integration Technology’s deliveryMANAGER is a BID concentrating on distributing data from a variety of decision-support systems (data warehouses, data marts, and so on) and file servers. This BID enables business users to find information, place orders for that information, and have it delivered to their desktops.

The VIT deliveryMANAGER components are deliveryAGENT, metaWAREHOUSE, and deliveryADMIN.

The deliveryAGENT is the Web browser or Windows user interface to the information directory. Both structured and unstructured data can be cataloged and delivered. Information is arranged as information objects called collections. Business users search for information by subject and topics of interest; they can also obtain relevant business metadata. Business users can subscribe to this information and have it delivered to their desktops, file servers, email, or Web servers. Data delivery can be based on time or events.

The metaWAREHOUSE is the information catalog (currently stored in Oracle) that integrates technical and business metadata. Both structured and unstructured data can be cataloged.

The deliveryADMIN is the administrative tool used to manage the information directory. It handles user security, registration of all information objects, the building of collections, and monitoring information usage. This is implemented on Unix and Windows NT.

The VIT deliveryMANAGER costs $50,000. VIT is a consulting firm that is transforming itself into a product company. It has obtained venture financing but had funded initial product development through consulting engagements. deliveryMANAGER has approximately 10 installations. deliveryMANAGER is the only BID mentioned that has implemented an information delivery capability in addition to the information discovery and understanding functions. It is based on a well-engineered technical architecture and has obtained hands-on implementation experienced while developing deliveryMANAGER. It is well worth evaluating, with the biggest qualification being the risk level associated with a startup.

Recommendations

With all of the resources being used in building data warehouses and data marts, it is imperative to make the results of these projects usable by business users. Without this usage, these projects will fail to meet user expectations. Implementing a Business Information Directory produces the significant benefit of making the information visible, understandable, and available. In short, it can be the difference between success and failure.

Data warehouse and data mart projects need to incorporate metadata management and BIDs as part of their objectives. Even with the immature state of the market, the currently available products offer advantages over ignoring these issues and capabilities. Many of the early data warehouse projects built their own BIDs, which is still a viable alternative. However, many IT shops today do not have the resources or time to implement their own custom-built solutions.

Leave a Reply

Your email address will not be published.