What do you call that thing that’s holding your data? Don’t know? Then it must be an operational data store (ODS).
No term has been muddied and misused more in the data management field than ODS. We have consensus definitions for data warehouses and data marts; however, ODS seems to mean anything that isn’t a data warehouse or data mart. Whereas people understand how to architect, design and build data warehouses and data marts, they are stumped when it comes to an ODS.
In fact, building an ODS is like the lawless Wild West of data management because there is no standard architecture and design. Ask someone what an ODS is and you’re likely to hear, “We will know it when we see it.” Regardless of whether there is a clear definition of an ODS, IT departments have been very busy building them. According to a survey conducted in 2004 by TDWI, the average organization has four and a half ODS systems.
Typically, from a technical perspective, I would start this column by defining what something is and then examine its uses. There has been excellent published work that defines what an ODS is, but to date, it has fallen short because it deals mainly with explaining what has been built rather than why it was built. As a result, people tend to define an ODS by its symptoms rather than the business reasons for building it. Rather than follow that approach, I am going to use the methodology of developing a solution for a client, which is: gather the business requirements and then define and design the architecture based on the needs. In this manner, we will first identify the causes and business reasons for an ODS and create the appropriate design. In the second part of this column, I will compare the current conventional wisdom regarding ODS definitions and architectures associated with the actual business uses for an ODS.
The business purposes for building an ODS include:
We’ll examine these four purposes of an ODS in greater detail.
ODS systems got their start by supporting reporting requirements across multiple legacy applications. They helped fill a gap that existed in fulfilling data integration across these multiple applications because it was not within the scope of the application vendors or the data warehouse groups. The vendors were too busy expanding or acquiring new functionality and modules and not focusing on integration and reporting across them. Data warehouse groups tended to draw a line between management reporting (what they did) and operational reporting (what they did not do). Data warehouse teams were occupied by building and maintaining their existing projects without taking on the burden of operational reporting. In addition, updating the data more often than once a day, or in real time, was more than many of them could handle. Thus, the group maintaining the applications managed integrated reporting activities. As a result, the worlds of ODS and data warehousing/business intelligence (DW/BI) diverged into different, overlapping and counterproductive approaches.
Integrated reporting is currently being offered by both ERP and business intelligence vendors. ERP vendors have built ODSs, data warehouses and data marts into their products to meet their customers’ ever-growing demands for reporting. A pre-built ODS is a way for these vendors to integrate across their own modules, as well as their competitors’ products. These ODS systems are architected as a part of an overall data integration and reporting solution offered by the ERP vendor. The solutions work well and can be very cost-effective, helping avoid the need to build custom systems.
BI vendors have developed corporate performance management (CPM) solutions that support both operational and management processes. These solutions are often sold at the executive levels to enable scorecards and other management metrics reporting, but many times provide invaluable support for many operational processes and associated reporting. With an underlying ODS enabling near real-time updates, the CPM solutions again provide an alternative to custom-built solutions on top of enterprise applications.
When used as a system of record for categories of data, an ODS collects, integrates and distributes current information and provides an enterprise view of it. Common examples of these categories of data are customers and products. Business groups, such as divisions or subsidiaries, may be responsible for managing a subset of the data subjects, but no single group is responsible for managing the overall customer or product lists.
Data categories such as customers or products are often referred to by different names based on your application background. Data modelers and people involved in data warehousing will refer to this data as dimensions. Application-oriented people, especially those that have been involved in legacy applications, call this reference data. Finally, ERP vendors have adopted the term master data.
Regardless of what you call it, this data is generally physically scattered across multiple applications or source systems. For accurate reporting and analysis, an enterprise needs to make sure this data is consistent. For those in the data warehousing arena, making this data consistent is referred to as conforming dimensions. Historically, the DW group received de facto responsibility for making this data consistent. Many people, including those in DW groups, felt that the DW was not the appropriate application to provide this functionality. This involved creating and maintaining the consistent data with update capabilities enabled. This is a great use of an ODS – where the ODS becomes the SOR for this consistent data or conformed dimensions. ERP vendors have recognized this need and are starting to offer master data management across their application modules through their own pre-built ODS solutions. When an enterprise’s needs are greater than one ERP vendor’s applications, a custom ODS will be required.
One of the most prominent business purposes for an ODS is to provide data integration, data distribution and reporting to support specific business processes and associated data. The two most common business areas are customer data integration, and budgeting and forecasting. It is important to note that both of these involve integrating front-office applications rather than the back-office applications that are the mainstay of data warehousing. With the front-office applications, many people need up-to-the-minute updates on information that is relevant to their business function. With customer-facing applications, such as a call center, the businesspeople need data on all aspects of their customers, but that data is spread across many applications.
The ODS provides the ability to act as a data hub to collect, update and re-distribute the data to people and the applications that they use. The data hub (or ODS) may be connected to various data warehouses and data marts using them both as sources and targets (for data updates) in various business processes. This is similar to how budgeting and forecasting can be serviced by an ODS. The budgeting applications will use the ODS as the initial source of information, update the data though these applications, and then use the ODS to distribute these updates to the DW. In all cases, the ODS is used for the immediate reporting needs of its constituents.
A very common reason for creating an ODS is that the DW cannot handle either the data volumes or update frequencies required to support the operational reporting demands. The DW may have originally been designed to update data daily. It doesn’t have the hardware capacity to expand, or if it did, the performance would degrade. On the other hand, even if the capacity exists, handling operational data may be outside the charter of the DW group. They have been successful by sticking to their charter and feel that the application group should continue to be responsible.
The argument that the DW cannot support real-time updates and large data volumes is not a valid reason to create an ODS. If, as discussed earlier, there’s a need to provide operational reporting across disparate applications, then yes, an ODS is needed. Fortunately, enterprise application and BI vendors offer cost-effective off-the-shelf solutions that handle this. However, if the sole reason to create an ODS is that the DW cannot support the data volumes and update frequency, then think again. All you are accomplishing by creating an ODS is shifting that data to another location. The data must still be stored and updated; therefore, if the ODS doesn’t have a business purpose (as we discussed earlier), the data should go into the DW instead.
My next column will examine the conventional wisdom regarding what an ODS is and what architecture you should use in building it. We will compare these definitions and architectures with its intended business purposes to perform a gap analysis of what really fits and what does not. We will create a definition and architecture based on business solutions (cause) and not on symptoms (what the current ODS can do).