There are many emerging data integration strategies and technologies that IT can leverage to create the information needed for their business. When examining these, it is useful to split the data integration landscape into two camps: enterprise IT and departmental data.
The enterprise IT camp
The enterprise IT group is responsible for the enterprise data warehouse (EDW) and any data migration or consolidation efforts across their enterprise applications. The data integration work for this camp includes data sourcing, profiling and cleansing using extract, transform and load (ETL) and other specialized tools. The primary purpose of these data integration processes is to take data from many source systems and perform various processes to create consistent, comprehensive, current and correct data enabling business reporting and analysis. The sourcing efforts are the “heavy lifting” tasks of data integration.
The best practice for enterprise data integration is to use a high-end ETL tool. These have been popular for many years. Historically, enterprise IT groups had to select multiple “best-of-breed” tools to support the non-traditional ETL tasks such as data profiling and cleansing. In addition, ETL was limited to batch-driven processing, so for real-time or messaging processes, the IT group had to buy yet another tool.Many high-end ETL tool vendors started expanding their products to include many of these new processing capabilities. Their ETL tools gradually evolved into data integration suites. Gartner Research and Forrester Research have been rating Informatica and IBM (formerly Ascential Software) the top data integration suites for many years. These tools offer customers a data integration suite to handle the complex and extensive tasks that the enterprise IT group encounters.
Although this data integration trend has been evolving for a while, too many integration initiatives have been developed and implemented as data and technology silos. Enterprise IT groups continue to look at data integration as a series of discrete processes and best-of-breed products. Integration competency centers (ICCs) are a non-technology trend that is starting to fix this problem in some organizations. ICCs enable the IT group to view data integration from an enterprise standpoint and build out a data framework that eliminates silos.
Even though enterprise IT groups for large companies have adopted ETL technology for data integration, the departmental data camp continues to create hand-coded applications for their data integration tasks. This camp consists of small to medium-sized businesses and even departments within enterprises that are using the data integration suites elsewhere.
It may seem strange that these two diverse groups are in the same camp, but when it comes to data integration, there are more similarities than one would envision. First, and most important, gathering and transforming departmental data for small to medium-sized businesses is much simpler than it is for large enterprise groups. Compared to wWhen building the enterprise data warehouse, there are fewer sources and less cleansing required. This means that much of the functionality that has been built into the high-end tools is simply not needed in the departmental setting.
Secondly, departmental groups don’t have the large groups of dedicated ETL resources that enterprise groups have. In fact, they might not have any dedicated resources, because staff members have to split their time on other projects or roles. This means that the high-end tools are probably too overwhelming for these groups to master.
The result has been that the departmental groups have shied away from high-end tools, claiming that they’re too costly, complex and difficult for them. From their perspective, they’re right. These groups also don’t see the value of ETL tools that are lower-priced or come bundled with other tools.
The great news is that these same “underpowered” ETL tools have evolved in recent years to be very powerful and robust. Microsoft SQL Server 2008 Integration Services (SSIS) and Oracle Data Integrator (ODI), for example, have greatly expanded their capabilities by building in many of the common tasks that are required in BI and DW. In addition, there is open source software ETL from Pentaho and JasperSoft/Talend that offers more capability than many realize.
These current data integration trends hammer home the argument that departmental data integration should be handled by an ETL tool versus hand-coding. It’s more productive, and the total cost of ownership (TCO) is lower.
There is a significant movement for business groups currently managing departmental data with spreadmarts to replace them with data integration based on ETL tools. Businesses need reporting and analytics for decision making, and they understand that data is the key.
Taking the next step with new data integration strategies
The recent trends in data integration are exciting. If you are in the enterprise camp, create an ICC and start to leverage the complete suite of data integration capabilities that your vendors have been adding. If you are in the departmental data camp, look to incorporate ETL tools to replace hand-coded data marts and spreadmarts (or data shadow systems). In both cases, your TCO and responsiveness to the business will improve remarkably.
This is a two-part article on data integration: