Data Integration versus ETL

Data Integration – Top Vendors Named
December 5, 2006
The “staggering” influence of Excel
December 11, 2006
Data Integration – Top Vendors Named
December 5, 2006
The “staggering” influence of Excel
December 11, 2006
Show all

Data Integration versus ETL

In my previous post I discussed the Gartner Research "Magic
Quadrant for Data Integration Tools, 2006
" and that Gartner has
decided to supplant its “Magic
Quadrant for Extraction, Transformation and Loading, 1H05
” because
“Discrete markets for single-purpose data integration tools, such as extraction,
transformation and loading (ETL), have given way to a broader market for data
integration tools that address a range of data delivery styles.”

I am a devout advocate of taking a holistic approach toward
data integration and using data-integration suites, however, market conditions
and pragmatic considerations have created my split personality on this subject.

My decades long involvement (am I that old?!) in data-integration
efforts and my architect/engineering background resoundingly proclaim data
integration rather than “simply” ETL tools. My other personality is my
consulting side, which advocates that it’s better to improve your customers’
business solutions than tell them that one-size-fits-all and their only option is
to buy into data-integration nirvana. The result of the “take it or leave it”
approach is that they sometimes “leave it” and continue to do what they are
doing, i.e. custom coding.

Maintaining the status quo means that their business solutions
will continue to have:

  • Questionable data quality, consistency and integrity
  • The inability to audit where they got the data and how they transformed it (what financial transparency and Sarbanes-Oxley?)
  • Business people spending more of their time gathering and massaging data, rather than analyzing the data and making informed business decisions

Although it’s time to look at data-integration tools, I
think it’s premature to abandon ETL products. First, the data integration tools
market is in an “early stage”. The market assessment is great for the early
adopters but many enterprises may not be ready for this stage yet.

Second, and most importantly, although ETL is widespread it
is not pervasive. With the thousands of ETL licenses sold how can I make that
claim? Easy, the two most widely used approaches to data integration are: 

  • Custom coding, using SQL scripts, stored procedures or SQL APIs from various programming languages
  • A combination of loosely configured steps using Microsoft Access and Microsoft Excel. Sometimes SAS or a flat file extract created from a BI tool is used to provide the initial extracts for the Microsoft tools to use. This is a Data Shadow System.

Both of these approaches are used extensively in the SMB
(small to medium businesses) market and in the large enterprise market. In the
latter market, they use ETL tools extensively to create one or more data
warehouses. However, if you look not too far under the covers, you see that
many groups within those large enterprises are using custom coding to feed many
data marts, reporting databases and cubes. In addition, the business is
building numerous data shadow systems for reporting, analysis and performance
management. It is not unusual for dozens of these systems to be built within a
large enterprise.

The architect in me says that I should advocate expanding
the use of the ETL or data-integration tools in these large enterprises to
replace all the custom coding and data shadow systems. There are many
advantages to this approach. And this is the first alternative that I suggest.
BUT, as with most business decisions, a business person should have options. A
one-size-fits-all or a take-it-or-leave-it approach does not take allow a
business or IT person to examine alternatives based on their considerations.

Why aren’t large enterprises using the data-integration or
ETL tools they already have in place? The two principal inhibitors towards using these tools are resource constraints (costs, people,
skills, time) and not being ready for them yet. Why aren’t they ready? Two possible
reasons are because their needs are far simpler than what the sophisticated
tools provide or they are not experienced enough to recognize the need.

One of the things I learned from software start-ups is that
you have to sell against software competitors and substitute activities. In other words, you may find yourself
competing against your customer. Enterprises could use ETL tools or they could
substitute custom coding or data shadow systems built with Microsoft Office
products. Other industry experts might not recognize that as a choice, but it’s
a choice that I have seen customers make dozens of times in enterprises, even
when they already have successful ETL projects.

In addition, the customer can always choose to do nothing. Despite
how wonderful our product is and how persuasive our Microsoft PowerPoint slides
are, customers can always  decide they
are going to do something else with their money, resources and time.

For the most part, I advocate a holistic approach to data
integration.  But when the customer is
heavily engaged in custom coding and creating data shadow systems, I have to
consider if  they need to “walk before
they can run”.

This may not be the best-in-class
choice but it certainly is the best-for-the-customer


1 Comment

  1. Robert says:

    I read the external article on DATA SHADOW SYSTEMS , and I think the criticism is so very one-sided. These so-called data shadow systems will never ever go away and for good reason.
    1. It is up to the individual worker or small group to figure out what data they need in what format, in order to properly do the analysis that is their job.
    2. Many workers will find, even if they were there when this multi-million dollar warehouse was being designed, that their advice on what data is needed will be ignored. This is because of a very ugly aspect of human nature that will never ever change. Human beings evaluate advice, not always based upon the quality of the advice, but on the quality of the person giving the advice, i.e. the person’s perceived social status. Now the software company trying to close the multimillion dollar warehouse deal will try to make sure the salespeople will have very high social ranking, nice tie, prestige , and all that. Many of the in-house people who do a lot of grunt work with the data, they will give advice that will be ignored because they have lower social status. No manager would admit to making decisions like that. But this is real human nature.
    You’re missing the point with your criticism of data shadow systems. Those folks in the trenches will do what they have to do in order to get their job done.

Leave a Reply

Your email address will not be published. Required fields are marked *