Data Integration: Hand-coding Using ETL Tools

Discount on my Online DW and BI Data Modeling Course
November 2, 2009
Data Integration: Hand-coding Using ETL Tools Part 2
November 6, 2009
Show all

Data Integration: Hand-coding Using ETL Tools

Hand-stop We are creatures of habit. It’s not easy to stop doing something the way we’ve always done it. Especially when we think we are right (but actually we’re not). Let’s explain.

I have discussed (some might say preached) in many posts, articles, webinars, podcasts, classes and client discussions that for any recurring data integration tasks IT should use an Extract, Transform and Load (ETL)  tool.

This certainly has been the best practice for enterprise data warehousing projects in the Fortune 1000. This is where I got my early experience in data integration and got to use the ETL tools that annually rank in Gartner’s Upper Magic Quadrant and Forrester’s Top Wave. These ETL tools enabled IT groups and SI (system integrator) project teams to tackle data integration challenges too complex and extensive for hand-coding. 

However, while the enterprise data warehousing  projects were being developed with enterprise class ETL tools, most Fortune 1000 departmental projects and small to medium business (SMB) companies were hand-coding their data-integration processes.

IT groups choose to hand-code because for quite a while the enterprise class ETL tools they heard of were too expensive for their budgets.  In addition, these tools required dedicated, trained developers. Life in a small IT group means doing multiple tasks and never really having time to get training.  The result is millions of lines of hand-coded ETL in enterprises today —  most of which is not documented and took much longer to develop than it would have with an ETL tool.

Times are Changing

But times have changed. There are now very robust, affordable ETL tools that can easily handle any departmental or SMB data integration projects (and might even be able to handle some enterprise data warehouses too).

In fact, two classes of ETL tools are free (or almost so). First, database vendors have “bundled” ETL tools with their databases. Although initially these tools were very simplistic they have expanded over the years have quite robust functionality. Second, open source software (OSS) ETL tools have also emerged and are capable of handling many departmental and SMB needs.

The emergence of these tools has broken through the pricing barrier with ETL tools that are not only quite capable,  but not difficult for a small IT staff to learn. I have been advocating these tools in the departmental and SMB data integration market for years. The good news is these tools are picking up converts but the bad news is IT may not be using these tools as well as they might.


Stay tuned for my next post, where I continue this discussion with Hand-Coding: What Went Wrong and How to Avoid Repeating History’s Mistakes

4 Comments

  1. Justin says:

    I’ve seen eaxctly those problems. I then came across Talend first as an open source tool but then we decieded to pay for the additional functionality in the commercial version – its as good as the others in Gartners magic quadrant but at a fraction of the cost. Take a look and you won’t be disappointed.

  2. Rick Sherman says:

    Justin,
    Talend is an attractive open source ETL option. I haven’t used it at clients but have used it in workshops when I teach at an engineering university’s Master Degree program. It wouldn’t win a functionality bake-off with the upper quadrant tools BUT not every integration project needs all those functions anyways. It may not be “best” overall but it may be “best fit” and the price is right.
    Too many people though decide that open source is free and do not budget any time to learn wither the tool or the fundamentals & best practices for ETL (or data integration.)
    Thanks for your good observations,
    rick

  3. Rick, there shouldn’t be anyone disagreeing with your argument against hand-coding these days as there is a growing number of very capable ETL tools on the market that are either free or highly affordable. I’d like to point out, however, that you should have included expressor in your analysis of classes of ETL tools that are capable of handling many, if not all departmental and SMB needs. Although we are not a “free” tool, we are very much targeting departmental and SMB opportunities because there are many accounts that need more power and functionality than what the freeware or open source software tools can offer.
    For example, we find that a sizable percentage of Microsoft SSIS users (30 – 40% in our estimate) are running into performance and/or data transformation complexity and/or data connectivity issues. Of course, the Gartner upper-quadrant ETL tools could solve these problems, but we believe expressor is the only solution on the market that’s both affordable and matches the upper magic quadrant performance, data transformation complexity, and connectivity capabilities of the likes of Informatica, DataStage, and Ab Initio.
    A case in point, one of our new customers is American Tower, who is replacing SSIS with expressor for exactly the reasons I mentioned earlier. Last week we also exhibited at the PASS Summit 2009 and got a very warm reception from many SSIS users who are facing a similar dilemma, which they often expressed as “Do I continue with SSIS because it’s free? I can’t afford the 250,000 dollars or more price tag of any of the upper quadrant tools.” And as you pointed out, none of them wants to go back to hand coding. Which explains why they were so enthusiastic about expressor – the high-performance ETL alternative that’s up to 80% less expensive than the traditional high-end ETL vendors.

  4. Rick Sherman says:

    Michael,
    Thanks for the feedback and I agree that expressor is a great option people should consider.
    The “bundled” tools, especially SSIS, are better than hand-coding but often run into limitations. They often come with a limited set of pre-built functionality (the developers have to augment the gaps with SQL code), performance is highly dependent on database tuning and they often do not scale.
    I’ll keep spreading the word that IT should embrace ETL tools and learn data integration processes. I’ll also try to position ETL tools in respect to what they can do, their ease of use (to learn and to develop with), their TCO (total cost of ownership)and their fit for the task & organization at hand.
    Thanks,
    Rick

Leave a Reply

Your email address will not be published. Required fields are marked *