When I teach data integration concepts in my classes at various corporations or to my students at Northeastern University, one of the topics I address is the differences between engine-based and database-based data integration tools.
You may be more familiar with their other names:
Inevitably, someone asks me which one is best. I have answer on two levels. The first answer is that neither is the best under all circumstances but depends entirely on the data integration scenario that is being addressed. Evolving industry trends are causing enterprises to often need both alternatives in order to best meet their full range of data integration needs.
The second part of the answer is that up until recently you could not compare ETL versus ETL purely on their own merits because the data integration tools that typically performed either approach tended to operate on two ends of the spectrum.
Historically the best of breed, as measured by Gartner Research and Forrester Research, provided engine-based or ETL approach. That meant that the two opposing camps on this issue were not just debating the technical merits of these alternatives but also the underlying data integration tools that provided this functionality. Fortunately this has constraint has recently changed.
Let’s quickly look at history. The first generation of data integration tools was code generators. At the same time, we were just beginning to adopt relational databases. Both that generation of data integration tools and relational databases had limited functionality.
Engine-based ETL tools were developed to address enterprises’ unmet data integration needs. The top-tier ETL tools have evolved into data integration suites incorporating much more than simply ETL processing.
But something happened along the way: relational databases and they way they were used grew more sophisticated. Nowadays databases can perform many integration tasks very effectively, especially if the data continues to reside in the same database instance or server.
That brings us to the second thing that changed. Data is now more apt to stay in the same database, even after it has been integrated into a data warehouse. Formerly, data was moved into data marts or cubes in separate databases.
This is the perfect condition to consider whether you should use ELT versus an ETL approach.
Integrating the data into the data warehouse involves the heavy lifting such as data cleansing, conforming dimensional data and handling slowly changing dimensions, for example. On the other hand, moving it into data marts or cubes involves the more lightweight tasks of filtering, aggregating and applying business transformations.
ETL is most often the choice for the heavy-lifting requirements of the former. ELT might be the better choice for the lightweight tasks of the latter.