In my discussions with clients, prospects, students and networking with folks at seminars I am always asked about my opinion or recommendations on data integration and ETL (extract, transform and load) products. People always like to talk products and much of industry literature is centered on tools.
I’m happy to discuss products, but every once in a while someone asks me a more insightful question, which is what happened this week. That person asked what the main shortcomings or stumbling blocks are that companies encounter when implementing data integration.
Great question. I discuss this when I am working with clients and teaching courses, but hardly anyone asks me that and directs the discussion in that direction.
My answer is simple: it’s not the tool, but how you use it that determines success. Although you do have to know the mechanics of the tool that is not the critical success factor. What really matters is the mechanics of data integration.
Many people don't understand data integration processes and the frameworks products provide to implement those processes. And it’s not just data integration newbies that have this problem; it’s also experienced veterans.
Most data integration architects, designers and developers started ETL by writing SQL scripts or manually coding using something like Java with JDBC. Then they try to replicate what they did in the manual code into the data integration processes. This is probably the worst way to use a data integration product! You likely get little benefit from the framework, processing is not optimized (maybe even terrible) and worse, the developer gets frustrated because he feels he could have coded it faster.
Welcome to the world of frustrated data integration processes, where people either assume these products are not useful or that the particular product they used must not be very good.
Almost all data integration products provide data imports/exports; data and workflows; data transformations; error handling; monitoring; performance tuning; and many processes that have evolved as best practices such as slowly changing dimensions (SCD), change data capture (SCD), and hierarchy management. All of these pre-built capabilities mean that data integration development does not have to reinvent the wheel, but can leverage industry best practices to develop world-class integration. But instead many data integration developers are spending their time creating the equivalent of manually coded import, extract and transforms without ever having time to get the best practices that would best serve their business.
Any successful, productive, robust data integration effort needs people who understand the necessary processes and can implement best practices. Getting the tool and having the people who know how to use the tool is only the beginning. You will get nowhere fast until you make sure you have people who understand data integration processes.
3 Comments
Nice post. But using the right tool still counts — a lot!
Michael
http://www.expressor-software.com
Agree agree agree. Summary, it is not the wand then, it is the magician that is important.
Good work can be done with poor tools but it will be an unnecessary struggle. Right tools in the hands of the right person who understands the data, the business, the challenge and is experienced and/or motivated, then it all comes together.
And the appropriate tools, not too big, not too small, not too numerous nor few not too complex, not too simple.
Great post! both comments before are correct using the right tool definitely helps with reach goals but the person using the tool needs to have the drive to learn about how to use the tool correct to utilize it to its max! Business Intelligence Software has many perks for the user. It might just look like a bunch of charts but in reality it shows alot and when used correctly with some training the person maybe able to learn that data integration is not just with the data but with the user and the software