Data lineage – where did that data come from? (data auditability and traceability)

Automate Data Remediation to Find Dirty Data Before Your Customers Do
April 6, 2011
Fast Track Your Business Intelligence with Pentaho & Athena IT Solutions – May 10
April 19, 2011
Show all

Data lineage – where did that data come from? (data auditability and traceability)

Cartmel_sea_admiral The word “pedigree” brings up visions of race horses and show dogs. But it should also make you think about your data. Your data has a pedigree, or lineage, too. Just as with fine animals, it shows where it came from – and probably what can be expected of it in the future, too.

When you’re working with financial data you need to know where it came from – its source systems, what systems processed it, how it was manipulated, and how it was changed. If the CFO asks you to substantiate a certain number, you sure want to know where it came from!

A former colleague of mine experienced first-hand one of the pitfalls of not knowing the lineage of a client’s data. The project was to replace a spreadsheet-based (data shadow system) budgeting, planning and forecasting system with a performance management solution at a multi-billion dollar company. He spent 10 hours in a client meeting – one of those painful marathon meetings – where they were discussing the design of the new system and poring over the data in the client’s spreadsheets to understand how the data was transformed and manipulated.

After the meeting not only ran through lunch but also dinner, a senior manager from the client said “Hey, wait a minute. This isn’t even the right set of spreadsheets!” Everything that the dozen people at the meeting – half from the client’s finance staff and the other half highly-paid consultants – had just done over the past 10 hours was a total waste of time.

The sad truth is that my colleagues and I see many clients who honestly have no idea how their data got into their reports or Microsoft PowerPoint slides. They know what enterprise application the data originated in and can get their IT staff to document how the data was loaded in their data warehouse. But then gets fuzzy!

How many enterprises can then trace and document what happens to data as it progresses through several potential stops in data marts, cubes, Microsoft Access databases and Microsoft Excel spreadsheets? How do they know what transformations and manipulations happened to that data? When everyone is using their own spreadsheets (data shadow systems), no one knows which version is right.

I see too many enterprises that only master the first steps of the data journey – enterprise applications and data warehouses. After that it’s a “black box.” Explain that to your CFO, your stockholders or government regulators. I’m sure Enron and Tyco had enterprise applications and data warehouses, but somehow their reports did not show reality. Those are extreme examples of people doing unethical activities (and they are in the minority of business people) BUT even without bad intentions your enterprise should know where the information on sales, expenses, customers and employees, for example, has come from and how it’s been transformed.

Enterprise Data Management (EDM) is not an esoteric topic anymore. Data needs to be managed at the point when it enters an enterprise until it is consumed in a report or analysis. An enterprise needs to adopt a holistic enterprise-wide data-management program to enable data lineage and audits. 

EDM is not just for a competitive advantage anymore, but rather a business and financial necessity.     

1 Comment

  1. absolutly said right that edm is now a must have than a can have, especially when dealing with financial data

Leave a Reply

Your email address will not be published. Required fields are marked *