Predictive analytics and data visualization are data-hungry processes. An enterprise data warehouse is often a critical part of the data backbone that these processes crave. IT’s data integration, data cleansing, master data management (MDM) and data governance feed these processes just as they do BI reports, dashboards and other business analytics efforts.
In many enterprises there is a battle between data completeness and data quality. The two opposing sides are data scientists (on the business group side) and the IT group.
Data scientists are avid IT customers, but something often happens that strains their relationship with each other. Unlike the business users associated with other BI projects, data scientists and power users commonly use data that is incomplete or has quality issues. This frightens IT and may trigger a hostile reaction. After all, the IT group takes its role as the enterprise data custodian very seriously; it spends a lot of time integrating and cleaning the data. The down side of this is that the business group may see them more as data czars than custodians. It may appear that they feel they own the data and it is their duty (or right) to dole it out, helping the enterprise by keeping “bad” data from the business.
The world is filled with bad data, and Big Data exacerbates this because, well, it’s big! The reason that data scientists use that data is because it has business value and it is the best that they can get.
Another key difference between data scientists and typical BI users is that the data scientist is a data savvy user. In fact, a data scientist may understand the data and its limitations better than IT. The data scientist needs to understand the state of the data and determine the impact, and then may need to adjust the models to compensate for the data quality.
Once the necessary data is located and evaluated, work often needs to be done to turn it into a clean, consistent and comprehensive set of information that is ready to be analyzed. That process may be minimized if you leverage an enterprise data warehouse as the primary data source. But if you augment the warehoused information with external and unstructured data, it will add to the data integration and cleansing work you need to do.
One of the critical success factors for long-term success in predictive modeling for an enterprise is a healthy relationship between IT and data scientists. IT needs to continue to expand the data backbone and be data custodians. They need to recognize that their roles and responsibilities are different with the typical BI user and the data scientist. In addition, they can improve data scientist productivity by establishing analytical hubs (see Why you need an analytical hub). Data scientists, on the other hand, need to work more closely with IT, leveraging the data they are custodians of and then keeping them up to date on their efforts.