If your enterprise is considering Hadoop to help handle your big data workload, you’re in good company. Many organizations are exploring the use of this software to manage large amounts of data and more complex queries.
Hadoop has grown in popularity as big data has spurred the evolution of platforms capable of handling its large demands. Tools have evolved from simple tool sets to complete platforms. Large systems integrators and tech titans such as IBM have embraced the open source movement, making it more available and accessible to customers who are choosing how to implement big data for their enterprises.
Benefits of Using Hadoop
A few of the benefits of using Hadoop in your project include:
Shifting Roles
Traditionally, IT was completely responsible for all development. And if the business didn’t communicate its needs clearly to IT, or IT didn’t listen carefully, what they developed was not always what the business really needed or wanted. But with Hadoop, businesspeople – those who truly understand what their goals are and how the data can be used – are becoming far more involved in the process.
Because the big data platform capabilities are expanding, the emphasis on programming has decreased. As applications become more automated, less technical skill is needed to use them. The process is mirroring what happened with data warehousing and business intelligence (BI), where coding knowledge was necessary at first, but then vendors developed data integration and BI tools, which were much simpler and intuitive to use.
Frequently, the businesspeople tasked with working with Hadoop are called data scientists. This is an often-misunderstood job title. Some perceive that the data scientist is a coder, but in fact, it is someone whose main focus is the business. This is someone who really understands the business, its goals and what it needs to do with its data.
In the past, working with tools meant working with code. But now, the expertise data scientists must have is more in the area of developing predictive models and econometrics. The data models may have over 1,000 variables, so being able to predict customer behavior is more important than programming. In fact, for this role, a psychology major is going to have a better background than a computer science major.
The role of the business analyst is shifting as well. While the role used to require knowledge such as developing statistical programs, business analysts can now use data visualization and data discovery tools to interpret and make decisions based on their data.
The Data Difference
One of the problems in a big data initiative happens when people don’t understand that there is a variety of data and many different ways to manage that data – one size does not fit all.
Previously, people thought that data warehouses (DW) and relational databases could handle all of their needs. But then we started seeing unstructured data (like the content of emails and tweets), which didn’t fit in well with the structure or lend itself to a SQL query, no matter how complex. Now, people are deciding that everything needs to go into a Hadoop platform (or Hive or another comparable platform). The reality, however, is that data comes in all shapes and sizes. Some you need in real time, some you can wait for. Think of data in terms of the three Vs: different volumes, varieties and velocity. With this in mind, you can see that some data belongs in traditional databases, while other data is better suited to Hadoop platforms.
It is not uncommon for an organization to forget the lessons of the past and repeat its sins. This is evident in regard to two lessons forgotten in initial big data efforts. First, when organizations use new data technologies they tend to make the mistake of building a new data silo because it seems to be the easiest way to get things done quickly. But data silos will hinder an organization’s ability to examine data from a variety of sources, which then makes getting long-term business value from a big data initiative even more elusive.
Second, during the initial DW wave, organizations kept trying to design data capture systems, e.g., enterprise applications, the same way they designed the DW for analytical purposes. It became best practice to design data structures differently for data capture and data analysis. Many big data initiatives are revisiting the past and trying to combine capture and analytical design – resulting in costly misuse of the new big data technologies.
Your Big Data Plan
When big data is part of your plan, remember three things: