Considerations for Implementing Hadoop for Your Big Data Project

Hadoop 2.0 — From Open Source Projects to Enterprise Solutions
June 12, 2013
Self-Service BI customers are not all the same (Part 1)
July 17, 2013
Hadoop 2.0 — From Open Source Projects to Enterprise Solutions
June 12, 2013
Self-Service BI customers are not all the same (Part 1)
July 17, 2013
Show all

Considerations for Implementing Hadoop for Your Big Data Project

articles_inside_analysisIf your enterprise is considering Hadoop to help handle your big data workload, you’re in good company. Many organizations are exploring the use of this software to manage large amounts of data and more complex queries.

Hadoop has grown in popularity as big data has spurred the evolution of platforms capable of handling its large demands. Tools have evolved from simple tool sets to complete platforms. Large systems integrators and tech titans such as IBM have embraced the open source movement, making it more available and accessible to customers who are choosing how to implement big data for their enterprises.

Benefits of Using Hadoop

A few of the benefits of using Hadoop in your project include:

  • Your choice of the cloud or on-premises. Because big data projects are usually begun from scratch (as opposed to legacy applications), enterprises can decide whether to deploy on premises or in the cloud right from the start. Either way, their people must ramp up on technology, developing the skills needed to deploy the infrastructure for their big data project. Cloud deployments, however, give people the opportunity to get into the space more quickly and get business value out of it.
  • Advanced tools. As with most new technologies, the first wave gets implemented and becomes a data or application style; this is the case with some of the initial implementations of big data. Many vendors, both open source and larger vendors, have developed data integration tools and suites that work with Hadoop, Hive and related projects. These tools allow people to interplay big data with structured data and get even more value out of both.
    Advanced analytics tools have evolved to take advantage of Hadoop. People have been using advanced tools such as data visualization for years, but now many of the tools can be extended to Hadoop, which further expands the business value that people can ultimately get out of their big data. Textual analytics, data mining and rich data visualization often enable businesspeople to gain the insights that would have eluded them without big data.
  • Efficiency. Who can refuse an offer to do more with less? Not enterprises determining how to manage their big data. Hadoop offers increased capabilities, yet requires less programming than traditional platforms. Big data has often meant you had to engage in “big programming.” Hadoop reduces that need, which helps lower an enterprise’s resource costs while speeding up their time to market.
  • Services. As with any new technology wave, there is often a severe shortage of skill resources to successfully implement an organization’s first big data projects. Large systems integrators provide a variety of big data services to jumpstart an organization’s efforts. In addition, many emerging best practices are being built into the big data platforms that have emerged in the marketplace.

Shifting Roles

Traditionally, IT was completely responsible for all development. And if the business didn’t communicate its needs clearly to IT, or IT didn’t listen carefully, what they developed was not always what the business really needed or wanted. But with Hadoop, businesspeople – those who truly understand what their goals are and how the data can be used – are becoming far more involved in the process.

Because the big data platform capabilities are expanding, the emphasis on programming has decreased. As applications become more automated, less technical skill is needed to use them. The process is mirroring what happened with data warehousing and business intelligence (BI), where coding knowledge was necessary at first, but then vendors developed data integration and BI tools, which were much simpler and intuitive to use.

Frequently, the businesspeople tasked with working with Hadoop are called data scientists. This is an often-misunderstood job title. Some perceive that the data scientist is a coder, but in fact, it is someone whose main focus is the business. This is someone who really understands the business, its goals and what it needs to do with its data.

In the past, working with tools meant working with code. But now, the expertise data scientists must have is more in the area of developing predictive models and econometrics. The data models may have over 1,000 variables, so being able to predict customer behavior is more important than programming. In fact, for this role, a psychology major is going to have a better background than a computer science major.

The role of the business analyst is shifting as well. While the role used to require knowledge such as developing statistical programs, business analysts can now use data visualization and data discovery tools to interpret and make decisions based on their data.

The Data Difference

One of the problems in a big data initiative happens when people don’t understand that there is a variety of data and many different ways to manage that data – one size does not fit all.

Previously, people thought that data warehouses (DW) and relational databases could handle all of their needs. But then we started seeing unstructured data (like the content of emails and tweets), which didn’t fit in well with the structure or lend itself to a SQL query, no matter how complex. Now, people are deciding that everything needs to go into a Hadoop platform (or Hive or another comparable platform). The reality, however, is that data comes in all shapes and sizes. Some you need in real time, some you can wait for. Think of data in terms of the three Vs: different volumes, varieties and velocity. With this in mind, you can see that some data belongs in traditional databases, while other data is better suited to Hadoop platforms.

It is not uncommon for an organization to forget the lessons of the past and repeat its sins. This is evident in regard to two lessons forgotten in initial big data efforts. First, when organizations use new data technologies they tend to make the mistake of building a new data silo because it seems to be the easiest way to get things done quickly. But data silos will hinder an organization’s ability to examine data from a variety of sources, which then makes getting long-term business value from a big data initiative even more elusive.

Second, during the initial DW wave, organizations kept trying to design data capture systems, e.g., enterprise applications, the same way they designed the DW for analytical purposes. It became best practice to design data structures differently for data capture and data analysis. Many big data initiatives are revisiting the past and trying to combine capture and analytical design  – resulting in costly misuse of the new big data technologies.

Your Big Data Plan

When big data is part of your plan, remember three things:

  • It’s about the business. The whole point of big data in an enterprise is to improve the business. However, more than half the time spent on big data (according to a Forrester study I recently read) is taken up with data analysis, with very little time spent on planning for what the business actually needs. Your most important big data activity is to figure out what your business needs, then go from there.
  • Make the analytics advanced. You’re going to need a lot more than text analytics. Include these advanced analytics technologies for developing predictive models, since the business is going to want to use big data to figure out what is going to happen, not just what already happened.
  • Data awareness and cultural shift. There is a profound cultural shift happening in analytics today, as businesspeople become the leaders in understanding what and how to use data to grow and improve their businesses. IT will shift into a supporting role of creating the data backbone for busines people to use in their analysis rather than creating the analysis for them in the form or prebuilt reports and dashboards.

Leave a Reply

Your email address will not be published. Required fields are marked *