Enterprises today are faced with a flood of data that they could never have imagined a decade ago. Yesterday’s CIO thought cell phones were just for making phone calls. Today’s CIO knows they are powerful hand-held computers that generate massive amounts of data, from taking photos and composing texts, tweets and emails, to scanning QR codes.
Much of the data enterprises are gathering today is unstructured, and that can be a game-changer. Yesterday’s enterprise, with its transactional systems and relational databases, was well poised to collect and store structured data such as sales figures, prices and addresses. But today’s enterprise is gathering more unstructured data from sources such as emails, social media, patient records and legal documents.
The Three Vs of Data
Unstructured data ramps up the three Vs of data: volume, velocity and variety. The three Vs are not limited to big data, although it does put them on steroids. The volume of data, according to IBM’s report “Bringing Big Data to the Enterprise,” is increasing so fast that 90% of the data in the world today was created in the last two years alone. Think of all the data in just one day’s worth of tweets.
The velocity increases as data becomes more time-sensitive. There is greater pressure to decrease the amount of time between data capture and analysis. We now depend on the speed of some of this data. It’s extremely helpful to get an immediate notification from your bank, for example, when a fraudulent transaction is detected, enabling you to cancel your credit card immediately.
And the variety changes all the time. Not only is the data coming from so many different sources, when it is unstructured, such as audio or video, or semi-structured such as XML and RSS feeds, it has to be handled differently. While relational databases might be able to take advantage of faster CPUs, more storage and distributed processing to handle high volumes and the need for faster access to data, the variety – namely, unstructured data – is where there can be a problem.
Analytics Grow More Complicated and More Necessary
With this flood of data, both structured and unstructured, comes a flood of analytics. Sure, enterprises are adept at gathering all sorts of data about their customers, prospects, internal business processes, suppliers, partners and competitors. Capturing data, however, is just the beginning. Many enterprises have become overwhelmed by the information deluge and either cannot effectively analyze it or cannot get information that is current enough to act upon.
Adding to the information deluge, now many more people in an organization need the information that comes from all this data. This need extends from the top to bottom of the organizational hierarchy and is pervasive across business groups.
Businesses can’t underestimate the importance of their analytics initiatives. While enterprises still need leaders and decision makers with intuition, they depend on data to validate their intuition. In this sense, data becomes a strategic guide that helps executives see patterns they might not otherwise notice. The Bain study “Big Data: The Organizational Challenge”[1] found that enterprises with the most advanced analytics capabilities outperformed competitors by wide margins, with the leaders showing these results:
If the decision makers of an enterprise didn’t realize before that their information is a critical corporate asset, they certainly must realize it now.
A Strategy for the New Data
So, how does an enterprise approach the task of analyzing all this important, but hard-to-manage data especially if its current workhorse, the relational database, is not up to the task? Handling the new data requires some new thinking.
The leading database contenders are Hadoop and the dozens of NoSQL entrants.
Hadoop and its ecosystem – MapReduce, Hive, Pig and others – have garnered much of the Big Data hype and registered many impressive implementations. But at its core, Hadoop is a file system – inexpensive, highly scalable, distributed and fault tolerant – and not a database. With its pedigree in Internet search, it is a terrific match for many types of Big Data such as social media and web analytics. But there are limitations and costs associated with it: it is batch-oriented, needs lots of manual coding and requires those other programs in its ecosystem if (and you will) need analytics. The cost of cheap hardware and software is lots of labor-intensive activities and time-consuming systems integration.
NoSQL (Not Only SQL) databases are the key alternative to Hadoop. They offer database capabilities but are built to handle a variety of data types and be highly scalable. There are several architectural approaches used in NoSQL: wide column store, document store, key value and graph databases. With so many entrants and variations in architectures, it is sometimes difficult to position them accordingly.
When examining NoSQL database alternatives, you should make your selection on what capabilities your enterprise needs to support both data capture and analytics, such as:
Unstructured data certainly presents many challenges, but it’s the wave of the future and is often full of valuable business insights. It behooves an enterprise to develop a comprehensive strategy to manage and analyze it.
References:
1. http://www.bain.com/Images/BAIN_BRIEF_Big_Data_The_organizational_challenge.pdf