“Big data” is the buzz-phrase that’s being spouted by those who want to sound tech-forward, but what does it really mean? If you think it’s only about millions of records, Hadoop clusters and indecipherable algorithms performing unpronounceable functions, you’re way off.
It’s true that volume is one of the defining characteristics of big data. The amount of information available to organisations is far greater than ever before, and it’s certainly too large to be managed by the conventional databases that most businesses have relied on for decades.
But that doesn’t mean it’s all about volume. Data is the new oil, in its raw form it has limited value, but when refined it has enormous value.
Data is distinct, and there are many varieties of it. In order for it to be useful to you, you need to know what kind of data will help your answer the questions you’re asking. If you don’t know what these questions are, you mightn’t be ready to enter the world of big data at all. While it’s true that “data finds data”, and that it can help you gain unexpected insights and answer questions you didn’t know were there to begin with, unexpected insights and discovery is part of maturity in Big Data, first start with a problem in mind and work you way up the curve. If you set out without initial parameters of questioning, you’re setting out on a very difficult journey.
Consider, for instance, the big data solutions currently being deployed by electricity utilities around the world to power during peak-load periods. They collate data related to weather patterns and forecasting, locations of essential services, the types of appliances running in households and the power use of residential and business customers. When the power grid is nearing its max, through live streaming of data, historical data, and predictive analytics, they can then introduce short-timed rolling brownouts for air-conditioners that cut power to only these “background” appliances for a short period of time, so that while consumers are unaware or subject to minimal interruption for three minutes, the company is able to ensure the power grid does not reach overload and cause significant blackouts and disruptions for lengthy periods of time.
This wasn’t the result of randomly matching different types of data. It proceeded from a basic business problem: how do we prevent outages by conserving peak-load power during high usage periods? This specific question determined the variety of data included in the solution. There were myriad other data categories available – geo-spatial smartphone data, customer service interactions, social media posts – that were excluded, because they weren’t pertinent to the business problem the solution was designed for.
The bigness of big data is not simply about data volume. That’s not “big data”, that’s “more data”. The real size factor is the immensity of the questions that can be asked and answered with big data solutions. Some of these will require trillions of gigabytes of data, but most will not. It’s important to know what variety of data, in what specific volumes, you will need to build your solution.
Tristan Sternson is CEO at Infoready.