Insight: My, what big data you have!

It’s already common parlance, but ask yourself: would you rather deal with guy who has a relatively small data set, but really knows what to do with it, or someone who’s got a massive set, but no clue where to start? Let’s unpack the thought… Brandon de Kock does just that.

‘Big data’ is hardly new – the term was coined back in the late ’90s by Nasa scientists who were facing increasing 0s and 1s coming in from satellites and things and wondering what, how and where to deal with all of them. And here in 2015, there’s still confusion about how to accurately describe it. Some like to go the ‘massive data sets that are so complex it’s hard to process them’ route, others opt for more observable, or practical factors like the fact that most big data seems to be ‘real time’ in nature – or at least ongoing – a bit like the live analytics dashboard from Google.

But, and it’s a really big BUT, irrespective of your definition there’s one undeniable fact that many commentators seem to forget: it’s just data. And its usefulness relies on the same principle as always: without great understanding and analysis a terrible thing happens to data: nothing! The reason for the escalation in awareness is simply that everything you do these days leaves a digital footprint – and the job of big data is to collect and sell all of those footprints to the highest bidder,

So everyone from financial institutions to greengrocers suddenly has access to more numbers than they can crunch. But even if they know the right questions to ask, finding a method of storing, accessing, analyzing and presenting the tsunami of facts it is no different to the challenges faced in the days when ‘Margie and her clipboard’ brigade were standing outside the supermarkets.

The promise of big data is quite simple: given enough numbers to crunch, almost any area of human behaviour can be instantly and dynamically described and is entirely predictable. Ipso facto, we have entered an era of statistical correlation that promises to deliver every possible answer to every possible question.

Super. But where’s the ‘but’? And is it a big ‘BUT’? Well let’s simplify things with ‘the big four’. Firstly, when big numbers go wrong, it can be in a spectacular way. For example, it’s a verifiable fact that Internet usage in Portugal has doubled in the past decade. And it’s also a verifiable fact that there has been a 50% reduction in the recorded cases of drug abuse in Portugal over the past decade. Now, for a computer trying to find patterns in numbers, this would obviously indicate an extraordinary correlation, but the computer doesn’t have a crucial piece of information: in a groundbreaking legislative experiment, Portugal decriminalised all drugs 10 years ago. You don’t need to be Steven Hawkins to work out that there’s bugger all connection between internet usage and drug abuse in Portugal, but if your brain was made of silicon, you’d never know the pieces were from two different puzzles.

More worrying still, modern computers can generate unlimited correlations – in no time at all. Far more than any human team is capable of generating or (and here’s the challenge) is capable of dealing with. So for every 10 000 patterns spat out by a computer, there may only be one that’s actually interesting – or useful. It’s a crucial insight: big data is awesome and powerful, but not in isolation. In the insight business, much like the media business, the mantra has changed: context is the new king.

Secondly, it’s dangerous to take the validity of big data for granted. Just as it’s a tricky business merging conventional data sets, you’d be making a huge mistake to assume that patterns identified in data collected at one point in time will apply to data collected at another point in time. But the ‘non-stop nature of big data collection often ignores this and simply merges ad infinitum.

Thirdly, there’s the danger of snowballs. Seamless, streaming digital data builds and builds like a child with an ever-growing Lego collection. But if there’s a glitch at the beginning of that process that isn’t picked up, you’ll get a negative feedback loop happening. I’ve always refereed to it as a ‘perpetuated digital myth’ – you know, where something entirely false is circulated so much that it ends up ranking first on Google – and therefore becomes ‘common wisdom’.

And then there’s dirty data. Misspellings, abbreviations, glitches, slang and ambiguity make for a pretty noisy environment in a binary brain. And it’s exacerbated by the fact that language itself is an ever-moving feast. Twenty-five years ago, for example, big data would have thought a vacuum cleaner that ‘sucked’ was a good thing.

So if there’s a caveat for the modern age of ‘free and easy’ data, remember that all data, big or small, is a tool, not a solution. Put differently, there has never been a more effective way of revealing human behavior than big data, but it fails miserably to identify the reasons behind that behavior. As a consumer, it is at best, extraordinary in terms of how convenient it can make life and, at worst, terrifying. The 25 billion devices known as the Internet of Things bring with them unique possibilities for a data-rich future, but you’d be a fool to simply sit by and watch the numbers waterfall like the opening sequence of The Matrix.

Anyone can collect sea sand, you only have to know where the beach is, but to turn it into a piece of useful, beautiful, practical glassware, well that takes special skills. Oh, and if you are collecting sand on a beach and see a guy wearing a T-shirt that says, ‘Less data, more insight!’ feel free to greet me!

Brandon de Kock is creative director for WhyFive Insights (@WhyFiveSA), the Cape Town based research agency behind BrandMapp – the country’s most in-depth view of economically active South Africans.

IMAGE: Wikipedia Big Data