Big Data is one of those buzz terms that is just starting to get annoying. Can we please stop using it to describe everything? Given that, I must admit, I do think there are three fundamental shifts happening in the analytics space and it’s a transition that started several years ago.
The first major shift is the sampling, or should I say the lack of it. Analytics has always been closely tied to statistics. Although I would argue that the two are not synonymous, they are very similar. One of the core principles in statistics is the idea of probability and sampling. By looking at a smaller sample, I can make inferences about a larger, theoretical, population. Well, the first shift in “Big Data,” is that I now have visibility to the entire population, I don’t have to sample. I believe that this has created two opportunities in terms of analytics – mass personalization and analysis of the outliers. Mass personalization is basically the idea of individual personalization but on a mass scale. In order to personalize to the individual, I must have all the records, and I cannot sample to do this. In terms of outliers, we used to throw these out, as they skewed our analysis. Today, we want to find the outliers, because these can represent rare events that we are looking for. You’d never sample to identify the rare events, you can’t, by definition!
The second major shift is the movement from batch to inline analytics. When I started in the field almost 20 years ago, we operated in batch in terms of performing the analytics and implementing it (scoring). Although batch analytics is not a thing of the past, the shift is happening where algorithms and scoring are now happening in real-time and on-demand. Technology has pushed this revolution and analytics needs to keep up as we continue to embrace a digital world.
The last major shift in analytics is the idea of working with data that has never been considered data. One of the major changes with Big Data is dealing with unstructured data. Analytics don’t really like unstructured data. What’s the first thing we do when we get a new dataset? We clean up and organize the data. We don’t like unstructured data because we think it’s messy or dirty and we have to clean it up to use. Well, welcome to the world of Big Data, where everything is “messy” in that it does not have structure. The entire practice of text analytics was created because of the need to deal with this new, messy data. Ten years ago, a phone call wasn’t really considered “data.” But today, we can transcribe that conversation into text and analyze it, without a human listening to every single call. As things continue to get more “digitized,” analytics will need to continue to evolve to determine if there is meaning and value in these new data forms.
Even with these changes, I honestly don’t think analytics is being revolutionized in the Big Data era, it’s simply evolving. If I were to describe Big Data and how we need to think about it, analytically speaking, I’d summarize it into a simple three-step process: Access, Evaluate and Integrate. First, we need to get access to the data. Then, we need to figure out how to evaluate the data to find meaning, but then even more importantly, business value, in the data. Lastly, we need a way to act on it by integrating and implementing our findings. Merkle has been using this paradigm of Access, Evaluate and Integrate for 10 years in how we think about data, and I don’t see this changing, no matter how “Big” the data is.