We use cookies to personalize content, to provide social media features and to analyze our traffic. We also share information about your use of our site with our social media, advertising and analytics partners. For information on how to change your cookie settings, please see our Privacy policy. Otherwise, if you agree to our use of cookies, please continue to use our website.

What’s Wrong with Big Data?

The biggest problem with big data today is not the data, it’s us, the analysts. The brutal truth about big data is that despite the fields of server farms processing and storing ginormo-bytes of data, the technology stacks today are designed for operational purposes; and the analyst of today (or yesterday) isn’t really prepared to deal with it. I was just sitting in a presentation of MongoDB, which is a “big data” platform technology that is NoSQL based. Anyway, we were talking about how easy it was to put data in. No complicated data statements or field width settings, it just sucks up the data and it’s in.  So simple.  But then, just ask it to do a simple count by state of the data we just sucked in and we need to do several java-based commands with some kind of merge just to get some counts.

This is an illustrative example of what I’m talking about. The people who really know the big data platforms out there today are generally technology oriented and the analytic professionals out there today are really more traditional statisticians (that know SAS and SPSS).  There definitely is a current gap in big data that is made up by the lack of true data scientists – those that can marry the statistical prowess of the traditional analyst with the programming wizardry of a computer programmer. Now, it’s not that these people don’t exist, but in the next 2 years, I believe there IS and WILL continue to be a shortage of this unique resource.

Given that, I don’t think this is a long term problem. Tools will catch up. SAS already has “proc hadoop” in its arsenal and this will continue with other tools as well. The open-source R packages are being developed in order to be “big data” ready and providers like Revolution Analytics are offering commercially viable solutions in R. And these products will continue to evolve. Also, the analytical community will catch up. For example, at Merkle, we are investing in recruiting and building the talent pool of the new data scientist. But it’s not that simple. To find people who know statistics and are comp-sci wizards, where do you go?  Are they in the stat program in major universities? Or do you go to the computer science area? What about the math program? These are challenges that we are all going to face as we continue to try to understand what big data really means and how we’ll need to continue to adapt to it and evolve as analytic professionals.

Join the Discussion