Big Data is racking up a lot of buzz these days. Is this a game changer in analytics field? Are we ready for that? It is a lot easier to understand the impact when we look at the anatomy of Big Data.
Big Data refers to huge quantity of data captured with information explosion. People often cited the amount of data captured by companies such as Amazon and Google. Quantity comes with two dimensions, each with its own challenges: replica of instances (rows) and attributes (columns). Analytics professionals are not afraid of data; in fact we are data junkies. Data to an analyst is the same as the ingredients to a chef. We worry more about a lack of data than we ever do about having too much.
So, how does the analytics community deal with these two challenges? Starting with the easier one first. Analytics professionals have developed well known methodologies to deal with a large number of rows. One of the common and effective (and easier) practices is sampling. On the other hand, it is quite an involved process to resolve the challenges brought about by a large number of attributes. These attributes represents the new information we have never had access to before. Examples are the thousands of offline demographic information available at massive scale and the abundance of online digital footprint data (social media, display, SEO, etc.). More extreme cases are genome-related data where there are many more attributes than instances. Traditional statistical textbook analysis methods generally come up short to handle it with dignity. Fortunately, with wide adoption of predictive analytics in many industries, analytics professionals in the commercial space have developed a number of enhancements to textbook methods. Now we can handle thousands of attributes with ease and confidence. In short, analytics practitioners are already geared to deal with the challenges of Big Data.
Big Data opens whole new worlds for analysts to help inform complex marketing plans. It not only feeds analytics practitioners with right ingredients for more robust analysis, it also broadens the frontier of research capabilities. Just one example is that practitioners can now build many more models concurrently and use polling techniques to build even stronger and more effective models since we have plenty of instances.