We use cookies. You have options. Cookies help us keep the site running smoothly and inform some of our advertising, but if you’d like to make adjustments, you can visit our Cookie Notice page for more information.
We’d like to use cookies on your device. Cookies help us keep the site running smoothly and inform some of our advertising, but how we use them is entirely up to you. Accept our recommended settings or customise them to your wishes.

Big Data = Big Data Quality Challenges

A recent webinar was being promoted with the following information:  “94% of executives polled suspect that their customer and prospect data is inaccurate in some way. In fact, respondents think that, on average, as much as 17% of their data might be inaccurate.”  Numbers and statistics, in general, can be problematic. When coupled with the fact that these numbers are based on survey responses to “executives polled,” one is reminded of the quote alternately credited to Twain or Disraeli about “Lies, Damned Lies, and Statistics.” But there are some meaningful nuggets in those statistics.

First: “94 % of executives polled suspect that their customer and prospect data is inaccurate in some way.” What’s meaningful in this bullet?  If 94% of execs think their data is inaccurate in some way, I wonder if this means 6% believe their data is completely accurate.  If so, then these 6% are either incredibly optimistic or willfully deluded. The fact is that no company’s data is completely accurate. Between keying or other human-level errors, data changes (approximately 4 million births, 2 million deaths, 2 million marriages, 40 million moves, and so on), and transactional or silo disagreement, there is not a company anywhere whose data is completely accurate.

Second: “As much as 17% of their data might be inaccurate.” This one feels closer to reality when considering the many potential types of error. Based on a lot of the customer marketing databases with which I’ve had experience through the years, this number may be conservative. If we consider a hypothetical structured row/column dataset with 10 million customers and 50 fields, 17% would imply 85 million erroneous fields. That’s a lot of error. 

Which brings me to this central point: the exploding volume, velocity, variety, and variability of Big Data are forcing marketers (and data and technology providers) to make big improvements in data quality as a part of their drive for effective engagement.  Put plainly, brands cannot meaningfully engage with customers, and certainly cannot do so cost effectively, using flawed data. Investment in data quality isn’t new with the emergence of Big Data: it’s simply that that the complexity of what’s needed has increased.  We’ve migrated from basic name and address hygiene for contact data and getting descriptive data from a handful of providers (and hoping it was right)…  to advanced integration and enhancement routines so that we can accurately recognize, intelligently analyze, and meaningfully interact with individuals across media and channel touchpoints. With marketing data flowing in from – and back out through – display ads, search, mail, email, social media and even, increasingly, TV and radio, data quality is central to:

  • Accurately reconciling contact or identity information such as names, addresses, social IDs, email addresses, mobile numbers, or IP addresses
  • Associating it with descriptive information such as demographics  and psychographics
  • Tying it to transactional or behavioral information including shopping and buying behavior, marketing response behavior, and even unstructured comments or intentions from user-generated content

There are a lot of companies promising glitzy, glowing things in “sexy” areas of Big Data and marketing right now. But getting the data right, for contact and connection information, for analytics, and for meaningful engagement, is still a core foundational need that can’t be ignored.

Join the Discussion