In our recent post, How to Protect Your Survey from Disengaged Responders and Ensure Quality Data, we discussed tips that help identify unengaged responders through straight lining, timing, and attention checks, and ways to promote thoughtful responses. So now that you’ve taken the necessary steps to collect the best possible responses in your survey, it is still possible that some poor responses have made their way into your data and that you will need to decide what to do with them.
This is a particular problem in motivational based research using panel surveys, so let’s use an example study on consumer banking choices where we’ve encountered disengaged responses.In our example study, let’s say that we’ve determined a handful of careless responses that could tarnish our data. Once we’ve determined which responses to flag as disengaged, two common approaches we could take are to: 1) eliminate the unreliable responses completely, or 2) include them in the analysis anyway. Both of these approaches can potentially lead to bias in the results, especially with a high proportion of flagged responses. Bias occurs when survey data does not accurately represent the population of interest — ultimately impacting the validity of the results. In the case of data quality, the fully engaged responses may not be truly representative of the population as a whole. Instead, those responses are representing only the part of your population that is good at taking surveys. This data quality bias is similar to the non-response bias often present in other types of surveys.
During the analysis, a better approach is to weigh the fully engaged responses to counteract the bias that results in eliminating the disengaged responses. One possible method to achieve this is to:
- Match responses to a third-party database such as Merkle DataSource. This also allows for reliable demographic data for all responses.
- Create a model using only the quality responses with the third-party data as the independent variables.
- Apply the above model when using the quality responses to score the flagged responses and determine how much bias has occurred.
- Assign weights to the reliable data to counteract the bias created by eliminating the disengaged responses.
- Proceed with analysis using weighted reliable data.
With this approach, there are a few items to keep in mind:
- When creating a model, the dependent variable may come from a survey question or a set of questions of particular interest for the study.
- This approach is dependent on several factors, such as the ability to match to a third- party database and the ability to create a good model; either of which may not hold true.
In our example study, consider a segment of the survey where customers are driven by convenience for selecting a bank. Most of the disengaged responders, however, who provided us with unreliable survey data, may be doing so because they are also driven by convenience, yet do not find taking a cognitively demanding survey very convenient and are therefore flagged as disengaged. So is the convenience segment actually larger than projected based on what’s represented in the unweighted quality data? By building a solid predictive model on the good data and scoring the flagged data accordingly, we would be able to test this hypothesis.
Consider the differences in the motivational distributions between the two samples below. In this case, we surveyed 900 respondents where we flagged 1/3 (or 300) of the responses as disengaged.
By using the modeled distribution of the flagged data to weigh the quality data, we can obtain an unbiased distribution below with the weights.
We can now see that customers driven by convenience are underrepresented in the quality data and are now better represented using the weighted data. Overall, we have a more reliable distribution of customers and the factors that drive them when choosing a bank. This is one way to reduce bias from panel surveys. It’s just as important to understand poor responses as it is to identify them. How disengaged responses relate to your research ultimately determines the quality of the data you rely on.
What are some ideas that you have to overcome data quality bias?