Among the many criticisms we hear of online panels is the charge that we have no idea whether these respondents are who they say they are and that the incentive-driven nature of panels encourages people to pretend they are someone they are not. A number of companies have introduced products to clean up online samples so that we can know with some confidence that our online respondents are real. Not surprisingly, each of these companies uses a different approach and, as research presented at the CASRO Online Conference back in March by Melanie Courtright and Chuck Miller from DMS demonstrated, can produce different results.
Melanie and Chuck started with an online sample of 7200 people with a roughly 60/40 split between their own panel and other third party sources. They balanced the sample on the front end on age, gender, income, and ethnicity. They asked each respondent at the outset for his or her name, mailing address, and date of birth. They submitted what they got to four companies offering validation services. Then they administered a questionnaire containing a number of demographic, lifestyle, attitudinal, and behavioral questions. All respondents were administered the survey, even those whom the validation service could not validate as real. The results should trouble all of us:
- The percent of respondents validated varied by almost 10 points across the four providers (87.4% vs. 78.4%).
- Barely half of 18 to 24 year olds were validated with closer to a third for two of the providers.
- Hispanics and Asians validated at significantly lower rates than whites and African-Americans.
- While there were no significant differences by gender across providers, males were almost twice as likely to fail the validation check as females.
These results are similar to results in some proprietary research that we did for a client in 2010. We also found that respondents with lower incomes and less education were more likely to be flagged as invalid.
Melanie and Chuck also compared the substantive survey findings across validated and non-validated respondents where they found other important differences. For example:
- Validated respondents reported lower rates of iPhone and smartphone ownership than respondents who did not validate.
- Validated respondents also reported being more careful and thoughtful shoppers than the non-validated group.
I find all of this worrisome. One of the findings from the ARF ORQC's study of 17 online panels was that online panels are not interchangeable, that the answer you get for any given research question might well depend on the panel you choose to work with. This research from DMS suggests that validation rates also will vary depending on the service your panel provider chooses to work with, potentially adding still another layer of bias. The apparently strong relationship between age, ethnicity, education and income on the one hand and likelihood to validate on the other adds still another layer of concern. Are we running the risk of excluding the very people who are toughest to reach—young people, ethnics, the less well-educated—simply because they don't show up in "the system," don't own credit cards or have mortgages? In the name of making things better are we actually making them worse?
Comments
4 responses to “New challenges to online panel data quality”
Hi Reg – just checking, was the subject sample in this study all opt-in? What are your thoughts on potential applicability to online panels recruited using probability methods? Are you aware of comparable validation studies of probability-based online panels?
Thanks and good seeing you at AAPOR.
Hi Mike — I would expect that probability-based methods would not have this problem because the panel builder samples respondents from lists (e.g., telephone numbers, address) and contacts them via that channel. And so the panelist is known to exist at that telephone number or address, and the chances of that person getting into the panel a second time are small enough to ignore.
This seems like an interesting article.
It seems that younger folks and minorities are harder to validate because they have smaller financial footprints. I think it is a leap to equate ability to validate with sample quality.
In practical terms, doesn’t this argue for blending multiple panels per survey, and controlling that blend across surveys? Obviously this does nothing to solve the basic quality problems, but at least it cancels out the artifact you could introduce by throwing entire projects at a single panel provider and hoping that has no effect on responses.