Over the last few weeks I have been in a few settings where people took to comparing survey results from a Web panel (or two) with results from phone or in-person studies that used a probability sample. The results were not all that comparable, and the general sense of the people looking at these results more or less came down to, "How can this be?" There then followed a search for reasons that mostly came down to either bad panel or bad panelists.
Should we be surprised when nonprobability samples yield different results than probability samples? Or surprised when they are the same?
Now it's always seemed to me that this gives short shrift to Sir Isaac's Philosophiæ Naturalis Principia Mathematica, but putting that aside, the thing about gravity is that it always works and in very predictable and mathematically precise ways. Alas, online does not and there is evidence aplenty to demonstrate it.
This may seem to be another of those traditionalist rants against online, but it's not. It's more like a plea to recognize that online is different, that there is no inherent reason why it should produce the same results as probability-based methods. When it doesn't the search for an explanation ought to start with the fact that the sample frame (i.e., the panel or all panels, for that matter) is biased and that no amount of demographic balancing or weighting is sufficient to fix that. We have come to simplify the problem by assuming that if we can get the demos right we have a good sample, but the heart of the problem is not the demos, it's the cluster of attitudinal and behavioral characteristics that cause some people to go online while others do not and cause some people to join panels while the vast majority of others do not. At this point we don't understand those differences well enough to measure them and therefore dealing with that bias is a hit and miss game.
At least until another Isaac Newton comes along.