There you go again!

One hears lots of silly things said at MR conferences and one of the silliest and oft-repeated refrains is that you can't do surveys with probability samples any more. There are even those who say that you never could. As often as I get the chance I point out that that's total nonsense. Lots of very serious organizations draw high quality probability samples all the time and get very good results. The prime example here in the US is the Current Population Survey, the government survey used as the basis for calculating the unemployment rate each month. Pretty much everything that comes from Pew is based in probability sample surveys as are many of those political polls that we follow so breathlessly every four years.

The concept of a probability sample is very straightforward. The standard definition is a sample for which all members of the frame population have a known, nonzero chance of selection. Unless you have a complex stratified or multi-stage design it's a pretty simple concept. As long as you have a full list of the population of interest to draw on and everyone has a chance to be selected the resulting sample can be said to represent the population of interest. But there are some serious challenges in current practice.

The first is assembling a frame that includes the entire population you want to study. For example, because of the rise of cell phone only households the landline frame that used to contain the phone numbers of well over 90% of US households no longer does. So it has become standard practice to augment the landline frame with a cell phone frame to ensure full coverage of the population. Clients often can supply customer lists that do a good job of covering their full customer base and we can draw good samples from them as well. Online panels are problematic because they use the panel as the frame and it contains only a very small fraction of the total population.

The second major challenge is declining cooperation. While there are studies that show even surveys with alarmingly low response rates can produce accurate estimates, low response rates make everyone nervous, raise fears of representivity and call results into question. The Current Population Survey gets 90% plus and so we trust the employment rate, but that kind of response is very unusual.

There are other challenges as well but I think it's the deterioration of the landline frame and very low response rates that cause some people to think that probability sampling is no longer possible. Anyone willing to spend the time and the money will get very accurate estimates from a probability sample, better than anything they'll get with an online panel or other convenience samples.

As I have written numerous times on this blog, the lure of online has always been that it's fast and cheap, not that it's better. And depending on how the results are to be used the method can be just fine, fit for purpose. But sometimes the problem requires representivity and when it does probability sampling is still the best way to get it.


Comments

4 responses to “There you go again!”

  1. I could not agree more with you, Reg. Moreover, even if good probability samples are getting harder to obtain or are not achievable in social media or mobile research, it does not imply that we (researchers) should give up on trying to get the best sample possible for a study. For example, probability samples have never been possible for mall intercepts. This didn’t mean that you would interview the first 100 patrons coming through the door on Monday morning with no sampling strategy.
    It’s always been important to understand the limitations of one’s sample and to try to unveil the effect those might have on the results; much more important than reporting a margin of sampling error. This fundamental research principle has not changed even if the emergence of new technologies offers different ways of collecting data.

  2. Of course I can’t disagree with your points Reg, but I think the salient point is “Anyone willing to spend the time and the money …”. Government and social research organizations often have the budgets to conduct that type of research whereas commercial research is often restricted on both time and money.
    Moreover, I believe that the shift to emotional measurement/behavioral analytics, ethnographic models, and social listening are clearly indicative of businesses simply not finding probability samples necessary for their business.
    The bulk of commercial survey research conducted today is based on convenience samples and we use a variety of techniques to try to compensate for that.
    I agree that the science of sampling is important and has it’s place, as do many research methods that have served us well in the past such as CATI, but increasingly they seem to be irrelevant (or at least of marginal importance) for most commercial research.
    Also, in an age when twitter and Facebook can be used to exactly duplicate results of probability sample polls, that does indicate that perhaps we’re being too dismissive of convenience samples, doesn’t it?

  3. @lennyism — I was more or less with you right down to that last paragraph. It just ain’t so. Less than a handful of one-off studies and people want to quickly make the leap of faith that because it worked once in this domain it will always work in all domains. Last year someone showed that Twitter can predict which new TV shows will be hits. This year it didn’t. The problem with this line of thinking is that there are no theoretical underpinnings. It just works! Where have we heard that before?

  4. I understand your point Reg and agree we need validation and consistent corroboration of these techniques, but I’ve been thinking for a while now that maybe there is something else at work here that is more difficult to quantify related to influence or herd mentality. I can’t put my finger on it, but where there is smoke there is fire… Perhaps it is just the law of large numbers at play, but undeniably in some circumstances it does work.