Shirking on the Web?

Here is an interesting question from this morning’s email:

"Are we able to see how long a particular respondent took to complete a web survey? My client’s concern is people who go through surveys in a couple of minutes, just pressing random buttons to get to the end as fast as they can – and being able to remove them from the data on an ongoing basis throughout fielding."

For starters, we can calculate the elapsed time of an individual survey from start to finish, but there is no straightforward way to identify surveys that may have been completed in multiple sessions. We can control that somewhat by only calculating elapsed time for surveys that are started and completed in the same day, but even then there is no straightforward way to tell whether an R was away from his PC for extended periods during the interview without going to the substantial effort of timing every single question.  The speed of the Rs Internet connection also can create significant variability in completion time.  So this measure is inherently dirty and there is no good way to establish a standard from which you might reasonably assume–and it would be an assumption–that an R has not taken the survey task seriously.

But this might not be the right measure.  Survey methodologists have long recognized that Rs sometimes do not put their full cognitive energies into answering every single survey question.  This behavior has been dubbed "satisficing."  (Europeans often use the much more appealing term "shirking.")  Sometimes it can happen for an entire survey but the more likely pattern is that this behavior sets in as the survey goes on and the novelty wears off. There are multiple ways that methodogists measure satisficing and two favorites are levels of item non-response (i.e., frequency of choosing non-substantive answers like Don’t Know or Refused) and levels of differentiation in  series of contiguous questions with the same answer categories, such as a long series of 0-10 satisfaction questions.    On the phone measures of differentiation show the degree to which an R gets into a pattern and just keeps giving the same or nearly the same answer for questions throughout the series.  On the Web we typically present these questions in a grid to save space and time, so we want to measure the degree to which Rs "straightline" through grids, that is, just click down the column and select the same answer for every question in the grid.  One can score individual surveys on these kinds of measures and then set limits under which surveys are deleted.  We already do some of this with measures of item nonresponse at the analysis stage for most of our analytic products but we have tended not to do it during data collection.  Perhaps we should.

The larger question may be whether there is any reason to worry more about this with Web surveys than with telephone or other modes.  The answer there probably is,"Maybe."  While we tend to see higher rates of item nonresponse in Web surveys than phone (more use of non-substantive answer codes) that probably is a function of our tendency to offer these codes to Rs in Web surveys but to treat them as VOL in phone surveys rather than shirking by Rs.  In other words, phone survey Rs aren’t told the Don’t Know is an acceptable answer.   So you can reduce item nonresponse on the Web by not offering the non-substantive code, the easy way out.

On the issue of non-differentiation or "straightlining" in grids the literature has it both ways.  Two of the best scientific comparisons of phone and Web (Chang and Krosnick, 2003; Fricker et al, 2005) differ on the issue with the former finding less non-differentiation on the Web and the latter finding more. Some internal work by Howard Speizer and Wyndy Wiitala have found a slight tendency for there to be more non-differentation among like items on the phone than on the Web.  All of this leads me to conclude that this is not a major issue.  There may be measurable shirking on some studies but it is not likely to be great or have a major impact on your data.

So my answer to this client would be:  You probably don’t need to worry about this. But if you do, the easiest way to measure this probably is to score each completed survey on its overall item non-response and to drop cases that fall below some reasonable standard.