I am in Chicago at a mini-conference called The Road to the Client Congress. The conference has been organized by Bob Lederer who, among other things, publishes an MR trade rag called The Research Business Report.
This all started back in the fall of 2006 when Bob managed to tap into what he perceives as a deep-seated angst among clients about data quality, especially data collected from online panels. So he has been hosting events ever since that keep talking about this issue, sometimes with "vendors" (as he likes to refer to research companies), sometimes just with clients, and sometimes with both. I don’t know if the data quality problem is a serious as Bob thinks it is, but he has certainly generated some heat on the issue.
This one is mostly "vendors" and our job here seems to be to help Bob set the agenda for the next client-only conference. It’s a collection of about 30 or so of the usual suspects from a panel companies and "vendors." There also is a token panel of four clients who periodically get put on the spot to state the client perspective on something.
In the run-up we all got some homework that consisted of submitting to Bob our input in five areas:
- Define "Data Quality" for this meeting in one sentence
- The key Non-Metric data quality issues/questions every client researcher should ask of research agencies and online panel providers
- Metrics that truly indicate data quality and should be part of every research project report
- Implementable mechanical guidelines (e.g., traps for speedsters, straightliners, etc.)
- What should be transparent to clients in online panel research?
We spent most of yesterday going over 1 thru 3. The feeling I had throughout was that we were rewriting The ESOMAR Guide to Conducting Research on the Internet‘s 25 questions to help research buyers of online sample, and doing a poor job of it at that. By the end of the day I felt that we had had a lot of fun arguments but had not moved the issue of panel data quality forward as much as an inch. There has been a lot of good work on this topic of which the ESOMAR 25 and the soon-to-be-finalized ISO standard on access panels are two good examples. But these are largely ignored here in the US. Whether that’s some sort of xenophobia or a not-invented-here sydrome I can’t say, but I think the failure to recognize the good work being done on quality problems around the globe is one of the reasons why the US MR community just seems to be talking in circles on the issue of data quality. I can think of at least four US-based trade associations or professional groups who have developed online standards, but their impact on data quality has been nil. We are at least a year and maybe two years behind the state of the debate in Europe and I don’t see us doing anything to close that gap.
Worse yet, it doesn’t feel to me like we even have found a good way to frame the problem. The Europeans have generally taken the approach that we don’t know enough to set serious standards so let’s instead create transparency so that buyers can impose their own definitions of quality. The ISO standard, for example, creates a common vocabulary and sets a very low bar in terms of requirements, but mostly is specifies the kinds of procedures and documentation a panel company must be prepared to disclose when a client asks. In other words, it creates transparency. The ESOMAR 25 questions tell you the questions you should ask but they don’t try to tell you what the answers should be. That’s up to you. And that’s how it should be. Partly that’s because we’ve not seen enough research to know for sure what a lot of the answers should be, and partly that’s because the "right" answers in most instances may depend on the research problem that you are trying to solve.
Which gets me to the definition of data quality that I sent to Bob:
Data quality is the degree of fit among the method of data collection, the effectiveness of its execution, and the business problem under study.
Please make a note of it.
Comments
2 responses to “Reinventing the Wheel in Chicago”
I also like wiki’s definitions…
“The state of completeness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific use.”
“refers to the degree of excellence exhibited by the data in relation to the portrayal of the actual phenomena.”
Ironically you can also argue that some of the data we collect from “vendors” truly are a phenomena!
I think that’s a pretty good definition as well. I think the real key is the “appropriate to a specific use.” The “fit for purpose” piece is critical.