Master the Art of Proper Data Quality in Research
Data quality in research is so vital in gathering insights. Without having accurate data, how can we make the right decisions? Conversely, when data is insufficient, we might make the wrong decision because it was based on incorrect or even fraudulent data. That can lead to billions of dollars spent on the wrong decisions.
“When we want to understand our customers, we have to have good data quality,” said Zoe Dowling, senior insights leader at Microsoft, on an episode of the market research podcast “Reel Talk: The Customer Insights Show.”
In this article, I cover the following:
- The spectrum of data quality in market research
- Why does fraud in research happen?
- Strategies to ensure data quality in research
The spectrum of data quality in research?
Lisa Wilding-Brown, CEO at InnovateMR, explained that data quality and the lack thereof happen on a spectrum.
“On one end, you have these nefarious users who are doing this at scale and are trying to exploit vulnerabilities of research and panel companies,” she said on “Reel Talk.” “In the middle of the spectrum, you might have real human beings recruited by over-incentivized websites. And then, on the other end, you have innocent issues that pop up.”
So the spectrum goes from nefarious attempts to wrong intentions to unintentional mistakes, which is good to know so we can address the specific use cases of issues in data quality.
“Cyber fraud is not going away,” Lisa said. “If you measure the GDP output of cyber fraud, it would be the third largest country after the United States and China. So just put that into perspective the global damage these fraudsters can deploy.”
Eric Santos, vice president of sales at conversational insights leader Voxpopme, breaks it down into four types of low-quality samples:
- Professional participant
- Bad actor
- Bots
- Frustrated participants
And it’s essential to understand the problem because it’s hard to fix something you don’t understand or aren’t fully aware of, said Vignesh Krishnan, CEO of Research Defender. Once aware, it’s essential to understand why it’s happening.
“One thing that surprises me is when people get surprised when they see so much fraud,” Vignesh said. “And I don’t think we should be anymore.”
But, sometimes data quality isn’t a fraud issue, but a design issue, said Steven Snell, principal survey methodologist at Goldman Sachs on “Reel Talk.”
“Most surveys should be done done in nine minutes,” he said. “And if you are going over 10-12 minutes, that should be the exception to the rule. Otherwise, we deserve every black eye. They are really not engaged.”
Why does fraud in research happen?
As is often the case, it comes back to money, said Vignesh.
When fraudsters can scale survey fraud, they can bring money to the bank by increasing the volume and automating it.
“With fraud, the question is, how do you control it so it doesn’t affect the larger ecosystem,” Vignesh said.
He explained why fraud in research has increased:
- Years ago, most everything was done manually. So, for example, people would email about a survey, somebody would create the survey, and then it gets created and launched.
- The systems are more dynamic today, and surveys are launched quickly.
“In a more dynamic ecosystem today, you have far more areas where fraud can happen,” he said. “It’s dynamic, and nobody is manning the door, basically.”
Vignesh said fraud in research happens in various research methods, including quantitative surveys, qual studies, and even in-home product testing.
Read next: What to do with incomplete survey responses?
Of course, fraud in research creates several issues:
- First, data cleaning must be a priority and is time-consuming, like when a third of the open-ended answers don’t make sense. Somebody might spend time going through them.
- Financial losses
- Wrong business decisions based on fraudulent responses
- Once fraud is discovered, it can challenge other research results – past, present, and maybe even in the future.
Strategies to ensure data quality in research
No matter what issues the fraud is causing, it must be addressed.
“If you lose trust, somebody might say, ‘why do we need to do research at all?'” Vignesh said. “That’s an even bigger problem because that question can undermine future projects.”
Straight-up fraud in research, the one extreme of the spectrum, happens especially in an economic downturn and when technology enables bad actors to cheat the system in easier ways than would have been possible years ago.
“There’s a lot of strategies you can implement to get ahead of it,” said Lisa. “There’s a balance between technological and methodological strategies to deploy.”
She said that technical strategies can include digital fingerprinting, while methodological ones can consist of how questions are asked and how people respond.
In-survey strategies
Lisa said fighting fraud in research and getting good quality responses requires good survey design from the start.
In quant surveys, the danger of survey bots can be a problem, but several strategies can weed them out, including using red herring questions that ask questions to see if people are paying attention and that are hard to answer for bots.
Read next: How to avoid survey bots
Use red herring questions to ensure that the right people are asked the survey questions.
“They can be used to test domain expertise,” Lisa said. “Let’s say you are going after IT decision-makers. Create questions that are designed to test their domain expertise.”
In your questions, include real brand names and non-existent brand names to weed out people who don’t know and to catch bots.
“If they are aware of those fake brands, you know they aren’t in it for the right reasons,” Lisa said.
Before even getting into the survey, ask good screener questions to get the right people to the survey.
Prevention
But survey fraud can happen anywhere, which is why it’s so important to stay ahead of trying to prevent it.
For example, in the Voxpopme video survey platform, panel manager Matthew Handegaard and the team have implemented several strategies to prevent fraud in research and ensure data quality.
“Respondents are required to show their face, which helps identify and remove duplicates,” Matthew said. “We also geofence, so anyone registered is inside the country. Then at the response level, we reject bad responses and repeat bad respondents are deprioritized, which could lead to removal.”
Read next: How to ask inclusive demographic questions in your market research
Scrutiznize responses
Vignesh said this already happens in the industry, but it’s good to mention that some data quality issues – including fraud – can be caught by reviewing the responses. In addition, some types of responses can easily be dismissed as useless when they don’t even answer the question.
Partnerships
Relationships matter, which certainly also holds regarding assuring data quality in research. Ensure to understand what your suppliers do to strive to get a high-quality sample, Lisa said.
Consider: Use the questions for users and buyers of online samples from ESOMAR
“Look at how they recruit and incentivize people,” Lisa said.
On-staff expertise
Tactics and technologies of bad actors committing fraud in research change constantly. That’s why it’s so important to have somebody on staff that focuses on the issue, Lisa said.
“Make sure that all these mitigation tactics are deployed,” she said.
To deploy tactics, somebody needs to understand fraudsters’ tactics and how they evolve. For example, take the example of device farms discussed in this interview with a former fraudster.
Consider getting outside perspectives. That could include studying former fraudsters, talking to somebody new to the market research industry about a problem, and participating in formal mentoring.
“There’s just so many formalized mentorship programs,” Lisa said. “Get involved, and you’ll get so much benefit from it, and companies will see that benefit from that, too. So everyone wins.”
Read next: Our checklist from experts: Building a team the right way
Technology to scrutinize responses
Use technology solutions – like Natural Language Processing – to review the responses. Do they make sense? Are they in line with the question?
“Throwing manhours at the problem needs to be avoided,” Vignesh said.
“In a world where everything is getting automated, this is a huge opportunity,” added Jenn Mancusi, CRO at Voxpopme.
Building a community
Eric said building a community of the right respondents is one way to manage data quality control in research. In a community, the participants have opted in for ongoing research, and a theme typically connects them.
Review how respondents can log in.
Vignesh said that fraudsters often use Virtual Private Networks (VPNs) to log in. So they might be in one country, but through a VPN, make it look like they are in another country.
“Most of this fraud comes from places where the $1, $2, $3 payout for surveys goes a long way,” Vignesh said.
The hardest part here is to gauge intent, he said. For example, there might be a good reason somebody is using a VPN and not trying to commit fraud.
“The question now is, ‘are you going to trust that person or not?'” Vignesh said. “On one hand, you can give reasons why you should trust that person, but on the other hand, it’s one of the most commonly used methods to hide your origination.”
Consider all the options.
Keep an eye on new emerging options, and don’t put all your eggs in one basket.
“We need to use many different tools and strategies to catch these individuals,” Lisa said.
Collect the right amount of personal information.
Collecting the correct information and the right amount can help us verify who participates in our surveys. But, of course, there’s also a level of consumer trust and privacy concerns that need to be honored here.
“There’s a halo effect that can happen when other industries not even connected to us do bad things or report a data breach,” Lisa said. “That can erode the trust of our consumers in survey research. That can have a ripple effect on our space.”
Some of that can be overcome through good communication, said Jenn.
“How are we communicating that this is what we are doing, what you are opting in for, and what we are going to use your data for, and that there’s no personally identifiable information attached to it,” she said.
Combine methods and track behavior over time.
Vignesh said fighting fraud can be more successful when methods are combined. For example, you can ask somebody if there are on a VPN, and if they say no and you know they are on a VPN, something is amiss.
Then look at what people are doing over time.
“You can’t really improve something you don’t measure,” Vignesh said, stressing the importance of observing respondent behavior over time. “But you can improve what you can measure.”
Is it really possible to stop fraud in research? Vignesh said that’s unlikely, but we can make it harder and more expensive for those trying to commit fraud.
“You have to control and mitigate it, and when it does happen, make sure it’s minimal,” Vignesh said. “There are many levers to pull but make it more expensive for these fraudsters. Make them stick around for a longer time – when they stick around longer, you can see more bad behaviors.”
Data quality in research final thoughts
Ensuring data quality in our research is crucial. Whether the quality is threatened because of fraud, the wrong incentives, or simply mistakes, it’s a problem that we as an industry together need to keep working on fixing. And if we can’t eliminate it, let’s minimize the impact.
Listen to our market research podcast.