How to Get Better Survey Data More Efficiently

Mollie J. Cohen; Zach Warner

doi:10.1017/pan.2020.20

How to Get Better Survey Data More Efficiently

Published online by Cambridge University Press: 11 November 2020

Mollie J. Cohen

and

Zach Warner

Show author details

Mollie J. Cohen: Affiliation:
Assistant Professor, University of Georgia, Athens, GA30602, USA. Email: [email protected], URL: http://www.molliecohen.com
Zach Warner*: Affiliation:
Postdoctoral Research Fellow, Cardiff University, CardiffCF10 3AT, UK. Email: [email protected], URL: http://www.zachwarner.net
*: Corresponding author Zach Warner

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

A key challenge facing many large, in-person public opinion surveys is ensuring that enumerators follow fieldwork protocols. Implementing “quality control” processes can improve data quality and help ensure the representativeness of the final sample. Yet while public opinion researchers have demonstrated the utility of quality control procedures such as audio capture and geo-tracking, there is little research assessing the relative merits of such tools. In this paper, we present new evidence on this question using data from the 2016/17 wave of the AmericasBarometer study. Results from a large classification task demonstrate that a small set of automated and human-coded variables, available across popular survey platforms, can recover the final sample of interviews that results when a full suite of quality control procedures is implemented. Taken as a whole, our results indicate that implementing and automating just a few of the many quality control procedures available can streamline survey researchers’ quality control processes while substantially improving the quality of their data.

Keywords

survey design measurement error machine learning

Type: Article
Information: Political Analysis , Volume 29 , Issue 2 , April 2021 , pp. 121 - 138

DOI: https://doi.org/10.1017/pan.2020.20 [Opens in a new window]
Copyright: © The Author(s) 2020. Published by Cambridge University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Edited by Jeff Gill

References

Arceneaux, T. A. 2007. “Evaluating the Computer Audio-Recorded Interviewing (CARI) Household Wellness Study (HWS) Field Test.” In Proceedings of the American Statistical Association (Survey Research Methods Section), 2811–2818. Alexandria, VA: American Statistical Association.Google Scholar

Bagnall, A., and Cawley, G. C.. 2017. “On the Use of Default Parameter Settings in the Empirical Evaluation of Classification Algorithms.” arXiv:1703.06777v1.Google Scholar

Bennett, A. S. 1948. “Toward a Solution of the ‘Cheater Problem’ among Part-Time Research Investigators.” Journal of Marketing 12(4):470–474.Google Scholar

Bhuiyan, M. F., and Lackie, P.. 2016. “Mitigating Survey Fraud and Human Error: Lessons Learned from a Low Budget Village Census in Bangladesh.” IASSIST Quarterly 40(3):20–26.CrossRef Google Scholar

Biemer, P. P., and Lyberg, L. E.. 2003. Introduction to Survey Quality. Hoboken, NJ: Wiley.CrossRef Google Scholar

Biemer, P. P., and Stokes, S. L.. 1989. “The Optimal Design of Quality Control Samples to Detect Interviewer Cheating.” Journal of Official Statistics 5(1):23–39.Google Scholar

Birnbaum, B., DeRenzi, B., Flaxman, A. D., and Lesh, N.. 2012. “Automated Quality Control for Mobile Data Collection.” In Proceedings of the 2nd ACM Symposium on Computing for Development, 11–12. Association for Computing Machinery.CrossRef Google Scholar

Blasius, J. 2018. “Fabrication of Interview Data.” Quality Assurance in Education 26(2):213–226.CrossRef Google Scholar

Blasius, J., and Thiessen, V.. 2012. Assessing the Quality of Survey Data. London: Sage.CrossRef Google Scholar

Blasius, J., and Thiessen, V.. 2015. “Should We Trust Survey Data? Assessing Response Simplification and Data Fabrication.” Social Science Research 52:479–493.Google Scholar PubMed

Blasius, J., and Thiessen, V.. 2018. “Perceived Corruption, Trust, and Interviewer Behavior in 26 European Countries.” Sociological Methods & Research, doi:10.1177/0049124118782554.CrossRef Google Scholar

Bredl, S., Storfinger, N., and Menold, N.. 2011. “A Literature Review of Methods to Detect Fabricated Survey Data.” Discussion paper no. 56, Zentrum für internationale Entwicklungs- und Umweltforschung, ZEU, Giessen.Google Scholar

Bredl, S., Winker, P., and Kötschau, K.. 2008. “A Statistical Approach to Detect Cheating Interviewers.” Discussion paper no. 39, Justus-Liebig-Universität Gießen, Zentrum für internationale Entwicklungs- und Umweltforschung (ZEU), December.Google Scholar

Breiman, L. 2001. “Random Forests.” Machine Learning 45(1):5–32.Google Scholar

Brier, G. W. 1950. “Verification of Forecasts Expressed in Terms of Probability.” Monthly Weather Review 78(1):1–3.2.0.CO;2>CrossRef Google Scholar

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P.. 2002. “SMOTE: Synthetic Minority Over-sampling Technique.” Journal of Artificial Intelligence Research 16:321–357.Google Scholar

Cohen, M. J., and Larrea, S.. 2018. “Assessing and Improving Interview Quality in the 2016/17 AmericasBarometer.” AmericasBarometer Methodological Note IMN002.Google Scholar

Cohen, M. J., and Warner, Z.. 2020a. “Replication Data for: How to Get Better Survey Data More Efficiently.” Code Ocean, V1. https://doi.org/10.24433/CO.5039798.v1 .CrossRef Google Scholar

Cohen, M. J., and Warner, Z.. 2020b. “Replication Data for: How to Get Better Survey Data More Efficiently.” https://doi.org/10.7910/DVN/SV9B3E, Harvard Dataverse, V1, UNF:6:FbP/7vOB8y3qPGbWny8pTg== [fileUNF].Google Scholar

Crespi, L. P. 1945. “The Cheater Problem in Polling.” Public Opinion Quarterly 9 (4):431–445.CrossRef Google Scholar

De Haas, S., and Winker, P.. 2014. “Identification of Partial Falsifications in Survey Data.” Statistical Journal of the IAOS 30(3):271–281.Google Scholar

Eng, J. L. V., et al. 2007. “Use of Handheld Computers with Global Positioning Systems for Probability Sampling and Data Entry in Household Surveys.” American Journal of Tropical Medicine and Hygiene 77(2):393–399.Google Scholar

Fernández-Delgado, M., Cernadas, E., and Barro, S.. 2014. “Do We Need Hundreds of Classifiers to Solve Real World Classification Problems?” Journal of Machine Learning Research 15(1):3133–3181.Google Scholar

Finn, A., and Ranchhod, V.. 2017. “Genuine Fakes: The Prevalence and Implications of Data Fabrication in a Large South African Survey.” World Bank Economic Review 31(1):129–157.Google Scholar

Gomila, R., Littman, R., Blair, G., and Paluck, E. L. 2017. “The Audio Check: A Method for Improving Data Quality and Detecting Data Fabrication.” Social Psychological and Personality Science 8(4):424–433.Google Scholar

Guyon, I., and Elisseeff, A.. 2003. “An Introduction to Variable and Feature Selection.” Journal of Machine Learning Research 3(1):1157–1182.Google Scholar

Heath, A, Fisher, S., and Smith, S.. 2005. “The Globalization of Public Opinion Research.” Annual Review of Political Science 8:297–333.CrossRef Google Scholar

Hicks, W. D., Edwards, B., Tourangeau, K., McBride, B., Harris-Kojetin, L. D., and Moss, A. J.. 2010. “Using CARI Tools to Understand Measurement Error.” Public Opinion Quarterly 74(5):985–1003.CrossRef Google Scholar

Hill, D. W. Jr., and Jones, Z. M.. 2014. “An Empirical Evaluation of Explanations for State Repression.” American Political Science Review 108 (3):661–687.CrossRef Google Scholar

Krosnick, J. A. 1999. “Survey Research.” Annual Review of Psychology 50: 537–567.CrossRef Google Scholar PubMed

Kuhn, M. 2008. “Building Predictive Models in R using the caret Package.” Journal of Statistical Software 28(5):1–26.CrossRef Google Scholar

Kuhn, M., and Johnson, K.. 2013. Applied Predictive Modeling. Berlin, Germany: Springer.Google Scholar

Kuriakose, N., and Robbins, M.. 2016. “Don’t Get Duped: Fraud through Duplication in Public Opinion Surveys.” Statistical Journal of the IAOS 32(3):283–291.CrossRef Google Scholar

Landrock, U. 2017. “Investigation Interviewer Falsifications: A Quasi-experimental Design.” Bulletin of Sociological Methodology 136(1): 5–20.CrossRef Google Scholar

Lupu, N., and Michelitch, K.. 2018. “Advances in Survey Methods for the Developing World.” Annual Review of Political Science 21:195–214.CrossRef Google Scholar

Menold, N., and Kemper, C. J.. 2014. “How do Real and Falsified Data Differ? Psychology of Survey Response as a Source of Falsification Indicators in Face-to-Face Surveys.” International Journal of Public Opinion Research 26(1):41–65.Google Scholar

Menold, N., Winker, P., Storfinger, N., and Kemper, C. J.. 2013. “A Method for Ex-Post Identification of Falsifications in Survey Data.” In Interviewers’ Deviations in Surveys: Impact, Reasons, Detection and Prevention, edited by Winker, P., Menold, N., and Porst, R., 25–48. Berlin, Germany: Peter Lang.Google Scholar

Mitchell, S., Fahrney, K., and Strobl, M.. 2009. “Monitoring Field Interviewer and Respondent Interactions Using Computer-Assisted Recorded Interviewing: A Case Study.” Paper presented at the Annual Conference of the American Association for Public Opinion Research (AAPOR).Google Scholar

Mneimneh, Z. et al. 2018. “Case Studies on Monitoring Interviewer Behavior in International and Multinational Surveys.” In Advances in Comparative Survey Methods: Multicultural, Multinational and Multiregional Contexts (3MC), edited by Johnson, T. P., Pennell, B.-E., Stoop, I. A. L., and Dorer, B., 731–770. Hoboken, NJ: Wiley.Google Scholar

Montalvo, J. D., Seligson, M. A., and Zechmeister, E. J.. 2018. “Improving Adherence to Area Probability Sample Designs: Using LAPOP’s Remote Interview Geo-locating of Households in real-Time (RIGHT) System.” Americas Barometer Methodological Note IMN004.Google Scholar

Murphy, J., Baxter, R., Eyerman, J., Cunningham, D., and Kennet, J.. 2004. “A System for Detecting Interviewer Falsification.” Paper presented at the Annual Conference of the American Association for Public Opinion Research (AAPOR).Google Scholar

Robbins, M. 2018. “New Frontiers in Detecting Data Fabrication.” In Advances in Comparative Survey Methods: Multicultural, Multinational and Multiregional Contexts (3MC), edited by Johnson, T. P., Pennell, B.-E., Stoop, I. A. L., and Dorer, B., 771–806. Hoboken, NJ: Wiley.CrossRef Google Scholar

Sarracino, F., and Mikucka, M.. 2017. “Bias and Efficiency Loss in Regression Estimates Due to Duplicated Observations: A Monte Carlo Simulation.” Survey Research Methods 11(1):17–44.Google Scholar

Schäfer, C., Schräpler, J.-P., Müller, K.-R., and Wagner, G. G.. 2004. “Automatic Identification of Faked and Fraudulent Interviews in Surveys by Two Different Methods.” Discussion Papers of DIW Berlin 441. Berlin, Germany: German Institute for Economic Research.Google Scholar

Seligson, M., and Morales, D. E. M.. 2015. “Improving the Quality of Survey Data Using CAPI Systems in Developing Countries.” In The Oxford Handbook of Polling and Polling Methods, edited by Atkeson, L. R., and Alvarez, R. M.. Oxford: Oxford University Press.Google Scholar

Simmons, K., Mercer, A., Schwarzer, S., and Kennedy, C.. 2016. “Evaluating a New Proposal for Detecting Data Falsification in Surveys: The Underlying Causes of ‘High Matches’ Between Survey Respondents.” Statistical Journal of the IAOS 32(3):327–338.CrossRef Google Scholar

Slomczynski, K. M., Powalko, P., and Krauze, T.. 2017. “Non-unique Records in International Survey Projects: The Need for Extending Data Quality Control.” Survey Research Methods 11(1): 1–16.Google Scholar

Stokes, L., and Jones, P.. 1989. “Evaluation of the Interviewer Quality Control Procedure for the Post-Enumeration Survey.” Proceedings of the American Statistical Association (Survey Research Methods Section), 696–698.Google Scholar

Storfinger, N., and Winker, P.. 2011. “Robustness of Clustering Methods for Identification of Potential Falsifications in Survey Data.” Discussion Papers 57, Justus Liebig University Giessen, Center for International Development and Environmental Research (ZEU).Google Scholar

Swanson, D., Cho, M. J., and Eltinge, J.. 2003. “Detecting Possibly Fraudulent or Error-Prone Survey Data Using Benford’s Law.” Proceedings of the American Statistical Association (Survey Research Methods Section), 4172–4177.Google Scholar

Turner, C., Gribbe, J., Al-Tayyip, A., and Chromy, J.. 2002. Falsification in Epidemiological Surveys: Detection and Remediation. Technical Papers on Health and Behavior Measurement, No. 53. Washington, DC: Research Triangle Institute.Google Scholar

Winker, P.. 2016. “Assuring the Quality of Survey Data: Incentives, Detection and Documentation of Deviant Behavior.” Statistical Journal of the IAOS 32(3):295–303.CrossRef Google Scholar

Cohen and Warner supplementary material

Cohen and Warner supplementary material 1

PDF 531.2 KB

Cohen and Warner supplementary material

Cohen and Warner supplementary material 2

File 43 KB

Cohen and Warner Dataset

Dataset

https://doi.org/10.7910/DVN/SV9B3E

Link

Article contents

How to Get Better Survey Data More Efficiently

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Cohen and Warner supplementary material

Cohen and Warner supplementary material

Cohen and Warner Dataset

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests