Hostname: page-component-cd9895bd7-mkpzs Total loading time: 0 Render date: 2024-12-26T22:11:09.907Z Has data issue: false hasContentIssue false

IRT Models for Expert-Coded Panel Data

Published online by Cambridge University Press:  03 September 2018

Kyle L. Marquardt*
Affiliation:
V-Dem Institute, Department of Political Science, University of Gothenburg, Gothenburg, Sweden. Email: [email protected]
Daniel Pemstein
Affiliation:
Department of Criminial Justice and Political Science, North Dakota State University, Fargo, ND 58105, USA. Email: [email protected]

Abstract

Data sets quantifying phenomena of social-scientific interest often use multiple experts to code latent concepts. While it remains standard practice to report the average score across experts, experts likely vary in both their expertise and their interpretation of question scales. As a result, the mean may be an inaccurate statistic. Item-response theory (IRT) models provide an intuitive method for taking these forms of expert disagreement into account when aggregating ordinal ratings produced by experts, but they have rarely been applied to cross-national expert-coded panel data. We investigate the utility of IRT models for aggregating expert-coded data by comparing the performance of various IRT models to the standard practice of reporting average expert codes, using both data from the V-Dem data set and ecologically motivated simulated data. We find that IRT approaches outperform simple averages when experts vary in reliability and exhibit differential item functioning (DIF). IRT models are also generally robust even in the absence of simulated DIF or varying expert reliability. Our findings suggest that producers of cross-national data sets should adopt IRT techniques to aggregate expert-coded data measuring latent concepts.

Type
Articles
Copyright
Copyright © The Author(s) 2018. Published by Cambridge University Press on behalf of the Society for Political Methodology. 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Authors’ note: Earlier drafts presented at the 2016 MPSA Annual Convention, the 2016 IPSA World Convention and the 2016 V-Dem Latent Variable Modeling Week Conference. We thank Chris Fariss, Juraj Medzihorsky, Pippa Norris, Jon Polk, Shawn Treier, Carolien van Ham and Laron Williams for their comments on earlier drafts of this paper, as well as V-Dem Project members for their suggestions and assistance. We are also grateful to the editor and two anonymous reviewers for their detailed suggestions. This material is based upon work supported by the National Science Foundation under Grant No. SES-1423944 (PI: Daniel Pemstein); the Riksbankens Jubileumsfond, Grant M13-0559:1 (PI: Staffan I. Lindberg); the Swedish Research Council, 2013.0166 (PI: Staffan I. Lindberg and Jan Teorell); the Knut and Alice Wallenberg Foundation (PI: Staffan I. Lindberg); the University of Gothenburg, Grant E 2013/43; and internal grants from the Vice-Chancellor’s office, the Dean of the College of Social Sciences, and the Department of Political Science at University of Gothenburg. We performed simulations and other computational tasks using resources provided by the Notre Dame Center for Research Computing (CRC) through the High Performance Computing section and the Swedish National Infrastructure for Computing (SNIC) at the National Supercomputer Centre in Sweden (SNIC 2016/1- 382, 2017/1-407 and 2017/1-68). We specifically acknowledge the assistance of In-Saeng Suh at CRC and Johan Raber at SNIC in facilitating our use of their respective systems. Replication materials available in Marquardt and Pemstein (2018).

Contributing Editor: R. Michael Alvarez

References

Aldrich, John H., and McKelvey, Richard D.. 1977. A method of scaling with applications to the 1968 and 1972 Presidential elections. American Political Science Review 71(1):111130.Google Scholar
Bakker, R., de Vries, C., Edwards, E., Hooghe, L., Jolly, S., Marks, G., Polk, J., Rovny, J., Steenbergen, M., and Vachudova, M. A.. 2012. Measuring party positions in Europe: The Chapel Hill expert survey trend file, 1999–2010. Party Politics 21(1):143152.Google Scholar
Bakker, Ryan, Jolly, Seth, Polk, Jonathan, and Poole, Keith. 2014. The European common space: Extending the use of anchoring vignettes. The Journal of Politics 76(4):10891101.Google Scholar
Boyer, K. K., and Verma, R.. 2000. Multiple raters in survey-based operations management research: A review and tutorial. Production and Operations Management 9(2):128140.Google Scholar
Brady, Henry E. 1985. The perils of survey research: Inter-personally incomparable responses. Political Methodology 11(3/4):269291.Google Scholar
Buttice, Matthew K., and Stone, Walter J.. 2012. Candidates matter: Policy and quality differences in congressional elections. Journal of Politics 74(3):870887.Google Scholar
Clinton, Joshua D., and Lewis, David E.. 2008. Expert opinion, agency characteristics, and agency preferences. Political Analysis 16(1):320.Google Scholar
Coppedge, Michael, Gerring, John, Lindberg, Staffan I., Teorell, Jan, Pemstein, Daniel, Tzelgov, Eitan, Wang, Yi-ting, Glynn, Adam, Altman, David, Bernhard, Michael, Steven Fish, M., Hicken, Allen, McMann, Kelly, Paxton, Pamela, Reif, Megan, Skaaning, Svend-Erik, and Staton, Jeffrey. 2014. V-Dem: A new way to measure democracy. Journal of Democracy 25(3):159169.Google Scholar
Coppedge, Michael, Gerring, John, Lindberg, Staffan I., Skaaning, Svend-Erik, Teorell, Jan, Altman, David, Bernhard, Michael, Steven Fish, M., Glynn, Adam, Hicken, Allen, Knutsen, Carl Henrik, McMann, Kelly, Paxton, Pamela, Pemstein, Daniel, Staton, Jeffrey, Zimmerman, Britte, Andersson, Frida, Mechkova, Valeriya, and Miri, Farhad. 2016. Varieties of democracy codebook v6. Technical report. Varieties of Democracy Project: Project Documentation Paper Series.Google Scholar
Coppedge, Michael, Gerring, John, Lindberg, Staffan I., Skaaning, Svend-Erik, Teorell, Jan, Altman, David, Bernhard, Michael, Steven Fish, M., Glynn, Adam, Hicken, Allen, Knutsen, Carl Henrik, Marquardt, Kyle L., McMann, Kelly, Miri, Farhad, Paxton, Pamela, Pemstein, Daniel, Staton, Jeffrey, Tzelgov, Eitan, Wang, Yi-ting, and Zimmerman, Brigitte. 2016. V–Dem Dataset v6.2. Technical report. Varieties of Democracy Project. https://ssrn.com/abstract=2968289.Google Scholar
Coppedge, Michael, Gerring, John, Lindberg, Staffan I., Skaaning, Svend-Erik, Teorell, Jan, Andersson, Frida, Marquardt, Kyle L., Mechkova, Valeriya, Miri, Farhad, Pemstein, Daniel, Pernes, Josefine, Stepanova, Natalia, Tzelgov, Eitan, and Wang, Yi-Ting. 2016. Varieties of Democracy Methodology v5. Technical report. Varieties of Democracy Project: Project Documentation Paper Series.Google Scholar
Hare, Christopher, Armstrong, David A., Bakker, Ryan, Carroll, Royce, and Poole, Keith T. 2015. Using Bayesian Aldrich-McKelvey Scaling to study citizens’ ideological preferences and perceptions. American Journal of Political Science 59(3):759774.Google Scholar
Johnson, Valen E., and Albert, James H.. 1999. Ordinal Data Modeling . New York: Springer.Google Scholar
Jones, Bradford S., and Norrander, Barbara. 1996. The reliability of aggregated public opinion measures. American Journal of Political Science 40(1):295309.Google Scholar
King, Gary, Murray, Christopher J. L., Salomon, Joshua A., and Tandon, Ajay. 2004. Enhancing the validity and cross-cultural comparability of measurement in survey research. The American Political Science Review 98(1):191207.Google Scholar
King, Gary, and Wand, Jonathan. 2007. Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis 15(1):4666.Google Scholar
Konig, T., Marbach, M., and Osnabrugge, M.. 2013. Estimating party positions across countries and time–a dynamic latent variable model for manifesto data. Political Analysis 21(4):468491.Google Scholar
Kozlowski, Steve W., and Hattrup, Keith. 1992. A disagreement about within-group agreement: Disentangling issues of consistency versus consensus. Journal of Applied Psychology 77(2):161167.Google Scholar
Lebreton, J. M., and Senter, J. L.. 2007. Answers to 20 questions about interrater reliability and interrater agreement. Organizational Research Methods 11(4):815852.Google Scholar
Lindstädt, Rene, Proksch, Sven-Oliver, and Slapin, Jonathan B.. 2016. When experts disagree: Response aggregation and its consequences in expert surveys.Google Scholar
Maestas, Cherie D., Buttice, Matthew K., and Stone, Walter J.. 2014. Extracting wisdom from experts and small crowds: Strategies for improving informant-based measures of political concepts. Political Analysis 22(3):354373.Google Scholar
Marquardt, Kyle, and Pemstein, Daniel. 2018. Replication Data for: IRT models for expert-coded panel data, https://doi.org/10.7910/DVN/KGP01E, Harvard Dataverse, V1.Google Scholar
Norris, Pippa, Frank, Richard W., and Martínez I Coma, Ferran. 2013. Assessing the quality of elections. Journal of Democracy 24(4):124135.Google Scholar
Pemstein, Daniel, Seim, Brigitte, and Lindberg, Staffan I.. 2016. Anchoring vignettes and item response theory in cross-national expert surveys.Google Scholar
Pemstein, Daniel, Tzelgov, Eitan, and Wang, Yi-ting. 2015. Evaluating and improving item response theory models for cross-national expert surveys. Varieties of Democracy Institute Working Paper 1(March):153.Google Scholar
Pemstein, Daniel, Marquardt, Kyle L., Tzelgov, Eitan, Wang, Yi-ting, and Miri, Farhad. 2015. The V-Dem measurement model: Latent variable analysis for cross-national and cross-temporal expert-coded data. Varieties of Democracy Institute Working Paper , 21.Google Scholar
Ramey, Adam. 2016. Vox populi, vox dei? Crowdsourced ideal point estimation. The Journal of Politics 78(1):281295.Google Scholar
Stan Development Team. 2015. Stan: A C++ Library for Probability and Sampling, Version 2.9.0. http://mc-stan.org/.Google Scholar
Teorell, Jan, Dahlström, Carl, and Dahlberg, Stefan. 2011. The QoG expert survey dataset. Technical report. University of Gothenburg: The Quality of Government Institute, http://www.qog.pol.gu.se.Google Scholar
Treier, Shawn, and Jackman, Simon. 2008. Democracy as a latent variable. American Journal of Political Science 52(1):201217.Google Scholar
Van Bruggen, Gerrit H., Lilien, Gary L., and Kacker, Manish. 2002. Informants in organizational marketing research: Why use multiple informants and how to aggregate responses. Journal of Marketing Research 39(4):469478.Google Scholar
von Davier, Matthias, Shin, Hyo-Jeong, Khorramdel, Lale, and Stankov, Lazar. 2017. The effects of vignette scoring on reliability and validity of self-reports. Applied Psychological Measurement 42(4):291306.Google Scholar
Supplementary material: File

Marquardt and Pemstein supplementary material

Online Appendix

Download Marquardt and Pemstein supplementary material(File)
File 695.8 KB