Hostname: page-component-848d4c4894-xfwgj Total loading time: 0 Render date: 2024-07-05T21:41:53.493Z Has data issue: false hasContentIssue false

Ignoramus, Ignorabimus? On Uncertainty in Ecological Inference

Published online by Cambridge University Press:  08 December 2007

Martin Elff*
Affiliation:
Faculty of Social Sciences, University of Mannheim, A5, 6, 68131 Mannheim, Germany
Thomas Gschwend
Affiliation:
Center for Doctoral Studies in Social and Behavioral Sciences, University of Mannheim, D7, 27, 68131 Mannheim, Germany, e-mail: [email protected]
Ron J. Johnston
Affiliation:
School of Geographical Sciences, University of Bristol, Bristol BS8 1SS, UK, e-mail: [email protected]
*
e-mail: [email protected] (corresponding author)

Abstract

Models of ecological inference (EI) have to rely on crucial assumptions about the individual-level data-generating process, which cannot be tested because of the unavailability of these data. However, these assumptions may be violated by the unknown data and this may lead to serious bias of estimates and predictions. The amount of bias, however, cannot be assessed without information that is unavailable in typical applications of EI. We therefore construct a model that at least approximately accounts for the additional, nonsampling error that may result from possible bias incurred by an EI procedure, a model that builds on the Principle of Maximum Entropy. By means of a systematic simulation experiment, we examine the performance of prediction intervals based on this second-stage Maximum Entropy model. The results of this simulation study suggest that these prediction intervals are at least approximately correct if all possible configurations of the unknown data are taken into account. Finally, we apply our method to a real-world example, where we actually know the true values and are able to assess the performance of our method: the prediction of district-level percentages of split-ticket voting in the 1996 General Election of New Zealand. It turns out that in 95.5% of the New Zealand voting districts, the actual percentage of split-ticket votes lies inside the 95% prediction intervals constructed by our method.

Type
Research Article
Copyright
Copyright © The Author 2007. Published by Oxford University Press on behalf of the Society for Political Methodology 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Authors' note: We thank three anonymous reviewers for helpful comments and suggestions on earlier versions of this paper. An appendix giving some technical background information concerning our proposed method, as well as data, R code, and C code to replicate analyses presented in this paper are available from the Political Analysis Web site. Later versions of the code will be packaged into an R library and made publicly available on CRAN (http://cran.r-project.org) and on the corresponding author's Web site.

References

Abramovitz, Milton, and Stegun, Irene A., eds. 1964. Handbook of mathematical functions with formulas, graphs, and mathematical tables. Washington, DC: National Bureau of Standards.Google Scholar
Abramson, Paul R., and Claggett, William. 1984. Race-related differences in self-reported and validated turnout. Journal of Politics 46: 719–38.CrossRefGoogle Scholar
Benoit, Kenneth, Laver, Michael, and Gianetti, Daniela. 2004. Multiparty split-ticket voting estimation as an ecological inference problem. In Ecological inference: New methodological strategies, ed. King, Gary, Rosen, Ori, and Tanner, Martin, 333–50. Cambridge, UK: Cambridge University Press.Google Scholar
Brown, Philip J., and Payne, Clive D. 1986. Aggregate data, ecological regression, and voting transitions. Journal of the American Statistical Association 81: 452–60.CrossRefGoogle Scholar
Burden, Barry C., and Kimball, David C. 1998. A new approach to the study of ticket splitting. American Political Science Review 92: 533–44.CrossRefGoogle Scholar
Cho, Wendy K. Tam. 1998. Iff the assumption fits …: A comment on the King ecological inference solution. Political Analysis 7: 143–63.Google Scholar
Cho, Wendy K. Tam, and Manski, Charles F. Forthcoming. Cross-level/ecological inference. In Oxford handbook of political methodology, ed. Box-Steffensmeier, Janet, Brady, Henry, and Collier, David. Oxford, UK: Oxford University Press.Google Scholar
Cirincione, C., Darling, T. A., and O'Rourke, T. G. 2000. Assessing South Carolina's congressional districting. Political Geography 19: 189211.CrossRefGoogle Scholar
Crowder, Martin J. 1978. Beta-binomial ANOVA for proportions. Applied Statistics 27: 34–7.CrossRefGoogle Scholar
Cziszar, Imre. 1991. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Annals of Statistics 19: 2032–66.Google Scholar
Fienberg, Stephen E., Holland, Paul W., and Bishop, Yvonne. 1977. Discrete multivariate analysis: Theory and practice. Cambridge, MA: MIT Press.Google Scholar
Fienberg, Stephen E., and Robert, Christian P. 2004. Comment to ‘Ecological inference for 2 × 2 tables' by Jon Wakefield. Journal of the Royal Statistical Society: Series A (Statistics in Society) 167: 432–4.Google Scholar
Golan, Amos, Judge, George, and Perloff, Jeffrey M. 1996. A maximum entropy approach to recovering information from multinomial response data. Journal of the American Statistical Association 91: 841–53.CrossRefGoogle Scholar
Good, I. J. 1963. Maximum entropy for hypothesis formulation, especially for multidimensional contingency tables. Annals of Mathematical Statistics 34: 911–34.CrossRefGoogle Scholar
Goodman, Leo A. 1953. Ecological regressions and the behavior of individuals. American Sociological Review 18: 663–4.CrossRefGoogle Scholar
Goodman, Leo A. 1959. Some alternatives to ecological correlation. American Journal of Sociology 64: 610–25.CrossRefGoogle Scholar
Groetsch, Charles W. 1993. Inverse problems in the mathematical sciences. Braunschweig and Wiesbaden: Vieweg.CrossRefGoogle Scholar
Gschwend, Thomas, Johnston, Ron, and Pattie, Charles. 2003. Split-ticket patterns in mixed-member proportional election systems: Estimates and analyses of their spatial variation at the German federal election, 1998. British Journal of Political Science 33: 109–27.CrossRefGoogle Scholar
Herron, Michael C., and Shotts, Kenneth W. 2003. Using ecological inference point estimates as dependent variables in second-stage linear regressions. Political Analysis 11: 4464.CrossRefGoogle Scholar
Herron, Michael C., and Shotts, Kenneth W. 2004. Logical inconsistency in EI-based second-stage regressions. American Journal of Political Science 48: 172–83.Google Scholar
Hoadley, Bruce. 1969. The compound multinomial distribution and Bayesian analysis of categorical data from finite populations. Journal of the American Statistical Association 64: 216–29.CrossRefGoogle Scholar
Jaynes, Edwin T. 1957. Information theory and statistical mechanics. Physical Review 106: 620–30.CrossRefGoogle Scholar
Jaynes, Edwin T. 1968. Prior probabilities. IEEE Transactions on Systems Science and Cybernetics 4: 227–41.CrossRefGoogle Scholar
Johnston, Ron J., and Hay, A. M. 1983. Voter transition probability estimates: An entropy-maximizing approach. European Journal of Political Research 11: 405–22.CrossRefGoogle Scholar
Johnston, Ron J., and Pattie, Charles. 2000. Ecological inference and entropy-maximizing: An alternative estimation procedure for split-ticket voting. Political Analysis 8: 333–45.CrossRefGoogle Scholar
Judge, George G., Miller, Douglas J., and Tam Cho, Wendy K. 2004. An information theoretic approach to ecological estimation and inference. In Ecological inference: New methodological strategies, ed. King, Gary, Rosen, Ori, and Tanner, Martin, 162–87. Cambridge, UK: Cambridge University Press.Google Scholar
King, Gary. 1997. A solution to the ecological inference problem: Reconstructing individual behavior from aggregate data. Princeton: Princeton University Press.Google Scholar
King, Gary. 1998. Unifying political methodology: The likelihood theory of statistical inference. Ann Arbor, MI: Michigan University Press.CrossRefGoogle Scholar
King, Gary, Honaker, James, Joseph, Anne, and Scheve, Kenneth. 2001. Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review 95: 4969.CrossRefGoogle Scholar
King, Gary, Rosen, Ori, and Tanner, Martin A. 1999. Binomial-beta hierarchical models for ecological inference. Sociological Methods and Research 28: 6190.CrossRefGoogle Scholar
Kullback, Solomon. 1959. Information theory and statistics. New York: Wiley.Google Scholar
Levine, Stephen, and Roberts, Nigel S. 1997. Surveying the snark: Voting behaviour in the 1996 New Zealand general election. In From campaign to coalition: New Zealand's first general election under proportional representation, ed. Boston, Jonathan, Levine, Stephen, McLeay, Elizabeth, and Roberts, Nigel, 183–97. Palmerston North, NZ: Dunmore Press.Google Scholar
Mosimann, James E. 1962. On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions. Biometrika 49: 6582.Google Scholar
Openshaw, S., and Taylor, P. J. 1979. A million or so correlation coefficients: Three experiments on the modifiable areal unit problem. In Statistical methods in the spatial sciences, ed. Wrigley, N., 127–44. London: Pion.Google Scholar
Openshaw, S., and Taylor, P. J. 1981. The modifiable areal unit problem. In Quantitative geography: A British view, ed. Wrigley, N. and Bennett, R. J., 6070. London: Routledge and Kegan Paul.Google Scholar
Prentice, R. L. 1986. Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association 81: 321–7.CrossRefGoogle Scholar
Rosen, Ori, Jiang, Wenxing, King, Gary, and Tanner, Martin A. 2001. Bayesian and frequentist inference for ecological inference: The r × c case. Statistica Neerlandica 55: 134–56.CrossRefGoogle Scholar
Shannon, Claude E. 1948. A mathematical theory of communication. Bell System Technical Journal 27: 379423, 623-56.CrossRefGoogle Scholar
Skellam, J. G. 1948. A probability distribution derived from the binomial distribution by regarding the probability as variable between the sets of trials. Journal of the Royal Statistical Society. Series B (Methodological) 10: 257–61.CrossRefGoogle Scholar
Steel, David G., Beh, Eric J., and Chambers, Ray L. 2004. The information in aggregate data. In Ecological inference: New methodological strategies, ed. King, Gary, Rosen, Ori, and Tanner, Martin, 5168. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Uffink, Jos. 1995. Can the maximum entropy principle be explained as a consistency requirement? Studies in History and Philosophy of Modern Physics 26B: 223–61.Google Scholar
Vardi, Y., and Lee, D. 1993. From image deblurring to optimal investments: Maximum likelihood solutions for positive linear inverse problems. Journal of the Royal Statistical Society. Series B (Methodological) 55: 569612.CrossRefGoogle Scholar
Vasicek, Oldrich Alfonso. 1980. A conditional law of large numbers. Annals of Probability 8: 142–7.CrossRefGoogle Scholar
Wakefield, Jon. 2004. Ecological inference for 2 × 2 tables. Journal of the Royal Statistical Society: Series A (Statistics in Society) 167: 385426.Google Scholar