Published online by Cambridge University Press: 08 December 2007
Models of ecological inference (EI) have to rely on crucial assumptions about the individual-level data-generating process, which cannot be tested because of the unavailability of these data. However, these assumptions may be violated by the unknown data and this may lead to serious bias of estimates and predictions. The amount of bias, however, cannot be assessed without information that is unavailable in typical applications of EI. We therefore construct a model that at least approximately accounts for the additional, nonsampling error that may result from possible bias incurred by an EI procedure, a model that builds on the Principle of Maximum Entropy. By means of a systematic simulation experiment, we examine the performance of prediction intervals based on this second-stage Maximum Entropy model. The results of this simulation study suggest that these prediction intervals are at least approximately correct if all possible configurations of the unknown data are taken into account. Finally, we apply our method to a real-world example, where we actually know the true values and are able to assess the performance of our method: the prediction of district-level percentages of split-ticket voting in the 1996 General Election of New Zealand. It turns out that in 95.5% of the New Zealand voting districts, the actual percentage of split-ticket votes lies inside the 95% prediction intervals constructed by our method.
Authors' note: We thank three anonymous reviewers for helpful comments and suggestions on earlier versions of this paper. An appendix giving some technical background information concerning our proposed method, as well as data, R code, and C code to replicate analyses presented in this paper are available from the Political Analysis Web site. Later versions of the code will be packaged into an R library and made publicly available on CRAN (http://cran.r-project.org) and on the corresponding author's Web site.