Published online by Cambridge University Press: 04 January 2017
Questions about racial or ethnic group identity feature centrally in many social science theories, but detailed data on ethnic composition are often difficult to obtain, out of date, or otherwise unavailable. The proliferation of publicly available geocoded person names provides one potential source of such data'if researchers can effectively link names and group identity. This article examines that linkage and presents a methodology for estimating local ethnic or racial composition using the relationship between group membership and person names. Common approaches for linking names and identity groups perform poorly when estimating group proportions. I have developed a new method for estimating racial or ethnic composition from names which requires no classification of individual names. This method provides more accurate estimates than the standard approach and works in any context where person names contain information about group membership. Illustrations from two very different contexts are provided: the United States and the Republic of Kenya.
Author's note: The author is grateful for comments from Andy Eggers, Arthur Spirling, Rachel Gould, Ben Ansell, Bernard Grofman, Gary King, Ken Benoit, Dominik Hangartner, Geoffrey Evans, and Lucy Barnes. Two anonymous reviewers provided excellent comments that resulted in a significantly improved manuscript. The author gratefully acknowledges his time at Nuffield College, Oxford University, as Postdoctoral Prize Research Fellow, during which much of this work was written. Computation for the research was carried out on the High Performance Computing resources at New York University–Abu Dhabi, with the enthusiastic support of Muataz Al-Barwani and Benoit Marchand. The replication archive for this article is available at the Political Analysis Dataverse as Harris (2014). Supplementary materials for this article are available on the Political Analysis Web site.