Online surveys of public opinion took off at the turn of the century (Couper Reference Couper2000). They are less expensive and faster to administer than other surveys. However, probability-based sampling is challenging online so most online studies use nonprobability samples (Cornesse et al. Reference Cornesse, Blom, Dutwin, Krosnick, De Leeuw and Legleye2020). This divergence from the gold standard of probabilistic sampling spurred debates about the ability of non-probabilistic approaches to yield samples of comparable quality (Cornesse et al. Reference Cornesse, Blom, Dutwin, Krosnick, De Leeuw and Legleye2020; Mercer et al. Reference Mercer, Kreuter, Keeter and Stuart2017; Revilla et al. Reference Revilla, Saris, Loewe and Ochoa2015; Yeager et al. Reference Yeager, Krosnick, Chang, Javitz, Levendusky, Simpser and Wang2011).
However, these debates focus on the United States and a few Western European countries, where several probability-based online panels are available. In less-affluent parts of the world, including Latin America, researchers can select among several experienced firms, but none offers probability-based samples. In these contexts, researchers seeking to draw a national sample for an online survey have little choice but to contract that work out to a commercial firm, which draws a sample from its pool of panelists. Given market incentives, opaqueness is the norm, with respect to both the methods used to recruit the panelists and for drawing a sample from that pool.
The fact that commercial firms share little information about their methods raises questions about how reliable non-probabilistic online samples are in developing contexts such as Latin America. Yet, scholars of the region increasingly rely on these samples to test their theories. In only the past four years, five scholarly articles that rely on online samples in Latin America have been published in the discipline’s top general and public-opinion journals, and another 10 have appeared in regional journals.Footnote 1 Many more are found in working papers, particularly in the wake of the COVID-19 pandemic, which limited possibilities for face-to-face surveys for some time. Although more and more political surveys are being fielded online in Latin America, we know little about their ability to produce unbiased samples.
As a result, researchers also have little information about the effectiveness of efforts to address sample bias in online surveys. One common approach is post-stratification, in which researchers assign more weight to certain types of respondents to address demographic imbalances (Kalton and Flores-Cervantes Reference Kalton and Flores-Cervantes2003). Among researchers of nonprobability online samples in the United States, an innovative approach that has emerged is sample matching (Rivers Reference Rivers2011), a method that selects from a non-probabilistic pool of panelists by matching on demographic variables to population microdata. However, we know of no studies of these approaches in less-developed countries. If online samples in contexts such as Latin America are biased, could these approaches help researchers to partially mitigate the problem?
This article is a study of both the biases in online samples in Latin America and whether these common approaches might reduce them. We fielded nine online surveys in 2020 and 2021 through a reputable regional provider in Argentina, Brazil, Chile, Colombia, Mexico, and Peru—among the countries most widely studied in the region.Footnote 2 In each survey, we included benchmark questions from reliable national household surveys or recent censuses to use as a basis for comparing our samples to the population.
We then examined whether common approaches could improve the quality of these online samples. We constructed post-stratification weights and evaluated whether using them improves representativeness. In a subset of the countries—Argentina, Brazil, and Mexico—we also implemented a quasi-experiment to compare the provider’s standard sampling approach against sample matching. By design, half of the initial sample was drawn using a sample-matching approach; the other half was drawn using the firm’s standard approach. This allows us to determine whether sample matching might reduce some of the biases in a typical online sample from the region.
We find that online samples in the region exhibit high levels of bias, with errors of slightly less than 9 percentage points on average, but sometimes as high as 15 percentage points. In general, online samples in Latin America overrepresent the more-affluent portions of the national population. We also find that post-stratification does little to improve sample quality; sample matching outperforms the provider’s standard approach, but the gains in representativeness are substantively small.
There is no silver bullet for making non-probabilistic online surveys in developing contexts such as Latin America unbiased. Although providers like the one with whom we worked regularly describe their samples as “nationally representative,” in practice, they are far from representative of the national population on crucial demographic dimensions. This is partly because unequal access to the Internet in developing contexts means that providers are unlikely to have enough panelists in lower socioeconomic categories to draw nationally representative online samples. However, it also is partly because even major providers expend little effort in recruiting and maintaining panelists from lower socioeconomic groups, even in contexts where Internet access is comparatively high.Footnote 3 We worked with one of the largest providers in the region and in countries with relatively high levels of Internet penetration, and still the samples came up short.
There is no silver bullet to making non-probabilistic online surveys in developing contexts such as Latin America unbiased.
Of course, there are instances in which researchers do not need representative samples to draw valid inferences, which we discuss in the conclusion. However, until online survey providers substantially expand their panels and improve their in-house sampling methods, online samples in even the wealthiest countries in Latin America cannot be described as “nationally representative.” Researchers who want to draw conclusions about the attitudes or behaviors of the public as a whole must continue to rely on probabilistic, offline sampling (Logan et al. Reference Logan, Parás, Robbins and Zechmeister2020).
ARE ONLINE SAMPLES UNBIASED?
How good are typical online samples in Latin America at producing samples that reflect national populations? To answer this question, we fielded nine online studies in six Latin American countries through a reputable online survey provider that maintains panels across much of the region.Footnote 4 Table 1 reports the size of each study’s sample and the dates of the fieldwork. The 2020 studies included approximately 100 items and the median duration was 26 to 29 minutes; the 2021 studies included approximately 75 questions, with a median duration of 19 minutes (Castorena et al. Reference Castorena, Lupu, Schade and Zechmeister2022).Footnote 5
We cannot detail the provider’s sampling approach because the firm would not supply this information, a situation that is typical of commercial online survey providers (e.g., Baker et al. Reference Baker, Blumberg, Brick, Couper, Courtright and Michael Dennis2010).Footnote 6 To the best of our knowledge, firms operating in Latin America draw samples using stratification and/or quotas combined with algorithms to optimize response rates and assign respondents to multiple ongoing studies. We conducted a review of provider websites drawn from the directory of ESOMAR, a global trade organization for public opinion and market research. Reviewing 43 panel providers that operate in Latin America, we found that only six had websites offering information about sampling. Even among these six, the information was limited. What we know about our firm’s sampling approach is consistent with what a typical researcher contracting such a firm would know—namely, very little.
We use two benchmarks to evaluate the representativeness of the realized samples. First, we compare the samples to the national population based on the most recent census from the Integrated Public Use Microdata Series (IPUMS) (see online appendix table A2 for the years). Second, in the three 2020 studies, we included in our questionnaire a number of benchmark questions that are not available in the census but that are useful demographic characteristics, tend to change slowly, and are included in administrative face-to-face studies in each country (see online appendix table A3).
Following Yeager et al. (Reference Yeager, Krosnick, Chang, Javitz, Levendusky, Simpser and Wang2011), we compare the realized samples to the population on each variable by calculating the mean absolute error (MAE)—that is, the average difference between the proportions of respondents falling into the modal response in the population and that proportion in the sample.Footnote 7 For example, one benchmark question in Brazil is the number of rooms in the house, for which five is the modal response in the benchmark data, representing 28.2% of respondents. In our sample, the percentage who chose five was 25.1%, resulting in an MAE of 3.1. We then average across all of the available variables to arrive at a single measure for each study.
Figure 1 summarizes these errors for each study, with the left-hand panel focused on the benchmark questions in our three 2020 studies and the right-hand panel focused on census variables for all nine studies. When pooling across the three studies that included benchmark questions, online samples yield an MAE of 8.72. Comparing the samples to census variables across all nine studies, we obtain a pooled MAE of 8.75. This represents nearly twice the MAE reported in comparisons between nonprobability online samples and probability online and telephone surveys in the United States (Yeager et al. Reference Yeager, Krosnick, Chang, Javitz, Levendusky, Simpser and Wang2011). It also is almost twice the unusually large average polling error in the run-up to the 2020 US elections (American Association for Public Opinion Research 2021). Our findings suggest that we should be skeptical about the ability of nonprobability online samples in Latin America to produce representative samples of the national population.
Our findings suggest that we should be skeptical about the ability of nonprobability online samples in Latin America to produce representative samples of the national population.
CAN RESEARCHERS IMPROVE ONLINE SAMPLES?
Can we improve these nonprobability samples by post-stratifying the realized sample or using sample matching to recruit panelists into our studies? To answer this question, we first used raking to construct weights for each sample based on gender, age, education, and region. We then compared the MAEs from the weighted sample to the MAEs from the unweighted sample, as shown in figure 2. For both the benchmark questions (p<0.522) and the census variables (p<0.838), post-stratification had no statistically significant effect on the errors when we pooled across studies. Although weights improved our samples in some cases, they actually introduced additional errors in others. Post-stratification does not appear to be a reliable solution for improving online samples.
Another option for researchers to mitigate the problem of online sample representativeness is to adopt a better approach for sampling respondents from the panel than the one that providers use. Proponents of sample matching argue that it can reduce sample bias (Rivers Reference Rivers2011), but there have been few systematic comparisons between sample matching and other approaches (Baker et al. Reference Baker, Blumberg, Brick, Couper, Courtright and Michael Dennis2010)—none of them in developing contexts.Footnote 8
In three of our nine studies, we implemented a quasi-experiment to compare the sampling approach used by the survey provider—which, given the limited information they provide, we refer to as the black-box approach—with a sample-matching approach designed and executed by us. The firm provided us with full background information on all of its registered panelists, enabling us to carry out the matching process and to sample from the panel ourselves.Footnote 9
Whereas the survey provider implemented its standard sampling approach on its own, we implemented the sample-matching approach ourselves. This method begins with drawing a sample from the target population (in our case, the adult population in the census), which serves as the basis for generating a matched sample from the pool of panelists. We used the census microdata from IPUMS as the reference population because they are representative extracts of national census microdata, easily accessible online, and relatively similar across countries.Footnote 10 From these datasets, we selected demographic variables that corresponded to the information that the survey firm provided about its panelists.
For each country, we drew one stratified random sample of the target size from the reference population. These samples were stratified by region and urban/rural residence based on census classifications. We then drew a sample of panelists based on their closeness to observations in the target sample, with “closeness” defined as being as similar as possible on the demographic variables common to both the provider data and the census microdata (the variables are listed in online appendix table A5).Footnote 11 To calculate closeness, we used the nearest-neighbor matching algorithm, forcing matching on gender, age decile, and whether an individual lives in the capital region.
We then invited selected panelists to participate. Following the initial invitation to participate, a maximum of two reminders were sent during the span of one week. If an invited panelist did not participate in that time, they were replaced by finding the next closest panelist to the target observation. Thus, the target sample remained constant throughout the sampling procedure, and the matched sample evolved as nonrespondents were replaced.
We implemented the two sampling approaches sequentially, beginning with sample matching and concluding with the firm’s approach. Although the target with both methods was 1,200 respondents, the matching approach yielded smaller samples because nonresponse rates were high for panelists matched to some observations in the target sample. We decided to complete fieldwork rather than continue to invite replacements—a challenge we discuss next.Footnote 12
As before, we compare the two realized samples by calculating the MAE across the available benchmark questions and census variables. Figure 3 summarizes these errors for each study. For the benchmark questions, the MAE of the matched sample is consistently smaller than that of the black-box sample, meaning that sample matching outperformed the provider’s proprietary method. Pooling all three studies, the matched sample MAE of 7.19 percentage points is significantly lower than the black-box sample MAE of 8.72 percentage points (p<0.049). We observe more muted effects with regard to the census variables. The pooled comparison produces an MAE of 5.36 percentage points for the black-box sample and a statistically indistinguishable MAE of 5.22 for the matched sample (p<0.854). Taken together, sample matching seems to perform as well or better in producing online samples representative of the national population. These improvements are real but also substantively small: 1 to 2 percentage points.
Those small gains also come at considerable cost. Given that providers in the region do not have the capacity to implement sample matching themselves, the only option for researchers is the type of arrangement that we made. This means that researchers must run the matching algorithm, generate a sample of panelists, and provide that sample to the firm. Because some panelists fail to respond, they must be replaced with the next-closest panelist, entailing many iterations between the researchers and the firm. Whereas the provider’s approach completed fieldwork within one to two weeks, fieldwork for the sample matching approach took approximately six weeks—and there were remaining observations in the target sample that we could not complete due to persistent nonresponse. Until survey firms operating in the region develop capacity and provide seamless in-house sample matching, implementing it will imply substantial burdens and longer fieldwork duration. These costs must be balanced against the gains in methodological transparency and sample quality.
Moreover, even with these gains, online samples in Latin America continue to yield high error rates. One reason is coverage error (Couper et al. Reference Couper, Kapteyn, Schonlau and Winter2007). A substantial portion of the Latin American population is not online: in the average country in the region in 2020, only 66% reported using the Internet (see online appendix table A1). Because those without Internet access cannot become online survey panelists, this creates substantial coverage error that no sampling approach or post-hoc adjustment can fix. Even panels in countries with comparatively high levels of Internet use (e.g., Argentina and Chile) might have coverage error if panel providers do not expressly invest in recruiting and maintaining panelists from lower socioeconomic groups.
Consider, for instance, the distribution of our provider’s panels with regard to education. Figure 4 plots the distribution on education for panelists and the census population in the six countries we study. It is easy to see that the distribution of panelists is skewed toward higher levels of education (panel sizes are listed in online appendix table A2). In theory, researchers could compensate for this skew by specifically selecting panelists from lower-education categories or by giving those in the sample more weight. However, the problem with these panels is that they simply do not have a sufficient pool of panelists with lower levels of education. In some cases, they have no panelists at all in the lowest educational category. This coverage error and the reality of differential nonresponse means that online samples consistently underrepresent these groups.
DISCUSSION
Fielding surveys online is less expensive and faster than by telephone or face to face. However, gold-standard probabilistic sampling online is largely unavailable in developing countries, so researchers rely on nonprobability sampling approaches for online surveys in these contexts. Although more and more researchers are relying on these online samples, we know little about how they perform in practice.
Our analysis of nine studies in Latin America is sobering. Online samples from six countries in the region consistently overrepresent the more-affluent portions of the population, producing large errors on average. Furthermore, these results are from some of the wealthier countries in the region, with higher rates of Internet penetration than elsewhere (see online appendix table A1). We might well expect these errors to be even larger in less-affluent countries.
Standard tools for addressing these biases also yield little improvement. Post-stratification did not consistently improve our samples and, although sample matching outperformed the provider’s realized samples, these improvements were small—and required considerable effort to implement. If providers develop the in-house capacity to implement sample matching themselves at no additional cost, then our results clearly indicate that this would be the preferred approach. Nevertheless, there are no silver bullets to mitigate the biases in online samples in developing contexts such as Latin America.
Of course, we cannot know exactly how the provider we contracted drew its samples. Other survey providers may use somewhat different methods, and it even is possible that the provider used slightly different methods across our studies. However, the provider we contracted is well regarded in the region and already widely used by researchers. Future studies should consider other sampling approaches and other providers, but we are skeptical that they will yield substantially different results. Even some of the largest panels available in the region cannot overcome the fact that, in less-affluent countries, online panels do not have sufficient coverage in lower socioeconomic categories. These populations have less access to the Internet, making them difficult for online surveys to reach. This remains a crucial challenge for researchers who want to capitalize on the efficiency and economy of online surveys but who also want to draw inferences about national populations in less-affluent countries.Footnote 13
Even some of the largest panels available in the region cannot overcome the fact that, in less-affluent countries, online panels do not have sufficient coverage in lower socioeconomic categories.
Nevertheless, there may be good reasons for researchers to use online samples in developing contexts such as Latin America. Of course, they are less expensive and faster to collect than telephone and face-to-face samples, and they offer opportunities for visualizing survey questions that may have advantages over aural modes (e.g., when implementing conjoint experiments). Like convenience samples, online samples can be useful to researchers who are conducting experiments (see Mullinix et al. Reference Mullinix, Leeper, Druckman and Freese2015; Samuels and Zucco Reference Samuels and Zucco2014), provided that their conclusions acknowledge the sample’s socioeconomic skew. Online surveys also can be useful for piloting studies or testing question wording in advance of a survey that uses a probability sample. They also have advantages for collecting short-term panel data (e.g., over the course of an election campaign) because responses can be collected quickly. Finally, it may be possible to reduce a study’s costs by mixing online surveys with probability samples of the underrepresented socioeconomic groups.
Nevertheless, researchers should be explicit about these design choices and about the nature of the samples they use; too often they refer to online samples in Latin America simply as “nationally representative.” For now, research in developing contexts aimed at capturing the opinion of the national population with a single mode still requires probability samples drawn via telephone or face to face.
ACKNOWLEDGMENTS
For their comments and advice, we are grateful to participants in the LAPOP Lab workshop. We thank Meg Frost, Martín Gou, and Emily Noh for excellent research assistance.
DATA AVAILABILITY STATEMENT
Research documentation and data that support the findings of this study are openly available at the PS: Political Science & Politics Harvard Dataverse at https://doi.org/10.7910/DVN/MPYI5D.
SUPPLEMENTARY MATERIALS
To view supplementary material for this article, please visit http://doi.org/10.1017/S1049096522001287.
CONFLICTS OF INTEREST
The authors declare that there are no ethical issues or conflicts of interest in this research.