Curation Bubbles

JON GREEN; STEFAN MCCABE; SARAH SHUGARS; HANYU CHWE; LUKE HORGAN; SHUYANG CAO; DAVID LAZER

doi:10.1017/S0003055424000984

Curation Bubbles

Published online by Cambridge University Press: 20 January 2025

and

JON GREEN*: Affiliation:
Duke University, United States
STEFAN MCCABE*: Affiliation:
George Washington University, United States
SARAH SHUGARS*: Affiliation:
Rutgers University, United States
HANYU CHWE*: Affiliation:
Northeastern University, United States
LUKE HORGAN*: Affiliation:
Northeastern University, United States
SHUYANG CAO*: Affiliation:
University of Michigan, United States
DAVID LAZER*: Affiliation:
Northeastern University, United States
*: Corresponding author: Jon Green, Assistant Professor, Department of Political Science, Duke University, United States, [email protected].
Stefan McCabe, Postdoctoral Associate, Institute for Data, Democracy and Politics, George Washington University, United States, [email protected].
Sarah Shugars, Assistant Professor, Department of Communication, Rutgers University, United States, [email protected].
Hanyu Chwe, PhD Student, Network Science Institute, Northeastern University, United States, [email protected].
Luke Horgan, Research Engineer, Network Science Institute, Northeastern University, United States, [email protected].
Shuyang Cao, PhD Student, Department of Computer Science and Engineering, University of Michigan, United States, [email protected].
David Lazer, University Distinguished Professor, Network Science Institute, Northeastern University, United States, [email protected].

Article contents

Abstract
INTRODUCTION
CURATION ON SOCIAL MEDIA
DATA AND METHODS
RESULTS
DISCUSSION
DATA AVAILABILITY STATEMENT
FUNDING STATEMENT
CONFLICT OF INTEREST
ETHICAL STANDARDS
Footnotes
References

Rights & Permissions

Abstract

Information on social media is characterized by networked curation processes in which users select other users from whom to receive information, and those users in turn share information that promotes their identities and interests. We argue this allows for partisan “curation bubbles” of users who share and consume content with consistent appeal drawn from a variety of sources. Yet, research concerning the extent of filter bubbles, echo chambers, or other forms of politically segregated information consumption typically conceptualizes information’s partisan valence at the source level as opposed to the story level. This can lead domain-level measures of audience partisanship to mischaracterize the partisan appeal of sources’ constituent stories—especially for sources estimated to be more moderate. Accounting for networked curation aligns theory and measurement of political information consumption on social media.

Type: Research Article
Information: American Political Science Review , First View , pp. 1 - 19

DOI: https://doi.org/10.1017/S0003055424000984 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press on behalf of American Political Science Association

INTRODUCTION

The internet provides individuals with an essentially limitless amount of information and a high degree of choice in which elements of that information to consume. This has prompted concerns over whether the internet threatens societies’ abilities to establish common bases of reliable information and, by extension, the sustainability of democracy in the twenty-first century. One side of this debate argues that the internet fuels political polarization, as users may choose to avoid conflicting viewpoints (Sunstein Reference Sunstein2002; Reference Sunstein2017) or may be algorithmically steered toward politically favorable content (Pariser Reference Pariser2011). Conversely, an extensive empirical literature on news consumption shows that incidental exposure to politically diverse sources is extremely common (Bakshy, Messing, and Adamic Reference Bakshy, Messing and Adamic2015; Gentzkow and Shapiro Reference Gentzkow and Shapiro2011; Guess Reference Guess2021; Messing and Westwood Reference Messing and Westwood2014), and therefore suggests that concerns over “echo chambers” or “filter bubbles” are overblown.

We argue that networked curation processes lead information consumption on social media in particular to be more politically homogeneous than this empirical literature has thus far suggested. However, this is more a reflection of democracy than a threat to democracy—a product of individuals engaging with information, and each other, on their own terms—highlighting trade-offs between cross-cutting exposure and active participation (Kreiss and McGregor Reference Kreiss and McGregor2023; Mutz Reference Mutz2006; Stroud Reference Stroud2011). Users on social media platforms curate the information they share with others and simultaneously receive curated streams of information tailored to their interests (Davis Reference Davis2017). This involves “unbundling” discrete pieces of information from their parent sources and re-bundling them into user-level streams of content—transforming a hierarchical distribution of information (from sources to consumers) into a networked distribution of information (from users to users).

One implication of this process, which we develop in this article, is that source-level estimates of audience partisanship may mistake heterogeneity for moderation. Users choose other users to follow based on their tendencies to share useful or otherwise appealing information (Barberá Reference Barberá2015), and those accounts will in turn selectively share information from a given source with their network ties (such as followers, friends, or group members) based on the extent to which that content serves social as well as informational functions (Epstein et al. Reference Epstein, Sirlin, Pennycook and Rand2023; Marwick and boyd Reference Marwick and boyd2011). That is, users share information not only to inform others, but also to perform their identities, advance their interests, and generate social returns (such as likes, retweets, or followers). Since individual stories are subject to these networked curation processes, source-level estimates do not reflect cases where individual stories from a given source are useful for different partisan audiences.

We test this implication of networked curation by comparing common measures of audience partisanship at the source and story levels. Virtually all the literature in this area aggregates partisan consumption to the level of the domain, or source, rather than examining the partisan audiences of individual stories (Eady et al. Reference Eady, Nagler, Guess, Zilinsky and Tucker2019; Guess Reference Guess2021; Peterson, Goel, and Iyengar Reference Peterson, Goel and Iyengar2021; Robertson et al. Reference Robertson, Jiang, Joseph, Friedland and Lazer2018). Bakshy, Messing, and Adamic (Reference Bakshy, Messing and Adamic2015) and González-Bailón et al. (Reference González-Bailón, Lazer, Barberá, Zhang, Allcott, Brown and Crespo-Tenorio2023) are notable exceptions, discussed below. Source-level aggregation implicitly assumes that every story from a given source is drawn from a consistent distribution of partisan appeal that attracts a stable ratio of Democratic to Republican users. By contrast, we find evidence of partisan curation bubbles, defined as sets of users who share and consume content with consistent appeal from a variety of sources.Footnote ¹ When users in curation bubbles are able to identify and circulate congenial information from a variety of sources, individual stories may reach audiences atypical of the sources that produced them—introducing heterogeneity into the source’s aggregate audience that doesn’t necessarily reflect heterogeneity in the partisan valence of the information users consume.

We use two large-scale datasets to test for the presence and extent of partisan curation bubbles. First, we analyze sharing patterns on Twitter in 2017 and 2018 using a panel of over 1.6 million user accounts linked to a commercial voter file. We then examine sharing and exposure patterns on Facebook between 2017 and 2021 using data made available through Social Science One (King and Persily Reference King and Persily2020). We consistently find evidence of partisan curation bubbles. The fact that we find substantively similar results in all three analyses (of sharing on Twitter and on Facebook, and of exposure on Facebook) suggests a robust pattern.

CURATION ON SOCIAL MEDIA

The online information ecosystem in the early twenty-first century is characterized by unbundling and abundance. An individual’s news consumption near the end of the twentieth century would typically be clustered in a small number of sources offering packages of information. As one could not read a story in a newspaper without buying at least a single copy of the whole newspaper, information search was largely a search for preferred packages, or sources, from which to habitually consume a variety of information. This could take the form of a subscription to a newspaper that covered news, opinion, sports, and culture—or an opinion magazine that offered a particular editorial direction. Contemporary information consumption presents a fundamentally different proposition as it is largely unbundled at the story level, such that it is practical for individuals to consume information à la carte from a wide range of sources. The task of information search is now less about identifying the most desirable sources and is instead about identifying content of interest from a functionally infinite set of options.

The internet offers individuals several different strategies to manage the task of wading through an ocean of information to identify what they want to see. Centralized aggregators such as search engines and news portals (Fischer, Jaidka, and Lelkes Reference Fischer, Jaidka and Lelkes2020; Robertson et al. Reference Robertson, Green, Ruck, Ognyanova, Wilson and Lazer2023) are perhaps the most obvious and widely used, allowing individuals to input queries ranging from general (“political news”) to specific (“2024 Nevada caucus results”) and receive relevant information in return. Here, we focus on a different, commonly used setting: feed-based social media, in which users follow accounts and posts from these accounts are aggregated into a flow of content. We take curation to be the processes through which people are matched with content that appeals to them. We consider curation to encompass both platform architecture, such as a ranking algorithm, and user choice within that architecture. Following Davis (Reference Davis2017), this includes both consumptive curation, or users’ selection of accounts from which to receive information, and productive curation, or users’ choice of what to share with others. Importantly, consumptive curation effectively delegates the search for relevant information to others. Rather than actively searching for specific information, and rather than choosing news sources to habitually consume information, users choose other users from which to habitually consume information and then scroll through whatever those users choose to post.

Online curation is analogous to prior accounts of the “two-step flow” of information from radio and print media to opinion leaders, and from opinion leaders to ordinary citizens (Katz and Lazarsfeld Reference Katz and Lazarsfeld1955; Lazarsfeld, Berelson, and Gaudet Reference Lazarsfeld, Berelson and Gaudet1948). However, on social media, this has the potential to happen with more structure and on a far larger scale. Rather than opinion leaders (perhaps haphazardly) recounting news they read earlier in the day to another individual, users on social media can immediately and directly share news with hundreds or thousands of other users at a time. Moreover, opinion-leading relationships as envisioned by the Columbia school are largely formed incidentally, as a consequence of proximity within one’s local community (e.g., Lazer et al. Reference Lazer, Rubineau, Chetkovich, Katz and Neblo2010; Minozzi et al. Reference Minozzi, Song, Lazer, Neblo and Ognyanova2020). By contrast, social media allows users much more choice in who they form ties with and why, potentially including the provision of information.

These affordances of social media renewed long-standing concerns over how much choice in information consumption is too much. The ability to pick and choose individual accounts from which to receive political information carries the potential for users to select into politically homogeneous “echo chambers” (Sunstein Reference Sunstein2002; Reference Sunstein2017). The increased reliance on platforms that algorithmically filter, sort, and recommend content prompts parallel concerns over “filter bubbles” (Pariser Reference Pariser2011; Ribeiro et al. Reference Ribeiro, Ottoni, West, Almeida and Meira2020) in which consuming partisan information begets exposure to more partisan information. These related concerns involve the same outcome: politically homogeneous information diets that, in theory, frustrate democratic societies’ abilities to make collective decisions using common bases of reliable information.

Empirical research regarding the extent these potentially undesirable outcomes manifest is mixed (Barberá Reference Barberá, Persily and Tucker2020; Dahlgren Reference Dahlgren2021; Prior Reference Prior2013). This is in part because individuals’ tendencies to engage in selective exposure within their information environments are not as straightforward as early theories regarding the concept predict—in line with early skepticism (Freedman and Sears Reference Freedman and Sears1965; Reference Freedman and Sears1967). While some individuals do select pro-attitudinal sources (Stroud Reference Stroud2011), this does not necessarily mean that they are actively avoiding counter-attitudinal information. Indeed, individuals are especially likely to seek (and subsequently share) pro-attitudinal information when they are exposed to counter-attitudinal information (Garrett Reference Garrett2009; Weeks et al. Reference Weeks, Lane, Kim, Lee and Kwak2017). This dynamic is less obviously concerning, and can take place in the context of healthy deliberative exchange. In addition, people often rely on heuristics other than partisanship when deciding which information to consume, such as topical relevance (Kobayashi and Ikeda Reference Kobayashi and Ikeda2009; Mummolo Reference Mummolo2016) or social endorsements (Messing and Westwood Reference Messing and Westwood2014). As a result, partisan segregation in aggregate news consumption online is typically found to be relatively low (Flaxman, Goel, and Rao Reference Flaxman, Goel and Rao2016; Gentzkow and Shapiro Reference Gentzkow and Shapiro2010; Guess Reference Guess2021).

This finding initially extended to social media. Early research on Facebook showed that since friendship ties formed for a variety of reasons—many of which were incidental to politics—Facebook users were frequently exposed to politically distant sources (Bakshy, Messing, and Adamic Reference Bakshy, Messing and Adamic2015; Bakshy et al. Reference Bakshy, Rosenn, Marlow and Adamic2012). However, when extending this analysis on Facebook to Pages and Groups, which form for more specific reasons, González-Bailón et al. (Reference González-Bailón, Lazer, Barberá, Zhang, Allcott, Brown and Crespo-Tenorio2023) find stronger evidence of political segregation in information consumption—consistent with other work finding evidence of political homophily on social media (Conover et al. Reference Conover, Ratkiewics, Francisco, Goncalves, Menczer and Flammini2021; Reference Conover, Goncalves, Flammini and Menczer2012).Footnote ² Put simply, one may be friends with a politically distant acquaintance or relative on Facebook in spite of their politics but follow a Page because of its politics, which will have consequences for the diversity of information to which one is exposed. Moreover, pro-attitudinal information spreads more quickly, is consumed more frequently, and is received more approvingly within political communities on social media sites than counter-attitudinal information (Garz, Sörensen, and Stone Reference Garz, Sörensen and Stone2020; Halberstam and Knight Reference Halberstam and Knight2016). This imbalance is likely attributable to the political information users choose to share on social media sites. Sharing information with one’s followers is inherently more public than consuming it oneself, and can be used to signal (or, from the opposite perspective, infer) political identities and commitments (Marwick and boyd Reference Marwick and boyd2011; Settle Reference Settle2018). In the rare instances, in which users share political information from opposing partisans, it is often accompanied by negative comments that indicate disagreement (Cinelli et al. Reference Cinelli, Morales, Galeazzi, Quattrociocchi and Starnini2021; Wojcieszak et al. Reference Wojcieszak, Casas, Yu, Nagler and Tucker2022).

The Facebook Page, in the above example, is acting as a curator—an account that shares or reshares content. An account that posts a link to a story in the New York Times is identifying that content as worthy of attention. Consumers are accounts on social media that are exposed to content. Users have the ability to act as both a consumer and curator, though in practice the vast majority of productive curation is done by a small number of accounts (Grinberg et al. Reference Grinberg, Joseph, Friedland, Swire-Thompson and Lazer2019; Hughes et al. Reference Hughes, McCabe, Hobbs, Remy, Shah and Lazer2021; Wojcik and Hughes Reference Wojcik and Hughes2019) who are more politically active offline (and exhibit more partisan extremity) than users who do not post about politics themselves (Hughes Reference Hughes2019). These curators, in turn, take an active role in purposively identifying individual stories to share, and how to frame those stories for their followers (Billard Reference Billard2021; Park and Kaye Reference Park and Kaye2018). Importantly, curators do not necessarily share information solely for information’s sake—the act of sharing specific information (as opposed to other information one could potentially share) is a means by which users can signal aspects of their identity that are important to them (e.g., Osmundsen et al. Reference Osmundsen, Bor, Vahlstrup, Bechmann and Petersen2021; Van Bavel et al. Reference Van Bavel, Harris, Pärnamets, Rathje, Doell and Tucker2021). Political information sharing on social media will therefore likely feature partisan curators—that is, users who selectively share information that promotes their political in-group or detracts from political out-groups. These users are, in a sense, performing “hidden labor” for their preferred party, attempting to shape the character of online discourse by selectively sharing politically favorable information.Footnote ³

The underlying logic and architecture of information sharing on social media is therefore likely to produce curation bubbles, or sets of users who share and consume content with consistent appeal from a variety of sources. We view curation bubbles as a general property of social media not limited to partisanship. For example, Taylor Swift fans will curate information related to Taylor Swift from a variety of sources. This will include atypical sources, such as ESPN, when ESPN publishes stories about Taylor Swift. Here, we are interested in partisan curation bubbles, or users who tend to share (and, through homophilous tie formation, see) politically consistent information from a variety of sources.

If and when politically neutral or distant sources publish individual stories that are useful for promoting partisan identities and interests, partisan users will share them with their followers (who are in turn likely to be co-partisans themselves), introducing heterogeneity into those sources’ audience for their constituent stories. By extension, partisan curation bubbles are formed via co-partisan users sharing information favorable to their party. The breadth and variety of information available on the internet allows partisan users to easily find politically favorable information (Peterson and Iyengar Reference Peterson and Iyengar2021). This information can originate from a variety of sources, and indeed partisans tend to overestimate the extent to which mainstream outlets perceived as ideologically distinct offer substantively different coverage (Peterson and Kagalwala Reference Peterson and Kagalwala2021). Furthermore, politically favorable information may be most useful for promoting one’s party precisely when it is attributable to a source perceived to be politically neutral or distant, as this can increase its credibility (Baum and Groeling Reference Baum and Groeling2009). Temporal variation in whether the news is broadly favorable to the political left or right can also introduce selective engagement with the news itself (Kim and Kim Reference Kim and Kim2021), which would lead to variation in which partisan curation bubbles are circulating more or less raw information at any given time.

Figure 1 provides an illustrative characterization of the partisan curation process, in comparison to a process solely driven by users consuming information directly from sources. In both cases, there are three sources that, based on users’ overall consumption behavior, appear to be left-leaning, neutral, and right-leaning, respectively. In Figure 1a, this is reflected by two left-leaning users consuming the left-leaning source, two right-leaning users consuming the right-leaning source, and all four users consuming the neutral source. In Figure 1b, there are two curators mediating these users’ consumption. Curator A only shares blue stories with the two left-leaning users who follow them and Curator B only shares red stories with the two right-leaning users who follow them, irrespective of the sources that produced those stories. The pattern of consumption is quite integrated at the producer level; yet is completely segregated at the story level. Source B, in particular, appears neutral overall not by producing stories that all users consume, but by producing stories that are curated by either left-leaning or right-leaning users.

Figure 1. Stylized Examples

Partisan curation bubbles carry implications for how researchers understand the political valence of the information being shared on social media. Empirical researchers frequently quantify source-level partisan slant using estimates of the partisanship of news outlets’ overall audiences (e.g., Eady et al. Reference Eady, Nagler, Guess, Zilinsky and Tucker2019; Garimella et al. Reference Garimella, Smith, Weiss and West2021; Guess Reference Guess2021; Robertson et al. Reference Robertson, Green, Ruck, Ognyanova, Wilson and Lazer2023). These estimates typically represent a normalized ratio of how often URLs from the given domain were shared by Democrats compared to Republicans. For instance, a domain shared exclusively by Democrats would receive a score of−1, a domain shared exclusively by Republicans would receive a score of 1, and a domain shared by equal numbers of Democrats and Republicans would receive a score of 0. The major exceptions that construct scores at the URL as well as domain level are Bakshy, Messing, and Adamic (Reference Bakshy, Messing and Adamic2015) and González-Bailón et al. (Reference González-Bailón, Lazer, Barberá, Zhang, Allcott, Brown and Crespo-Tenorio2023). The former evaluates exposure to cross-cutting partisan content on Facebook; the latter examines segregation in news consumption on Facebook. Both sets of results are consistent with the possibility of partisan curation bubbles, but neither directly studies their presence.

The core assumption of this approach—common to many approaches for quantifying political valence on social media (e.g., Barberá Reference Barberá2015)—is that behavior reflects revealed preferences. This is agnostic to the substance of the content in question, in contrast with methods that infer the slant of a given story or source based on its text (Gentzkow and Shapiro Reference Gentzkow and Shapiro2010; Ho and Quinn Reference Ho and Quinn2008).Footnote ⁴ This can make reputations self-fulfilling. If, for example, Republican users avoid the New York Times because it is regarded as left-leaning, then the New York Times will garner left-leaning audience regardless of what the newspaper publishes (Peterson and Kagalwala Reference Peterson and Kagalwala2021)—which will carry through to its location on the [−1,1] scale. Similarly, a score of 0 doesn’t mean that the domain is “neutral” in any sense deeper than that it was shared by Democrats and Republicans at equal rates. In other words, the average partisanship of sources’ audiences is a relative measure of partisanship, not an absolute one (Guess Reference Guess2021; Robertson et al. Reference Robertson, Jiang, Joseph, Friedland and Lazer2018).

As the stylized example in Figure 1 suggests, partisan curation bubbles have the potential to distort estimates of partisan appeal at the source level because they can introduce substantial heterogeneity at the story level. Importantly, this distortion is unlikely to be uniform—there will be more story-level heterogeneity in audience partisanship within sources that carry more moderate overall estimates. The reason for this is mechanical as well as theoretical: there is only one way for individual stories to aggregate to an extreme domain score (circulation among consistently partisan audiences), but there are two ways to produce a moderate score. A moderate domain can produce stories that are consistently circulated by both Democrats and Republicans at relatively even rates, or it can produce stories that are disproportionately circulated by either Democrats or Republicans. The moderate domain-level average will only reflect the individual stories that produced it in the former case. However, as partisan curators selectively share stories that are socially useful for them, the latter will frequently occur (we expand on this point in Appendix A of the Supplementary Material).

Finally, we note that our theoretical framework is agnostic as to the potential role of social media platform’s recommendation algorithms. While algorithmic curation is undoubtedly important for determining which information users see, algorithms themselves do not inevitably lead to the consumption of politically homogeneous information. Algorithms can optimize on a variety of criteria, some politically salient and others not (Bandy and Diakopoulos Reference Bandy and Diakopoulos2021; Fischer, Jaidka, and Lelkes Reference Fischer, Jaidka and Lelkes2020), and different platforms may make different design choices that could (either intentionally or incidentally) encourage or discourage exposure to counter-attitudinal information (Garrett and Resnick Reference Garrett and Resnick2011). For example, algorithm-based recommendations from centralized news aggregators such as MSN or Google may be more likely to direct users toward large, mainstream sources than they are to direct users toward niche, ideological sources (Guess Reference Guess2021). The best evidence in this area on social media in particular comes from a platform-wide experiment on Twitter, which found that its algorithmic timeline led users to be exposed to more political content than users who remained on chronological timelines (Huszár et al. Reference Huszár, Ktena, O’Brien, Belli, Schlaikjer and Hardt2022). However, that same study found inconsistent effects with respect to whether the amplification of political content was disproportionately in favor of left- or right-leaning content. While we are unable to isolate the potential contributions of platforms’ algorithms to our empirical findings, we view it as exceedingly unlikely for individual stories to circulate among politically atypical audiences in the absence of users intentionally curating those stories for their social ties.

Hypotheses

Our theoretical framework carries a set of empirical implications that we test in this article.

First, because networked curation occurs at the story level, we expect to observe audience heterogeneity within sources.

Hypothesis 1a: Productive curation. The partisan composition of sharing behavior will exhibit story-level heterogeneity within sources.
Hypothesis 1b: Consumptive curation. The partisan composition of viewing behavior will exhibit story-level heterogeneity within sources.

We further expect that this heterogeneity corresponds with substantive differences in the latent partisan appeal of the information being circulated and that it is not, in expectation, due to idiosyncrasies such as “hate-sharing” or noise.

Hypothesis 2: Partisan audience scores estimated at the story level will reflect the substantive partisan appeal of those stories.

Finally, we expect systematic variation in the extent to which partisan curation bubbles pose a challenge to interpreting and using source-level estimates of audience partisanship. Specifically, moderate domain-level estimates of audience partisanship are more likely to mischaracterize the partisan audience for any given story. Relatively more extreme source-level estimates, by contrast, will more frequently reflect the audiences for each individual story.

Hypothesis 3: Moderate source-level scores will more frequently mischaracterize the partisan appeal of their constituent stories.

We illustrate these general points in Table 1 that reports the stories (URLs) from the Wall Street Journal with the 10 most Republican audience scores and the 10 most Democratic audience scores. Domain-level scores often identify the Wall Street Journal as “neutral” (Bakshy, Messing, and Adamic Reference Bakshy, Messing and Adamic2015; Gentzkow and Shapiro Reference Gentzkow and Shapiro2010), and it carries a relatively centrist domain-level audience score of –0.34 in our Twitter data. However, we see that this score, if applied to every story produced by the Wall Street Journal, fails to adequately describe its cross-cutting content. The stories disproportionately circulated by Republicans are largely conservative opinion pieces from the editorial page. In contrast, none of the stories disproportionately circulated by Democrats are opinion pieces; they are mainstream reporting with content that is good news for Democrats and bad for Republicans. In other words, the Wall Street Journal does not have a consistent moderate audience; it produces individual stories with differential partisan appeal that reach different partisan audiences. This is the dynamic we will explore and test through our curation bubbles framework.

Table 1. Top Headlines from the Wall Street Journal, Minimum 250 Shares

Note: On the left, the ten headlines with the most left-leaning audience scores; on the right, the ten headlines with the most right-leaning audience scores. Audience scores drawn from Twitter sample.

DATA AND METHODS

We use data from Twitter and Facebook to examine both domain- and story-level partisan curation bubbles in the United States. This cross-platform comparison allows for validation of our key results, but comes with challenges. People use different social media platforms for different reasons (Evans et al. Reference Evans, Pearce, Vitak and Treem2017), meaning that we would expect variation in engagement and exposure across the two platforms. However, both platforms are of scientific interest, with Twitter being particularly influential among journalists and Facebook being the social media platform the general public most frequently uses for news consumption overall (Jurkowitz and Gottfried Reference Jurkowitz and Gottfried2022; McGregor and Molyneux Reference McGregor and Molyneux2018; Molyneux and McGregor Reference Molyneux and McGregor2021). While these platforms’ user bases differ in size and composition, leading us to expect variation in the precise stories and domains which circulate on each site, users on both platforms engage in similar styles of networked curation.

Perhaps the bigger challenge to cross-platform analysis is methodological. Data on Twitter and Facebook are collected and structured differently, requiring slightly different approaches for estimating partisanship, as discussed in detail below. These differences, however, also come with opportunities. Our Twitter data contain information for individual users with fine-grained measures of their likely partisan affiliation. Our Facebook data do not contain such individual-level data, but it does include clicks, reactions, and views in addition to shares. Each platform therefore allows us to test phenomena that the other does not.

In total, our Twitter data consist of 405,531 unique URLs shared on that platform between January 1, 2017 and December 31, 2018. These URLs originated from a total of 8,378 domains. Our Facebook data consist of 218,395 unique URLs from 908 domains shared between January 1, 2017 and February 28, 2021. We analyze less content on Facebook because we focus on domains and URLs less affected by privacy-preserving noise. Each of these datasets and their partisanship measures are described in detail below.

Dataset 1: Twitter Users with Matched Voter Data

For this study, we collected tweets from a panel of Twitter users matched to U.S. voting records. Taking a user-focused approach to data collection allows us to identify a consistent population over time and to bring in user-level demographic information, including measures of party affiliation. A pilot version of this dataset was described in Grinberg et al. (Reference Grinberg, Joseph, Friedland, Swire-Thompson and Lazer2019) and more descriptives are provided in Hughes et al. (Reference Hughes, McCabe, Hobbs, Remy, Shah and Lazer2021) and Shugars et al. (Reference Shugars, Gitomer, McCabe, Gallagher, Joseph, Grinberg and Doroshenko2021). For purposes of this analysis, two details from those papers are relevant: first, although slightly more white and female than the population of American Twitter users, our panel is otherwise generally representative of Twitter users (Hughes et al. Reference Hughes, McCabe, Hobbs, Remy, Shah and Lazer2021); second, our vendor for voter data (TargetSmart) provides a modeled estimate of party identification that correlates well with aggregate electoral results, allowing us to avoid the vagaries of interpreting party registration across states (Shugars et al. Reference Shugars, Gitomer, McCabe, Gallagher, Joseph, Grinberg and Doroshenko2021).Footnote ⁵

Panel users were identified in 2017. Starting with 290 million profiles retrieved from Twitter’s 10% Decahose sample, we searched for profiles in which the Twitter names (display name or handle) and locations matched entries in the voter file that were unique at the city level (or state level, if the Twitter profile does not list city). We successfully matched 1.6 million accounts corresponding to registered U.S. voters. Because some users may go inactive, this represents an upper bound on our population size. Once identified, we retroactively collected panelists’ past tweets dating back to 2010. Since 2017, we have regularly collected all new, publicly posted panelists’ tweets.

In this analysis, we analyze URLs shared or retweeted by our panelists between January 1, 2017 and December 31, 2018. We include retweets in our analysis because this is the primary means of sharing content authored by others. While it is certainly the case that, on the margins, some sharing (and by extension in our Facebook data, clicking, reacting, and viewing) behavior is done with disapproval of the underlying content, past work has found that on Twitter this sort of “hate sharing” is concentrated in quote tweets (Wojcieszak et al. Reference Wojcieszak, Casas, Yu, Nagler and Tucker2022), and so we exclude URLs shared through this mechanism from our analysis.Footnote ⁶ We restrict our focus to URLs shared a minimum of ten times, giving us an initial set of 1,404,035 unique URLs originating from 82,293 domains. After excluding URLs not likely to contain political content (see discussion below) and domains with fewer than one thousand total shares, we focus our analysis on a subset of 369,675 politically relevant URLs originating from 718 domains.

Dataset 2: Facebook URLs

To analyze partisan circulation of content on Facebook, we use the Facebook Open Research & Transparency (FORT) URLs Shares dataset which is available to researchers through a collaboration with Social Science One (King and Persily Reference King and Persily2020; Messing et al. Reference Messing, DeGregorio, Hillenbrand, King, Mahanti, Mukerjee and Nayak2021). The dataset counts the number of people who viewed, clicked, reacted to, or shared any given URL. A URL must have received at least one hundred public shares to be included in the dataset, but data from private individuals are included in the released counts. The dataset fulfills differential privacy guarantees by adding Gaussian noise to all counts (Messing et al. Reference Messing, DeGregorio, Hillenbrand, King, Mahanti, Mukerjee and Nayak2021). Because the Gaussian function is constant across URLs, the signal-to-noise ratio is highest for high-engagement URLs.

Specifically, we begin by collecting all URLs from domains shared on Facebook at least 1 million times in the US between January 1, 2017 and February 28, 2021. This initial sweep yields 5,545,381 URLs from 1,132 domains. We then impose three filtering processes. First, to avoid irregularities introduced by the added statistical noise (Buntain et al. Reference Buntain, Bonneau, Nagler and Tucker2023), we only consider URLs that have been shared at least 1,000 times, viewed 10,000 times, clicked 5,000 times, and reacted to 5,000 times. Second, we remove all URLs not classified as political. Finally, we remove domains with fewer than ten unique URLs. These three filtering processes trim the Facebook dataset to 214,995 unique political URLs from 780 domains.

Classifying Political Content

Since we are interested in the curation of political content, we filter both datasets to URLs we classify as political. To do this, for every URL, we retrieve the title and “blurb”—the short text which is displayed for a URL on social media. For Facebook, this information is directly available through the FORT URLs Shares dataset. For Twitter, we scrape this information.Footnote ⁷ For both platforms, we classify each URL as related to politics or not politics using a convolutional neural network and word vectors initialized with the GloVe pretrained embedding (Pennington, Socher, and Manning Reference Pennington, Socher and Manning2014). The final classifier is trained on New York Times, Wikipedia, and Facebook data and achieves a precision of 99% and a recall of 92%.Footnote ⁸

Estimating Partisanship

We estimate a URL’s partisanship as the average partisanship of interactions with that content. This means slightly different things on different platforms, though we have conducted our analysis so as to make the platforms as closely comparable as possible. We elaborate on relevant measurement considerations here.

Our Twitter panel is matched to a commercial voter file that includes a reliable modeled estimate of each user’s likelihood of identifying as a Democrat on a 0–100 scale.Footnote ⁹ Where possible, we use this numeric representation of likely Democratic identification rather than trichotomizing the measure to partisan categories in order to preserve as much information about model uncertainty as possible. For the purposes of capturing the partisanship of sharing behavior in a manner comparable to traditional approaches, we implement a linear transformation of this score by subtracting it from 50 and then dividing by 50 to put this modeled estimate on $ [-1,1] $ scale running from most Democratic to least Democratic. This is preferable to relying solely on party registration, which is not collected in all states.

Our Facebook data do not include a direct measure of user partisanship, but aggregate interaction counts into five categories of political ideology: −2, −1, 0, +1, and +2, from very liberal to very conservative. These labels are included with the FORT URL Shares dataset and are estimated based on the political pages a user follows, similar to Barberá et al.’s tweetscores (Barberá et al. Reference Barberá, Jost, Nagler, Tucker and Bonneau2015; Messing et al. Reference Messing, DeGregorio, Hillenbrand, King, Mahanti, Mukerjee and Nayak2021). While typically interpreted as ideology, this measure anchors Democratic politicians on one side and Republican politicians on the other. For consistency with our Twitter data, we therefore refer to this measure as partisanship.

Added Gaussian noise in the Social Science One data can make this calculation difficult: popular but hyperpartisan URLs in our Facebook dataset are shown as having negative share counts among out-partisans. Constructing audience scores with these negative counts could lead URLs to fall outside of the $ [-1,1] $ range when normalizing. To avoid this issue, for URLs with any negative counts in any political categories, we add the largest absolute value of category-level negative counts to all categories. This coerces the minimum category-level count to zero and constrains the resulting audience score to the $ [-1,1] $ range. This allows us to calculate political scores in a straightforward manner without substantively altering our methodological approach or eventual results.

For both platforms, we then calculate a URL’s partisan audience score as the average partisanship of its interactions. For Twitter, this means assigning each sharing event the modeled partisanship of the user who shared the URL, and then averaging those scores. For Facebook, we construct this average based on the counts of shares in the five partisanship categories, weighting interactions by the partisan values of −2 to +2, before normalizing to the common $ [-1,1] $ scale. Consistent with past work finding that Twitter has a more left-leaning user base than Facebook (Wojcik and Hughes Reference Wojcik and Hughes2019), the total average audience score on Twitter is −0.39, whereas on Facebook, the analogous score (based on sharing behavior) is −0.12.

We use sharing behavior on both Twitter and Facebook as our primary estimate of content partisanship on these platforms. In order to preserve information regarding individuals who share the same content multiple times, we use the total number of shares rather than the unique number of sharers which has been used in past work (Bakshy, Messing, and Adamic Reference Bakshy, Messing and Adamic2015; Robertson et al. Reference Robertson, Jiang, Joseph, Friedland and Lazer2018).Footnote ¹⁰

In addition to this cross-platform comparison, we calculate three more estimates of content partisanship on Facebook using measures of views, clicks, and reactions. While only available for our Facebook dataset, these measures establish the robustness of our main findings and give additional insight into the multifaceted curation process of social media. “Views” refer to the number of times a piece of content appeared within a user’s feed; “clicks” capture the consumption choices of what users click on once they are exposed; and finally, “reactions” indicate users’ public responses of “like,” “love,” “haha,” “wow,” “sorry,” or “anger.”

Statistical and Substantive Evidence of Curation Bubbles

We use a number of strategies to test the empirical implications of our curation bubbles framework. Our tests of H1a and H1b begin with comparing URL- and domain-level audience scores on each platform. If story-level partisan composition follows source-level composition, these distributions will be similar. However, if story-level partisan composition is heterogeneous as we hypothesize, these distributions will differ. Specifically, the distributions of URL-level scores should exhibit more extremity than the distributions of domain-level scores.

On Facebook, we are able to test both productive (H1a) and consumption (H1b) curation. While the sharing data of Twitter only allow for testing H1a, the user-level data from this platform allow for further tests by user partisanship. For example, we can compare the average domain-level audience score of URLs Democrats share with the average URL-level audience score of URLs Democrats share. This allows us to further test H1a by examining possible partisan drivers of our results.

Differences between URL and domain-level distributions are important because we expect that scores estimated at the story level will reflect the substantive partisan appeal of those stories (H2). To test this, we had a team of hand coders evaluate the partisan appeal of a sample of one thousand news stories drawn from our Twitter data. These hand coders, a collection of graduate students and postdocs, were asked to evaluate the appeal of selected stories to Democrats (−1) or to Republicans (1), or both equally (0). The full instructions are included in Appendix B of the Supplementary Material. We sampled URLs for coding with probability proportional to the absolute deviation between the URL-based audience score and domain-based audience score; that is, we oversampled stories in partisan curation bubbles. Each coder evaluated five hundred stories (Krippendorf’s $ \alpha $ = 0.673), and we averaged the results to produce a hand-coded score of partisan appeal on the same scale as the URL-based audience score.

Finally, we expect systematic variation in how well domain estimates capture the substantive partisan appeal of their constituent stories (H3). Specifically, we expect that moderate domain scores are more likely to mischaracterize the partisan appeal of stories (URLs) from that domain. We test this by estimating the extent to which we can statistically distinguish URL-level audience scores from their parent domains’ audience scores. These tests also serve as an important robustness check for H1a and H1b, demonstrating that differences in distributions are not merely due to partisan variation in the volume of URLs associated with different types of sources.

More formally, we test the extent to which individual URLs have partisan audiences that are statistically and substantively distinguishable from the aggregate audience of their associated domain. This assumes that, in the absence of partisan curation bubbles, URL-level estimates of audience composition would be sampled from a normal distribution centered at the domain-level audience score.Footnote ¹¹ We can then test, for each constituent story, whether its observed audience score is statistically distinguishable from a story generated under this null hypothesis.

By way of example, consider individual URLs associated with the New York Times on Twitter. The mean partisanship (and therefore the domain score) for nytimes.com is $ -0.59 $ and the standard deviation is $ 0.59 $ . Under the null hypothesis of no partisan curation bubbles, we would expect most URLs to have audience scores that fall within an interval characterized by the standard error of the mean, with the width depending on our chosen confidence level. For the New York Times, the interval will be centered on $ -0.59 $ , with the width depending on our chosen confidence level, the domain-level standard deviation of shares $ (0.59) $ and on the square root of the number of URL shares. One such URL, an editorial criticizing a court decision on voter-registration policies,Footnote ¹² was shared 75 times and has a URL score of−0.72. This point estimate is to the left of the domain-wide average, but is not statistically distinguishable from it at the 99% confidence level. as this score falls within the interval $ -0.59\pm 2.57\frac{0.59}{\sqrt{75}}=[-0.76,-0.41] $ . On the other hand, an editorial denouncing the shooting of Representative Steve ScaliseFootnote ¹³ (175 shares, URL score $ 0.25 $ ) falls well outside of the interval, $ -0.59\pm 2.57\frac{0.59}{\sqrt{175}}=[-0.70,-0.47] $ , allowing us to infer that the audience for this specific story does not reflect a data-generating process in which every New York Times story’s audience is sampled from an identical distribution.

One concern with this approach is that, given the large share volume for many stories, we could reject the null hypothesis of no difference despite trivial substantive differences. To address this, we conduct a further test for substantive significance by widening the confidence intervals by 0.1 in each direction, analogous to the use of two one-sided tests in equivalence testing (Rainey Reference Rainey2014). Testing for differences of at least 0.1 on the [−1,1] scale (i.e., 5% of the full possible range) corresponds to perceptible differences in audience composition on the left and the right. For example, the difference in Twitter-based domain scores between breitbart.com and nationalreview.com is 0.11 and the difference between thenation.com and theatlantic.com is 0.09.

For each domain, we then calculate the proportion of constituent URLs that have audience scores statistically and substantively distinct from the domain-level average. For example, for the New York Times, only 5% of stories are substantially different from the domain score of –0.59. Per H3, we expect this proportion to be higher for more moderate domains (e.g., those with a score close to 0), and lower for more extreme domains (e.g., those closer to –1 or 1).

RESULTS

We evaluate: whether story-level heterogeneity in partisan appeal is reflected in sharing behavior (H1a) and viewing behavior (H1b); whether partisan appeal is recognizable at the story level (H2); and whether this heterogeneity leads moderate source-level estimates to mischaracterize the partisan appeal of their constituent stories more frequently than extreme source-level estimates (H3).

Testing for Productive and Consumptive Curation

We first show evidence of productive curation (H1a) by plotting the distributions of domain- and URL-level audience scores based on sharing behavior on Twitter and Facebook in Figure 2. Here, we focus on domains within the top quartile by number of political URLs, though in Appendix I of the Supplementary Material, we show that these results are consistent across a variety of thresholds. In both cases, the distributions of URL and domain score differ from one another. The differences are statistically significant under a Kolmogorov–Smirnov test (Facebook: $ D=0.12;\hskip0.3em p=0.0101 $ ; Twitter: $ D=0.23;\hskip0.3em p=0.0003 $ ). When visually comparing the distributions between platforms, it may seem counter-intuitive that the Twitter distributions have a larger test statistic. The KS test is comparing the largest absolute deviation between the empirical cumulative distribution functions rather than the whole shape of the distribution. If you use a distance measure that compares the full distribution—Wasserstein distance—the larger substantive difference between the distributions on Facebook is apparent ( $ D=0.18 $ for Facebook; $ D=0.08 $ for Twitter). Here, what is relevant is that in both cases the Kolmogorov–Smirnov test rejects the null hypothesis of no differences between the domain- and URL-level distributions. As we show below, the relative visual similarity between the domain- and URL-level distributions on Twitter masks substantial heterogeneity across domains.

Figure 2. Distribution of URL- and Domain-Level Scores Based on Sharing Behavior for Twitter and Facebook

Note: Data limited to domains in the top quartile of unique political URLs.

In our Twitter sample, there is considerably more political news sharing on the political left than on the right (the platform-wide audience score is –0.39), which is at least partially attributable to the partisan composition of Twitter’s user base and our sample of Twitter users (Hughes et al. Reference Hughes, McCabe, Hobbs, Remy, Shah and Lazer2021), while Facebook is more balanced (its corresponding score is –0.14). However, on both platforms—and especially on Facebook—the distributions of audience scores at the URL level exhibit more extremity than they do at the domain level —providing preliminary support for H1a.

To better understand productive curation in the context of a platform’s user base, we further examine variation in domain- and URL-level audience scores by users’ modeled partisanship (Figure 3). For ease of visualization, here we trichotomize our user-level measure of modeled partisanship into likely Democrats ( $ p(Dem)>0.65 $ ), unlikely Democrats ( $ p(Dem)<0.35 $ ), and users for whom likely partisanship is uncertain ( $ 0.35\le p(Dem)\le 0.65 $ ). The figure shows that domain-level scores mischaracterize the sharing profiles of a significant number of users in all three groups,Footnote ¹⁴ but especially those who are unlikely to be Democrats. Under a domain measure, 28% of unlikely Democrats’ political information sharing is, on average, to the right of 0; and 1.5% is to the right of 0.5. Using a URL measure, these respective percentages are 39.1% and 7.3%. This indicates that there are a substantial number of unlikely Democrats who tend to share political information from sources with generally left-leaning audiences when specific stories from those sources are disproportionately shared by right-leaning users, further supporting H1a.

Figure 3. Twitter URL- and Domain-Level Partisanship of Shared Political URLs, by Modeled Partisanship of Users

Note: Only users who shared at least five politics-related URLs are included.

Figure 4. URL Scores by Share Volume for Selected Domains on Twitter and Facebook

Note: Points represent URLs, colors represent relationship to domain-level score. Gray points are not statistically distinguishable from the domain-level average, yellow points are statistically but not substantively ( $ >0.1 $ ) distinguishable, blue points are substantively more left-leaning, and red points are substantively more right-leaning. Total proportion of URLs substantively distinct from domain-level average shown in facet subtitles.

We illustrate this dynamic in Figure 4, which plots individual URLs’ audience scores and share volume on both Twitter and Facebook for the New York Times, the Wall Street Journal, Mediaite, Fox News, RT, and Reason. These cases are illustrative in that they vary in size and overall audience partisanship. The domain-level audience score for each is shown with a vertical dashed line; individual stories with audience scores statistically and substantively distinguishable from the domain score at the 99% level are shown with greater opacity, and in blue (more Democratic than expected under the null) or red (less Democratic than expected under the null), relative to those that are within this uncertainty interval. Stories that could be statistically significantly distinguished from the source under a null hypothesis of no difference, but whose difference did not meet our threshold of substantive significance, are shown in yellow. This figure shows that, even for sources with more extreme overall audience scores and a relatively lower share of stories in partisan curation bubbles, atypical partisan audiences do often find specific information from those sources to circulate at high volume, consistent with H1a.

This figure also previews the dynamic we will systematically test in H3. For sources with neutral domain scores, story-level fluctuations between different partisan audiences are in some cases the norm—especially for the stories that circulate at high volume. The Wall Street Journal’s well-known divide between its “hard news” and editorial content, discussed above, is further apparent here—as is Mediaite’s idiosyncratic audience.Footnote ¹⁵ While we further test H3 across all sources, these results provide preliminary evidence that domain-level scores near zero cannot be straightforwardly interpreted as indicating reliably neutral content. Moreover, it is precisely the domains with the most neutral audience scores that exhibit the most within-domain partisan heterogeneity. These domains are not garnering neutral audience scores solely by producing content that is consistently shared by Democrats and Republicans at equal rates; they often produce content that is alternately shared by either Democrats or Republicans disproportionately.

We first test whether the heterogeneity we observe in productive curation (H1a) extends to consumptive curation (H1b). Using Facebook’s FORT URLs dataset, we recalculate domain and URL-level scores using the consumptive measures of clicks, reactions, and views, along with the productive measure of shares for comparison. The results are shown in Figure 5. While we find that public-facing behaviors (shares and reactions) exhibit more extremity than private behaviors (clicks and especially views), we again find that the domain-based approach consistently understates the partisan extremity of engagement, regardless of how engagement is measured. This supports H1b, showing that story-level heterogeneity in sharing behavior (productive curation) is carried through to story-level heterogeneity in viewing behavior (consumptive curation).

Figure 5. Distributions of URL-Level Audience Scores by Engagement Type (Facebook)

Note: Data limited to domains in the top quartile of unique political URLs.

Substantive Differences in Partisan Appeal

We argue that this heterogeneity is attributable to substantive differences in story-level partisan appeal (H2). To test this, we use the subset of one thousand URLs for which we have hand coded partisan appeal. Specifically, we test whether variation in humans’ assessments of whether Democrats or Republicans would view a story more favorably is better explained by URL-level or domain-level audience scores. We find there is a strong correlation ( $ r=0.75 $ ) between the URL audience scores and the evaluated partisan appeal, and that this relationship is much stronger at the URL level than at the domain level ( $ r=0.55 $ ). This is further illustrated in Figure 6, which plots human-evaluated partisan appeal against URL-level audience scores, with domain-level audience scores reflected in the color gradient. When partisan appeal is regressed against URL and domain scores together (Table E.1 in the Supplementary Material), an F-test supports the inclusion of domain scores as improving model fit ( $ F=41.01 $ ), but the substantive improvement is minimal, shifting the adjusted $ {R}^2 $ from 0.554 to 0.571. Put simply, we find support for H2: variation in URL-based audience scores reflect variation in the substantive partisan appeal of a story, and this is not a product of source cues.

Figure 6. Hand-Coded Partisan Appeal against Twitter-Based Audience Scores

Note: The dashed black line is the identity line ( $ y=x $ ). The solid orange line is line of best fit. Articles with large URL score–domain score discrepancies were oversampled for hand coding. See Table E.1 in the Supplementary Material for related regression results.

Differential Implications for Domain-Level Estimates

Finally, we test systematic differences in the extent to which curation bubbles distort estimates of partisan audiences across the [−1,1] scale at the source level (H3). Figure 7 shows, for our Twitter data, the proportion of each domain’s stories that significantly differ from the domain-level audience score under a null hypothesis of differences less than 0.1 (i.e., the proportion of stories that are substantively distinct from the domain-level average) by the domain-level audience score, with illustrative sources labeled. We supplement this analysis in Figure 8, showing the same dynamics across both productive and consumptive measures using our Facebook data.

Figure 7. Proportion of URLs Substantively Distinct from Domain by Domain-Level Audience Score for Twitter

Note: The solid line indicates loess curve of best fit. Data limited to domains in the top quartile of unique political URLs.

Figure 8. Proportion of URLs Substantively Distinct from Domain for Different Facebook Engagement Types

Note: The solid line indicates loess curve of best fit.

While Facebook’s larger user base and higher volume narrows confidence intervals such that larger proportions of stories are statistically distinguishable from their domain-level averages in general, both platforms show a clear trend. While every domain at least occasionally produces stories that circulate among atypical partisan audiences, this is significantly more common as domains’ audience scores approach zero (see OLS regressions in Appendix H of the Supplementary Material). That is, the more neutral the domain-level audience score, the more frequently that domain’s constituent stories have partisan audience scores that are substantively different than the domain’s audience as a whole. This supports H3, showing that moderate domain scores tend to mischaracterize the partisan appeal of their constituent stories at the highest rates.

Results for specific domains, shown on the plot, reflect qualitative understandings of those domains’ audiences as well. For example, hyper-partisan outlets such as Daily Kos and the Huffington Post on the left, and Breitbart and Fox News on the right, have extreme domain scores and fewer stories that circulate among atypical partisan audiences. By contrast, the Wall Street Journal’s domain-level audience score of –0.34 is frequently ill-suited to describe individual stories the newspaper publishes, as previously indicated in Table 1. Furthermore, the domains with audience scores that least frequently capture the partisan appeal of their constituent stories are those with audience scores near zero, such as Mediaite or the New York Post. It is also worth noting that domains with audience scores near zero and relatively less within-domain heterogeneity are often outlets that are ideological in ways that do not neatly reflect partisanship in the US, such as the Russian state-sponsored RT.

DISCUSSION

We find evidence of partisan curation bubbles across our analyses, as users share and consume information with consistent partisan appeal from a variety of sources. These partisan curation bubbles frequently lead to story-level heterogeneity within sources for both productive (H1a) and consumptive (H1b) curation. This audience heterogeneity likely reflects a heterogeneity of within-source partisan appeal, as audience scores estimated at the story level do reflect the partisan appeal of content (H2). Furthermore, we find systematic variation in this heterogeneity—with more moderate estimates of domain-level audience partisanship more frequently mischaracterizing the partisan valence of individual stories (H3). This suggests that relatively moderate domain-level scores are often the result of different stories circulating among different partisan audiences, rather than every story reaching a consistently balanced audience.

It is likely that elements of these curation processes predate the internet and social media. For example, opinion leaders who subscribed to a given newspaper may have tended to read and talk about particular stories that matched their prior political preferences. However, observing this process prior to the internet would have required an impossible scale of instrumentation. There is a sense in which the internet has merely made this process visible. For example, recent work examining user behavior within Google Search indicates that even though users’ search results do not systematically vary by partisanship, their choice of which search results to click on does (Robertson et al. Reference Robertson, Green, Ruck, Ognyanova, Wilson and Lazer2023). The theoretical mechanisms underlying these findings could be extended in further work—such as by experimentally manipulating the pairing of politically (in)congruent stories with politically (in)congruent sources to directly test the extent to which users are willing to share politically favorable information from ideologically distant sources.

However, we argue that the internet has dramatically changed the structure of supply and demand to make networked processes of curation far more important. The fundamental logic of the internet is competition for attention at a granular story level, and the ratio of available information to human attention has increased by many orders of magnitude. This is in contrast to pre-internet competition at the outlet level, with consumers choosing stations to watch and newspapers to subscribe to. The networked curation processes of social media allows individuals to delegate the task of navigating a functionally infinite amount of information to other users who regularly share information that appeals to their identities and interests. One of the natural results of this process are partisan curation bubbles.

Purely with respect to measurement, our findings suggest that source-level measures of audience partisanship should be used with caution as they risk overestimating the partisan diversity of information consumption. All but the most extreme sources have a meaningful amount of partisan heterogeneity at the story level, and for some sources this is the rule rather than the exception. There will be times when source-level aggregation is theoretically warranted or practically necessary. In settings outside of social media where information consumption is not characterized by networked curation, source- and story-level estimates may generate similar results. Our point is to emphasize that source-level aggregation is a measurement choice that must be considered on a case-by-case basis.

More broadly, these findings shed new light on the macro structure of information consumption on feed-based social media. While we empirically demonstrate that information consumption on these platforms is more politically homogeneous than prior empirical accounts, we view the networked curation processes that produce these results as a feature of democratic participation as much as others might view the resulting polarized consumption as a normative concern. Social media is, at its core, social, allowing users to use information to perform their identities and advance their interests in the context of democratic participation. To the extent to which these identities and interests diverge—particularly among the most politically engaged, who are the most likely to perform opinion-leading functions on social media (Hughes Reference Hughes2019)—so too will the information that circulates among different audiences. While much of the literature takes polarized information consumption as distressing for democracy, it is not obvious that this, in and of itself, is a problem to solve (Kreiss and McGregor Reference Kreiss and McGregor2023). In this sense, these findings underscore the long-standing trade-offs between exposure to opposing views and democratic participation (Mutz Reference Mutz2006; Stroud Reference Stroud2011)—with different sites at which individuals express themselves and exchange their views being better suited for one or the other.

While the analyses here focus on audience partisanship, our theoretical framework problematizes source-level analyses of information consumption on social media more generally. For example, with respect to the study of political misinformation, preliminary evidence indicates that users interested in promoting false or misleading narratives often strategically repurpose factually true information from reliable sources in order to do so (Goel et al. Reference Goel, Green, Lazer and Resnik2024). Domain-level measures of political information cannot detect this behavior, but it naturally follows from individuals engaging with and using information on social media to perform their identities and advance their interests. Furthermore, individuals who report low levels of trust in mainstream sources on surveys may base these evaluations more on the sources’ reputations than their specific interactions with information those sources produce (Peterson and Kagalwala Reference Peterson and Kagalwala2021), and likely still recognize that such sources are perceived as credible by others (see also Pennycook and Rand Reference Pennycook and Rand2019). Despite their stated distrust in mainstream sources overall, these individuals may nevertheless find specific information from these sources useful when it suits their purposes (Baum and Groeling Reference Baum and Groeling2009). Accounting for networked curation is crucial for aligning theory and measurement on large-scale platforms where such affordances are available.

Finally, it is important to note that socio-technical systems are elastic, and that different design choices may lead to different outcomes (Bail Reference Bail2021). For example, newer platforms such as TikTok have de-emphasized the curation influence of followed accounts to rely more directly on the estimated relevance of specific pieces of content. The variation of platform features and affordances suggests a promising line of future research in examining the dynamics and democratic outcomes of networked curation across different platforms. We believe that content choice for information consumers is permanently expanded relative to the twentieth century. While curation bubbles may pop, the process of networked curation connecting people to content they want to see from a set of vast choices is a permanent feature of the information landscape.

SUPPLEMENTARY MATERIAL

To view supplementary material for this article, please visit https://doi.org/10.1017/S0003055424000984.

DATA AVAILABILITY STATEMENT

The raw data underlying this article cannot be shared due to privacy concerns arising from matching data to administrative records, data use agreements, and platforms’ terms of service. Research documentation (including all code) and secondary Twitter data that preserves user anonymity are openly available at the American Political Science Review Dataverse: https://doi.org/10.7910/DVN/1ONKDX.

ACKNOWLEDGMENTS

The authors would like to thank Taylor Carlson, Bruce Desmarais, Shannon McGregor, and C. Daniel Myers; participants at the 2021 meetings of the American Political Science Association, Midwest Political Science Association, and Society for Political Methodology; participants at the Duke Behavior and Identities Workshop; and three anonymous reviewers for helpful feedback on earlier versions of this manuscript. The authors would also like to thank Ronald Robertson for making reference data available and John Harrington for research assistance.

FUNDING STATEMENT

S.M. was supported by the John S. and James L. Knight Foundation through a grant to the Institute for Data, Democracy & Politics at the George Washington University. S.C. is supported by a Bloomberg Data Science Ph.D. Fellowship. D.L. acknowledges support from the William & Flora Hewlett Foundation and the Volkswagen Foundation.

CONFLICT OF INTEREST

The authors declare no ethical issues or conflicts of interest in this research.

ETHICAL STANDARDS

Facebook data in this study were obtained from Meta, as part of Facebook Open Research & Transparency (FORT), an initiative to facilitate the study of social media’s impact on society. Researchers seeking permission to use the FORT platform must (1) apply to become an approved partner and (2) sign the Research Data Agreement (RDA), a publicly available legal agreement. The RDA prohibits sharing Facebook data with any third party. Researchers may request access to Facebook data at https://socialscience.one/rfps. Collection of Twitter data and linkage to administrative records was approved by the Institutional Review Board at Northeastern University (#17-12-13). The authors affirm that this article adheres to the principles concerning research with human participants laid out in APSA’s Principles and Guidance on Human Subject Research (2020).

Footnotes

The first two authors contributed equally to this article.

¹ Consistency here is with respect to a given dimension of information. A user can be in multiple curation bubbles along different dimensions, such as sports, politics, and music.

² See also work showing that politically engaged users occasionally sever ties on social media for political reasons (Bode Reference Bode2016; Neely Reference Neely2021).

³ We thank an anonymous reviewer for raising this point.

⁴ Though even in these settings, it is not obvious that such estimates can be taken as ground truth measures of “bias,” per se. Gentzkow and Shapiro (Reference Gentzkow and Shapiro2010) argue that newspapers may adapt the slant of the content they produce to match the political preferences of their audience.

⁵ Our copy of the voter file, and its associated estimates of partisanship, are from 2017. These estimates of partisanship have been used elsewhere in the literature (Broockman and Kalla Reference Broockman and Kalla2023).

⁶ Quote tweets are also relatively uncommon; on average for every one quote tweet a user sends, they send roughly one user-authored tweet, two replies, and three retweets (Shugars et al. Reference Shugars, Gitomer, McCabe, Gallagher, Joseph, Grinberg and Doroshenko2021).

⁷ We scraped each URL in the dataset and extracted the text present in the og:description HTML tag. (The Open Graph protocol defines a number of HTML tags a publisher can add to their website to enable platforms like Facebook and Twitter to easily discover information about a web page’s contents). This is the same process by which Facebook extracted the blurbs present in their dataset (Messing et al. Reference Messing, DeGregorio, Hillenbrand, King, Mahanti, Mukerjee and Nayak2021).

⁸ We used a 98% training-test split; precision and recall are reported for the held-out test set (48,000 articles).

⁹ Appendix G of the Supplementary Material shows that this modeled estimate of partisanship is strongly correlated with county-level election returns.

¹⁰ The two measures used by Bakshy et al. and Robertson et al. differ slightly as well, where measures used by Robertson represent sources as slightly further toward the political right. The former’s domain-level scores are averages of URL-level ratios of Democratic to Republican sharers; the latter is a ratio of Democratic to Republican sharers at the domain level. We present comparisons to these earlier measures in Appendix D of the Supplementary Material.

¹¹ There are two notable approximations here: first, we are assuming that the number of shares for the URL is relatively large; second, we are assuming that the URL score is not bounded to the interval $ (-1,1) $ . The former is only a concern for infrequently shared URLs on Twitter and would not be a problem on Facebook, due to our stricter inclusion criteria. In Appendix F of the Supplementary Material we show, using a nonparametric bootstrap estimator, that our results are not sensitive to this approximation.

¹² https://www.nytimes.com/2018/10/19/opinion/sunday/north-dakota-addresses-voting-id.html.

¹³ https://www.nytimes.com/2017/06/14/opinion/steve-scalise-congress-shot-alexandria-virginia.html.

¹⁴ KS test statistics for likely Democrats $ D=0.30;\hskip0.3em p<1{0}^{-16} $ , uncertain: $ D=0.11;\hskip0.3em p<1{0}^{-16} $ , unlikely Democrats: $ D=0.15;\hskip0.3em p<1{0}^{-16} $ .

¹⁵ Mediaite is a self-consciously bi-partisan outlet that primarily posts video clips of pundits and politicians from across the political spectrum commenting on current events.

References

REFERENCES

Bail, Christopher. 2021. Breaking the Social Media Prism: How to Make Our Platforms Less Polarizing. Princeton, NJ: Princeton University Press.Google Scholar

Bakshy, Eytan, Messing, Solomon, and Adamic, Lada. 2015. “Exposure to Ideologically Diverse News and Opinion on Facebook.” Science 348 (6239): 1130–2.CrossRef Google Scholar PubMed

Bakshy, Eytan, Rosenn, Itamar, Marlow, Cameron, and Adamic, Lada. 2012. “The Role of Social Networks in Information Diffusion.” In WWW ’12: Proceedings of the 21st International Conference on World Wide Web, 519–28. New York: Association for Computing Machinery.Google Scholar

Bandy, Jack, and Diakopoulos, Nicholas. 2021. “More Accounts, Fewer Links: How Algorithmic Curation Impacts Media Exposure in Twitter Timelines.” Proceedings of the ACM on Human-Computer Interaction 5 (CSCW1): article 78. https://doi.org/10.1145/3449152.Google Scholar

Barberá, Pablo. 2015. “Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data.” Political Analysis 23 (1): 76–91.CrossRef Google Scholar

Barberá, Pablo. 2020. “Social Media, Echo Chambers, and Political Polarization.” In Social Media and Democracy: The State of the Field, eds. Persily, Nate and Tucker, Josh, 34–55. Cambridge: Cambridge University Press.CrossRef Google Scholar

Barberá, Pablo, Jost, John, Nagler, Jonathan, Tucker, Joshua, and Bonneau, Richard. 2015. “Tweeting from Left to Right: Is Online Political Communication More Than an Echo Chamber?” Psychological Review 26 (10): 1531–42.Google Scholar PubMed

Baum, Matthew, and Groeling, Tim. 2009. “Shot by the Messenger: Partisan Cues and Public Opinion Regarding National Security and War.” Political Behavior 31 (2): 157–86.CrossRef Google Scholar

Billard, Thomas. 2021. “Deciding What’s (Sharable) News: Social Movement Organizations as Curating Actors in the Political Information System.” Communication Monographs 89 (3): 1–22.Google Scholar

Bode, Leticia. 2016. “Pruning the News Feed: Unfriending and Unfollowing Political Content on Social Media.” Research & Politics 3 (3). https://doi.org/10.1177/2053168016661873.CrossRef Google Scholar

Broockman, David, and Kalla, Joshua. 2023. “Selective Exposure and Echo Chambers in Partisan Television Consumption: Evidence from Linked Viewership, Administrative, and Survey Data.” Preprint. Open Science Framework, April 14. doi: 10.31219/osf.io/b54sxCrossRef Google Scholar

Buntain, Cody, Bonneau, Richard, Nagler, Jonathan, and Tucker, Joshua A.. 2023. “Measuring the Ideology of Audiences for Web Links and Domains Using Differentially Private Engagement Data.” Proceedings of the International AAAI Conference on Web and Social Media 17 (1): 72–83.CrossRef Google Scholar

Cinelli, Matteo, Morales, Gianmarco De Francisci, Galeazzi, Alessandro, Quattrociocchi, Walter, and Starnini, Michele. 2021. “The Echo Chamber Effect on Social Media.” Proceedings of the National Academy of Sciences 118 (9). https://doi.org/10.1073/pnas.2023301118.CrossRef Google Scholar PubMed

Conover, Michael, Goncalves, Bruno, Flammini, Alessandro, and Menczer, Filippo. 2012. “Partisan Asymmetries in Online Political Activity.” EPJ Data Science 1 (6). https://doi.org/10.1140/epjds6.CrossRef Google Scholar

Conover, Michael, Ratkiewics, Jacob, Francisco, Matthew, Goncalves, Bruno, Menczer, Filippo, and Flammini, Alessandro. 2021. “Political Polarization on Twitter.” Proceedings of the International AAAI Conference on Web and Social Media 5 (1): 89–96.CrossRef Google Scholar

Dahlgren, Peter. 2021. “A Critical Review of Filter Bubbles and a Comparison with Selective Exposure.” Nordicom Review 42 (1): 15–33.CrossRef Google Scholar

Davis, Jenny. 2017. “Curation: A Theoretical Treatment.” Information, Communication, & Society 20 (5): 770–83.CrossRef Google Scholar

Eady, Gregory, Nagler, Jonathan, Guess, Andrew, Zilinsky, Jan, and Tucker, Joshua. 2019. “How Many People Live in Political Bubbles on Social Media? Evidence from Linked Survey and Twitter Data.” SAGE Open 9 (1): https://doi.org/10.1177/2158244019832705.CrossRef Google Scholar

Epstein, Ziv, Sirlin, Nathaniel, Pennycook, Gordon, and Rand, David. 2023. “The Social Media Context Interferes with Truth Discernment.” Science Advances 9 (9): abo6169.CrossRef Google Scholar PubMed

Evans, Sandra K., Pearce, Katy E., Vitak, Jessica, and Treem, Jeffrey W.. 2017. “Explicating Affordances: A Conceptual Framework for Understanding Affordances in Communication Research.” Journal of Computer-Mediated Communication 22 (1): 35–52.CrossRef Google Scholar

Fischer, Sean, Jaidka, Kokil, and Lelkes, Yphtach. 2020. “Auditing Local News Presence on Google News.” Nature Human Behavior 4: 1236–44.CrossRef Google Scholar PubMed

Flaxman, Seth, Goel, Sharad, and Rao, Justin. 2016. “Filter Bubbles, Echo Chambers, and Online News Consumption.” Public Opinion Quarterly 80 (S1): 298–320.CrossRef Google Scholar

Freedman, Jonathan, and Sears, David. 1965. “Selective Exposure.” Advances in Experimental Social Psychology 2: 57–97.CrossRef Google Scholar PubMed

Freedman, Jonathan, and Sears, David. 1967. “Selective Exposure to Information: A Critical Review.” Public Opinion Quarterly 31 (2): 194–213.Google Scholar

Garimella, Kiran, Smith, Tim, Weiss, Rebecca, and West, Robert. 2021. “Political Polarization in Online News Consumption.” Proceedings of the International AAAI Conference on Web and Social Media 15 (1): 152–62.CrossRef Google Scholar

Garrett, R. Kelly. 2009. “Echo Chambers Online? Politically Motivated Selective Exposure among Internet News Users.” Journal of Computer-Mediated Communication 14 (2): 265–85.CrossRef Google Scholar

Garrett, R. Kelly, and Resnick, Paul. 2011. “Resisting Political Fragmentation on the Internet.” Daedalus 140 (4): 108–20.CrossRef Google Scholar

Garz, Marcel, Sörensen, Jil, and Stone, Daniel F.. 2020. “Partisan Selective Engagement: Evidence from Facebook.” Journal of Economic Behavior & Organization 177: 91–108.CrossRef Google Scholar

Gentzkow, Matthew, and Shapiro, Jesse. 2010. “What Drives Media Slant? Evidence from U.S. Daily Newspapers.” Econometrica 78 (1): 35–71.Google Scholar

Gentzkow, Matthew, and Shapiro, Jesse. 2011. “Ideological Segregation Online and Offline.” The Quarterly Journal of Economics 126 (4): 1799–839.CrossRef Google Scholar

Goel, Pranav, Green, Jon, Lazer, David, and Resnik, Philip. 2024. “Misinformation is more than “Fake News”: Using Co-sharing to Identify Use of Mainstream News for Promoting Misinformation Narratives.” Working Paper. https://www.researchsquare.com/article/rs-4427772/v1.CrossRef Google Scholar

González-Bailón, Sandra, Lazer, David, Barberá, Pablo, Zhang, Meiqing, Allcott, Hunt, Brown, Taylor, Crespo-Tenorio, Adriana, et al. 2023. “Asymmetric Ideological Segregation in Exposure to Political News on Facebook.” Science 381 (6656): 392–8.CrossRef Google Scholar PubMed

Green, Jon, McCabe, Stefan, Shugars, Sarah, Chwe, Hanyu, Horgan, Luke, Cao, Shuyang, and Lazer, David. 2025. “Replication Data for: Curation Bubbles.” Harvard Dataverse. https://doi.org/10.7910/DVN/1ONKDX.CrossRef Google Scholar

Grinberg, Nir, Joseph, Kenneth, Friedland, Lisa, Swire-Thompson, Briony, and Lazer, David. 2019. “Fake News on Twitter during the 2016 U.S. Presidential Election.” Science 363 (6425): 374–8.CrossRef Google Scholar PubMed

Guess, Andrew. 2021. “(Almost) Everything in Moderation: New Evidence on Americans’ Online Media Diets.” American Journal of Political Science 65 (4): 1007–22.CrossRef Google Scholar

Halberstam, Yosh, and Knight, Brian. 2016. “Homophily, Group Size, and the Diffusion of Political Information in Social Networks: Evidence from Twitter.” Journal of Public Economics 143: 73–88.CrossRef Google Scholar

Ho, Daniel, and Quinn, Kevin. 2008. “Measuring Explicit Political Positions of Media.” Quarterly Journal of Political Science 3 (4):353–77.CrossRef Google Scholar

Hughes, Adam. 2019. “A Small Group of Prolific Users Account for a Majority of Political Tweets Sent by U.S. Adults.” Pew Research Center, October 23. https://www.pewresearch.org/short-reads/2019/10/23/a-small-group-of-prolific-users-account-for-a-majority-of-political-tweets-sent-by-u-s-adults/.Google Scholar

Hughes, Adam, McCabe, Stefan, Hobbs, William, Remy, Emma, Shah, Sono, and Lazer, David. 2021. “Using Administrative Records and Survey Data to Construct Samples of Tweeters and Tweets.” Public Opinion Quarterly 85 (S1): 323–46.CrossRef Google Scholar

Huszár, Ferenc, Ktena, Sofia Ira, O’Brien, Conor, Belli, Luca, Schlaikjer, Andrew, and Hardt, Moritz. 2022. “Algorithmic Amplification of Politics on Twitter.” Proceedings of the National Academy of Sciences 119 (1): e2025334119. https://doi.org/10.1073/pnas.2025334119.CrossRef Google Scholar PubMed

Jurkowitz, Mark, and Gottfried, Jeffrey. 2022. “Twitter is the Go-To Social Media Site for U.S. Journalists, but not for the Public.” Pew Research Center, June 27. https://www.pewresearch.org/short-reads/2022/06/27/twitter-is-the-go-to-social-media-site-for-u-s-journalists-but-not-for-the-public/.Google Scholar

Katz, Elihu, and Lazarsfeld, Paul. 1955. Personal Influence: The Part Played by People in the Flow of Mass Communications. London: Routledge.Google Scholar

Kim, Jin Woo, and Kim, Eunji. 2021. “Temporal Selective Exposure: How Partisans Choose When to Follow Politics.” Political Behavior 43 (4): 1663–83.CrossRef Google Scholar

King, Gary, and Persily, Nathaniel. 2020. “A New Model for Industry–Academic Partnerships.” PS: Political Science & Politics 53 (4): 703–9.Google Scholar

Kobayashi, Tetsuro, and Ikeda, Ken’ichi. 2009. “Selective Exposure in Political Web Browsing.” Information, Communication, and Society 12 (6): 929–53.CrossRef Google Scholar

Kreiss, Daniel, and McGregor, Shannon. 2023. “A Review and Provocation: On Polarization and Platforms.” New Media & Society 26 (1): 556–79.CrossRef Google Scholar

Lazarsfeld, Paul, Berelson, Bernard, and Gaudet, Hazel. 1948. The People’s Choice: How the Voter Makes Up His Mind in a Presidential Campaign. New York: Columbia University Press.Google Scholar

Lazer, David, Rubineau, Brian, Chetkovich, Carol, Katz, Nancy, and Neblo, Michael. 2010. “The Coevolution of Networks and Political Attitudes.” Political Communication 27 (3): 248–74.CrossRef Google Scholar

Marwick, Alice E., and boyd, danah. 2011. “I Tweet Honestly, I Tweet Passionately: Twitter Users, Context Collapse, and the Imagined Audience.” New Media & Society 13 (1): 114–33.CrossRef Google Scholar

McGregor, Shannon, and Molyneux, Logan. 2018. “Twitter’s Influence on News Judgment: An Experiment among Journalists.” Journalism 21 (5): 597–613.CrossRef Google Scholar

Messing, Solomon, DeGregorio, Christina, Hillenbrand, Bennett, King, Gary, Mahanti, Saurav, Mukerjee, Zagreb, Nayak, Chaya, et al. 2021. “Facebook Privacy-Protected Full URLs Data Set.” Harvard Dataverse. https://dataverse.harvard.edu/file.xhtml?fileId=5212847&version=8.0 (accessed January 3, 2022).Google Scholar

Messing, Solomon, and Westwood, Sean. 2014. “Selective Exposure in the Age of Social Media: Endorsements Trump Partisan Source Affiliation When Selecting News Online.” Communication Research 41 (8): 1042–63.CrossRef Google Scholar

Minozzi, William, Song, Hyunjin, Lazer, David M. J., Neblo, Michael A., and Ognyanova, Katherine. 2020. “The Incidental Pundit: Who Talks Politics with Whom, and Why?” American Journal of Political Science 64 (1): 135–51.CrossRef Google Scholar

Molyneux, Logan, and McGregor, Shannon. 2021. “Legitimating a Platform: Evidence of Journalists’ Role in Transferring Authority to Twitter.” Information, Communication, & Society 25 (11): 1577–95.CrossRef Google Scholar

Mummolo, Jonathan. 2016. “News from the Other Side: How Topic Relevance Limits the Prevalence of Partisan Selective Exposure.” Journal of Politics 78 (3): 763–73.CrossRef Google Scholar

Mutz, Diana. 2006. Hearing the Other Side: Deliberative versus Participatory Democracy. Cambridge: Cambridge University Press.CrossRef Google Scholar

Neely, Stephen R. 2021. “Politically Motivated Avoidance in Social Networks: A Study of Facebook and the 2020 Presidential Election.” Social Media + Society 7 (4). https://doi.org/10.1177/20563051211055438.CrossRef Google Scholar

Osmundsen, Mathias, Bor, Alexander, Vahlstrup, Peter Bjerregaard, Bechmann, Anja, and Petersen, Michael Bang. 2021. “Partisan Polarization Is the Primary Psychological Motivation behind Political Fake News Sharing on Twitter.” American Political Science Review. 115 (3): 999–1015.CrossRef Google Scholar

Pariser, Eli. 2011. The Filter Bubble: What the Internet is Hiding from You. London: Penguin.Google Scholar

Park, Chang Sup, and Kaye, Barbara. 2018. “News Engagement on Social Media and Democratic Citizenship: Direct and Moderating Roles of Curatorial News Use in Political Involvement.” Journalism & Mass Communication Quarterly 95 (4): 1103–27.CrossRef Google Scholar

Pennington, Jeffrey, Socher, Richard, and Manning, Christopher. 2014. “GloVe: Global Vectors for Word Representation.” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 1532–43. Doha, Qatar: Association for Computational Linguistics.Google Scholar

Pennycook, Gordon, and Rand, David. 2019. “Fighting Misinformation on Social Media Using Crowdsourced Judgments of News Source Quality.” Proceedings of the National Academy of Sciences 116 (7): 2521–6.CrossRef Google Scholar PubMed

Peterson, Erik, Goel, Sharad, and Iyengar, Shanto. 2021. “Partisan Selective Exposure in Online News Consumption: Evidence from the 2016 Presidential Campaign.” Political Science Research and Methods 9 (2): 242–58.CrossRef Google Scholar

Peterson, Erik, and Iyengar, Shanto. 2021. “Partisan Gaps in Political Information and Information-Seeking Behavior: Motivated Reasoning or Cheerleading?” American Journal of Political Science 65 (1): 133–47.CrossRef Google Scholar

Peterson, Erik, and Kagalwala, Ali. 2021. “When Unfamiliarity Breeds Contempt: How Partisan Selective Exposure Sustains Oppositional Media Hostility.” American Journal of Political Science 115 (2): 585–98.Google Scholar

Prior, Markus. 2013. “Media and Political Polarization.” Annual Review of Political Science 16: 101–27.CrossRef Google Scholar

Rainey, Carlisle. 2014. “Arguing for a Negligible Effect.” American Journal of Political Science 58 (4): 1083–91.CrossRef Google Scholar

Ribeiro, Manoel Horta, Ottoni, Raphael, West, Robert, Almeida, Virgílio A. F., and Meira, Wagner. 2020. “Auditing Radicalization Pathways on YouTube.” In FAT* ’20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 131–41. New York: Association for Computing Machinery.CrossRef Google Scholar

Robertson, Ronald E., Green, Jon, Ruck, Damian J., Ognyanova, Katherine, Wilson, Christo, and Lazer, David. 2023. “Users Choose to Engage with More Partisan News Than They Are Exposed to on Google Search.” Nature 618: 342–8.CrossRef Google Scholar

Robertson, Ronald E., Jiang, Shan, Joseph, Kenneth, Friedland, Lisa, and Lazer, David. 2018. “Auditing Partisan Audience Bias within Google Search.” Proceedings of the ACM on Human-Computer Interaction 2(CSCW): article 148. https://doi.org/10.1145/3274417.Google Scholar

Settle, Jamie. 2018. Frenemies: How Social Media Polarizes America. Cambridge: Cambridge University Press.CrossRef Google Scholar

Shugars, Sarah, Gitomer, Adina, McCabe, Stefan, Gallagher, Ryan J., Joseph, Kenneth, Grinberg, Nir, Doroshenko, Larissa, et al. 2021. “Pandemics, Protests, and Publics: Demographic Activity and Engagement on Twitter in 2020.” Journal of Quantitative Description: Digital Media 1. doi: 10.51685/jqd.2021.002Google Scholar

Stroud, Naomi Jomini. 2011. Niche News: The Politics of News Choice. Oxford: Oxford University Press.CrossRef Google Scholar

Sunstein, Cass. 2002. Republic.com. Princeton, NJ: Princeton University Press.Google Scholar

Sunstein, Cass. 2017. #Republic. Princeton, NJ: Princeton University Press.CrossRef Google Scholar

Van Bavel, Jay, Harris, Elizabeth, Pärnamets, Philip, Rathje, Steve, Doell, Kimberly, and Tucker, Joshua. 2021. “Political Psychology in the Digital (Mis)Information Age: A Model of News Belief and Sharing.” Social Issues and Policy Review 15 (1): 84–113.CrossRef Google Scholar

Weeks, Brian, Lane, Daniel, Kim, Dam Hee, Lee, Slgi, and Kwak, Nojin. 2017. “Incidental Exposure, Selective Exposure, and Political Information Sharing: Integrating Online Exposure Patterns and Expression on Social Media.” Journal of Computer-Mediated Communication 22 (6): 363–79.CrossRef Google Scholar

Wojcieszak, Magdalena, Casas, Andreu, Yu, Xudong, Nagler, Jonathan, and Tucker, Joshua A.. 2022. “Most Users Do Not Follow Political Elites on Twitter; Those Who Do Show Overwhelming Preferences for Ideological Congruity.” Science Advances 8 (39): eabn9418. doi: 10.1126/sciadv.abn9418CrossRef Google Scholar PubMed

Wojcik, Stefan, and Hughes, Adam. 2019. “Sizing Up Twitter Users.” Pew Research, April 24. https://www.pewresearch.org/internet/2019/04/24/sizing-up-twitter-users/ (accessed May 19, 2023).Google Scholar

Figure 1. Stylized Examples

Table 1. Top Headlines from the Wall Street Journal, Minimum 250 Shares

Figure 2. Distribution of URL- and Domain-Level Scores Based on Sharing Behavior for Twitter and FacebookNote: Data limited to domains in the top quartile of unique political URLs.

Figure 3. Twitter URL- and Domain-Level Partisanship of Shared Political URLs, by Modeled Partisanship of UsersNote: Only users who shared at least five politics-related URLs are included.

Figure 4. URL Scores by Share Volume for Selected Domains on Twitter and FacebookNote: Points represent URLs, colors represent relationship to domain-level score. Gray points are not statistically distinguishable from the domain-level average, yellow points are statistically but not substantively ($ >0.1 $) distinguishable, blue points are substantively more left-leaning, and red points are substantively more right-leaning. Total proportion of URLs substantively distinct from domain-level average shown in facet subtitles.

Figure 5. Distributions of URL-Level Audience Scores by Engagement Type (Facebook)Note: Data limited to domains in the top quartile of unique political URLs.

Figure 6. Hand-Coded Partisan Appeal against Twitter-Based Audience ScoresNote: The dashed black line is the identity line ($ y=x $). The solid orange line is line of best fit. Articles with large URL score–domain score discrepancies were oversampled for hand coding. See Table E.1 in the Supplementary Material for related regression results.

Figure 7. Proportion of URLs Substantively Distinct from Domain by Domain-Level Audience Score for TwitterNote: The solid line indicates loess curve of best fit. Data limited to domains in the top quartile of unique political URLs.

Figure 8. Proportion of URLs Substantively Distinct from Domain for Different Facebook Engagement TypesNote: The solid line indicates loess curve of best fit.

Green et al. supplementary material

File 13.1 MB

Green et al. Dataset

Dataset

https://doi.org/10.7910/DVN/1ONKDX

Link

Submit a response

Comments

No Comments have been published for this article.

Article contents

Curation Bubbles

Abstract

INTRODUCTION

CURATION ON SOCIAL MEDIA

Hypotheses

DATA AND METHODS

Dataset 1: Twitter Users with Matched Voter Data

Dataset 2: Facebook URLs

Classifying Political Content

Estimating Partisanship

Statistical and Substantive Evidence of Curation Bubbles

RESULTS

Testing for Productive and Consumptive Curation

Substantive Differences in Partisan Appeal

Differential Implications for Domain-Level Estimates

DISCUSSION

SUPPLEMENTARY MATERIAL

DATA AVAILABILITY STATEMENT

ACKNOWLEDGMENTS

FUNDING STATEMENT

CONFLICT OF INTEREST

ETHICAL STANDARDS

Footnotes

References

REFERENCES

Green et al. supplementary material

Green et al. Dataset

Comments

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests