Motivation and Contribution
Toward a Framework for the Politics of Internet Control
The politics of the internet has been studied from a variety of angles. Two, in particular, have proceeded in parallel. First is the burgeoning literature on digital censorship. It has tracked the explosion of censorship technology,Footnote 1 and the profusion of citizen responses.Footnote 2 Second is the emerging line of inquiry on trade in digital goods and services.Footnote 3 It encompasses new forms of trade and trade distortion in the digital age,Footnote 4 and new modes of interstate interaction engendered therein.Footnote 5
The parallel is peculiar. Internet control and digital trade are inextricably linked, as observed by numerous practitioners in democratizationFootnote 6 and in trade liberalization.Footnote 7 Internet control, defined here as the restriction of internet traffic via the blocking of web domains,Footnote 8 has many a time been decried as digital protectionism that unfairly advantages certain domestic sectors.Footnote 9 Disputes over this very issue have occurred on both bilateral and multilateral levels.Footnote 10 Nonetheless, there has not been a coherent articulation of how internet control implicates digital trade and how the distributional consequences bear on domestic politics and interstate relations.
In this paper, I advance a framework that connects the dots and, in so doing, traces out the logic of internet control in an autocratic state. It begins by distinguishing between three components of information: (1) ideas that propel political action; (2) data as a factor of production; and (3) knowledge as a driver of innovation. Insofar as all three are bound up in information flow, measures to restrict one also disrupt the others. Because of this, internet control intended to restrict ideas and thus prevent domestic challenges to regime security generates two externalities.
First, controls of this kind benefit domestic data-intensive firms in large economies with a high level of internet connectivity. For sectors that use data as an input factor, internet control not only distorts the quantity of foreign digital products available to domestic consumers. It also boosts the factor endowment for said domestic sectors by forcing domestic consumers to contribute their data to domestic producers. With induced growth, the data-intensive firms become more likely to expand overseas, which increases the likelihood of foreign access to domestic data. This impedes the state's competing objective of preventing foreign challenges to regime security, because it undermines data sovereignty, defined here as the total and absolute control of domestically originated data by the state in question.Footnote 11 Second, such controls hurt domestic knowledge-intensive actors who rely on access to knowledge from the outside world in generating innovation. Of these actors, the state will make accommodations for only foreign knowledge-intensive actors in the state who can credibly threaten immediate retaliation otherwise.
To test these two information externalities, I leverage the case of China's system of internet control by exploiting a major internet control shock that occurred in 2014. I discover that internet control gives Chinese data-intensive firms an approximately 26 percent marginal increase in revenue compared to other Chinese firms, and up to 50 percent for the most data-intensive firms. However, this advantage does not translate beyond the domestic context. Despite China's internet control, US data-intensive firms have performed marginally better than their Chinese counterparts. This suggests the presence of countervailing forces, one of which I test through an analysis of China's research sector. There, the same internet control shock is associated with a decline in research quality by 10 percent, and up to 15 percent for the most knowledge-intensive disciplines. An analysis of US and Chinese research output reveals that internet control reduces the research quality of Chinese researchers in any discipline by 22 percent compared to their US counterparts. With qualitative evidence, I then explicate how internet control's dual externalities pose not one, but two dilemmas: one between internal and external threats to regime security, and the other between imminent political threats and immediate economic costs. In both instances, foreign actors wield momentous sway over the autocrat's calculus.
Contribution to the Literature
In connecting digital censorship with digital trade, this paper contributes to both strands of the literature. Studies have duly noted the political repercussions of digital censorship,Footnote 12 but none have scrutinized its distributional consequences. References to censorship as a “tax” on information access are chiefly confined to the context of political repression.Footnote 13 In contrast, this paper shows how information externalities distort market outcomes beyond the political objective of digital censorship. In quantifying the divergent effects of internet control on different actors in the economy, it demonstrates how such control begets dividends for domestic data-intensive sectors but costs for the economy as a whole.
This paper also contributes two novel insights to the growing body of research on digital trade. Prior works have explored the political-economy ramifications of the unique properties of informational goods, both quantitativelyFootnote 14 and qualitatively.Footnote 15 My empirical test of the two information externalities in concrete, quantitative terms refines prior conjectures by showing that prevailing trade models underestimate the benefit to domestic data-intensive sectors while overlooking the cost to domestic knowledge-intensive sectors. In so doing, I uncover how, beyond the intended winners and losers,Footnote 16 the state's manipulation of information creates unintended winners and losers owing to the structure of information flow, which encapsulates multiple components.
Finally, this paper enriches the debate on the “dictator's dilemma.” Politically motivated control of information flow has been argued to come at an economic cost.Footnote 17 Autocrats face a dilemma between political unrest, by allowing in too much information, and economic unrest, by allowing in too little.Footnote 18 I challenge this framing in two ways. First, I unpack how it misattributes the source of the incentive for the autocrat to limit control. It is not a general concern about long-term growth but a specific concern about the immediate costs that certain actors may impose. Second, I highlight a new dilemma in the digital age between preventing domestic challenges to regime security through internet control and preventing foreign challenges to regime security through data sovereignty. In empowering domestic firms with droves of data, internet control weakens the autocrat's control over such data when these firms later expand overseas as the result of their growth.
In the next section I present my theory on the two information externalities of internet control and four testable hypotheses. I then introduce my empirical case, China's internet control, and detail my data and methodology. With that, I present my quantitative results. The qualitative section corroborates the implications for state strategy, after which I conclude with reference to future directions and policy relevance.
Theory and Hypotheses
Ideas, Data, Knowledge
My theory begins by recognizing three distinct components of information—ideas, data, and knowledge—based on earlier conceptualizations of the structure of information.Footnote 19 Of particular relevance is the definition of information as consisting of (1) ideas, or bit strings that are “set[s] of instructions for making an economic good”; and (2) data, such as “driving data, medical records, and location data.” Whereas scores of images serve as training data for machine learning algorithms, the resulting algorithm as a set of “forecasting rules” exemplifies an idea.Footnote 20
While useful for economic analyses, this definition omits a category of information central to civic and political life. Whether it is an ideology deemed threatening to the regime or a rallying call for assembly, information that inspires or facilitates political action has been the prime target for digital censorship.Footnote 21 Information of this kind is more like ideas than data in that it requires the interpretation and sense-making of a human actor.Footnote 22 Meanwhile, it differs from the foregoing examples of an idea in that the primary objective is to perform a political action rather than produce an economic good. For the purposes of my theory, I term human-actionable information intended for political action ideas and that intended for economic production knowledge.
One may conceptualize the distinction between data and knowledge with respect to economic production as that between input factor and total factor productivity (TFP). Let the total output, Y, be a function of TFP, A; capital as an input factor, K; and labor as an input factor, L. Whereas knowledge, such as technical know-how, affects total output through TFP by altering the returns to input factors, data does so in a different way. For data-driven firms such as Google and Uber, user data—from search history to driving routes—are used to train algorithms that undergird their core products, from which they derive a major stream of their revenue.Footnote 23 Data thus enters the equation as a factor of production that is distinct from capital and labor.
Equation (1) conceptually illustrates how information affects total output via the two components—knowledge and data.Footnote 24 The TFP, A, is a function of knowledge, Kn, while data, D, is a factor of production:Footnote 25
Information Externalities and Distributional Consequences
Given that information contains ideas, data, and knowledge, when a state blocks foreign web domains to restrict the flow of ideas, it also disrupts the flow of both data and knowledge. Domestic consumers now face impeded access to foreign digital products, from search engines to social media platforms. This compels them to switch to domestic substitutes. If Google is blocked, for instance, domestic users will resort to an indigenous search engine if one exists. Figure 1 provides a striking visualization of the substitutive relationship between Google and an indigenous search engine when the former's domain experienced disruptions in China.Footnote 26
The expanded user base will lead to an increase in both sales revenue and the supply of data. This is due to the prevalence of barter trade, where consumers pay for digital products not with money but with their data.Footnote 27 In autocracies, user data collected by domestic producers may be further transacted with the government for the latter's political ends.Footnote 28 Treating internet control simply as a tariff or quota without considering these critical features of digital trade would not only overestimate the loss in domestic consumer surplus, given high substitutability between domestic and foreign digital products that are both “free” to use. It would also underestimate domestic producer surplus from the supply of data for firms in data-intensive sectors and, in turn, their capacity for growth and expansion.
Concurrently, domestic knowledge-intensive sectors that rely on existing knowledge for their own knowledge production now face impeded access to external knowledge. Anecdotes abound regarding the decline in productivity for researchers when sites such as Google Scholar get blocked. Any or all of three scenarios can occur: (1) Researchers may see a reduction in the amount of external knowledge they can acquire per unit time, such as when network disruptions limit their ability to read articles on Google Scholar (“aware, willing, but unable”).Footnote 29 (2) Researchers may be discouraged by such disruptions from trying to acquire external knowledge (“aware but unwilling”).Footnote 30 (3) Researchers may be altogether unaware of some external knowledge due to lack of exposure (“unaware”).Footnote 31
Compared to standard trade distortions, welfare transfers to those affected by the negative knowledge externality are complicated by three factors. First, the decline in knowledge production does not immediately translate into a decline in total output. The state must weigh this against more pressing threats to regime security when deciding to impose internet control. Second, the cost to knowledge producers, who are scattered throughout the economy, is more diffuse than the benefit to data-intensive producers, who are fewer in number and better resourced. This presents collective action challenges for the former group.Footnote 32 Third, conventional metrics for innovation, discussed later in the empirical analysis, obscure the marginal effect of information access and do not inform precise compensation to those affected by internet control. Attempts at direct welfare transfer through measures such as research-and-development (R&D) spending would thus entail gross inefficiency.Footnote 33 These dynamics signify that the “dictator's dilemma” framing overstates the restraint on the autocrat from the need for innovation.
Figure 2 conceptually illustrates how politically motivated internet control aimed at restricting ideas generates a positive externality for domestic data-intensive sectors and a negative externality for domestic knowledge-intensive sectors. I next spell out the two information externalities as testable hypotheses, before testing them in the sections to follow.
Positive Externality for Domestic Data-Intensive Actors
Different actors in the economy depend on access to data as an input factor to different degrees. Firms that derive most of their revenue from creating data-driven algorithms are more dependent on data than, say, those that profit from producing most physical goods.Footnote 34 In the event of internet control, domestic consumers are less able to access foreign digital products and more likely to switch to domestic substitutes, driving up demand for the latter. This leads to an increase in revenue for domestic data-intensive firms, both directly from an increase in sales and indirectly from an increase in the supply of raw materials, or data in this case. Hence,
Hypothesis 1: Internet control incurs financial gains for domestic data-intensive firms relative to their domestic non-data-intensive counterparts.
By the same process, foreign data-intensive firms lose out on potential sales and the potential supply of data from consumers in the country under internet control. Hence,
Corollary Hypothesis 1 Internet control incurs financial gains for domestic data-intensive firms relative to their foreign data-intensive counterparts.
Research on the digital economy indicates a scale effect,Footnote 35 which suggests that these hypotheses presuppose a threshold of data endowment in the state. The positive externality therefore applies to states with a large population and a high level of internet connectivity, where a sufficient volume of data can be made available to domestic data-intensive firms that produce substitutes for foreign digital products.Footnote 36
Negative Externality for Domestic Knowledge-Intensive Actors
Similarly, different actors in the economy depend on access to knowledge to different degrees. Researchers who produce knowledge primarily by reviewing the existing literature are more dependent on knowledge than those who do so primarily through other types of activities, such as experiments.Footnote 37 In the event of internet control, domestic researchers are less able to access the literature from the outside world. This decrease in knowledge access leads to a steeper decline in the rate of knowledge production for the more knowledge-intensive disciplines, resulting in a greater decline in the quality of research. Hence,
Hypothesis 2: Internet control incurs a greater decline in research quality for domestic knowledge-intensive disciplines relative to their domestic non-knowledge-intensive counterparts.
The detriment from internet control affects all domestic researchers, which translates into a decline in research quality for domestic researchers relative to their foreign counterparts across all disciplines, regardless of knowledge-intensity. Hence,
Corollary Hypothesis 2: Internet control incurs a decline in research quality for domestic researchers relative to their foreign counterparts for any given discipline.
Based on these formulations, I now synthesize the political consequences of the two information externalities and implications for the state's strategy.
Implications for State Strategy
The autocratic state is first concerned with preventing domestic challenges to its regime security. Autocracies adept at suppressing and manipulating information are advantaged over overtly violent dictatorships in countering domestic opposition.Footnote 38 This incentivizes the autocrat to leverage internet control in restricting the inflow of instigative ideas and domestic communications that facilitate collective action,Footnote 39 which causes the two information externalities. Yet the autocrat is also concerned with foreign challenges to the regime. One way in which it seeks to prevent such challenges is by pursuing data sovereignty, such as through data localization and cross-border data flow restrictions. As I will explain, the positive data externality creates tension between these two objectives.Footnote 40
Political Consequences of Positive Data Externality
The windfall of data and revenue from the positive data externality makes domestic data-intensive firms more likely to grow and expand globally, such as by listing offshore. Doing so may compel compliance with foreign regulations that curtails the autocratic state's control over the firms’ data. This can occur directly, through competing requirements for data localization in foreign territories, or indirectly, through weakened state oversight over these firms. Consequently, one should expect a “one-two punch” from the state to retain control over domestic data held by these firms.
First is a move to reassert data sovereignty with respect to all domestic actors, which may entail stricter and/or more pervasive mandates for state authority over domestic data and prohibitions of foreign access to such data. Second is a move to curb overseas expansion by data-intensive firms which, due to its specificity, may entail targeting individual firms with extensive foreign ownership and/or plans for such expansion. While firm compliance is generally expected in autocracies, signs of noncompliance from data-intensive firms that have benefited from the positive data externality will be met with exceptionally harsh treatment. Being profit-maximizing like all others, the data-intensive firms must now balance growth against the risk of state sanction due to the wealth of domestic data they possess.
Political Consequences of Negative Knowledge Externality
As previously outlined, domestic knowledge-intensive actors are limited in their bargaining power. Direct compensatory welfare transfer by the state would also be inefficient. As a result, the state is not incentivized to offset the negative externality for domestic knowledge-intensive actors beyond limiting the scope of internet control where doing so does not compromise regime security.
One exception is foreign knowledge-intensive actors in the state who are parties to a contract that conditions resource provision to the state on freedom of information access. Typically concentrated in large urban areas, these foreign actors are better positioned for mobilization than their domestic counterparts. More importantly, they are able to impose immediate economic costs on the state, either by invoking legal provisions or by withholding the resources. If the costs are substantial, the state will be incentivized to allow privileged internet access for this specific group of foreign knowledge-intensive actors.
Data
Case Selection: China's System of Internet Control
I test the two information externalities in my theory through a quantitative analysis of internet control in China. This case uniquely satisfies both the scope and strength requirements for treatment administration. First, my hypotheses on the bifurcated effects on data-intensive versus knowledge-intensive actors require that the internet control in question affects both types of actors—ideally, all actors in the economy. In other words, it should be universal or near-universal in scope. China's internet control, popularly dubbed the Great Firewall, offers the closest real-world case to this setting.Footnote 41 China's DNS filter blocks hundreds of thousands of domains, with a gamut of subject matter extending far beyond political content.Footnote 42
Second, the internet control must have persisted for a sufficiently long period, with minimal circumvention, to enable meaningful observation of its effects. China's internet control, again, meets this criterion. Unlike censorship shocks elsewhere in the world, which are usually in response to specific events and relatively brief,Footnote 43 China's internet control is so entrenched that many in the younger generation have reportedly grown up with little awareness of digital products such as Google and Facebook.Footnote 44 As of 2018, only 5 percent of China's urban residents reported attempting to circumvent internet control, and this proportion was presumably much higher than the national average.Footnote 45
Treatment Variable: Measuring Internet Control Through Domain Accessibility
In our case setup, treatment occurred when internet control in China shifted from limited, domain-specific censorship to an across-the-board regime of control. The treatment variable must therefore capture both the timing and the degree of this change. In practice, this requires regularly measuring the accessibility of foreign web domains from inside China. Earlier measurements of internet control suffer from various drawbacks, including coder subjectivity, high noise-to-signal ratio, low measurement frequency, narrow scope, sampling bias, and insufficient historical coverage.Footnote 46 Given these limitations, I have coded my treatment variable using data from GreatFire, the only known resource of its kind.
GreatFire is an independent group that has used servers in China to test the accessibility of hundreds of thousands of web domains since 2011.Footnote 47 The extensive scope is complemented by a high testing frequency—nearly daily for popular domains.Footnote 48 I collect accessibility data for the 100 most visited websites in the world.Footnote 49 This yields 27,691 observations.Footnote 50 Figure 3 depicts the final, interpolated internet control history in China based on the testing data.
I consult previous research and media reports to validate this measurement. Together, they document a massive wave of internet control in 2014,Footnote 51 including a major shock around early June when the Chinese state cracked down on foreign websites, allegedly in anticipation of the twenty-fifth anniversary of the Tiananmen Square incident.Footnote 52 The anniversary has been nicknamed Internet Maintenance Day in recognition of the state's intensified website blocking around this time each year,Footnote 53 making numerous domains inaccessible for days, without explanation.Footnote 54
This wave of internet control is confirmed by the large red segment that begins around early June 2014 marked out in Figure 3. I exploit this shock as my treatment because it uniquely meets the two mentioned conditions: it is near-universal in scope, as it includes almost all of the websites being tested (a “wide” dosage); and it spans a lengthy two years, through mid-2016, including a brief period of relaxation (a “deep” dosage).Footnote 55
With internet control as the treatment, I now explain my coding of the two “treatment uptake” variables, which measure how much actors rely on the internet for their productive and innovative activities. These variables measure (1) how much firms in each sector depend on data as an input factor, or “data-intensity”; and (2) how much researchers in each academic discipline depend on knowledge access for research, or “knowledge-intensity.”
Sector-Level Data-Intensity
There are currently few systematic measurements of sector-level data-intensity. Measurements are either unavailable for Chinese firms, based on outdated data, or too coarse to capture variation across different digital-technology sectors.Footnote 56 To address these challenges, I develop two original measures of data intensity tailored for examining the impact of internet disruption on firms across sectors. First, I identify technology classes that contain data-intensive subclasses based on the inclusion of the keyword “data” in the US Patent and Trademark Office's patent class list.Footnote 57 I then identify patents in the Office's database that meet this criterion, and the corresponding US and Chinese assignee firms.Footnote 58 For each firm, I calculate the percentage of its patents that are data related. This continuous variable of data intensity is subsequently dichotomized and matched with all Chinese firms by NAICS code.
For a second measurement, I assign 1 to sectors that have the word “internet” in their NAICS definition, and 0 otherwise.Footnote 59 This assignment is then matched with all Chinese firms by NAICS code. Both of my data-intensity measurements reflect current variation in factor intensity across sectors and discriminate at the five- or six-digit NAICS code level. The second measurement more specifically captures internet-related data intensity. Tables A1 and A2 in the online supplement list the data-intensive sectors identified by these two measurements.
Discipline-Level Knowledge-Intensity
To determine the degree to which researchers in a given discipline rely on the internet, I measure their dependency on the literature to generate research output. This can be proxied by the density of references cited. In bibliometrics, reference density has been quantified using measures such as references per article and references per page to study citation patterns.Footnote 60 Of the two, references per page is more suitable for our purposes as it accounts for article length, which varies greatly across disciplines.
I use data from the Web of Science to compile references per page for all disciplines in my sample.Footnote 61 To reduce noise in my measurement, I sample the 1 percent most-cited single-discipline research articles in each discipline.Footnote 62 For each discipline, I divide the total number of references by the total number of pages. Figure A1 in the online supplement visually summarizes this variable.
Dependent Variables and Covariates
To measure the impact of internet control on firm performance, I use quarterly firm-level revenue data from Compustat Global for Chinese and US listed firms from 2000 to 2019.Footnote 63 As covariates, I include firm-level variables that likely correlate with outcome and for which less than a third of observations are missing. These are total assets, which proxies for firm size, and total liabilities, which proxies for leverage. The online supplement presents summary statistics for Chinese firms in 2013, just before the 2014 shock. Chinese data-intensive firms, many being young technology companies, tended to be smaller in size and leverage than the rest (Tables A3, A4, and A5).
Unlike firms, which are classified by sector, institutions routinely conduct research across disciplines; and research by one institution often involves authors from multiple countries.Footnote 64 I therefore measure the impact of internet control on research performance at the research-article level. I collect Web of Science data for all single-discipline research articles produced in mainland China and in the United States from 2011 to 2020.Footnote 65 Following earlier approaches,Footnote 66 I proxy the quality of an article with the number of forward citations it has received. I include covariates, such as article age, that correlate with outcome. To minimize small-sample bias, I examine only the thirty-one disciplines with at least thirty research articles published from each of the two countries in each year. Table A6 in the online supplement presents summary statistics for the Chinese sample.
Methodology Overview
In an experimental design, one would randomly assign actors to an environment with internet control or to one without, and compare differences between the two outcomes. In reality, one does not observe the counterfactual performance of Chinese firms or researchers in the absence of internet control. My research design assumes that treatment was exogenous, particularly to the market, or T ⊥ Y(0), Y(1).Footnote 67
We have reasons to believe that the 2014 internet control was not imposed to help the Chinese digital technology companies. The domains of their main competitors, such as Google, Amazon, and Facebook, had either been blocked long before 2014 or were not blocked more than other domains, as my measurement indicates in the previous section. The vast majority of state support did not go to data-intensive sectors.Footnote 68 In fact, tension between the state and the Chinese tech giants long predates the crackdown that began in 2020, as elaborated later in the qualitative section.Footnote 69 Far from being cash cows kept by the government, Chinese tech giants have historically had substantial foreign ownership.Footnote 70 China's recent move to rein in its data-intensive sectors through “golden shares” obscures the fact that these shares were first introduced in 2013 to reduce the state's role in these sectors.Footnote 71 To probe for just what may have prompted the 2014 shock, I interviewed practitioners and industry experts with proximity to the internet policymaking process in China. My interviews suggest that the need for domestic stability has been the principal driver of internet control shocks. Crackdowns typically occur just before anticipated social unrest and major political events, such as the National People's Congress, when protests are more likely than usual.Footnote 72
Yet even with exogeneity in treatment, the treatment uptake variables, data intensity and knowledge intensity, are not randomly assigned. This means that treatment assignment, which is the interaction between treatment and treatment uptake, is not random. Considering this, I leverage a series of empirical strategies to identify the marginal effect of internet control. First, I apply a matching method designed for panel data to identify the effect on Chinese data-intensive firms relative to other Chinese firms. Second, by exploiting the geographical variation in treatment exposure with a triple-difference estimator, I parse out the effect on Chinese data-intensive firms relative to their US counterparts. Third, to identify the effect on Chinese research output, I disentangle the treatment effect from the selection effect using a negative binomial model and a Poisson model with fixed effects. Fourth, I adopt two similar models for a difference-in-differences estimation of research output from China and the United States. The next two sections detail these strategies.
Empirical Analysis of Positive Data Externality
Matching Strategy for Chinese Firms
Given the quasi-experimental setting, one would match each treatment-uptaking observation with non-uptaking observations to construct the counterfactual. However, my panel data consist of a different bundle of data-intensive and non-data-intensive firms in each period. The limited number of available covariates also constrains our ability to directly control for potential confounders. To address these, I implement the PanelMatch method, which matches each treated observation with control observations in the same period that have an identical treatment history for up to a specified number of periods. These are refined using matching or weighting methods so that the treated and matched control observations are reasonably balanced on observed confounders. Average treatment effects are then estimated using the difference-in-differences estimator with bootstrapped standard errors.Footnote 73 Using PanelMatch, I match each data-intensive Chinese firm with the maximum possible number of non-data-intensive Chinese firms for ten lag periods (calendar quarters) on total assets, total liabilities, and revenue.Footnote 74 I then estimate the average treatment effect of the 2014 internet control shock on firm-level revenue for ten lead periods after treatment.
The results strongly comport with my hypothesis of a positive effect of internet control on data-intensive firms (Figure 4). Ten periods after treatment—about two to three years out—the data-intensive firms on average see a 26 percent revenue gain over their non-data-intensive counterparts. The positive effect emerges as early as three quarters after treatment, increases, and plateaus at around nine quarters. Remarkably, the fluctuation from the third to the sixth quarter closely aligns with the noticeable break in treatment in 2015–16, as shown in Figure 3.
I further investigate my hypothesis from the reverse angle. Here, I set the treatment date to July 2008, just before the 2008 Summer Olympics in Beijing. At this time, the Chinese internet underwent an exceptional, brief period of liberalization in anticipation of an influx of foreign visitors.Footnote 75 Numerous routinely blocked websites suddenly became accessible. The abrupt removal of the baseline level of control constitutes an “anti-treatment” that should have a negative effect on domestic data-intensive firms, and this is indeed the case (see supplemental Figure A2). For a few quarters after the relaxation of internet control, Chinese data-intensive firms saw a decline in revenue relative to other Chinese firms. The brevity of this effect is consistent with the restoration of control right after the Olympics.Footnote 76
Robustness Checks and Placebo Tests
I perform a number of robustness checks, with results presented in supplemental Figure A3. First, I reduce the maximum number of matched observations to twenty and rerun the estimation. Second, I refine my matched set with a variety of matching and weighting methods, which helps ensure that the result is not driven by any particular method. These estimations return similar results. Third, I include only firms with above-median data-intensity scores in my treatment-uptaking sample to see whether the result is driven by certain stratum of firms. In fact, the effect doubles, to over 50 percent revenue gain for the most data-intensive firms. Among them are those specializing in such products as web search portals (Table A1), as my theory posits. Fourth, I try the alternative data-intensity measurement that uses NAICS keywords, which yields statistically weaker but substantively comparable estimates.
Finally, I address concerns with pretreatment trends and spurious treatment effects. Given that the matching strategy relies on the parallel-trend assumption, I conduct a placebo test for two years before treatment. There are no statistically significant differences between the treatment-uptaking and non-uptaking firms throughout this period (Figure A4). For an additional placebo test, I draw a sample of non-data-intensive firms equal in number to the data-intensive firms used in the main analysis, match them with other non-data-intensive firms, and rerun the estimation. I repeat this process for thirty iterations and plot the averaged point estimates with bootstrapped standard errors. As expected, one does not see any treatment effect, which is essentially the difference between two non-uptaking samples (Figure A5).
Triple-Difference Estimator for Chinese and US Firms
My first corollary hypothesis concerns the impact of internet control on Chinese data-intensive firms relative to their foreign counterparts. The US Trade Representative (USTR), for one, views China's internet control as a form of digital protectionism that has cost “billions of dollars in potential US business.”Footnote 77 Since the internet control shock occurred only in China and not in the United States, I exploit the geographical variation in treatment exposure with a triple-difference estimator.Footnote 78 The revenue of firm i at time t is given by
The dummy variable, D i, denotes being a data-intensive firm; T t denotes being in a treated period; and C i denotes being a Chinese firm. This design exploits three sources of variation to account for country-specific confounders, selection into data-intensive sectors by firms in either country, and trends in data-intensive sectors that affect both countries. I add year fixed effects, α t, to address time-varying unobserved confounders. In ${\vector Z}_{it}$, I include two salient time-varying firm-level controls, firm size and leverage. Because I hypothesize that internet control in China benefits Chinese data-intensive firms relative to other Chinese firms but not US data-intensive firms relative to other US firms, I expect the coefficient of the triple interaction term, β 1, to be positive and significant. Supplemental Tables A7 and A8 present results for the naive and saturated models. For each model, I use the full sample, the above-median data-intensive sample, and the full sample with the alternative data-intensity measurement. Standard errors are clustered at the sector level, where treatment assignment occurred.
We see that none of the naive estimates are significant, whereas those from the saturated models are significant but counter to the expectation. Based on these, one cannot reject the null for Corollary Hypothesis 1. The US data-intensive firms, including many so-called Big Tech firms, appear to have more than offset any data advantage for the Chinese data-intensive firms. A boost in data as an input factor is but one source of revenue growth. That the US data-intensive firms have outperformed their Chinese counterparts despite internet control hints at countervailing forces.
My theory points to one such force: the negative knowledge externality that co-occurs with the positive data externality. In hampering knowledge production, it ultimately undercuts growth for all actors in the economy regardless of the input factor. I now turn to the second set of hypotheses on internet control's detriment to innovation.
Empirical Analysis of Negative Knowledge Externality
Negative Binomial Estimator for Chinese Research Output
I investigate the impact of internet control on Chinese research output by way of a modified difference-in-differences design. Because citation count data often exhibits high skewness and overdispersion,Footnote 79 I adopt a negative binomial model to estimate the 2014 internet control shock's marginal effect on Chinese article-level forward citations:
Ki denotes the knowledge intensity of the discipline, and Ti denotes having been published in a treated period.Footnote 80 By exploiting variation in knowledge intensity across disciplines, the model accounts for discipline-specific trends. Because the time dimension collapses in the cross-sectional data set, I control for article age, Ai, which correlates strongly with citation counts.Footnote 81 I also control for number of co-authors, Ni, which correlates positively with citations.Footnote 82 Journal fixed effects, αi, are added to all models. Supplemental Figure A6 and Table A9 attest to parallel trends in citations between knowledge-intensive and non-knowledge-intensive disciplines prior to 2014.
Given my hypothesis that internet control engenders a greater decline in research quality for more knowledge-intensive disciplines, I expect the coefficient of the interaction term, β 1, to be negative and significant. Table 1 presents the main results, with incidence-rate ratios in square brackets.Footnote 83 Standard errors are clustered at the discipline level, where treatment assignment occurred.
Note: Clustered (discipline-level) standard errors in parentheses. *p < .10; **p < .05; ***p < .01.
Main Results and Robustness Checks
Across all four models, the coefficients of interest are not only statistically significant (one at p < 0.01, two at p < 0.05) but also substantively large. Models 1 and 2 employ the original, continuous variable of knowledge intensity. Model 2 focuses on the 50 percent most-cited articles published in a given discipline in a given year, which reduces noise by excluding low-quality articles. The incidence-rate ratios suggest that, on average, internet control in China is associated with a close to 10 percent marginal reduction in research quality, conditional on the knowledge intensity of a discipline.
I then dichotomize the knowledge intensity variable. In model 3, I assign 1 to disciplines of median knowledge intensity or higher, and 0 otherwise. In model 4, I assign 1 to disciplines of knowledge intensity at least one standard deviation above the mean, and 0 to those at least one standard deviation below the mean. The results largely remain, and the controls for article age and number of co-authors behave as expected across all models.
For an additional robustness check, I repeat the preceding analyses using a Poisson model given only moderate overdispersion in the data.Footnote 84 The estimates are even greater in significance (three at p < 0.01, one at p < 0.05) and larger in magnitude (Table A10). Based on models 1, 3, and 4, internet control in China is associated with about a 15 percent marginal reduction in research quality, conditional on knowledge-intensity.
Difference-in-Differences Estimator for Chinese and US Research Output
To examine the impact of internet control on domestic researchers vis-à-vis their foreign counterparts, I again exploit the geographical variation in treatment exposure between China and the US with a difference-in-differences estimator:
The dummy C i denotes being produced by author(s) in China. I likewise add controls for article age and number of co-authors, and both journal fixed effects and discipline fixed effects, α 2i. Because I hypothesize that internet control hurts Chinese researchers in any discipline relative to their US counterparts, I expect the coefficient of the interaction term, β 1, to be negative and significant. Table 2 presents the results for both the negative binomial and Poisson models, with standard errors clustered at the discipline level.
Note: Clustered (discipline-level) standard errors in parentheses. ***p < .01.
The estimates, significant at p < 0.01 in both models, suggest that internet control has reduced the quality of research by Chinese researchers by more than 22 percent compared to their US counterparts, irrespective of the discipline. While China has caught up with the United States in aggregate research quality,Footnote 85 such metrics mask the damage from internet control at the margin: China would be still more innovative without such controls, even markedly so. This also helps elucidate how the “dictator's dilemma” exaggerates the autocrat's concern about internet control's harm to innovation. Even if sizable, such harm might only manifest when interacted with knowledge intensity or after accounting for confounders.
Based on the foregoing, we can confidently reject the null for both H2 and Corollary Hypothesis 2. In obstructing the flow of knowledge, internet control most acutely hurts domestic knowledge-intensive researchers. But no matter the knowledge domain, it hurts all domestic researchers. To the extent that innovation hinges on knowledge creation, internet control inhibits growth regardless of the mix of domains or sectors the state may seek to strategically foster.
Evidence for State Strategy
I conclude this theoretical proposal by presenting preliminary evidence for its implications for state strategy: following internet control, China clamped down on domestic data-intensive sectors that had benefited from the positive data externality. It did so through a combination of broad-based legislation on data sovereignty and targeted campaigns aimed at curbing individual firms’ overseas expansion. Concurrently, the state sought to diffuse discontent from the negative knowledge externality by limiting the scope of internet control generally and by allowing privileged internet access for certain foreign knowledge-intensive actors specifically. These findings underscore the disproportionate influence of short-term interests and foreign actors on the autocrat's decisions.
Reasserting Data Sovereignty: Legislation and Crackdown
In late 2016, shortly after the apparent abatement of internet control (Figure 3), China enacted its Cybersecurity Law.Footnote 86 Ambitious in scope but ambiguous in terminology, it set the tone for a succession of laws that would cover all aspects of data sovereignty. These include the National Intelligence Law,Footnote 87 the Data Security Law,Footnote 88 and the Personal Information Protection Law.Footnote 89 Persisting across these legislative efforts is the reassertion of the state's absolute authority over domestic data through localization and handover mandates,Footnote 90 and notably through tighter prohibition of access to such data by foreign entities, government or private.Footnote 91 The vague definitions grant the state vast discretion in determining the liability of domestic firms and in levying punishment.Footnote 92
However extensive, broad-based legislation could accomplish only part of the state's objective. It could not prevent profit-maximizing firms from seeking opportunities abroad and weakening the state's oversight of their data in doing so.Footnote 93 Vague provisions lose potency when challenged by conflicting but better-codified stipulations from another jurisdiction. What became known as China's crackdown on tech was part and parcel of the state's attempt to address this residual concern.Footnote 94 State authorities cited anticompetitive behavior, privacy violations, and data security malpractices as bases for the suspension of Ant Group's initial public offering,Footnote 95 the investigation leading to DiDi's delisting from the NYSE,Footnote 96 and the probe into BOSS Zhipin following its parent company's NASDAQ listing.Footnote 97 Beneath these decisions, however, throbbed a pulsating fear of “disorderly capital expansion”—code-speak for when a firm has amassed enough financial clout to pose a political threat to the regime.Footnote 98
Even so, it was the Cyberspace Administration of China, not agencies that oversee offshore listing such as the China Securities Regulatory Commission, that did much of the disciplining.Footnote 99 This hints that data, not just capital, was at stake. With their multitude of domestic data vulnerable to exploitation by foreign actors, these firms, already viewed as a threat from within, now also pose a risk to the regime from without.Footnote 100 The apprehension may not be misplaced. The handover of audit working papers, for example, could result in the retention of raw user data and communications between Chinese companies and government agencies for US regulatory inspection for three consecutive years.Footnote 101 Even if handover were not mandatory for compliance, it might still pose too great a risk if the data itself were of a particular kind. DiDi, as one of a handpicked group of Chinese entities licensed for detailed surveying and mapping, would present just this type of risk if foreign actors were able to access the company's coveted real-time location data, including data on Chinese defense zones.Footnote 102
The high-flying Chinese data-intensive firms were not simply getting their wings clipped by conflicting compliance requirements. They were being pressed against their primal drive for profit by the regime's insistence on “equal importance to internal and external security.”Footnote 103 Engorged with a frightful mix of capital and data, even the faintest crack of disobedience could invite crushing force from the state's iron fist.Footnote 104 The crackdown cost the Chinese firms trillions and eroded their once-enviable position on a par with their US counterparts.Footnote 105 Since then, trade complaints about the Great Firewall and allegations of US Big Tech's “jealousy” of their Chinese rivals have quietly given way to other stressors in bilateral relations.Footnote 106 The backlash reset whatever advantage the Chinese firms had won from internet control.Footnote 107 The self-same profit motive has sent the Chinese tech giants and the US Big Tech down divergent paths.
Minimizing Collateral Damage: AI-Powered Censorship and Selective Accommodation of Foreign Actors
Due to the inefficiency of directly compensating domestic knowledge-intensive actors for the negative knowledge externality, as previously described, the state will first limit the scope of internet control so long as it does not hinder maintaining domestic stability. Figure 3 illustrates such an attempt. Since 2017, across-the-board internet control has eased appreciably. The government has explored tailored measures that target, for example, sensitive segments of a domain while keeping the rest accessible.Footnote 108 AI has further fine-tuned censorship, with natural language processing and image recognition now widely embedded in China's popular mobile apps, such as WeChat.Footnote 109 Increasingly sophisticated censorship algorithms have driven down both false negatives and false positives.Footnote 110 In reducing false negatives, AI detects more anti-regime content faster.Footnote 111 In reducing false positives, AI allows through more innocuous content, minimizing the negative knowledge externality without compromising control.
While knowledge-intensive actors in general hold little power over the state, one notable exception is the foreign knowledge-intensive actors in the state. More precisely, they are those with whom the state has entered into various forms of contracts that require the state to ensure them freedom of information access in exchange for their provision of resources. Faced with similar hindrances as their domestic counterparts, these foreign actors have the option to retaliate by imposing an immediate economic cost on the regime. They may do so by invoking provisions for such access in the contract or by withholding the resources. For either to work, however, the threatened cost must be high.
The Sino-Foreign Cooperative University Union is one framework that imparts such de jure leverage to its member institutions, the “joint-venture universities.”Footnote 112 In principle, these institutions are not subject to the same restrictions on information access as their Chinese counterparts. For US accreditation, the Chinese government must demonstrate that the student experience at these institutions is on a par with that in the United States.Footnote 113 In practice, experiences vary. At New York University Shanghai, web domains blocked elsewhere in China are generally accessible via the institution's network. However, at another such institution, Duke Kunshan University, the network follows a different protocol, blocking some domains that are accessible at NYU Shanghai.Footnote 114
The Schwarzman Scholars program at Tsinghua University represents a different kind of leverage. At over USD 575 million, the program is the “single largest philanthropic effort in China's history.”Footnote 115 An endowment this size enables the founder, Stephen A. Schwarzman, to act as the de facto guarantor of freedom.Footnote 116 When asked whether he would “keep things very free” and maintain “total academic freedom” at his college, Schwarzman said, “Yes. Absolutely … And we've made that clear to our friends at Tsinghua and they agree completely.”Footnote 117
Indeed, at the Schwarzman College, virtual private networks are embedded in the network for credentialed users, which affords them a browsing experience similar to that in the United States—unlike their “friends at Tsinghua.” Other students at Tsinghua do not enjoy institution-sponsored unrestricted internet access, nor do those at other elite institutions such as Peking University.Footnote 118 Rather than the elevated status or exceptional productivity of the institutions, it is the leverage held by the foreign actors that motivates the state to make accommodations in this peculiarly discriminating manner.
Concluding Remarks
In this paper I begin with the three distinct components of information: ideas, data, and knowledge. Internet control intended to restrict ideas generates a positive externality for domestic data-intensive sectors and a negative externality for domestic knowledge-intensive sectors. Quantitative analysis of the case of China strongly supports both hypothesized externalities. I then postulate that the positive data externality impedes the state's competing objective of data sovereignty when domestic data-intensive firms expand overseas. Meanwhile, the state shields certain foreign knowledge-intensive actors from the negative knowledge externality to avoid the immediate costs they might otherwise impose. Qualitative evidence comports with these implications in accentuating the double challenge posed by internet control's dual externalities.
Many theoretical and empirical extensions can be made, of which I highlight three. First, just as the USTR has accused China of digital protectionism, China has protested the US and the EU sanctions of its firms, such as Huawei, and in some cases threatened retaliation.Footnote 119 A fuller assessment of the trade repercussions of internet control in a cross-border setting should take into account retaliatory acts and any boomerang effect beyond the initial impact.Footnote 120
Second, a closer look into the negative knowledge externality warrants an investigation into its mechanisms. One hypothesis is that internet control reduces research quality by limiting domestic researchers’ exposure to frontier knowledge from the outside world. Text-similarity measures have been used to track idea diffusion, including in scientific innovation.Footnote 121 Such methodologies can be applied to test this hypothesis by comparing research from China with that from the rest of the world, where one would expect less similarity between them following internet control.
Third, as internet connectivity continues to rise and indigenous digital products proliferate in the Global South, more states—both autocratic and democratic—will meet the scope conditions of my theory and provide fertile testing ground. It would be worthwhile to explore how information externalities manifest in democracies. The positive data externality may incentivize domestic data-intensive sectors to lobby for the state to block foreign competitors’ web domains. The state may likewise be incentivized to pursue such protectionist internet control in return for support from these sectors.Footnote 122 Moreover, that the protectionist benefit exists as an externality facilitates the justification of these measures under such guises as national security and privacy concerns. India's increase in internet control concurrent to its stunning increase in internet connectivity typifies a scenario for formulating and testing these hypotheses in a democratic context.Footnote 123 My theory also supplies an additional lens for analyzing events, such as the evolving situation of TikTok in the United States, that straddle trade and national security.Footnote 124
One final caveat is that advancements in generative AI may induce heavier reliance on data over knowledge in producing innovation. The positive data externality from internet control may therefore compensate for the negative knowledge externality. However, the resulting innovation may be less novel due to greater data homogeneity.Footnote 125 An inquiry into the emergent relationship between politics, information, and innovation in the age of generative AI will illuminate our understanding of state power and of human progress.
Data Availability Statement
Replication files for this article may be found at <https://doi.org/10.7910/DVN/OX6G1A>.
Supplementary Material
Supplementary material for this article is available at <https://doi.org/10.1017/S0020818324000237>.
Acknowledgments
For extensive feedback I thank Yasheng Huang, In Song Kim, Kenneth Oye, and members of the Kim Research Group. For helpful comments I thank Pablo Beramendi, Daniel Drezner, Richard Freeman, Kathleen McNamara, Abraham Newman, Elan Pavlov, Nathaniel Persily, James Prieger, Robert Reich, Tuan-Hwee Sng, Anton Sobolev, Neil Thompson, Paul Vaaler, Josephine Wolff, and meeting participants at MIT, Stanford University, Georgetown University, Carnegie Mellon University, University of California San Diego, University of Pennsylvania, TPRC, New Faces in Chinese Politics Conference, Cybersecurity Law and Policy Scholars Conference, Politics and Computational Social Science conference, National Bureau of Economic Research, International Political Economy Society, and the American Political Science Association's annual meeting. I am indebted to the editors and the anonymous reviewers for their thoughtful input.
Funding
Research for this paper received financial support from MIT, Stanford University, Georgetown University, the Smith Richardson Foundation, and the Horowitz Foundation for Social Policy.