The causal connection between culture and economic growth has garnered increasing attention. Although the broad linkage is traced back to Adam Smith, recent work focuses on specific functions of culture. Some, for instance, argue that wealth may in part be driven by an “inclusive” culture in society, where no one is called heretical when challenging the conventional wisdom, or where there is an institutionalized medium, such as journals, that guarantees the freedom of speech for ideas that challenge orthodoxy (Mokyr Reference Mokyr2002, Reference Mokyr2016; McCloskey Reference McCloskey2016). Empirical analysis has demonstrated that certain individual beliefs and preferences, when geographically concentrated, are growth enhancing (Guiso, Sapienza, and Zingales Reference Guiso, Sapienza and Zingales2006; Tabellini Reference Tabellini2008, Reference Tabellini2010). Others show that inclusivity matters but excessive diversity may have adverse effects by incurring high transaction costs (Ashraf and Galor Reference Ashraf and Galor2013).
Among the myriad attributes of culture, language offers a tangible and testable causal channel to economic growth. Language, in theory, can be conceptualized as a medium to gain market access, information, public goods, and rents. Many societies use language as a political instrument to build barriers to entry or deny certain groups access to market transactions and economic resources (Laitin and Ramachandran Reference Laitin and Ramachandran2016; Liu Reference Liu2014). Language can also function as a focal point around which to build effective political coalitions (Fearon Reference Fearon1999). This is in part why speakers of the same language in highly diverse societies typically coalesce as an ethnic group in the competition for political power and economic opportunities.Footnote 1 Thus, the ability of ethnic groups to use language as an instrument of power can have profound consequences, not only on the economy, but for society as a whole.
Few theoretical and empirical works systematically address why some ethnic groups gain the ability to use language more effectively than others. Although social science research has offered general hypotheses, the roots of the standardization of core cultural attributes for ethnic groups, especially language, have not been widely investigated.Footnote 2 It is one of the most difficult cultural attributes to standardize, because, compared to flags and anthems, codified language requires literacy.Footnote 3 Benedict Anderson (Reference Anderson2006) comes closest to theorizing about language rationalization. His thesis connects the profit motive of printers and booksellers (“print-capitalism”), which first arose following the invention of the printing press in late fifteenth-century Europe, to “national consciousness” (Anderson Reference Anderson2006, Chapter 3). Yet, as I show later, a broad description of this linkage leaves unaccounted multiple possible causal mechanisms. Moreover, systematic evidence that tests Anderson's hypotheses about language standardization has not yet been offered.
I fill these gaps by providing a simple conceptual framework and empirical analysis of how language standardization occurs for European ethnic groups. I distinguish two plausible channels through which Anderson's hypothesis that the acquisition of the printing press could spur the greater use of the vernacular over Latin. The first is the selection process. Here access to print technology gave early-modern ethnic groups an impetus to make innovations, including in later periods vernacular codification when such development was deemed necessary and viable. Accessibility is critical because fixing language was never a reason for the spread of presses. Ethnic groups with a sovereign state benefited from this channel. The second process is the one whereby standardization was a conscious choice on the part of early-modern printers to promote vernacular publications to meet their interest in education, proselytization, and profit. Many early-modern ethnic groups were able to codify their tongue in modern times without independent statehood by taking advantage of this coincidence. Multiple routes to the equifinality between print technology and language standardization reflect the wide-ranging effects of the printing press (Eisenstein Reference Eisenstein1979; Dittmar Reference Dittmar2011).
This article presents a new data set to explore these links with statistical evidence for 171 European ethnic groups for the period between 1400 and 2000 ce. The evidence is consistent with the hypothesis that the acquisition of the printing press is positively associated with standardization of the vernacular and that early adoption of that technology corresponds to greater chances of standardization. Using event history models, I find that ethnic groups that adopted the press have a four to nine times increase in the chance that their vernacular gets standardized. Similarly, the evidence shows that early adopters were six percentage points more likely to rationalize their vernacular in the modern period than were latecomers. The results hold whether or not ethnic groups comprised an independent state. The main findings are based on Cox proportional hazard models. But I also test their robustness using logistic regression models. I address endogeneity concerns, first, by accounting for the possibility that the spread of print technology is driven by human capital and, second, by using the geographical distance to Mainz as a source of exogenous variation for the spread of presses. All models confirm that the adoption of print technology has a substantive and significant impact on language standardization.
In this article, I make two contributions. First, I seek to demonstrate that there are non-state routes accounting for the variation in cultural consolidation among ethnic groups. Territorial sovereignty plays so integral a role in cultural preservation that it is commonly embraced in the literature on ethnicity and nationalism as a precondition for the construction of nations (Anderson Reference Anderson2006; Breuilly Reference Breuilly1993; Gellner Reference Gellner2006; Hechter Reference Hechter2000). Yet recent research on the causes of ethnolinguistic diversity has empirically shown that the state is not the only determinant and that it is critical to consider non-state determinants such as geography and technology (Ahlerup and Olsson Reference Ahlerup and Olsson2012; Ashraf and Galor Reference Ashraf and Galor2013; Laitin, Moortgat, and Robinson Reference Laitin, Moortgat and Lea Robinson2012; Michalopoulos Reference Michalopoulos2012; Spolaore and Wacziarg Reference Spolaore and Wacziarg2009). This article focuses on the impact of access to print technology and the timing of technological access to address variation in a core attribute of ethnicity, namely, language standardization. My second contribution is to show deep historical roots of contemporary politics that harnesses ethnicity and culture as political instruments. An emerging body of empirical work in economic and political history demonstrates that the impact of historical events can have centuries-long persistence on current economic and political outcomes (Comin, Easterly, and Gong Reference Comin, Easterly and Gong2010; Nunn Reference Nunn, Aghion and Durlauf2014; Olsson and Paik Reference Olsson and Paik2016; Spolaore and Wacziarg Reference Spolaore and Wacziarg2013; Stasavage Reference Stasavage2014). This article joins this growing literature by providing evidence that contemporary ethnic groups' ability to use language for political participation and access to the market was determined many centuries ago.
CULTURAL RATIONALIZATION, ETHNICITY, AND THE NATION
The unit of analysis in this article is the ethnic group. Before laying out hypotheses, it is critical to define “ethnicity” and the related concept, the “nation,” and address the relationship between the two. I define ethnicity as a category that delimits membership to a group and ethnic group as a collectivity determined by a set of attributes shared among its members. Following many scholars, I use the myth of common descent as the most basic characteristic for the definition.Footnote 4 As explained later, many ethnic groups attempted to consolidate their cultural practices as a path to political survival in a centuries-long competition over resources and territory, which was particularly intense in Europe (McNeill Reference McNeill1982).Footnote 5 Any symbols reminiscent of common origin may be selected for group consolidation, such as a memorable landscape, historic buildings, clothing, or a flag. By contrast, the nation may be defined as a collectivity with standardized culture that commands loyalty among the community members. Standardization here is synonymous with institutionalization;Footnote 6 a nation often designates certain cultural practices to be written as “official,” as in an anthem, interpretations of its origins, and language use, which are consistently invoked to consolidate group membership. Successful language standardization, in particular, requires literacy among the members of the nation so that official culture is effectively promulgated and sustained from one generation to the next. Ethnicity, in short, is distinguished from the nation in terms of standardization.
When ethnicity and nation are understood in this way, a few theoretical and empirical implications emerge. First, the ethnic group can be an analytical building block when considering questions about the nation. This approach is useful not only for conceptual distinction but also for empirical analysis. Second, it renders the category—nation—more amenable to concrete observation, which opens a way for causal identification. One such strategy may be to identify the timing of standardization as one salient dimension of ethnic groups’ cultural attributes and examine the mechanism by which standardization occurs.
Among the multitude of cultural practices, the standardization of the vernacular is one of the most important in specifying the mechanism through which ethnic groups become consolidated nations. The rationalization of non-linguistic dimensions, say an anthem, may precede that of linguistic ones, but enforcement would predictably be challenging in the absence of a shared means of communication. By contrast, once a vernacular is codified, this would drastically lower the cost of access to information, thereby making consolidation of the other dimensions of culture more efficient. In a classic study of cultural rationalization in Third-Republic France in the late nineteenth century, Eugen Weber (Reference Weber1976, p. 313) recounts the centrality of standardized language instruction at public school in the countryside. Weber vividly illustrates how a pupil who uttered words in an “unauthorized” tongue—patois—was chastised by having to hold a display showing a faux pas until the next student committed one. Further, a unitary language has a spillover effect in social organization. For instance, it makes the administration of military-related tasks such as recruitment, training, and the solidarity of personnel effective and therefore enhances fighting capacity. Uniform language fulfills a variety of functions, not just as a chief repository of cultural knowledge, but also as a useful instrument for policy.
My focus on language is certainly not new. In fact, language plays a central analytic role in the literature on ethnicity and nationalism (Anderson Reference Anderson2006; Bell Reference Bell2001; Gellner Reference Gellner2006; Kohn Reference Kohn2005). Convention holds that a nation may be observed when a government is conducted in the predominant tongue of its territory (Gellner Reference Gellner2006; Hechter Reference Hechter2000; Hobsbawm Reference Hobsbawm1990; Tilly Reference Tilly1994). The extant literature indicates that fixing the vernacular is assumed to be critical to achieve what Andreas Wimmer calls “the rule of like over like” (Reference Wimmer2002, p. 213). Across Europe, language historically served as one of the most important cultural dimensions with which to define membership to the given nation. The rise of modern citizenship, an institution in which language fluency is typically a requirement, is a good illustration.Footnote 7 Indeed, contemporary works use language as the dependent variable to theorize about the variation in the salience of identity across time or within an ethnic group (Laitin Reference Laitin1998, Reference Laitin2007; Brubaker Reference Brubaker2004) and the variation of “language regimes”—state policy on language instruction (Albaugh Reference Albaugh2015; Cardinal and Sonntag Reference Cardinal and Sonntag2015; Liu Reference Liu2014; Safran and Laponce Reference Safran and Laponce2005).
A systematic investigation as to why language rationalization has become the default choice for ethnic groups seeking survival has not yet been undertaken. There is little doubt that the codification of the vernacular is, in theory, a highly costly project. It entails the classification of speech into main “trunks” and “branches” (as in the language trees), the transcription of speech into letters, the establishment of a grammar and related rules on usage, and finally the construction of an orthography, a set of rules about spelling. Once these rules are clearly written out and consistently used and taught among the speakers, a language may be said to have been “standardized.” Publication of a dictionary may come at the very end of this lengthy process. Competent and devoted experts would be needed to bring the project to completion. It is, therefore, easy to imagine that each step is a labor-intensive, time-consuming, and financially demanding process. High fixed costs should make it clear that language standardization is not a “natural” choice. Yet, empirically, no ethnic groups that are well-known, such as the English, French, Italians, and Russians, fail to have their languages codified. Lesser-known ones, aspiring to survive, attempt to follow suit by consciously using their tongues in everyday communication and school instruction.Footnote 8 Two questions arise with respect to high fixed costs in labor and time on language rationalization. The first is a theoretical question. Why do some ethnic groups make such a costly investment and fix their vernacular? The second is empirical. Do hypotheses on the first question explain the variation of language standardization across ethnic groups? Addressing these questions fills an important lacuna in the study of ethnicity, nations, and ethnic diversity.
PRINT TECHNOLOGY AND LANGUAGE STANDARDIZATION
The Technological Innovation
I hypothesize a positive relationship between the printing press and language standardization. But it is crucial, first, to consider the technology's broad impact beyond language. The most general and profound effect that print technology brought to bear would be to reduce the cost of access to information (Mokyr Reference Mokyr, Aghion and Durlauf2005; Bernstein Reference Bernstein2013). Metal movable typography, which was invented by Johannes Gutenberg, among others, circa 1450 in Mainz, Germany, greatly encouraged innovations in knowledge production on many scales. Innovations occurred more cheaply, spread more widely and more quickly, and the incentives to produce grew stronger.Footnote 9 The diffusion of printing technology was remarkably swift by fifteenth-century standards. More than 110 cities had a press established by 1480 (Febvre and Martin Reference Febvre and Martin1976, p. 182) and the number grew to over 240 by 1500 (Clair Reference Clair1976). Thereafter, book production surged dramatically. There were an estimated 5 million manuscripts produced during the fifteenth century in a dozen European countries, a 358-fold increase from the sixth century and an 82 percent increase from fourteenth-century production levels (Buringh and van Zanden Reference Buringh and Luiten van Zanden2009, p. 416). Similarly, printed book production increased 6.3 times in the first half of the sixteenth century from the 12 million books printed during the incunabula period of 1450–1500 (Buringh and van Zanden Reference Buringh and Luiten van Zanden2009, p. 417).Footnote 10 In this period, the price of the book dropped by two-thirds (Dittmar Reference Dittmar2011). Moreover, the technology significantly reduced person-hours to the extent that it pushed scribes out of business (Bernstein Reference Bernstein2013). Figure 1 documents the rapid diffusion of printing technology among the European ethnic groups in my data set (described in detail later). Of the 96 groups that acquired the press, approximately half (41.7 percent) did so during the incunabula period. Figure 1 counts the first year of print adoption for each group.
The State's Role in the Spread of Print and Language Standardization
The reduced cost of access to print media served as a catalyst for activities beyond book production.Footnote 11 The initial demand for the new technology came primarily from university instructors, clergy, lawyers, and wealthy merchants (Febvre and Martin Reference Febvre and Martin1976, pp. 172–80). Yet the cost of operation remained high and printers were chronically short of capital (Febvre and Martin Reference Febvre and Martin1976, Chapter 4). As a result, printers had to work in as many markets, because uncertainty about the emerging market on print media made it a highly risky venture (Pettegree Reference Pettegree2010, Chapter 3). For printers, finding a profitable market and sustaining it were equally critical.
States played little initial role in generating supply or demand for the new technology. In premodern Europe, industrial technology like the movable-type press spread primarily through skilled workers rather than states or related political channels (Cipolla Reference Cipolla1972). Nevertheless, print technology had a ripple effect in politics by making the dissemination and enforcement of rules cheaper. It proved useful for legal, legislative, and administrative purposes (Graff Reference Graff1987, p. 109). There is some evidence that states took advantage of print media for two of their core functions: war and taxation. For instance, in the 1510s Maximilian I of the Holy Roman Empire issued propaganda broadsheets and pamphlets to raise manpower and revenue (Pettegree Reference Pettegree2010, p. 132). The state also had the potential to induce supply of print by serving as a reliable patron for printers who always looked for one. At the same time, premodern European states chronically suffered from an unreliable flow of revenue due to a limited degree of institutional centralization (Dincecco Reference Dincecco2015). All these rationales point to a favorable condition under which the state could employ printers as royal servants and create a win-win relationship which could be long-lasting, because revenue generation and law enforcement were permanent features of governance. Although printers called for monopoly rights to create a more secure business environment, premodern European states were generally unable to enforce regulations due to their limited institutional capacity (Pettegree Reference Pettegree2010, p. 73). Evidence on the political use of the printing press by the state seems indirect at best for the premodern era.
Equally important, independent statehood is not a prerequisite for language standardization. The development of the vernacular for the Bretons offers an illustrative case. Prior to incorporation into France in 1532, there was no clear evidence of Breton being used for official purposes during the five centuries of the independent kingdom (Price Reference Price1998, p. 36). However, from around 1450 the Breton language was substantially recorded in manuscripts, and the arrival of the printing press in Rennes in 1485 facilitated stability in language use. The reduced cost of printing led to the publication of a trilingual Breton-French-Latin dictionary in 1499 (the first printed book in Breton) and to the introduction of simplified spelling rules (orthography) in the mid-seventeenth century (Price Reference Price1998, pp. 36–37). A greater supply of language use began in the early nineteenth century, when literates produced poetry, history, and novels in Breton, following the publication of grammar books and French-Breton dictionaries in the “purified” form by influential linguist Jean-François Le Gonidec (Hardie Reference Hardie1948, p. 10; Minahan Reference Minahan2000, p. 131). The separatist movement of the twentieth century, though failed, gave an impetus for further linguistic sophistication including spelling unification across the Breton-speaking region of France. Despite the legal exclusion of Breton use in the French school system for 1880–1951, Breton use became more widespread in journals and periodicals, culminating in the 1958 monolingual dictionary, Geriadur istorel ar brezhoneg (Historical Dictionary of Breton) (Dalby Reference Dalby1998, p. 64). The example of the Breton language demonstrates, first, that printing technology played an important role in stabilizing the vernacular and, second, that standardization is feasible without an independent polity.
Evidence from the cultural history of early-modern Europe indicates that the spread of the movable type and the development of print media relied less on state actors than private ones. Printers had to secure access to capital other than in the public route, by creating a joint venture, locating private patrons, and relocating to high-demand cities like Wittenberg, the home of the Protestant Reformation (Pettegree Reference Pettegree2010). Thus, the theoretical linkage between the printing press and language standardization is likely to be as strong or even stronger outside the state.
Print and Vernacular Standardization
To begin, I discuss two ways in which the press gave rise to the greater use of the vernacular over Latin. First, printers played a role in triggering competition between vernaculars and Latin. As mentioned in the previous section, print was a high-risk business so that printers had to explore demand in academic and private markets. As Anderson famously points out, after quickly filling the “thin” Latin market, these printers cultivated a “thicker” vernacular one in the lay public, which could only understand the vernacular as no ethnic groups had Latin as a native language (Anderson Reference Anderson2006, p. 38). Indeed, the growing popularity in the use of the vernacular begot the “esotericization of Latin” (Anderson Reference Anderson2006, p. 42), which was made apparent after 1530. The competition largely ended within 80 years following the invention of the Gutenberg press (Febvre and Martin Reference Febvre and Martin1976, p. 320).Footnote 12
Second, access to print had an effect of stabilizing languages (McKitterick Reference McKitterick1998, p. 296). This point is often overlooked or made in passing in the literature but is crucial. Once words are produced in printed form and widely circulated, they tend to acquire “staying power” in terms of consistency in usage. Furthermore, the technology of mass production enhanced scalability. One conspicuous dimension on which the printing press contributed to language rationalization is spelling. Prior to the invention of the press, spelling was primarily phonetic; thereafter, it became increasingly consistent as the transmission of words was based on a mechanical print medium (Steinberg Reference Steinberg1974, p. 125). When printers processed text for publication, they simplified spelling at their own discretion to make their work more efficient (Eisenstein Reference Eisenstein1979, p. 87). Once types were set, subsequent printers would keep using them when printing the same words to save time. The power of the press to fix language use attracted the attention of school teachers and priests who were concerned about what they deemed as the inconsistent or improper use of language and some of them subsequently became printers themselves. It is crucial to recognize that the invention of the technology did not intend to enable or facilitate the rise of the vernacular; standardization was an unintended consequence of it.
What processes then account for the link between the greater use of the vernacular and codification? In Imagined Communities, Anderson made an argument about the link between the profit motive of printers and booksellers (print-capitalism) and national consciousness. He theorizes that “print-capitalism gave a new fixity to language, which … helped build that image of antiquity so central to the subjective idea of the nation” (Anderson Reference Anderson2006, p. 44). This hypothesis describes the processes of language rationalization in general, but two points remain unaddressed. The first is to specify the causal mechanisms linking the press to language standardization. The second highlights the dimension of duration in the standardization process that is highly time-consuming and labor-intensive. The print-capitalism thesis holds that although the vernaculars won the competition with Latin relatively quickly, it does not consider the new race between vernaculars. The new rivalry lasts significantly longer than that the competition with Latin. Latecomers in the game may be severely disadvantaged in that they miss the opportunity to standardize their tongue and are compelled to adopt similar ones that are better codified. These two points merit further discussion.
There are two plausible causal mechanisms from print technology to vernacular codification. The first is the selection process. If early-modern printers established presses in homeland cities of some ethnic groups and started to produce books in the vernacular, the technology would likely prove to be useful later when language standardization became viable or necessary. Even though the original aim of print adoption was printers’ incentive to generate revenue, accessibility to the relevant technology was a catalyst for subsequent innovations. The state is clearly a beneficiary in this hypothesis. Although European states did not create a strong demand for or supply of print, they benefited from the reduced cost of producing print media to facilitate the enforcement of rules and revenue generation. When states’ investment in language rationalization is rewarded by the greater degree of fiscal and institutional centralization, other states may quickly follow suit to stay competitive. Similarly, the selection mechanism can also explain why the language of demographically small ethnic groups was codified. As illustrated in the case of the Bretons, if there was a history of vernacular print, it could allow language entrepreneurs to revive or solidify their language to achieve national unity and to increase the chances of group survival without an independent state. Thus, the selection process can account for competitive and niche mechanisms.
The second process is conscious choice. Actors may take advantage of the access to printing technology using the vernaculars to advance their interests. Two paths merit attention. The first is the profit motive on the part of the printers. An increase in vernacular publications over those of Latin stirred new demand for translations. The translation business began to flourish in the early sixteenth century with many print offices becoming workshops for translators busily producing classical works in the vernaculars (Febvre and Martin Reference Febvre and Martin1976, p. 272). Ethnic entrepreneurs who want to advance learning in their native tongues, such as school teachers and university professors alike, would have an incentive to exploit the declining cost of becoming literate and producing books. Another path is Bible translation. Protestants had an incentive to translate the Latin Bible into the local languages in the proselytization effort. The motive was particularly strong among Protestant reformers. They wrote books and translated Luther's catechism and other writings into many languages. These were often the first printed vernacular works and proved to be a critical foundation for language standardization in subsequent centuries. The Slovene language fits this pattern. The first printed book in Slovene was a Protestant catechism, translated in 1551 by Primož Trubar, a Protestant preacher. This left the printed record of the language for writers in the nineteenth and twentieth centuries, who modernized Slovene with the publication of newspapers, books, and periodicals (Biggins and Crayne Reference Biggins and Crayne2000). Given that literate ethnic Slovenes spoke German rather than Slovene which was considered a peasant tongue for the centuries of Habsburg rule, the earlier translation effort allowed the Slovene language to survive. Similar examples can be found in Estonian, Latvian, Livonian, Lithuanian, and Finnish (Steinberg Reference Steinberg1974, p. 122). The conscious-choice process offers an alternative path to language standardization.
Figure 2 shows the cumulative publications of vernacular dictionaries between 1750 and 2000 among the 104 ethnic groups in my data set broken down by those groups, with a record of print adoption and those without it. It indicates that no matter the route, access to the printing press gives ethnic groups a strong advantage in consolidating the vernacular.
The Significance of the Early Adoption of the Press
It is imperative to understand the timing of access to the printing press because language standardization is a slow process. It easily spans a few centuries, even under ideal circumstances in which an ethnic group had state resources and institutions at its disposal specifically to develop its vernacular. Lexicography, in the monolingual edition, entails three steps: collecting words, “tokenizing” or making a tangible representation of words, and making an entry of these words (Brown Reference Brown2006b, p. 113). For each entry, lexicographers need to complete several tasks: (1) orthography (the rules about spelling); (2) guidance on pronunciation; (3) the classification of word-class (e.g., noun, verb); (4) definition making; (5) examples; (6) phraseology; (7) disputed points of usage; and (8) etymology and word histories (Brown Reference Brown2006b, p. 113).
This is an ideal level of progress for language standardization. Codified tongues play a central role in commanding authority, prestige, and legitimacy when ethnic entrepreneurs seek to mobilize the community and invoke a distinct, “national” identity. Language can have these functions. Advocates for codification typically used the rationale to “purify” the vernacular, separated from the words which were characterized as “dirt” or “chaff,” as their etymology could be traced to foreign tongues (Burke Reference Burke2004, pp. 144–50). Such mingling occurred because pre-standardized vernaculars were a mix of “indigenous” words which originated from the local language and “foreign” words which were imported from other locales as a result of trade or geographical proximity, or evolved from older languages like Latin. Standardization has this distinct attribute in comparison to other dimensions of language development such as grammar-making, which primarily concerns setting up the structural rules that govern the language.
The two processes I hypothesized earlier each have a distinct route to language standardization. First, the selection process stresses state capacity to promote vernacular codification. Ethnic groups with a sovereign state started in the late sixteenth century to establish a state-funded academy devoted to studying the vernacular, following the model of the Italian academy, Accademia della Crusca, founded in Florence in 1584. The state-centric approach is historically a phenomenon of continental Europe. Ethnic groups that adopted this model include the French, Spanish, Portuguese, Swedish, Russian, and Danish, among others. Their centuries-long effort culminated in delivering monolingual dictionaries as the authoritative source of language use in the nineteenth and twentieth centuries (the Italian dictionary was published in 1861).
The second process focuses on actors' conscious choice to develop the vernacular. Given the high fixed costs of codification, ethnic groups have to secure access to patrons. Such an investment grants specialists time to concentrate on their project. The codification of the English language fits this model. Unable to secure royal patronage, Samuel Johnson managed to make contracts with five booksellers, for the sum of £1,575, to cover the expenses (Brown Reference Brown2006a, p. 130).Footnote 13 Johnson's Dictionary of the English Language, published in 1755, has been widely regarded as the standard-bearer. This illustration points to a language standardization process alternative to the state-centric one.
EMPIRICAL STRATEGY
I hypothesize that the variation on language standardization depends on the timing of the acquisition of the printing press for each ethnic group. Early adoption affords time to make literacy more accessible and develop a richer lexicography, by creating a specialized state agency or seeking patronage to cover the fixed cost of the codification effort. By contrast, late adopters of the press do not have this luxury of time. If they already possess a state and resources, they might not have as strong an incentive to codify their own vernacular as early adopters. Latecomers could opt to accept the more developed language of one of their neighbors as a solution, which likely reduces the possibility to construct a distinct group of their own.
To test my argument, I constructed a new data set that contains information on various dimensions related to language standardization. It is unique in that I collect historical data on European ethnic groups as the unit of analysis. The data set is composed of 171 existing ethnic groups in Europe and that are observed for the period between 1400 and 2000 ce.Footnote 14 I draw primarily on Minahan (Reference Minahan2000) for entries on these ethnic groups and add others referred to, but not entered, in Minahan's volume. I double-check ethnic groups against Ethnologue compiled by M. Paul Lewis, Gary F. Simons, and Charles D. Fennig (Reference Ashraf and Galor2013). The list of 171 ethnic groups is given in the Online Appendix. The starting point is 1400 and I record data at the 50-year interval, unless otherwise noted. To enable the observation of covariates at the ethnic-group level, I locate a “homeland” city for each group and find information on political, economic, social, and geographical dimensions for these cities. I regard these observations as specific attributes for ethnic groups.Footnote 15 To date, this is one of the most detailed data sets on European ethnicity and nationalism. Figure 3 displays the geographical location of the homeland cities for 171 European ethnic groups.
The outcome variable is the timing of language codification. As I defined the term “nation” by the standardization of cultural attributes for ethnic groups, this coding strategy allows me to capture the standardization process of a core attribute of ethnicity, language, and draw implications for how ethnic groups consolidate their cultural practices. I operationalize the timing of language standardization by taking the first publication year of a comprehensive vernacular dictionary for each ethnic group. The qualifier “comprehensive” refers to a “modern,” monoglot dictionary that aims to cover most, if not all, words in the alphabet of the given language and one judged as setting a linguistic standard by scholars in relevant fields such as lexicography, linguistics, and cultural history. As Sidney I. Landau (Reference Landau2001) points out, these comprehensive dictionaries are distinct in purpose, substance, and scope from glossaries and encyclopedia, although the latter may often bear the label “dictionary” in the title.Footnote 16 In a similar vein, this qualifier excludes polyglot dictionaries such as the premodern translation works between a vernacular and Latin. Wherever possible, I try to find a dictionary specifically complied for minority ethnic groups whose vernaculars today may be classified as “dialects.”Footnote 17 The dictionary data rely mainly on Peter Burke (Reference Burke2004), Andrew Dalby (Reference Dalby1998), and Glanville Price (Reference Price1998). For some ethnic groups in the Caucasus, I use Sebastian Nordhoff, Harald Hammarström, Robert Forkel, et al. (Reference Nordhoff, Hammarström and Forkel2013). As Table 1 shows, only a quarter of the observed data (n = 26) achieved standardization by the twentieth century, while the rest codified their tongue within the last century, especially the latter half. The data suggests that language standardization is a modern phenomenon. For many ethnic groups that are demographically small and stateless, it is still an ongoing project. Figure 4 exhibits the geographical distribution of vernacular dictionaries between 1800–2000.
Source: See the Empirical Strategy section for the description of each variable.
The main predictor is the timing of acquisition of the movable-type printing press. I operationalize it by recording the first date that the technology arrived in the homeland city for each ethnic group. In this process, I make a distinction between the year when the Gutenberg press was adopted in a homeland and the year when vernacular books were printed. These two sets of years are not always identical and the latter typically occurs later. In this article, I choose the former because one of the empirical goals is to assess the effect of accumulated time of technological access for a given ethnic group. The print variable comes from various sources, but primarily from Colin Clair (Reference Clair1976) and Lucien Febvre and Henri-Jean Martin (1976), which cover Western, Southern, and Central Europe.Footnote 18 For the ethnic groups in the German-speaking area, I use Christoph Reske (Reference Reske2007).
I assess the relationship between the printing press and a vernacular dictionary by taking the following steps. First, I evaluate the demand-side mechanism, in which social and economic developments shape demand for the printing press. As discussed earlier, printers were essentially capitalists who were willing to go to locales likely to yield a higher return on their investments. This mechanism captures whether such pre-Gutenberg press activities determine the acquisition of the press and subsequent language standardization. Drawing from recent research in economic and political history, I control for the following set of variables. The first is the university. Universities would benefit from an ability to mass-produce books and other printed material to promote literacy. I rely on Henry C. Darby and Harold Fullard (Reference Darby and Fullard1970) and Walter Rüegg (1992–2011, 4 Volumes) on the history of European universities to obtain the founding date. The second control is the bishopric. Like universities, the printing press would make the proselytization effort easier and efficient, with the capacity to print pamphlets, posters, and booklets on a large scale. I use David M. Chaney (Reference Chaney2015) for the establishment year of a diocese or archdiocese. For the university and bishopric, I record the year of foundation. The third control is urban potential. It was originally constructed by Jan de Vries (Reference de Vries1984) as the sum of the population in the given period divided by the geographical distance between a city and all others in my data set.Footnote 19 This variable gives a sense of whether a city is surrounded by competing urbanizing towns or located in a more sparsely-populated area. Higher values indicate greater potential for urbanization. The standard source on population size in preindustrial Europe is Paul Bairoch, Jean Batou, and Pierre Chévre (Reference Bairoch, Batou and Chévre1988), which covers the period 800 through 1850 for hundreds of European cities. Maarten Bosker, Eltjo Buringh, and Jan Luiten van Zanden (Reference Bosker, Buringh and Luiten van Zanden2013) correct some of the data in Bairoch, Batou, and Chévre, and I follow their updates in compiling my data. For the period after 1850, I use several statistical handbooks including Brian R. Mitchell (Reference Mitchell2003a, Reference Mitchell2003b). The population data are available at the 100-year interval, so I take the average to compute values for 50-year periods.
A second set of controls concerns the supply-side mechanism. These variables can accelerate or delay language rationalization. A major covariate is war. Early-modern European history is characterized by recurring warfare, in which growing costs of fighting and preparing for war largely determined how to organize state entities most effectively (Bean Reference Bean1973; Tilly Reference Tilly and Tilly1975, Reference Tilly and Evans1985, Reference Tilly1992). A unitary language may emerge as desirable in this process. If tax collectors and subjects who are taxed communicate in a mutually intelligible tongue, this renders the administration of raising resources and manpower more effective. War can, therefore, serve as a catalyst that spurs the incentive for language rationalization. I draw on the database compiled by Peter Brecke (Reference Brecke1999) for war-related data. It records any conflict in the world with the minimum of 32 casualties in the period between 1400 and 2000. As Brecke uses states as the unit of analysis, I take care to localize the incidence of war at the ethnic-group level to the extent possible. If, for example, a war took place in Scotland prior to union with England, I regard it as having an impact on the Scots but not others in their neighbor (the English, Welsh). However, my motive here is to capture the institutional effect of war on taxation and governance. Therefore, if no mention is made about the place of a given war, I determine that war uniformly affects the residents in the country. For instance, if a war occurred in the Habsburg Empire, I regard that war as affecting all ethnic groups within the imperial domain. I construct war frequency, which measures how often an ethnic group experiences war in any 50-year period, to capture war's impact.
Another supply-side covariate is overseas trade. If an ethnic group's homeland city is located on or near the coast, oceanic trade can promote language standardization for greater efficiency. In European history, geography played an important role in economic activity.Footnote 20 Because roads for overland trade were not well-paved and thus proved unreliable, access to ports provided a critical precondition for economic growth. This access, in turn, may provide incentives for a unitary language. I use Christos Nüssli (Reference Nüssli2011) to produce an indicator taking the value of one if an ethnic group's homeland lies on an oceanic coast.Footnote 21 While access to trade is favorable to growth, other geographical conditions may have the opposite effect. In particular, “bad terrain” may prove prohibitively costly for undertaking vernacular codification. To assess this impact, I include a series of time-invariant measures. One is a set of variables such as terrain ruggedness and land elevation above the sea level for each ethnic group. I obtain these observations from the Global Land One-kilometer Base Elevation project (GLOBE) database (GLOBE Task Team and others 1999).Footnote 22 Related is a measure on island, an indicator taking the value of one if an ethnic group's homeland is on an island. All these geographical variables are intended to capture different dimensions of geography's impact.
In addition, I include indicators on the Russian, Ottoman, and Habsburg Empires based on Nüssli (Reference Nüssli2011). They take the value of one if an ethnic locale was under any of these polities at the beginning of the century. These are designed to capture the institutional impact on the acquisition of the printing press. Historians indicate that Russians and Ottomans in particular had centralized control over private, vernacular print until the eighteenth century.Footnote 23 Thus it is expected that the ethnic groups under Russian or Ottoman rule would be late in adopting the printing press and consequently in standardizing their vernacular.
Finally, I use two indicators to capture potential long-term institutional effects. One is a set of variables for the Roman Empire's influence. The Romans built roads between cities, which gave these cities an opportunity to develop institutional capacity in economic activity and political organization. I use Richard Talbert (Reference Talbert2000), Nicholas G.L. Hammond (Reference Hammond1981), Johan Åhlfeldt (Reference Åhlfeldt2015), and Pleiades (2015) to collect information on the Roman legacy. More specifically, I create an indicator taking the value of one if a city had major or minor Roman roads.Footnote 24 Another institutional effect I control for is the impact of the Protestant Reformation. The data set includes two measures. One is the measure on geographical distance to Wittenberg or Zürich, two crucial cities in understanding the movement. Following Steven Pfaff and Katie E. Corcoran (Reference Pfaff and Corcoran2012), I construct this variable by taking the nearest distance between the given ethnic homeland and either of the Protestant centers. It is important to note that this variable captures not the impact of the confessional movement, but the magnitude of the religious reform on institutions. One of the Reformation's consequences is that cities and other localities were forced to enact institutional reforms to reduce tension after being forced to choose a side. One example is education reform for the upper-class and university education (Gorski Reference Gorski2003, p. 19). This implies that the Reformation could promote vernacular literacy and a demand for a linguistic standard. Proximate location to Wittenberg or Zürich indicates a greater impact of such institutional reform. The second measure on the Reformation is a fixed effect on the predominant religious preference (Catholicism, Protestantism, Orthodoxy, or Islam) for each group.Footnote 25 Geographical indicators for the region are also included in the data set.
ESTIMATION RESULTS
Baseline Regression
To begin, I document the baseline correlation between the printing press (the explanatory variable) and vernacular dictionaries (the outcome variable). The bivariate relationship is shown in Table 2. Of the 171 observations, approximately half (48.5 percent) acquired the printing press and achieved vernacular codification. For those ethnic groups that have the recorded date of print adoption, 86 percent standardized the vernacular by 2000 (the cutoff year for this study). By contrast, for those observations without access to the technology, 72 percent had not achieved standardization by 2000. It is important to highlight that language standardization is an ongoing project. Although 67 groups did not publish monolingual dictionaries and are defined as “unstandardized,” they may be able to codify their vernacular in the future.
Second, I discuss the timing of printing press adoption. Of the 83 ethnic groups that got the press and language standardization, elapsed time between the two is, on average, 363 years (the median is 396 years). There is a huge lag because 75 percent of those groups that acquired the press did so by the early seventeenth century, while 75 percent of those that standardized language did so in the twentieth century. These two pieces of descriptive evidence provide preliminary support for my hypotheses that the adoption of the printing press is positively linked to language standardization and that, given this lag, the early adoption of presses is crucial to it.
To test my argument more systematically, this article employs the following estimation strategies. First, I use the Cox proportional hazard model to examine the effect of time-varying covariates on vernacular codification. The key advantages of the Cox model include the assumption that the baseline hazard rate does not follow a particular distribution. Instead, the duration times are parameterized in terms of a given set of covariates (Box-Steffensmeier and Jones Reference Box-Steffensmeier and Jones2004, p. 49). The positive coefficients are interpreted as increasing hazards for an event of interest as a function of the covariates, namely increasing the chances of language standardization for ethnic groups. By contrast, the negative coefficients are interpreted as decreasing hazards, meaning that the chances of language rationalization decrease. More specifically, I estimate the following reduced-form model:
Here h i (t) is the codification of vernacular dictionaries for ethnic group i at time t. This event of interest is parameterized by whether ethnic groups acquired the printing press (β's), a vector of covariates γX i , and a set of fixed effects δ for empires, geography, and religion. h 0(t) is the baseline hazard but dropped in estimation. In addition to the Cox model, I use logistic regression and estimate the same specifications as robustness checks.Footnote 26
Table 3 documents regression estimates for the Cox and logistic models, each in four sets of specifications: (1) the bivariate model; (2) the model with demand-side covariates; (3) the model with supply-side covariates (the demand-side ones included); and (4) the fully-specified model that includes all the controls and fixed effects. The bivariate model suggests that if ethnic groups acquire a printing press in the homeland, the chances that they standardize the vernacular is 8.66 times greater than those without access to the technology (the value of the hazard ratio obtained by exp(2.159) in Model 1). The result holds when the demand-side covariates are introduced. The magnitude of the hazard ratio gets attenuated to 4.95 in Model 3, with the founding of a university significantly positively correlated with language standardization. Other specifications exhibit a similar pattern. The supply-side scenario indicates that rough terrain and war are expected to have a significantly negative impact on language standardization, although the substantive effect of the press remains approximately the same at 4.29 (Model 5). Finally, the magnitude of the press is invariant to a host of fixed effects on imperial rule, religion, and geographical region. In all specifications, the adoption of print technology is significantly positively correlated with vernacular codification and the result is robust to the inclusion of a host of covariates and fixed effects.
∗ = Significant at the 10 percent level.
∗∗ = Significant at the 5 percent level.
∗∗∗ = Significant at the 1 percent level.
Notes: Robust standard errors clustered by ethnic groups for all models. Intercept in the logistic models is not reported. Western Europe and Catholic are used as the reference category, respectively, for geographic region fixed effects and for religion fixed effects so they are omitted. Full results are reported in the Online Appendix.
Source: See the Empirical Strategy section.
A second hypothesis is that the early acquisition of the printing press is crucial for the early codification of the vernacular. To test this proposition, I create counterfactual scenarios where ethnic groups acquired the press at the different timings and estimate the cumulative hazard for language standardization. I set three scenarios in the 150-year interval, 1500, 1650, and 1800 ce, and estimate the cumulative hazard from the fully-specified Cox model. Cumulative hazard rates denote the total amount of risk that has been accumulated over time. This exercise shows that the advantage of early adoption become discernible over time. The cumulative hazard rate reaches 0.2 for early adopters (presses by 1500), meaning that the chances of language standardization are estimated to add up to 0.2 if ethnic groups acquired print by 1500. The estimated cumulative hazard is 0.16 for “mid”-adopters (presses by 1650) and 0.14 for late adopters (presses by 1800). Though the differences in magnitude are not large, the simulation offers two insights. First, early technology adoption can trigger knowledge accumulation. Vernacular printing leaves a written record of cultural practices for ethnic groups. The stock of knowledge in a unique language grows over time through access to concrete, verifiable information in printed material (Mokyr Reference Mokyr2002, Reference Mokyr, Aghion and Durlauf2005). Printing technology can facilitate cultural innovation by lowering the costs of writing novels, mythologies, and history, which in turn consolidates language. Second, this simulation suggests that increasing efficiency in technology may not permit quick catch-up in cultural rationalization for late adopters. Although the costs associated with printing were expected to have fallen over this period, the challenge of assembling knowledge on culture does not appear to be quickly overcome. Estimates from these scenarios indicate that the timing of acquiring the enabling technology matters for understanding language standardization.
Robustness Checks: The Role of the State
Regression estimates so far supported my hypotheses that the adoption of printing technology is positively correlated with language standardization and that the timing of adoption is crucial. However, there are two endogeneity concerns to my causal claims. One is the impact of independent statehood. This channel represents the selection process, in which ethnic groups with access to an independent state were disproportionately endowed with resources and capabilities that accelerate language standardization. This process could have begun before 1400 ce, the year when my analysis time starts, and print adoption is endogenous to this process. For instance, although the literature suggests that the spread of the press depended primarily on skilled workers (Cipolla Reference Cipolla1972), an early start on building state capacity in taxation or rule enforcement can determine the demand for the printing press, because the technology can make governance more efficient. It can thus be imagined that the acquisition of print technology is a function of having an independent state. To address this concern, I employ the following empirical strategy. I begin by constructing an alternative data set with the state as the unit of analysis (n = 47). Within this sample, I recode the printing variable by identifying its first adoption year for any city in each state. I then reestimate the impact of printing technology on vernacular codification. For states with multiple official languages, I use the first publication date of a vernacular dictionary for each tongue and use it as an alternative outcome (my list has up to four official languages). The Online Appendix offers the list of the new sample with key variables. Table 4 presents the Cox regression estimates for the state sample.
∗ = Significant at the 10 percent level.
∗∗ = Significant at the 5 percent level.
∗∗∗ = Significant at the 1 percent level.
Notes: Robust standard errors clustered by ethnic groups for all models. Western Europe and Catholic are used as the reference category, respectively, for geographic region fixed effects and for religion fixed effects so they are omitted. Full results are reported in the Online Appendix.
Source: See Figure 1 for the sources of the printing press.
All models include the covariates from the fully-specified model. The impact of the first printing press on language standardization remains significantly positive. The magnitude of coefficients is stable across alternative lists of official languages and much greater than that in the ethnic-group sample. This is unsurprising, because the vast majority of 47 states have the recorded date of print adoption and have standardized the vernacular. At the same time, if unobserved state-related forces determine language standardization, their impact on those ethnic groups without access to the state is expected to be statistically indistinguishable from zero. Column 5 of Table 4 reports estimates of Cox regression for these groups (n = 124). Although the substantive effect of print technology is much smaller than in the state-only sample, the results remain similar to Table 3. This simulation suggests that while the selection process in which the adoption of the press is a function of having a state may be at work, the unobserved state capacity is unlikely to determine the chances of language standardization.
Robustness Checks: Human Capital's Impact
The second endogeneity concern is the influence of human capital on the development of the vernacular. While the urbanization potential is controlled for as a proxy for economic growth, a more specific channel through human capital, or individual-level literacy, may account for language standardization. In the literature on economic history and on modern-era growth, strong cognitive skills can enhance economic well-being.Footnote 27 Increase in literacy is hypothesized to have a spillover effect on an investment in the vernacular for greater efficiency in communication, writing, and business transactions within ethnic-group members. In the context of early-modern Europe, a major driver for human capital is the Protestant Reformation (Becker and Woessmann Reference Becker and Woessmann2009, Reference Becker, Woessmann and McCleary2011; Boppart, Falkinger, and Grossmann Reference Boppart, Falkinger and Grossmann2014; Cantoni, Dittmar, and Yuchtman Reference Cantoni, Dittmar and Yuchtman2016).Footnote 28 The religious movement prodded lay followers to read the Bible in the vernacular. Literate reformers also issued church ordinances to reinforce lay literacy (Dittmar and Meisenzahl Reference Dittmar and Meisenzahl2016). Recent research demonstrates that an (early) access to the printing press played a critical role, because campaigners of the religious reform would take advantage of the technology's capacity to mass-produce, disseminate information, and canvass support through the vernacular Bible, ordinances, and broadsheets (Rubin Reference Rubin2014). If the human capital hypothesis is correct, ethnic groups that observe Protestantism as the primary religion are expected to exhibit high literacy, which in turn is positively correlated with language standardization. The adoption of the printing press may be endogenous to this process.
This article has already addressed the impact of the Protestant Reformation on print and vernacular codification by including the religion fixed effects and taking the shorter distance from an ethnic group homeland to either Wittenberg or Zürich. Yet these measures may be too broad to capture the human capital channel precisely.
To account for this channel, I construct the vernacular Bible indicator. It is an appropriate proxy for a source of human capital in early-modern Europe, because it captures the Protestant advocacy of lay literacy built on the access to printing technology. This variable takes the first publication date (year) of the vernacular translation of the Bible drawn from Ethnologue which is compiled by Lewis, Simons, and Fennig (Reference Lewis, Simons and Fennig2013). The data is also supplemented by Price (Reference Price1998). As with the dictionary publication, the year 2000 is used as the cutoff. My strategy is, first, to estimate the impact of the vernacular Bible in the fully-specified Cox and logistic models minus the print variable (i.e., Columns 7 and 8 in Table 3, respectively). Second, I reintroduce the print variable with the Bible variable included. If the human capital channel determines the chances of language standardization, the vernacular Bible should be positively associated with language standardization. Moreover, the substantive impact of the printing press is expected to be statistically indistinguishable from zero, while the Bible's effect should be retained.
Table 5 documents the role of human capital in language standardization. As before, all models are fully-specified ones including the fixed effects. Table 5 shows that the vernacular Bible is significantly positively correlated with vernacular codification in all models. In Model 1, ethnic groups with the Bible have 1.7 times greater chances of language standardization than those without it. When the press is included in Model 3, the magnitude gets attenuated (with the chances of the event now 1.59 times) but stays positive. Yet printing technology exhibits the much greater magnitude in the same model, in which the press increases the chances of language standardization by 4 times. Even when compared to fully-specified Model 7 of Table 3, the hazard ratio for print drops only slightly by 0.26. Although the human capital channel may account for why ethnic groups standardize the vernacular, the printing press hypothesis remains robust and exhibits a greater impact.
∗ = Significant at the 10 percent level.
∗∗ = Significant at the 5 percent level.
∗∗∗ = Significant at the 1 percent level.
Notes: Robust standard errors clustered by ethnic groups for all models. Intercept in the logistic models is not reported. Western Europe and Catholic are used as the reference category, respectively, for geographic region fixed effects and for religion fixed effects so they are omitted. Full results are reported in the Online Appendix.
Source: See Figure 1 for the sources of the printing press and see Lewis, Simons, and Fennig (Reference Lewis, Simons and Fennig2013) and Price (Reference Price1998) for the vernacular Bible.
To disentangle the regression results from Table 5, it is useful to revisit the sequence of historical events. The invention of the movable type preceded not just the Protestant Reformation but also Bible translation for all cases in my observations. Many European ethnic groups enjoyed a vernacular print earlier than the translation of the Bible promoted by Protestants. Model 3 of Table 5 shows that the changes in the hazard ratio of the press is 2.53 points greater than those of the Bible, pointing to the importance of earlier adoption of the press for European ethnic groups. The chronological order matters when considering the differences in the magnitude of these different channels.
Distance to Mainz as an Instrumental Variable
The regression analysis thus far documents the positive association between the printing press and language standardization, which is robust to the inclusion of a host of covariates under various scenarios. However, as previously mentioned, the distribution of the printing press is not random. Unobserved (pre-press) characteristics may drive the technology's spread or jointly determine press adoption and language standardization. To account for this endogeneity concern, I follow Jeremiah E. Dittmar (Reference Dittmar2011) and Jared Rubin (Reference Rubin2014) in exploiting the exogenous variation of the geographical distance to Mainz, Germany, as an instrumental variable for the homelands of European ethnic groups.
Distance to Mainz is an ideal instrument for my argument. Such an instrument should be correlated with the printing press, but not with vernacular dictionaries. At the same time, it should affect the outcome variable only through the proposed causal mechanism. The distance-to-Mainz variable satisfies these criteria. To begin, it has been shown that the Gutenberg press diffused through Europe in roughly a concentric-circle fashion. This describes not only the spread of the press, but also the patterns of human interactions in early-modern Europe more generally. Dittmar (Reference Dittmar2011) points out that Gutenberg and his collaborators jealously shielded the proprietary knowledge of the technology. Only nearly a century after the invention was the earliest known manual on the metal type published (Dittmar Reference Dittmar2011, p. 1140). Geographical proximity thus offers greater chances of accessing information. More broadly, premodern times were characterized by the “small world,” in which the distribution of technology or information occurred in the concentric circle manner. Using a mathematical model, Seth A. Marvel, Travis Martin, Charles R. Doering, et al. (Reference Marvel, Martin and Doering2013) show that the bubonic plague, which triggered the Black Death and decimated approximately the two-thirds of the European population in the mid-fourteenth century, spread in this manner. Hence, the closer an ethnic group's homeland is to Mainz, the more likely it is to adopt the printing press. Figure 5, albeit not perfect, broadly supports this statement.
The second rationale for using this instrument is the absence of a theoretical connection between Mainz and language standardization. No known ethnic groups have identified Mainz as their homeland city or chosen it by its geographical proximity to Mainz; nor did the city play a role in the diffusion of the vernacular use. Similarly, Rubin (Reference Rubin2014) demonstrates that Mainz was not a political, economic, or religious center before the printing press. That Mainz was an ordinary town also implies that it is unlikely to be connected to war or economic growth. The invention may have fueled the pace of growth, but the location of Mainz is unlikely to predict it. Recent empirical works also exploit the city's exogeneity in their estimation methods for outcomes such as the Protestant Reformation and economic growth in early-modern Europe (Dittmar Reference Dittmar2011; Rubin Reference Rubin2014).
To estimate the effect of the distance-to-Mainz variable as the instrument, I estimate the IV probit model with the following system of equations:
I use ordinary least square (OLS) to estimate the first stage where the outcome variable is the adoption of the printing press. In the second stage, I use a probit regression, where Φ denotes the normal cdf, by including the predicted values of the first stage model for the press. The model includes all controls from the fully specified model in Table 3, but for the first stage I recalculated the urban potential and war frequency variables to confine their effects to the fifteenth century. I do this to see if Mainz and its distance to the ethnic homelands were correlated with these variables prior to the Gutenberg invention. In addition, I omit the distance to Wittenberg or Zürich and replaced the Protestant with Catholic fixed effects for the first stage to follow the chronological sequence of events.Footnote 29 This calibration also allows me to assess whether Mainz was an economic center or underwent conflict before or around the time when the printing press was invented. In the second stage, I use the predicted values of the first-stage regression to estimate language standardization with the same set of covariates, including the distance to Wittenberg or Zürich and the Protestant dummy reintroduced. To see the impact of the printing press on early standardization, I subset the data set in accordance with vernacular codification (1) by 1850, (2) by 1900, (3) by 1950, and (4) by 2000.
Table 6 documents the results from IV probit regression. It indicates that the impact of the printing press on language standardization is positive and largely significant, when instrumented for the distance to Mainz. In the first stage, the distance-to-Mainz variable is negative and significant. The sign of the coefficient means that the closer ethnic groups' homeland is located to Mainz, the more likely they are to acquire the print technology. This result is consistent with the empirical literature that uses this instrument. The F-statistic on the weak instrument is approximately 25, which is above the conventional threshold of 10. In the second stage, the magnitude of the printing press is consistently in the expected direction. For the period 1850 and 1900, the printing press is positive, but not significant. This is largely because of the lack of variation in the outcome variable: by 1850 there were only seven vernacular dictionaries and by 1900, 27. But standard errors shrink as data in the outcome variable for each period get richer. Seventy-five percent of vernacular codification took place during the twentieth century. The coefficients of the print variable after 1900 become significant in the second-stage regression, reflecting the underlying data. Notably, the magnitude remains quite large: access to printing technology increases the probability of producing vernacular dictionaries by 400–470 percentage points in the twentieth century. Despite the lack of data for early periods, the instrumental-variables approach provides additional support for my hypothesis that the printing press predicts language standardization for European ethnic groups.
∗ = Significant at the 10 percent level.
∗∗ = Significant at the 5 percent level.
∗∗∗ = Significant at the 1 percent level.
Notes: Robust standard errors clustered by ethnic groups. IV probit estimation: first stage is OLS, second stage is probit, regressed on predicted values from the first stage. In the first stage, urban potential and war frequency are for the fourteenth centuries. Western Europe is used as the reference category for region fixed effects and thus omitted. In the second stage, Catholic is used as the reference category for religion fixed effects and thus omitted. Full results are reported in the Online Appendix.
Source: See the Empirical Strategy section.
CONCLUSION
I have systematically investigated the association between the printing press and the standardization of the vernacular for European ethnic groups from a long-term historical perspective. I argued that the Gutenberg press substantially reduced the cost of access to information, thus enabling vernaculars to be more popularly used and eventually win over Latin as the primary vehicle of written communication for political, economic, and social transactions. Moreover, I hypothesized that since language codification takes a long time in the order of centuries, the early acquisition of the press should give ethnic groups a head-start to develop their tongue. Using a new data set I constructed, this article has demonstrated that the time between the press and vernacular dictionaries takes 360 years on average. Statistical analysis confirms my hypotheses that print technology is positively and significantly associated with language standardization. It also supports my broader arguments that variation in cultural consolidation for ethnic groups is not solely attributed to territorial sovereignty and that historical events have a persistent impact on contemporary outcomes.
To what extent does my argument carry outside Europe? Although such an analysis is beyond the scope of this study, it seems that there are a few Europe-specific attributes relevant to language standardization. African experience, for instance, provides a useful comparative perspective. One similarity is that consistent with the European experience, recent empirical research documents that (early) access to technologies that enable human-capital development is positively associated with vernacular codification and other outcomes such as democracy in the long-run (Cagé and Rueda Reference Cagé and Rueda2016; Woodberry Reference Woodberry2012). However, unlike in Europe, access to printing technology in sub-Saharan Africa beginning in the nineteenth century was largely limited by proximity to the Protestant missionaries; in addition, there was no indigenous capacity to produce the movable-type press (Cagé and Rueda Reference Cagé and Rueda2016, pp. 73, 74). A reliance on imports suggests that the cost of access to information would remain high, leaving less room for literacy in general and the development of vernacular culture more specifically. European colonialism likely reinforced this trend. By comparison, early-modern Europe had an environment in which printers moved across Europe to spread the technology and there were multiple routes to gain access to it. This article has sought to demonstrate that such conditions were critical to standardization for many minority tongues in Europe that have survived despite institutional centralization in modern times.