By 1900, the rail network in colonial India was the fourth largest in the world, covering almost 25,000 miles (Bogart and Chaudhary Reference Bogart and Chaudhary2016). In striking contrast, public education was poorly funded and saw marginal progress under British rule. Education was an insignificant line item in the government budget—a mere 1.7 percent compared to 21 percent for railroads in 1881 (East India 1887). And, in 1891, only 9.6 percent of primary school-age children were in school (Chaudhary 2016). According to official opinion, demand for basic education was low in India, where children helped parents in the field (Chaudhary 2016). By increasing trade, income, and other labor market opportunities, railways may have increased demand for schooling, even in the absence of supply-side government interventions. Our paper asks whether there was a demand-driven increase in education in colonial India in response to the extension of the rail network.
Using decennial census data on literacy from 1881 to 1921, we estimate the effect of railroads on total, male, female, and English literacy at the district level. Railroad construction began in the 1850s, with 52 percent of British Indian districts connected to a railroad by 1881. This increased to 87 percent in 1901 and then 96 percent in 1921. Since literacy in the early censuses (1881 to 1901) cannot be compared to each other or to later censuses due to changes in enumeration, we use two strategies to identify the effect of railroads. The first exploits panel-like variation across birth cohorts within a given district in a given census year. The second exploits cross-sectional variation across districts in a given census year.
Our first approach estimates the differential effect of exposure to railroads across cohorts within districts using the 1911 and 1921 censuses, years with comparable literacy data. The cohorts in our data are age bins for which the census reports literacy—0 to 10, 10 to 15, 15 to 20, and 20 and above. Since 94 percent of districts are connected to the railroad by 1911, we construct railroad exposure as the cumulative number of years a railroad was present in a district before the youngest member of the cohort of interest reached age 6, the start of primary school. Using this measure in a panel framework, we include district fixed effects, cohort × province, and census year × province fixed effects. Such rich fixed effects control for time-invariant district characteristics and provincial and national factors that affect cohort literacy flexibly over time.
Our second cross-sectional approach uses two instruments that exploit cross-sectional variation in the years of railroad exposure in each census between 1881 and 1921. Building on recent techniques in the transportation literature (Redding and Turner Reference Redding and Turner2015), we construct one instrument using an 1852 plan that predates railway construction and favors low-cost routes over gentle terrain compared to direct routes (Davidson Reference Davidson1868). Our second instrument exploits military reasons for building railroads by measuring the distance between a district and a tree connecting 67 military cantonments circa 1864, before major railroad expansion began. Military cantonments were located in places at moderate elevation and away from ravines where the enemy could hide. Our exclusion restriction assumes distance to military cantonments, and the lines in the 1852 plan only affect literacy via railroads and are uncorrelated with unobserved determinants of literacy once we control for observable differences in geography, crop suitability, pre-railroad urbanization, and religion across districts.
We find positive and significant effects of railroads on literacy, in particular male and English literacy, in the synthetic panel regressions. A standard deviation increase in railroad exposure (17 years) increases total literacy by 0.29 standard deviations for total, 0.31 for males, and 0.25 for male English literacy. We find small and insignificant effects on female literacy. In our cross-sectional regressions, we find positive and significant effects of railroad exposure. Standardized coefficients suggest effect sizes ranging from 0.07 to 0.48 standard deviations, depending on the measure of literacy, the census year used, and the specific statistical model. Are these effects large? Our effect sizes are modest compared to comparable estimates from the nineteenth-century United States (Atack, Margo, and Perlman Reference Atack, Margo and Perlman2012). The effects are also modest if we compare them to the impacts of colonial supply-side investments in education.
Why did railroads increase education? The proximate mechanism is higher school enrollment. Using data on primary and secondary school enrollment, we find a one standard deviation increase in railroad exposure increases secondary enrollment by 0.2 to 0.55 standard deviations, with larger standardized effects in panel models compared to the cross-sectional regressions. We find small and insignificant effects on primary enrollment. One interpretation consistent with these results is that railroads changed not whether children initially enrolled in school, but rather how long they remained in school. Another interpretation arises from a data limitation. Many “secondary” schools were in fact elite schools with attached primary classes, and so our results may be driven by more enrollment in elite primary classes.
What deeper mechanisms link railroads with greater secondary schooling and literacy? We offer tentative answers using OLS mediation techniques (Imai, Keele, and Tingley Reference Imai, Keele and Tingley2010; Imai et al. Reference Imai, Keele, Tingley and Yamamoto2011).Footnote 1 Past work has shown that railroads increased agricultural income (Donaldson Reference Donaldson2018), which, in theory, can increase literacy if schooling is a normal good. Yet, we find that agricultural income is not a significant mediator. Rather, income taxes, urbanization, and service sector employment are key mediators by which railroads increase literacy and enrollment. Because we cannot disaggregate the mediating effects of rising non-agricultural income, the relaxation of income constraints for families, and increasing returns to literacy, we view these results as suggestive evidence that railroads increased the demand for education via non-agricultural channels.
Our paper contributes to three literatures. First, we contribute to the rich economic history literature on Indian railroads. Much has been written about the effects of colonial railroads on trade, with studies showing large effects on price convergence and income (e.g., Studer Reference Studer2008; Donaldson Reference Donaldson2018), small positive effects on city growth (Fenske, Kala, and Wei Reference Fenske, Kala and Wei2023), ambiguous effects on cropping patterns, and null effects on wage convergence (Collins Reference Collins1999). Footnote 2 Indian railroads have also featured in debates on colonization. Critics argue that the financing of Indian railroads delivered excessive returns to British investors, that the network benefitted colonial interests by emphasizing port-to-interior connections over interior-to-interior connections, and worsened the negative consequences of famines (Dubey Reference Dubey1965; Satya Reference Satya2008). In this view, railroads did not industrialize India because they were built to benefit the Empire. An alternative view argues that, although railroads helped colonial interests, they had positive effects on income, and returns to British investors were not excessive (Bogart and Chaudhary 2019; Hurd Reference Hurd1983). We add to this literature, showing that there were positive effects on schooling, though these favored men, English literacy, and secondary enrollment.
Second, we contribute to the literature on the effects of transportation infrastructure. For example, building on classic work by Fogel (Reference Fogel1964), Donaldson and Hornbeck (Reference Donaldson and Hornbeck2016) find large effects of railroads on market integration and income. Donaldson (Reference Donaldson2018) finds colonial Indian railroads reduced transport costs and increased agricultural income, which in turn reduced real income volatility and mitigated the effects of famine (Burgess and Donaldson, Reference Burgess and Donaldson2017). Much of this work focuses on prices, trade, income, and market integration. Footnote 3 A notable exception is Atack, Margo, and Perlman (Reference Atack, Margo and Perlman2012), who study U.S. school attendance in the nineteenth century. Tang (Reference Tang2017), similarly, looks at mortality effects of railroads in Meiji Japan, while Zimran (Reference Zimran2020) examines impacts on stature in the United States. Our paper looks at the effects of historical railroads on literacy and enrollment, outcomes more commonly examined in work on recent transportation projects (roads and highways) rather than older projects (railroads). We show that the impacts of transportation infrastructure on human capital have not been limited to modern economies. Footnote 4
Third, our paper contributes to the literature on the effects of demand and supply in explaining schooling (Glewwe and Muralidharan Reference Glewwe and Muralidharan2016). Many papers estimate the effect of labor demand shifts on education in India, with positive effects due to outsourcing facilities (Jensen Reference Jensen2012) and negative effects related to the National Rural Employment Guarantee Act in India (Li and Sekhri Reference Li and Sekhri2020). These relate to larger debates on the relative efficacy of demand versus supply interventions in schooling (Banerjee and Duflo Reference Banerjee and Duflo2011). On one side, scholars argue that increasing demand for education is sufficient to increase schooling, while other scholars argue public investments are necessary to increase education in developing countries. Our paper shows that one of the biggest infrastructure expansions, railroads, had positive effects on literacy and enrollment in India. Yet, these effects are modest and hence not cost-effective if we consider them against increased public funding of education.
HISTORICAL BACKGROUND
Literacy in British India
As British rule spread in India, former indigenous schools were either replaced or incorporated into the new colonial education system. This was a slow and uneven process that was largely complete by the end of the nineteenth century (Nurullah and Naik 1951). Public education funding, which was meager in the 1850s when the British Crown took control from the East India Company, increased from 1.5 percent of the colonial budget in 1882 to 5.2 percent in 1921, a sum that was still below 1 percent of national income (Chaudhary 2016). Did the transition to colonial schooling increase literacy? Unfortunately, we cannot answer that question because there are few comparable estimates of literacy before the 1880s. We know indigenous village schools were common in the early nineteenth century. They attracted boys from different backgrounds, but few girls (Rao Reference Rao2020). Yet, these accounts offer few specifics on literacy. One notable exception is a Scottish missionary named William Adam. He estimates literacy was around 4 percent (ability to read and write) to 6 percent (ability to sign) across a handful of districts in eastern India in the 1830s (Adam and Long Reference Adam and Long1868). It is hard to extrapolate, however, from these estimates because we do not know if Adam’s districts were positively or negatively selected compared to the Indian average.
Beginning with the 1881 census, we know that male literacy increased from 6 percent in 1881 to 11 percent in 1921, while female literacy increased from under 1 percent to 2 percent (Online Appendix Table A2). Literacy increased because more children went to school. Indeed, enrollment increased faster than literacy, from one in ten children attending school in 1891 to just over one in five in 1921 (Chaudhary 2016). This is not to say people did not learn to read and write outside of formal schools. Rather, schools offered a natural venue to acquire functional literacy in a society where the majority of adults were not literate.
The increase in enrollment mirrors the increase in schools. Between 1891 and 1921, schools per 100,000 people almost doubled from 44 to 70, with publicly managed and funded schools tripling from 9 to 30, while privately managed publicly funded schools almost doubled from 26 to 41 per 100,000 people (Chaudhary Reference Chaudhary2010a). Public funding was used to increase the number of schools and reduce fees. Although public primary education was not free, fees were not a significant barrier for skilled workers. For example, annual primary school fees in 1900 represented less than 0.5 percent of the annual wages of a skilled laborer (Chaudhary 2016).
On the expenditure side, public funding of education was decentralized to provinces in the 1870s, with further decentralization of primary education to districts and municipalities in the 1880s. The decentralization led to big differences in public spending across provinces, for example, between the coastal provinces of Bombay and Bengal. Bombay received more public funds and built a network of publicly funded and managed schools charging low fees. Unlike Bombay, Bengal received fewer public funds and subsidized privately managed aided schools, charging higher fees to build their network (Chaudhary Reference Chaudhary2010a). Footnote 5 Other provincial systems fell in between those of Bombay and Bengal.
Yet, these differences in public spending across provinces did not translate into differences in outcomes, namely enrollment or literacy. The coastal provinces of Bengal, Bombay, and Madras had higher enrollment and literacy in each decade, with male literacy averaging 20 percent compared to 11 percent in the interior for Central Provinces and United Provinces. Apart from regional differences, castes ranked higher in the Hindu caste hierarchy were more literate and better educated. In comparison, literacy among lower castes, also known as depressed castes or Scheduled Castes in modern India, averaged 1.6 percent. Tribal groups had even lower literacy at under 1 percent. Footnote 6
Against this backdrop of low but varying literacy, few scholars have looked at the effects of demand shifters in explaining levels of schooling. Much of the scholarship argues that poor public funding led to low literacy, which is a reasonable conclusion given the national patterns (Chaudhary 2016). Yet, differences in public spending alone cannot explain the differences in outcomes across and within provinces. To that end, we study if and how railroads affected the demand for basic literacy by exploiting temporal and spatial variation within British India.
Railroads in Colonial India
Unlike schooling, the British were early promoters of railroads in India, building an extensive rail network. Footnote 7 The first passenger line opened in 1853, connecting Bombay to Thane, a distance of 20 miles. Prompted by mercantile interests in Britain, the early lines connected the ports of Bombay, Calcutta, and Madras to the interior. Given the few good roads and navigable rivers, British firms hoped railroads would lower the costs of exporting raw cotton from India and of importing British manufactured goods to new Indian markets (Thorner Reference Thorner1951). Indeed, the British believed that goods traffic would be significant while passenger traffic would be insignificant. They proved to be wrong, with passenger traffic accounting for 35 percent of revenues.
Indian railroads were built by British firms with British financing, albeit subsidized by a guaranteed dividend backed by the Government of India (GOI). Such firms were the main players up until the 1870s, when the GOI began to build lines. This was followed by mixed public-private partnerships in the 1880s. Such partnerships were the norm until the 1920s (Sanyal Reference Sanyal1930). Route mileage expanded quickly in the early decades, especially from 1881 to 1901. Total route miles increased from 9,890 in 1881 to 17,308 in 1891, 25,363 in 1901, 32,839 in 1911, and then 37,266 in 1921 (Bogart and Chaudhary Reference Bogart and Chaudhary2016).
Figure 1 maps the spread of the network from 1881 to 1921. The important ports were connected to the interior before 1881. Many lines crossed the densely populated Indo-Gangetic plain, with fewer interior lines on the Deccan plateau. Early proposals, such as the Kennedy plan in 1852, called for lines parallel to the coast in order to economize on costs. Some were never built because subsequent officials opted for more expensive routes cutting through mountains (Davidson Reference Davidson1868). We return to the Kennedy plan below in order to construct an instrument for route placement.
Although British firms built the railroads, the GOI dictated route placement. What guided their decisions? Military, commercial, and famine concerns were cited as the main drivers in official correspondence (Hurd Reference Hurd1983). Following the Sikh Wars in the 1840s and the Indian Mutiny in 1857, the British were cognizant of the need to transport troops and supplies across the country at low cost. Existing transport routes were of poor quality and slow, which made it necessary to station troops at multiple locations in the event of an uprising (Parliamentary Papers 1854). On the commercial side, British merchants lobbied for Indian railroads that would connect the ports to cotton-growing regions in the interior and from the eastern and western ports to Delhi in the north. Another consideration was famine. Following devastating crop failures and famines in the 1870s, the GOI built “protective lines” in famine-prone regions. Finally, a few small lines were built connecting plains to the hill stations. While not random, the railroad network across districts was not uniformly indicative of positive or negative selection in affecting the subsequent increase in literacy. Rather, a mix of factors affected where and when railroads were built. Coastal districts with important ports were connected early, as were those in the Ganga plains. Yet, a few cotton-growing interior districts were connected before 1881, as were districts closer to Afghanistan. Neither group would be considered positively selected for rail access. To address the endogeneity of railroads, we compare cohorts within districts in panel models and use an instrumental variables strategy among other cross-sectional models.
Conceptual Framework
To motivate the empirical exercise, we describe a simple framework linking railroad exposure to schooling in this sub-section. In colonial India, parents usually made the decision to send their children to school and to keep them in school. What affected this decision? Among rural households, cultivators and tradesmen sent their boys to the village primary school, as did some laborers. For cultivating families, the opportunity cost of child labor was higher because children helped their families, especially in the sowing and harvest seasons (Hartog Reference Hartog1929; Sharp Reference Sharp1914, 1919). This was less of a concern for tradesmen.
There was little legal compulsion to send your child to school. Weak compulsory schooling laws were introduced in a few towns in the late 1910s, but with rare enforcement (Nurullah and Naik 1951). Fees were common in primary schools, but as noted earlier, they were low. Scholarships were also available to defray the costs, though landless families were probably less aware of such opportunities compared to tradesmen and landed cultivators (Sharp Reference Sharp1914). If rural families wanted to send their children for more schooling beyond the local or neighboring village school, they would send them to a secondary school in a nearby town. Some urban schools had hostels, but it was more common for children attending an urban school to stay with extended family. Fewer than 10 percent of children in urban schools stayed in a hostel. Footnote 8
Learning to read, write, and count effectively was an important skill for families engaged in trade and commerce. With basic literacy, rural men could better perform their occupational tasks. They could also work as teachers and postmen in their villages, earning wages well above those of skilled labor in most provinces (Chaudhary 2016). Attending a secondary school led to further opportunities in the colonial bureaucracy, law, and professional employment. This was a draw for landed zamindar families looking to transition from rural to urban living.
Among urban households, richer families would send their children to the primary stages of middle or English high schools. Fees were higher at these schools, which nonetheless did not deter the growing demand for English education among rich and middle-class Indian families during this period (Basu Reference Basu1974). Footnote 9 Learning English was a necessary skill to secure well-paying government and other service sector jobs. Some girls would also attend urban schools, albeit with fewer job prospects. Teaching was a common occupation for literate women of poor means, most of whom would have been educated in a town. Indeed, girls from rural families were less likely to attend primary school or move to a nearby town for more schooling.
How did railroads affect this household decision-making? By reducing price dispersion and increasing trade, railroads increased agricultural incomes in colonial India (Donaldson Reference Donaldson2018). This, in turn, would have generated income and substitution effects. Increasing agricultural incomes would lead families to consume more schooling if schooling is a normal good (i.e., income effect). For rural families, higher incomes would also relax credit constraints on sending children, mostly boys, to higher-quality secondary schools.
Yet, rising agricultural incomes would increase the opportunity cost of child labor leading some families, in particular cultivators and landless laborers, to send fewer children to school. More agricultural income could also lead to the expansion of rural primary schools if land taxes increased. But there is no evidence of such a supply side response because land taxes did not increase in sync with agricultural incomes (Kumar Reference Kumar1983). Apart from agriculture, railroads also had small and positive effects on city growth (Fenske, Kala, and Wei Reference Fenske, Kala and Wei2023). This would increase the returns to education due to agglomeration and more service and professional employment. While agricultural and non-agricultural channels likely affected male literacy, non-agricultural channels may have been more important for female literacy.
Unlike transportation costs that fall immediately with the opening of a railway, many of these conceptual links from railroads to literacy take time to develop. Indeed, spillovers from railroads are likely to grow as exposure to railroads increases. For income effects to operate in the agricultural sector, channels such as price discovery (Aker Reference Aker2010), the formation of new links between buyers and sellers (Jensen and Miller Reference Jensen and Miller2018), learning about alternative crops (Munshi Reference Munshi2004), and learning about the return to education in agriculture (Foster and Rosenzweig Reference Foster and Rosenzweig1995), all involve frictions that prevent them from being immediate. They require the development of new networks and relationships. Credit constraints, similarly, would only ease gradually because rural credit markets characterized by high interest rates and scarcity (Nath Reference Nath2022) may be slow to transform.
Similarly, opportunities in the non-agricultural sector would increase gradually with a railway connection. Such growth requires firms to establish, relocate, and grow, all of which takes time. The growth of the bureaucracy in any city connected to the railway would not be immediate. Urbanization would also not increase the number of workers and consumers immediately. Furthermore, if railroads contribute to agglomeration effects, they would also affect city growth rates and, hence, the growth of urban returns to education. Such conceptual links suggest that the duration of railroad exposure is perhaps better suited to capture the effect of railroads on schooling because it allows for differential effects between a place connected to a railway for one year compared to another that is connected for ten years. A simple indicator for the presence of railroads may not capture these links.
Against this background, more exposure to colonial railroads could lead to more literacy and schooling if (1) the income effect of rising agricultural incomes exceeded the substitution effect of increasing the opportunity cost of child labor, (2) if rising incomes relaxed household income constraints of sending children to school, (3) if railroads increased opportunities in the non-agriculture sector such as in industry and services, and (4) if rising urbanization increased the returns to education.
DATA
We construct a new district-level dataset for British India from 1881 to 1921 to test the relationship between railroads and education. Footnote 10 Our data pulls information from four primary sources: (1) the decennial censuses of 1881 to 1921, which we use to measure literacy and several other control variables; (2) the 1934 edition of History of Indian Railways Constructed and in Progress, which we use to code the opening dates of the railroad; (3) the District Gazetteers of India, which we use to code primary and secondary enrollment rates; and (4) multiple sources of Geographic Information System (GIS) data.Footnote 11
Measuring Literacy
The colonial census reports literacy by gender and age bins. Since 1901, the census also reports English literacy. Despite its richness, enumerating literacy over time is difficult because of changes in definition and measurement. In the 1881 and 1891 censuses, individuals were classified into three categories: literate, learning, and illiterate. Yet, enumerators were given no guidance on measuring literacy or accounting for learners apart from an age threshold (Gait Reference Gait1913). Age bins were also different across provinces.
Beginning with the 1901 census, the “learning” category was dropped and literacy was reported for standard age bins: those under age 10, aged 10 to 15, aged 15 to 20, and those over age 20. A uniform definition was adopted, namely “the ability to read and write.” Yet again, enumerators were not given official guidance on measuring literacy. Some provinces used a rigorous standard, while others enumerated individuals as literate if they could sign their name (Gait Reference Gait1913). It was only in 1911 that a uniform standard, the ability to read and write a short letter, was introduced. This makes literacy in the 1901 and later censuses difficult to compare. For example, many children under age 10 were counted as learners in the 1891 census, and some children under 10 were recorded as literate in the 1901 census, but not in subsequent censuses. To get around these issues, our panel analysis uses cohort literacy from the 1911 and 1921 censuses, when literacy was uniformly measured. Our cross-sectional analysis uses total literacy in each census year.
Online Appendix Table A1 shows literacy by cohort, gender, and language from 1901 to 1921. These are crude literacy rates equal to the number of literates in each group divided by the population of that group. Men were more literate than women, though this gender gap narrowed over time. English literacy was low in absolute terms but sizeable as a share of total literacy. Almost 15 percent of literate individuals in 1921 were, for example, also literate in English. Most children typically learned to read and write in a vernacular language before learning English (Sharp Reference Sharp1914). So, English literacy was, in particular, a measure of upper tail human capital. Online Appendix Table A2 shows total, male, and female literacy for each cross-section, while Online Appendix Figure 1 shows the distribution of total, male and female literacy. While the distribution of literacy was highly skewed in 1881, it became more dispersed by 1921.
Measuring School Enrollment
Unlike literacy, which measures the stock of human capital, enrollments capture the flow of human capital. As we expect railroads to affect the stock of literacy by increasing the flow of children into school, we complement our analysis of literacy with an analysis of school enrollment.
District enrollments are not reported in the colonial census. Rather, they are reported in many district gazetteers, which are less uniform. Nonetheless, we construct a series on primary and secondary enrollment between 1894 and 1911, the years with the most uniform data. Footnote 12 This is an unbalanced panel, as most provinces report enrollment for a subset of years.
Primary school enrollment is recorded as the number of children enrolled in primary schools divided by the cohort under age 10. It averaged 6 percent in 1901 and 1911, compared to 3 percent for secondary enrollment, which is children in schools other than primary schools, divided by the cohort aged 10 to 15. Footnote 13 A shortcoming of these data is that many secondary schools have attached primary classes, so some primary-aged children will then be included in secondary enrollment. For example, 47 percent of children in English secondary schools in 1912 were in primary classes, increasing to more than 60 percent in Assam and Eastern Bengal (Sharp Reference Sharp1914). Such primary classes were of higher quality than regular vernacular primary schools. As stated in Richey (Reference Richey1923), “The fact is that a very large percentage of the boys receiving elementary education in towns are not attending primary schools but the preparatory departments of secondary schools. It is only parents of the poorest class who send their boys to municipal primary schools” (p. 109). While this introduces some measurement error in enrollment, we are unaware of district-level enrollment data for all primary school children, regardless of school type.
Measuring Access to Railroads
To estimate the effect of railroads, we follow Fenske, Kala, and Wei (Reference Fenske, Kala and Wei2023) to code the opening dates of railway access in each district. Fenske, Kala, and Wei (Reference Fenske, Kala and Wei2023), following a procedure similar to Donaldson (Reference Donaldson2018), construct a polyline shapefile of the Indian railway network with an opening date for each segment. These dates are based on the 1934 edition of History of Indian Railroads Constructed and In Progress. For each listed railway line, they record the opening dates along with the beginning and end points of each line. We intersect this shapefile of railway lines with a map of modern sub-districts. Using a GIS mapping of colonial districts to these modern sub-districts, we compute the earliest year that each colonial district is connected to the railroad.
We use the date of opening to measure the duration of railroad exposure in a district, which, as we noted earlier, better captures the conceptual links from railroads to schooling. We refine this idea further in our panel analysis and assume the duration of railroad years affects literacy only up to the beginning of elementary school. If, for example, a railroad arrived in a district in 1893, it would not affect literacy for the cohort 20 and above in 1901 because they would be age 12 and above in 1893 and would have finished primary school. So, cohort 20 and above would have no exposure to railroads (coded as 0). In contrast, the arrival of railroads in 1893 would affect cohorts under age 10 in 1901 because many of them would not have presumably started elementary school as railroads arrived.
Since the age bins do not perfectly correspond to elementary school years, we use the youngest age in the bin to measure cumulative exposure up to elementary school. Our measure is the number of years a railroad has been operating in a district minus the number of years since the youngest member of a cohort would have regularly begun elementary school at age 6. Footnote 14 Denote this number of years since schooling began as y(c). For cohorts aged 20 and above, y(c) is 14. For cohorts aged 15–20, it is 9. For cohorts aged 10–15, it is 4. For cohorts aged below 10, it is 0. For cohort c, y(c) years since schooling began, in district d, with a railroad that opened in year r, measured in census year t, we define our treatment measure RailroadYears cdt as:
It may well be that railroads continue to affect schooling up until a child ends elementary school. We therefore construct a second measure, which is the number of years a railroad has operated in a district minus the number of years since the youngest member of a cohort would have regularly finished elementary school. Footnote 15 This measure assumes railroads affect literacy up to age 12 for the index age in a cohort. In Equation (1), this is equivalent to replacing y(c) with 8 for the cohort aged 20 and above, 3 for cohorts aged 15 to 20, and zero for the cohorts aged 10–15 and 0–10.
As constructed, the two measures bound the duration of exposure to railroads and assume that railroads affect literacy only up to the beginning or end of elementary school. Our first measure assumes parents decide whether to enroll their children in school based on cumulative exposure to the railroad up to the beginning of elementary school (age 6). Our second measure assumes parents decide whether to keep their children in elementary school based on cumulative exposure to the railroad up to the end of elementary school. Footnote 16
For the cross-sectional analysis, we count the number of years a district has been connected to a railroad in each census. As seen in Online Appendix Table A2, 52 percent of districts were connected to a railroad by 1881, increasing to 96 percent by 1921, with much of the increase happening before 1901. The railroad years measure thus better illustrates the variation across districts. For example, the number of railroad years averaged 7.6 years across districts in 1881, increasing to 22 years in 1901 and 40 years by 1921.Footnote 17
Geographic and Socioeconomic Controls
We construct a rich set of geographic variables to control for the geographical selection of railroad exposure. We collect data on the latitude and longitude coordinates of the centroid of the district, which we compute ourselves. We control for ruggedness from Nunn and Puga (Reference Nunn and Puga2012). We control for altitude, precipitation, temperature, slope, and suitability for growing specific crops such as cotton, dryland rice, wetland rice, and wheat, averaged over raster cells within a district. These are taken from the Food and Agriculture Organization of the United Nations’ Global Agro-Ecological Zones (FAO-GAEZ) data portal. Since proximity to the coast and rivers likely influenced railroad access, we include indicators for rivers and coastal districts as captured in Natural Earth Data’s shapefile maps of rivers and coasts. We also control for medieval ports recorded by Jha (Reference Jha2013). We control for the seasonality of rainfall. In particular, using data on historic rainfall from Matsuura and Willmott (Reference Matsuura and Willmott2018), we compute the Feng, Porporato, and Rodriguez-Iturbe (Reference Feng, Porporato and Rodriguez-Iturbe2013) entropy-based measure of seasonality. Finally, we control for the Kiszewski et al. (Reference Kiszewski, Andrew Mellinger, Malaney, Ehrlich Sachs and Sachs2004) index of the stability of malaria transmission.
Apart from geography, we control for the scale of urbanization before the advent of railroads using the population of cities enumerated in Chandler and Fox (Reference Chandler and Fox1974), circa 1850. These cities range in population from 11,000 to 580,000 across 52 districts. This effectively controls for more urban districts that were likely to be connected with railroads before less urban districts. We also control for the religious and caste composition of a district, including the share of Brahmans, traditional Hindu elites, Muslims, Christians, and tribal groups. Such shares are intended to capture historical differences in education among groups that may be correlated with railroad access. These data are taken from the colonial censuses.
ESTIMATION STRATEGY
Our main results exploit variation within districts and across cohorts to identify the effects of railroads on literacy. We complement this synthetic panel exercise with cross-sectional results.
Synthetic Panel
We estimate the following model using the log of the literacy rate by year, district, and cohort as the outcome:
In this model, LiteracyRate cdt is literacy for cohort c in district d and census year t. We use log literacy because it is a highly skewed variable, as shown in Online Appendix Figure 1. We estimate the model for t ∈ {1911,1921} and cohort c ∈ {0-10,10-15,15-20, 20+}. RailroadYears cdt measures the cumulative years of railroad exposure for cohort c in district d in year t.
We control for several fixed effects. First, district fixed effects, θ d , capture unobservable time-invariant district features that lead some districts to get railroads before others and that may correlate with literacy. Second, we include interactions of province × year and province × cohort fixed effects captured by δ p × η t and δ p × γ c to control for provincial changes in census enumeration methods by year and cohort. Such flexible controls address most measurement concerns related to literacy and account flexibly for omitted variables at the province and cohort level that may change over time. We cluster standard errors by district to account for serial correlation over time.
In this setup, we identify the effects of railroads using variation in cumulative exposure to railroad years across cohorts within districts over time. The key identifying assumption is that such exposure in railroad years is uncorrelated with the error term ε cdt . We believe this is a reasonable assumption given the flexible fixed effects included in the model. As a robustness check, we run the same analysis using the 1901 census and controlling for district fixed effects and province × cohort fixed effects. Since we use only the 1901 census for this exercise, changes in the standards used to measure literacy in different censuses are not an issue.
Cross-Section
We complement the panel methods with two cross-sectional models as follows.
ORDINARY LEAST SQUARES
We exploit the complete data from 1881 to 1921 using repeated cross-sections. For each census year, we estimate a separate OLS regression of the following form in t ∈{1881,1891,1901,1911,1921}:
In this equation, ln(LiteracyRate dt ) is the log literacy rate in district d in year t. Unlike cohort literacy in the synthetic panel, this measure picks up adult literacy because everyone who is literate is included in total literacy regardless of age. RailroadYears dt is the number of years district d in year t has had a railroad. This is 0 if the district is unconnected in t. The vector x dt includes the GIS controls, pre-rail urbanization, and social controls described in the previous section. We also include province fixed effects captured by δ p . Finally, ε dt is the error term.
Such a regression may generate biased estimates of the causal effect of railroads. For example, if more developed districts were the first to receive railroads, our estimate of railroad years would be biased upward because it would conflate the effects of railroads with those of prior development. On the other hand, if famine-prone areas received access early on, then our estimates would likely have a negative bias. To address such endogeneity concerns, we employ instrumental variables.
INSTRUMENTAL VARIABLES
We construct two instruments for RailroadYears dt that exploit different sources of variation. First, we build a tree spanning the 67 British military cantonments that existed as of 1864 before major expansion of railroads. If military concerns drove the placement of railroads, we expect cantonments where army troops were stationed to get early access. Using Prim’s algorithm, we construct the shortest tree that spans these 67 military cantonments. Figure 2 shows a map of this tree superimposed on the 1881 railway network. After constructing the tree, we compute the distance of each district from the spanning tree. We then use the log of (one plus) distance to this tree as an instrument for RailroadYears dt .
According to Kulkarni (Reference Kulkarni1979), two factors determined the location of cantonments. First, these places had to be “suitable for outdoor training round the year” (p. 214). This favored areas at moderate elevations. Second, cantonments could not be located near ravines where an enemy could hide. Both factors favor a particular aspect of climate and geography that is plausibly exogenous to other factors affecting human capital, conditional on the geographic and socio-economic controls. In addition, the northern Indian plains were more vulnerable to attack as compared to hilly areas, which in turn motivated the establishment of cantonments (Kulkarni Reference Kulkarni1979). Our analysis includes province fixed effects that capture this dimension of location.
Our second instrument uses Major J. P. Kennedy’s 1852 proposal for building railroads. Major Kennedy was the consulting engineer for the GOI and pushed for building low-cost railroads that, in his view, would confer innumerable benefits “to the growth of everything connected with the extension of British interests in India as well as with the industry, the wealth, and the comfort of its vast population” (Parliamentary Papers 1854, p. 3). Yet, Major Kennedy was aware of the costs of building railroads. So, he emphasized lower-cost routes connecting the ports with the interior. In particular, his plan called for a network in “strict harmony with the natural advantages” of the country. Unlike routes that would cut through the Eastern and Western coastal ranges of India, his plan called for routes that favored softer gradients, following the coast and natural topography.
Donaldson (Reference Donaldson2018) used portions of the Kennedy plan that were not implemented to construct placebo lines. In many cases, however, Kennedy’s routes were adopted, as seen by comparing the Kennedy plan in Figure 3 to the actual network in Figure 1. In other cases, however, more expensive routes were selected. Hence, we are assuming here that, conditional on controls, the 1852 Kennedy plan is uncorrelated with factors that would affect literacy other than access to railroads. To construct the instrument, we convert the map of Kennedy’s proposal into a polyline shapefile. We then calculate the shortest distance of each district from this route. We use the log of (one plus) distance to the lines in the Kennedy plan as an instrument for RailroadYears dt . We report results using the two instruments in the same regression and the results of each instrument used individually.
RESULTS
Synthetic Panel
Table 1 shows our main results on railroad exposure, which exploit variation across cohorts within districts in 1911 and 1921. Column (1) focuses on the log of total literacy, Column (2) on male literacy, and Column (3) on female literacy. In the second panel, we show results for English literacy. We report results for non-English literacy in the bottom panel. We calculate non-English literacy by subtracting English literates from total literates and dividing by the relevant population. Footnote 18
Notes: Robust standard errors clustered at district level in parentheses. *** p<0.01, ** p<0.05, * p<0.1. The unit of analysis is log literacy at the cohort×year level. All the regressions include fixed effects for district, cohort×province and year×province.
Source: See text for details.
As seen in Table 1, the coefficient on railroad exposure is positive and significant for total and male literacy, but not for female literacy. In terms of magnitude, the standardized β coefficients (multiplying the β coefficient in Table 1 by the standard deviation of cohort railroad years, 17.6 years, and dividing by the relevant standard deviation of log literacy) range from $\frac{{0.0202 \times 17.63}}{{1.250}} = 0.29$ standard deviations for total to $\frac{{0.0224 \times 17.63}}{{1.267}} = 0.31$ standard deviations for male literacy in the top panel. An alternative approach to magnitudes is to consider a counterfactual in which a railroad was connected ten years earlier. This would predict literacy rates, on average, to increase by 0.20 percentage points in the 0–10 age bin, by 1.38 percentage points in the 10–15 age bin, by 1.91 percentage points in the 15–20 age bin, and by 1.76 percentage points in the 20+ age bin.
We find smaller effects on male English compared to male non-English literacy, with standardized coefficients of 0.25 for English and 0.34 for non-English literacy. Unlike males, we find small and insignificant effects of railroads on female literacy, female English, and female non-English literacy. Footnote 19
This exercise includes the cohort aged 20 and above. Individuals aged 20 in this cohort began elementary school 14 years prior at age 6, but others at age 30 in the cohort were past elementary school 14 years prior when they were age 16. To ensure our results are not driven by such mismeasurement in cohort railroad years, we estimate the same regressions as above for the cohorts under 10, 10–15, and 15–20, removing those aged 20 and above. Any measurement error in cohort railroad exposure is smaller in these tighter age bins. As seen in Online Appendix Table A6, the results are similar, albeit with stronger results for non-English literacy compared to English literacy.
As outlined earlier, this measure of railroad exposure uses an index age in a cohort based on the beginning of elementary school at age 6. As a robustness check, a second method of constructing exposure uses an index age for a cohort based on the completion of elementary school at age 12. Online Appendix Table A7 shows these results. We find similar results with positive and significant effects only for male literacy. In terms of magnitude, they are marginally smaller at 0.26 standard deviations for total literacy and 0.29 standard deviations for male literacy.
An advantage to using the 1911 and 1921 censuses is the consistent enumeration of literacy across the two years. A disadvantage is that 94 percent of districts are connected by 1911. Unlike Table 1, we exploit variation across cohorts within districts using the 1901 census in Online Appendix Table A8. We find positive effects of railroads on male and English literacy, although the estimates on total male and non-English male literacy are smaller in magnitude and less precisely estimated than for 1911 and 1921. Increasing railroad exposure by 14 years (the standard deviation of cohort railroad years) increases male literacy by 0.095 standard deviations and male English literacy by 0.24 standard deviations. We again find small and insignificant effects on female literacy.
Cross Section
We turn next to cross-sectional results. Table 2 reports OLS estimates for each census year. While we report robust standard errors in this table, we show in Online Appendix Table A9 that our results are similar when we use Conley’s (1999) standard errors to adjust for spatial correlation in the error term.Footnote 20 Columns (1) to (3) show results for log literacy with no controls in (1), including province fixed effects in (2), and including province fixed effects with the full set of controls in (3). In Columns (4), (5), and (6), we report results for male, female, and English literacy. Two patterns stand out. First, the estimates are positive and significant across specifications. Second, the effects are larger for female and English literacy compared to male literacy in the later years.
Notes: Robust standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1. These cross-sectional models include province fixed effects; social controls namely the share of Brahmans, Christians, Muslims, and Tribes; the GIS controls namely area, latitude, longitude, altitude, precipitation, slope, temperature, ruggedness, malaria transmission, seasonality, indicators for coastal districts, rivers, and medieval ports, and suitability for specific crops such as cotton, dryland rice, wetland rice, wheat, and tea; and the city population recorded in Chandler and Fox (Reference Chandler and Fox1974) circa 1850.
Source: See text for details.
In Specifications (3) to (6) that include the controls and province fixed effects, standardized β coefficients range from 0.08 to 0.22 standard deviations, with those for English and female literacy being on the higher end of the range. For example, the 1921 standardized coefficient for female literacy at 0.15 is larger than that for male literacy at 0.069. The effect sizes for English literacy are also larger than for male literacy at 0.16 standard deviations of English literacy. Footnote 21 Finally, the consistent estimates between the five cross-sectional years from 1881 to 1921 are reassuring in that they suggest that one-time mortality shocks alone, such as the 1917 influenza epidemic, are not driving the results.Footnote 22
Table 3 shows the second-stage instrumental variables results using the military cantonments and the 1852 Kennedy plan instruments. We show the first-stage results for each year in Online Appendix Table A12. Columns (1)–(6) correspond to the same outcomes and controls as in Table 2. The two instruments strongly predict railroad years in each census year, as seen by the large Kleibergen Paap F-statistic (KPF). Using the Hansen test, we fail to reject the over-identification restriction in the majority of the specifications.Footnote 23
Notes: Robust standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1. These cross-sectional models include province fixed effects; social controls namely the share of Brahmans, Christians, Muslims, and Tribes; the GIS controls namely area, latitude, longitude, altitude, precipitation, slope, temperature, ruggedness, malaria transmission, seasonality, indicators for coastal districts, rivers, and medieval ports, and suitability for specific crops such as cotton, dryland rice, wetland rice, wheat, and tea; and the city population recorded in Chandler and Fox (Reference Chandler and Fox1974) circa 1850.
Source: See text for details.
Our IV results confirm our previous findings: railroads positively predict literacy. In terms of magnitude, the IV estimates are largest for English literacy, followed by female literacy, and then male literacy. For example, in standardized terms, the effects of railroad years on 1901 English literacy are 0.48 standard deviations, on female literacy are 0.43 standard deviations, and on male literacy are 0.41 standard deviations. We find similar patterns in effect sizes in the other years. These estimates are larger than the OLS estimates reported in Table 2. These IV estimates are local average treatment effects (LATE), namely the effect of increasing railroad years for those districts that gained access to railroads earlier because of their proximity to military cantonments and to the lines in the 1852 Kennedy plan. This translates into more isolated districts incidentally being connected to a railroad because they are on a direct line between major centers. It may well be that such isolated places benefited more from railroads, which would account for their larger effect sizes.
Discussion
Are these effects big or small? To answer this question, we first benchmark our results against those in Atack, Margo, and Perlman (Reference Atack, Margo and Perlman2012). They estimate the effect of railroads on individual school enrollment in the United States. Their estimates suggest that increasing rail access across U.S. counties in the 1850s predicts 56 percent of the increase in mean school enrollment between 1850 and 1860 (p. 16). We find smaller effects in India. In our case, increasing exposure to railroads between 1881 and 1891 predicts 16 percent of the actual increase in literacy. Footnote 24 It may well be that infrastructure expansions have larger spillovers in more developed countries.
Another way to consider the size of these estimates is in comparison to supply interventions. Chaudhary (2010b) finds it would have cost the colonial government roughly 3 rupees to make an additional person literate using causal estimates of public education spending on literacy. To construct a similar estimate for railroads, we have to monetize the increase in railway years. A crude approach is to use the change in capital outlay and working expenses between the relevant years, which we obtain from Bogart and Chaudhary (Reference Bogart and Chaudhary2016). This suggests an increase in railroad years between 1881 and 1891 of 6.28 years, which translates into 844,889,000 rupees. This increase predicts 16 percent of the increase in literacy between 1881 and 1891, translating into 401,245 additional literates. Converting this into per capita terms suggests a cost of around 2,100 rupees to make one additional person literate. This is a simple, illustrative back-of-the-envelope exercise. Railroads conferred many benefits on Indian society that are not captured here. What this exercise merely shows is that railroad effects on schooling would have had to be implausibly large to be a cost-effective strategy to increase mass education.
Both the cross-sectional and synthetic panel methods, then, point in the same direction of positive and significant effects of railroads on male and English literacy. Why do we find significant results for female literacy in our cross-sectional regressions and insignificant results in the synthetic panel? First, the local average treatment effects estimated by the two approaches may differ. For instance, the variation used in the cross-sectional estimation means that districts that were connected early to a railroad have the highest values of railroad exposure. By contrast, in the panel, some of the cohorts that receive the most residual exposure to railroads net of district fixed effects are in regions such as Dera Ghazi Khan and Chittagong, where outcomes for women have traditionally lagged those of men.
Second, in the cross-section, we allow literacy of the entire population to respond to the duration of railroad exposure, regardless of the age at which this exposure occurred; it may be the case that the effect of railroads on literacy gained in later life was greater for women. Unlike the cross-section, our panel regressions exploit variation in cohort exposure before the beginning of elementary school and are unable to capture this aspect of exposure to railroads.
Third, statistically, fixed effects approaches like ours can exacerbate attenuation bias due to measurement error. One reason why literacy in general, and female literacy in particular, may be mis-measured is what Dyson (Reference Dyson1989, p. 165) identifies as “female age shifting into the reproductive span” in the colonial censuses. Women’s ages were sometimes misstated toward their main reproductive years. Since the cross-sectional regressions study total male and female literacy, they circumvent measurement error in age enumeration. Further, the standard deviation of female literacy across cohorts within districts is only 0.6 percent, compared to a between standard deviation of 2.9 percent. This may also attenuate the synthetic panel estimates.
MECHANISMS
In this section, we document the proximate mechanism through which railroads increased literacy—greater school enrollment. We then provide suggestive evidence on the deeper mechanisms linking railroads to schooling.
Enrollment
Table 4 shows the results of enrollment for the panel and cross-sectional methods. As seen in the top panel, where we include district and year fixed effects, increasing exposure to railroads has a positive and significant effect only on secondary enrollment. Indeed, the coefficient on primary enrollment is negative, albeit insignificant. It would be possible for railroads to increase secondary enrollment without similarly increasing primary enrollment if, for example, they had no effect on the extensive margin, but raised the continuation rate into secondary education. Given that many of the secondary schools in the data combined secondary schooling with elite primary education, these results may also reflect greater enrollment in elite primary schools. What these results rule out is the interpretation that railroads led to an increase in children attending basic vernacular primary school.
Notes: Robust standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1. These cross-sectional models include province fixed effects; social controls namely the share of Brahmans, Christians, Muslims, and Tribes; the GIS controls namely area, latitude, longitude, altitude, precipitation, slope, temperature, ruggedness, malaria transmission, seasonality, indicators for coastal districts, rivers, and medieval ports, and suitability for specific crops such as cotton, dryland rice, wetland rice, wheat, and tea; and the city population recorded in Chandler and Fox (Reference Chandler and Fox1974) circa 1850.
Source: See text for details.
In terms of magnitude, the effects of railroad exposure increase secondary enrollment by 0.55 standard deviations in Specification (6) of the panel analysis and by 0.42 standard deviations of secondary enrollment for the 1911 IV specification. In comparison to literacy, these standardized β coefficients are larger for both the panel and cross-sectional models. This is unsurprising. We would expect bigger effects of railroads on the flow of children into school compared to the stock of literates because of high dropout rates, with many children leaving primary school before completing 3 to 4 years of schooling, which educationists in this period argued was necessary to become literate (Parulelar Reference Parulelar1939).
Agricultural Income and Land Taxes
Railroads had a large effect on price convergence, trade, and agricultural income in India (Donaldson, Reference Donaldson2018). Are rising agricultural incomes then a mediator between railroads and higher literacy? In Tables 5 and 6, we conduct a mediation analysis suitable for an OLS framework (Imai, Keele, and Tingley Reference Imai, Keele and Tingley2010; Imai et al. Reference Imai, Keele, Tingley and Yamamoto2011). Similar to enrollment and literacy, the mediators are logged in these regressions. Table 5 shows the mediation results for total literacy, and Table 6 shows these results for secondary enrollment. Footnote 25 As seen in Specifications (1) and (2) in the top panel, the coefficient on income is small, negative, and insignificant.
Notes: Robust standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1. The outcome is log literacy in the respective year. These cross-sectional models include province fixed effects; social controls namely the share of Brahmans, Christians, Muslims, and Tribes; the GIS controls namely area, latitude, longitude, altitude, precipitation, slope, temperature, ruggedness, malaria transmission, seasonality, indicators for coastal districts, rivers, and medieval ports, and suitability for specific crops such as cotton, dryland rice, wetland rice, wheat, and tea; and the city population recorded in Chandler and Fox (Reference Chandler and Fox1974) circa 1850.
Source: See text for details.
Notes: Robust standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1. The outcome is log secondary enrollment in the respective year. These cross-sectional models include province fixed effects; social controls namely the share of Brahmans, Christians, Muslims, and Tribes; the GIS controls namely area, latitude, longitude, altitude, precipitation, slope, temperature, ruggedness, malaria transmission, seasonality, indicators for coastal districts, rivers, and medieval ports, and suitability for specific crops such as cotton, dryland rice, wetland rice, wheat, and tea; and the city population recorded in Chandler and Fox (Reference Chandler and Fox1974) circa 1850.
Source: See text for details.
In Specifications (3) and (4), we also rule out a link from agricultural income to education via public funding. Surcharges on existing land taxes were a key funding source for district boards that managed rural primary education. While there could be a positive link in theory from railroads to agricultural income to land taxes, we find that land taxes per capita are not a significant mediator for literacy or secondary enrollment. These results are unsurprising because land taxes were revised infrequently in most parts of the country in the late nineteenth and early twentieth centuries (Kumar Reference Kumar1983). While railroad years are correlated with agricultural income, they are uncorrelated with land taxes.Footnote 26 There is no evidence that rising agricultural incomes mediate our railroad results.
Apart from agricultural income, railroads may have increased non-agricultural income and urbanization, which in turn would have increased the returns to education thus linking railroads to education. We indirectly test whether increasing returns to skill, urbanization, and rising non-agricultural incomes play a mediating role by looking at income tax revenues, urbanization share, and the share of workers in industry and services. Income taxes were assessed on non-agricultural income using a schedule that varied by income source. Since income from agriculture was not taxed, this measure captures income from industrial and professional employment. The share of non-agricultural workers and income taxes are both proxies, then, for returns to education.
Using data on income taxes from the District Gazetteers for 1901 and 1911, Specifications (5) and (6) in the top panel of Tables 5 and 6 show that income taxes have a positive and significant coefficient for both literacy and secondary enrollment. They mediate 30 percent to 46 percent of the effects of railroads on literacy and 25 percent to 43 percent of the effects on secondary enrollment. Rising non-agricultural incomes may have led to income effects and eased liquidity constraints, leading more families to “buy” schooling for their children.
In the bottom panel, we look at urbanization and the share of workers in industry and services. To measure urbanization, we compute the share of the population in a district living in towns with at least 5,000 persons using census data on city populations from Fenske, Kala, and Wei (Reference Fenske, Kala and Wei2023). We use labor force data from Fenske, Gupta, and Yuan (Reference Fenske, Gupta and Yuan2022) to construct these measures using the decennial census. Similar to income taxes, urbanization mediates between 38 percent and 48 percent of the effects on total literacy and a smaller share of secondary enrollment at 9 percent to 16 percent. Service sector employment also appears to partially mediate the results, but less so than urbanization and income taxes. It mediates anywhere from 6 percent to 16 percent of the effect of railroads on literacy and secondary enrollment. Lawyers and public administrators, among other professionals, were part of the service sector. Such workers were more educated than the rest of the population and were paid higher wages than other skilled occupations. These measures do, however, conflate income effects with rising returns to education. We have no way of disentangling these channels and hence interpret these results as evidence of their joint importance.
CONCLUSION
We study the effects of railroads on Indian literacy and enrollment using district-level data from 1881 to 1921. We find positive and significant effects of railroads on male and English literacy. Our results are robust in both panel models where we exploit variation in railroad exposure across cohorts within districts and in cross-sectional models where we control for the endogeneity of railroad exposure using instrumental variables. Railroads lead to greater literacy via higher secondary enrollment. We find no evidence that agriculture is an important mediator. Rather, non-agricultural income, urbanization, and service sector employment are key mediators of the link between railroads and higher schooling.
While a large literature has estimated the contribution of railroads to economic growth, most studies using social savings or other methodologies ignore spillovers to other parts of the economy. To accurately assess the costs and benefits of railroads, which for many countries were their single most expensive public investment of the nineteenth century, we need to account for these spillovers, positive and negative. Our findings suggest the social savings estimates of Indian railways would indeed be higher if we account for their positive spillovers on human capital.
Our findings also speak to current policy debates that pit demand against supply policies to improve schooling in developing countries. Our back-of-the-envelope exercise illustrates that even a large and positive demand shock such as railways had a much smaller impact on Indian illiteracy, relative to its cost, than increasing public education investments.