Introduction
There is nowadays widespread consensus that poor state capacity is one of the most important hurdles for economic growth and the economic system resilience (e.g. Acemoglu, Reference Acemoglu2005; Acemoglu and Robinson, Reference Acemoglu and Robinson2013; Besley and Persson, Reference Besley and Persson2009, Reference Besley and Persson2010; Dincecco and Katz, Reference Dincecco and Katz2016; Herbst, Reference Herbst2014; Migdal, Reference Migdal1988). Countries with poor state capacity have little capability to tax households and firms, provide critical public goods and services, and build the institutions necessary to respond effectively to negative shocks. Empirical studies indicate that these countries are considerably affected by the consequences of natural disasters (e.g. Kahn, Reference Kahn2005; Strömberg, Reference Strömberg2007) and that are at high sanitary risk because of poor medical equipment, lack of hospitals, personnel, and health robustness (e.g. Franklin, Reference Franklin2023).Footnote 1 Nevertheless, the evidence about the geographical distribution of the reported Covid-19 virus outbreak seems inconsistent with this theory. Both the reported infection and mortality rates were far higher during the initial outbreak in high-income countries (e.g. Chang et al., Reference Chang, Chang, He and Tan2022; Schellekens and Sourrouille, Reference Schellekens and Sourrouille2020). Should it mean that the containment of the Covid-19 virus was less effective in countries with a higher level of state capacity? Or that there is some overlooked dimension of state capacity at play? What could we learn from it about the state capacity concept?
The Covid-19 outbreak was a pandemic that caused millions of casualties and put the world on hold for years. Although there is indication that the virus circulated, undetected, already in 2019, the first documented cases were reported in China at the end of that year and quickly spread all over the world. By March 2020, no continent was left untouched and the World Health Organization (WHO) officially declared Covid-19 a global pandemic. While its impact was highly severe and tangible, the virus spread was to a large extent triggered by undocumented cases, many of which were asymptomatic or likely not severely symptomatic (e.g. Li et al., Reference Li, Pei, Chen, Song, Zhang, Yang and Shaman2020). These cases were difficult to detect, yet highly contagious. Bad detection of the virus therefore implies infected cases (or deaths from Covid-19) that are undocumented – in other words, a discrepancy between the true and the reported infection (or mortality) rate (under-reporting). In this paper, we develop a conceptual framework to study the institutional root of this statistical discrepancy and use several sources of data to bring evidence on the role of institutions during the first wave of the pandemic.
Specifically, we draw a connection between this detection effect (systematically documented in the medical literature) and the state capacity concept. We argue that this effect is related to the ‘information capacity’ of a state, a dimension of state capacity that is concerned with the ability to make societies ‘legible’ to the gaze of the central authority (Scott, Reference Scott2009). Throughout history, countries invested in the building of institutional functions aimed at gathering and processing information (Brambor et al., Reference Brambor, Goenaga, Lindvall and Teorell2020; D'Arcy and Nistotskaya, Reference D'Arcy and Nistotskaya2017; Lee and Zhang, Reference Lee and Zhang2017; Scott, Reference Scott2020). These include statistical agencies and other state branches devoted to mining data through censuses (Vom Hau et al., Reference Vom Hau, Peres-Cajías and Soifer2023), cartographic representation (Dimitruk et al., Reference Dimitruk, Du Plessis and Du Plessis2021), or medical surveys. These functions are critical to helping governments calibrate and implement policy; yet, they are unequally distributed worldwide. For instance, according to the World Bank's Statistical Performance Index (SPI), which captures the national statistical systems capacity, the United States scores 92.8 while Somalia takes a score of 48.4 (Dang et al., Reference Dang, Pullinger, Serajuddin and Stacy2023). A similar ranking can be retrieved from the analysis of alternative indexes, such as the one provided by the STANCE project (Brambor et al., Reference Brambor, Goenaga, Lindvall and Teorell2020).
Did these differences in information capacity impact the distribution of the reported infection (or mortality) rate? We argue that, because of the undetectable nature of the Covid-19 virus, information capacity (through for example PCR tests, chemical reagents availability, infrastructures, and competent personnel) was critical for assessing the pandemic extent, understanding its dynamics, and crafting appropriate responses that were essential for saving human lives. From a purely statistical point of view, the lack of information capacity generated poor detection and under-reporting, thus deflating the reported infection (or mortality) rate. Importantly, since detection is related to the level of information capacity it follows that the under-reporting of the pandemic is not randomly distributed but is a (decreasing) function of state capacity.
In Table 1 we illustrate the two channels, we focus on in this paper, through which state capacity affects the reported infection (or mortality) rate. The first channel we consider is the containment effect. The containment effect is the result of a better healthcare capacity, which a state built throughout the years to achieve healthcare goals and respond to epidemiological shocks (Kandel et al., Reference Kandel, Chungong, Omaar and Xing2020; Liang et al., Reference Liang, Tseng, Ho and Wu2020; Serikbayeva et al., Reference Serikbayeva, Abdulla and Oskenbayev2021). These include the provision and access to care, through health centres and hospitals, as well as the quality of care. It also embraces the capacity to implement health-related policies, such as vaccination or treatment of diseases. The containment effect acts to reduce the Covid-19 infection rate among the population. The second channel is the detection effect. As pointed out above, this effect stems from a better information capacity and increases the report of the infected cases (or of deaths caused by the disease) thanks to a higher ability to gather and process the relevant information. Therefore, as one can see from column 3 of Table 1, state capacity is likely to deploy at least two effects (we will discuss some others below) that go in opposite directions – possibly generating complex non-linear outcomes.
Table 1. Two main effects of state capacity on the reported Covid-19 cases

We bring this theory to the data, by analysing the virus outbreak in Colombia, during its first phase (i.e. up to June 2020), and cross-country data related to the pandemic. The first analysis exploits variation in the local state capacity level across Colombian municipalities. Colombia is arguably a suitable set-up to study this question. First, the country has a long history of relative absence of state capacity. Before the 1930s, there was little local public good provision in most of the country. Decentralization in the 1980s favoured a systematic local public good provision, but this system triggered variability in local state capacity throughout the country (Acemoglu et al., Reference Acemoglu, García-Jimeno and Robinson2015). Second, the country's healthcare system is to a great extent decentralized. During the pandemic, the Colombian central government decided to delegate the local healthcare providers to testing and detecting the Covid-19 infected cases – a decision that, according to Laajaj et al. (Reference Laajaj, De Los Rios, Sarmiento-Barbieri, Aristizabal, Behrentz, Bernal, Buitrago, Cucunubá, de la Hoz, Gaviria and Hernández2021), ‘dramatically increased the detection rate among the rich’ since the providers’ ‘quality of service […] is highly correlated with income levels’. Third, the Colombian Ministry of Health publishes a rich micro dataset on all country's diagnosed Covid-19 cases. Importantly, the dataset provides information on the infected person's municipality of residence giving us the opportunity to link the information on the Covid-19 degree of severity with local state capacity-related data, as provided by Acemoglu et al. (Reference Acemoglu, García-Jimeno and Robinson2015). In the second part of the article, we complement the Colombian analysis with cross-country evidence that also allows us to use specific measures of information capacity that have been developed in recent years, as explanatory variables. Our strategy is therefore to combine two analyses that can provide overall both sufficient internal validity and a satisfactory degree of generalization.
In both analyses, we estimate a robust U-shaped relationship between levels of state capacity and the reported Covid-19 diffusion among the population which is consistent with our conceptual framework. Not only is this a relationship that we find in both settings, but it stands when we use several alternative measures of state capacity and control for other possible channels. Using direct proxies for healthcare and information capacity also allows us to identify the containment and detection effects in a regression analysis. As expected, their estimated sign is negative and positive, respectively. We also find evidence of complementary (of these two effects) which suggests that detection mechanisms can help improve the public health measures needed to contain the virus.
It is important to point out that we do not claim a strict, causal interpretation of our results. We note in fact that alternative explanations could be at play affecting the intensity of the disease or the capacity of the state to contain or detect the virus. Mobility, local commuting, and exchanges with the rest of the world were among the factors that according to the literature could have explained a more severe outbreak of the pandemic (Lin et al., Reference Lin, Wang and Zhou2022; Price and Adu, Reference Price and Adu2022; Rahman and Thill, Reference Rahman and Thill2022). Moreover, while the eldest population is more vulnerable to diseases, a larger share of the young population is generally associated with a more active and mobile community (Caselli et al., Reference Caselli, Grigoli, Sandri and Spilimbergo2022). Finally, the stringency of the restrictions to mobility could have also accounted for different transmission rates. We try to address many of these aspects in our regression analysis. For example, in the cross-country analysis, we control for the degree of openness of an economy, the country's legal system origin, measures of institutional quality, the stringency index, as well as other demographic dimensions such as population density, the share of population living in an urban area, or the population age distribution. In the Colombian analysis, we also include dummies for the provincial head town and control for the distance to the closest highway as well as for the presence of criminal organizations.
Our paper speaks to several strands of literature. First, it relates to prior work that posits that state capacity is a multifaceted and significantly complex concept (e.g. Ricciuti et al., Reference Ricciuti, Savoia and Sen2019; Williams, Reference Williams2021). While scholars generally agree that state capacity is a multi-dimensional analytical construct, there is an ongoing debate about the best way to approach its practical manifestations (Hendrix, Reference Hendrix2010). These works put the focus on different sets of dimensions, but there is no consensus on which aspect of state capacity should be deemed most fundamental (e.g. Centeno et al., Reference Centeno, Kohli, Yashar and Mistree2017; Foa and Nemirovskaya, Reference Foa and Nemirovskaya2016; Lindvall and Teorell, Reference Lindvall and Teorell2016; Soifer, Reference Soifer2008). There is also broad consensus on the fact that whatever ‘latent’ core dimensions of state capacity one identifies, these should be considered as mutually supporting and interlinked (Hanson and Sigman, Reference Hanson and Sigman2021). Our work shows in a real-world scenario how two different dimensions of state capacity, healthcare and information capacity, produce opposing effects on the reported Covid-19 cases, containment and detection, that generate a complex pattern that would be difficult to puzzle out without a detailed understanding of the state capacity concept.
Second, we connect to a recent, fast-growing literature on information capacity. Scholars have traditionally paid little attention to the role of information, despite information was central in Douglas North's theory of state-building (e.g. Hodgson, Reference Hodgson2006; North, Reference North1991). Recent works have demonstrated that the state's capacity to gather and process information is critical for state-building, by supporting governments to best calibrate policy (e.g. Brambor et al., Reference Brambor, Goenaga, Lindvall and Teorell2020; D'Arcy and Nistotskaya, Reference D'Arcy and Nistotskaya2017; Lee and Zhang, Reference Lee and Zhang2017; Scott, Reference Scott2009, Reference Scott2020). Empirical analyses have further supported this argument, bringing evidence about these aspects having practical implications for state-building in South Africa (Dimitruk et al., Reference Dimitruk, Du Plessis and Du Plessis2021) and in Latin American countries (Vom Hau et al., Reference Vom Hau, Peres-Cajías and Soifer2023). In our paper, we document the importance of information capacity to respond to the Covid-19 pandemic and underline the (overlooked) detection effect of the information capacity on the reported infection (and mortality) rate.
Last but not least, our analysis contributes to improving the policymakers’ understanding of a complex phenomenon, like the Covid-19 diffusion. While the fact that testing was critical is widely recognized by scholars and the public audience in general (e.g. Brotherhood et al., Reference Brotherhood, Kircher, Santos and Tertilt2020; Piguillem and Shi, Reference Piguillem and Shi2022), the capacity to do tests has not been related yet, to the best of our knowledge, to the state capacity concept. In this paper, we inform policymakers and scholars that the level of information governments could rely on was not randomly distributed but was an increasing function of a country's information capacity level: governments with a high state capacity level have had the advantage of testing (stemming from a higher level of information capacity) which, through the detection effect, unintentionally translated into a larger reported infectious rate. It is likely that this informational advantage also helped these governments to make better policy decisions. Our analysis therefore indicates that the concept of state capacity and all its dimensions are an important aspect of understanding the geography of the reported distribution of the Covid-19 pandemic.
Theory and background
In this section, we provide a brief contextualization of the concept of state capacity by focusing on its role during a pandemic. We first explain why the Covid-19 pandemic is a suitable context to reflect on the state capacity concept; after that, we sketch out a theoretical framework to interpret the observed empirical pattern of the infected cases.
The Covid-19 pandemic
The onset of the Covid-19 pandemic has marked a watershed moment in global health and governance. After being first detected in late 2019 in China, the novel coronavirus rapidly crossed international borders, triggering a cascade of national and regional emergencies. By March 2020, when no continent was left untouched, the WHO officially declared Covid-19 a global pandemic. Our analysis purposely focuses on the first wave of the Covid-19 pandemic – specifically, between March and June 2020. This was the period, in fact, that allows us to examine the responsiveness of states and public institutions to the unprecedented challenges posed by the pandemic and isolate immediate institutional reactions free from the influence of hindsight and adaptations stemming from best practices and new technologies (such as the vaccines) that arose only in later stages. In other words, during the first wave an effective response could only rely on the level of state capacity that countries have developed throughout history – level that is fixed in the short run.
In order to put in the right context the institutional challenges faced by the governments, it is worth highlighting that the Covid-19 pandemic was different from any prior emergency countries were called to cope with in the past. It is a matter of fact that the spread of the virus was triggered by undocumented cases, many of which were asymptomatic or likely not severely symptomatic (e.g. Li et al., Reference Li, Pei, Chen, Song, Zhang, Yang and Shaman2020). Responding effectively to the pandemic meant for public authorities not just acquiring information on the number of casualties, like in the aftermath of a natural disaster; it meant collecting as much information as possible on a highly undetectable phenomenon to estimate the true diffusion of the virus in the population. In other words, dealing with the pandemic required two different aspects of state capacity: healthcare capacity which helped contain the virus diffusion; and information capacity which helped understand the ‘true’ magnitude of the pandemic. We will below review these two dimensions of state capacity in relation to Covid-19.
Healthcare capacity and the containment effect
One of the most important dimensions of state capacity is the ability of the state to provide goods and services to respond to the population's healthcare-related needs. With this goal, states have built hospitals and medical schools, provided diffused health centres, and improved the quality of care. The healthcare capacity is essential to provide health-related policies, such as vaccination or treatment of diseases, and to respond effectively to epidemiological shocks, through an effective allocation of resources in the face of crises (Kahn, Reference Kahn2005; Keefer et al., Reference Keefer, Neumayer and Plümper2011; Persson and Povitkina, Reference Persson and Povitkina2017) and prompt containment measures aimed at mitigating the spread of viruses (Kandel et al., Reference Kandel, Chungong, Omaar and Xing2020; Liang et al., Reference Liang, Tseng, Ho and Wu2020; Serikbayeva et al., Reference Serikbayeva, Abdulla and Oskenbayev2021). According to this argument, in the context of the Covid-19 pandemic, one should expect that high levels of state capacity help contain the transmission of the disease through better prevention and generally healthcare-related policies – an effect which we label as ‘containment effect’. We illustrate graphically the containment effect in Figure 1a. This effect is equivalent to a downward-sloping line in the space levels of state capacity (x-axis)/infection rate (y-axis). According to this channel, one extra dollar invested in state capacity is associated with a reduction in the infection rate.

Figure 1. Containment and detection effects. (a) Containment effect, (b) under-reporting due to lack of detection.
Notes: the two figures depict the pattern of the reported Covid-19 infection rate (in the y-axis) for different levels of state capacity (in the x-axis). In panel (a) we illustrate a theoretical scenario in which state capacity affects the Covid-19 infection rate only through the containment of the virus. In panel (b) we show how the lack of detection generates underreporting of the infected cases and how this under-reporting is decreasing in the levels of state capacity. The red lines illustrate two possible patterns of the reported infection rate. The dashed red line illustrates a situation in which the detection effect is mild, while the thick red line shows one in which the detection effect is substantial. The distance between the true Covid-19 infection rate (black) line and the reported infection rate (red) line is the undetected rate.
Information capacity and the detection effect
A second critical dimension of state capacity in relation to Covid-19 is the information capacity. It is a matter of fact that states and local institutions require accurate observations of phenomena to inform their decision-making processes. A growing body of scholarly work has been shedding light on the importance of the institutional functions devoted to collecting and processing information within the wide array of state's capabilities (Lee and Zhang, Reference Lee and Zhang2017; Scott, Reference Scott2020). This strand of literature emphasizes that fiscal capacity is closely linked to ‘information capacity’, defined as the ability to render societal practices ‘legible’ to the gaze of the central authority (Scott, Reference Scott2009). There have even been attempts to develop measures of state capacity premised on the notion that we can proxy the overall dimension by examining how institutions handle data and information, leveraging the reliability of population and property censuses, assessing the timing and frequency of the release of statistical yearbooks, and considering the presence of a government agency with data processing tasks (Brambor et al., Reference Brambor, Goenaga, Lindvall and Teorell2020; D'Arcy and Nistotskaya, Reference D'Arcy and Nistotskaya2017; Lee and Zhang, Reference Lee and Zhang2017). In a similar vein but with a narrower, purely descriptive scope, the World Bank's collection of ‘Statistical Performance Indicators’ is an attempt to portray the maturity and performance of national statistical systems.Footnote 2
In the event of a crisis, such as the Covid-19 pandemic, gathering data serves as an indispensable tool for assessing its extent, understanding its dynamics, and crafting appropriate responses. One should therefore expect that information capacity, through for example PCR tests, chemical reagents availability, infrastructures, and competent personnel, increases the detection rate of the virus, pushing up, through this channel, the reported number of Covid-19 cases; in other words, one extra dollar invested in state capacity is associated with an increase in the accuracy of the information regarding the true distribution of Covid-19 cases.
Summing-up the state capacity effects
In Figure 1b we sum up graphically the two above-mentioned effects of state capacity. If containment was the only channel through which state capacity impacts the infection rate, or if the information capacity was not relevant for Covid-19, one should expect to see the black, thick line. If instead information capacity is important, one should also consider that its effect, detection, pushes up the reported number of Covid-19 cases by reducing the undetected rate. Therefore, one extra dollar invested in state capacity reduces the infection rate but also the under-reporting of these rates. This is illustrated in Figure 1b. Consider first the low state capacity areas. There, both containment and detection are low. The true infection rate is therefore high but because of poor detection there is considerable under-reporting (underlined by the vertical dashed line); therefore, the reported number of infected cases is relatively low (red curve). Consider now high state capacity areas, at the other end of the spectrum. There, both containment and detection are high. Accordingly, the distance between the true and the reported infection rates is relatively small. The interrelation of these two channels can give rise to non-linear effects whose outcome is consistent, as shown in Figure 1b, with a U-shaped pattern of the infection rate for various levels of state capacity.
Other channels related to state capacity
It is worth noting that there could be other channels, related to state capacity, that potentially affect the transmission rate. For instance, a long-standing tradition of strong institutions is likely to foster social trust, facilitate voluntary compliance with rules, and garner support for public activities (Andriani and Sabatini, Reference Andriani and Sabatini2015; Besley and Dray, Reference Besley and Dray2024; Newton and Norris, Reference Newton, Norris, Pharr and Putnam2000). Compliance might also be driven by a higher detection rate (and therefore by a higher information capacity level) as people may adopt more cautious behaviours as the reported infection (and mortality) rates rise (Agüero and Beleche, Reference Agüero and Beleche2017; Laxminarayan and Malani, Reference Laxminarayan, Malani, Glied and Smith2011; Philipson, Reference Philipson2000). Furthermore, high levels of state capacity tend to increase political stability through legitimization, thereby facilitating the enforcement and acceptance of stringent policies needed to mitigate the transmission of diseases (Andersen et al., Reference Andersen, Møller, Rørbæk, Skaaning, Møller and Skaaning2016). Finally, since inequality is found to worsen the consequences of pandemic outbreaks (Guillén, Reference Guillén2020), we could expect better outcomes in places where state capacity is well-developed since it is generally associated with lower income inequality (Acemoglu et al., Reference Acemoglu, Ticchi and Vindigni2011; Hollenbach and Silva, Reference Hollenbach and Silva2019). Graphically, all these channels are likely to generate an upward, clockwise rotation of the black curve in Figure 1, making the containment effect steeper.
Data description and empirical strategy
We test our conceptual framework using evidence from the so-called first-phase of the Covid-19 pandemic in Colombia as well as worldwide (i.e. March to June 2020). In this section, we provide a description of the data we use.
Colombian municipality-level data
Our main piece of evidence is offered by the Colombian municipalities.Footnote 3 In our study, we focus on the cumulative number of infected cases and deaths from Covid-19 during the first phase of the pandemic. To this aim, we gather the information on Covid-19 infected cases published by the Colombian Ministry of Public Health (Ministerio de Salud y Protección Social).Footnote 4 Data are collected as CSV files supplying information on any (known) person infected with Covid-19 in the country. Critically, the data provide information on the municipality of residence of the infected person and on whether she recovered from the disease or passed away. Since we are interested in matching these data with the municipality's level of state capacity, we compute the infection rate and the fatality rate at that level over 1,000 inhabitants. Our sample comprises 1,018 municipalities. The average infection rate is 5.859 and the average fatality rate is 0.206.Footnote 5 Figure 2 illustrates the spatial distribution of the two rates across Colombian municipalities. The two maps in the figure make clear that there is considerable variation across the mean values with observed standard deviations being 9.256 and 0.361, respectively.

Figure 2. Covid-19 severity across Colombian municipalities. (a) Infection rate, (b) mortality rate.
We match data on Covid-19 cases with municipality-level data on the level of state capacity used by Acemoglu et al. (Reference Acemoglu, García-Jimeno and Robinson2015). The authors put emphasis on the historical relative absence of state capacity in the country as well as on its great variability. Historical differences in the local state capacity levels have been then further amplified by a series of reforms that made Colombia a decentralized state where many aspects of the public sphere organization are decided locally. Apart from police, courts, and public hospitals, all other agencies fall under the jurisdiction of the municipalities. We focus on the local capacity to raise taxes – that is, on the fiscal capacity of a municipality. Local taxes in Colombia are mainly industry and commerce tax as well as property tax which are collected to finance public spending and state agencies.Footnote 6 To this aim, we use the information on the diffusion of tax collection offices as a proxy for the ability of a municipality to raise the citizens’ expected cost of evading taxes (e.g. Allingham et al., Reference Allingham and Sandmo1972). We then compute the number of tax offices in a municipality per 1,000 inhabitants. Over 48% of the Colombian municipalities do not have a tax office. In the median municipality, there is an office every 20,000 people, while, in the 95 percentile, there is an office every 5,000 people.
We note that this measure of fiscal capacity works well at the extensive margin (i.e. whether there is an office or not);Footnote 7 however, the intensive margin may arguably capture other factors such as the size of the office – information that we do not have. Moreover, there might be other aspects that affect the local fiscal capacity level behind the diffusion of tax collection offices. An important one is the presence of criminality, which might capture resources and undermine a municipality's capacity to raise taxes. We will try to address this concern in some robustness checks below, where we draw from Acemoglu et al. (Reference Acemoglu, García-Jimeno and Robinson2015) the number of guerrilla attacks between 1988 and 2004 by the Ejército de Liberacíon Nacional (ELN) and the Fuerzas Armadas Revolucionarias de Colombia (FARC), the two most structured insurgent groups in the country.
In addition to that, we will use other proxies for the local level of state capacity that capture the so-called ‘infrastructural power’ of the state (e.g. Mann, Reference Mann2012). We use the number of police inspections; of police posts; of courts; of telecom offices; of post offices; of agricultural bank offices; of hospitals; and the number of jails. All these figures are gathered from Acemoglu et al. (Reference Acemoglu, García-Jimeno and Robinson2015) and are divided by the municipality's population times 1,000. We also employ the share of the population with access to water at home, covered by sewage, and with electric power.
In order to better interpret our results we work with latent dimensions of state capacity, that we compute by performing a principal component analysis. We retain the first three components whose eigenvalues clear the rule-of-thumb threshold of eigenvalue equal to 1 and that together explain 70% of the variance (see online Appendix Figure A1 and Table A2). Their correlations with the single variables are reported in Figure 3. The matrix, showing how strongly each proxy loads on these factors, allows us to link each principal component to a distinct latent variable. Namely, the first principal component is highly correlated with commercial and institutional services, as the number of post offices, local bank branches, and telecommunication facilities report the highest correlation coefficients. The second one seems to capture the provision of essential public infrastructures, such as the state of sewage coverage and aqueduct water supply, which might also have a direct effect on the health conditions of the population. Finally, the third factor mostly correlates with variables more directly dependent on the central government, as police inspections, hospitals, and our preferred proxy, tax collection offices per 1,000 inhabitants. Overall, the first component can be considered the expression of small-scale community services, while the second one resumes the supply of basic public utility services. The third component, instead, might be more aligned with the standard interpretation of state capacity as the ability to impose taxation and spending its proceeds, as well as enforcing law and order.

Figure 3. Correlation matrix of Colombian municipalities’ state capacity measures and principal component.
Country-level data
We complement the micro-analysis from Colombia using cross-country data on the magnitude of the Covid-19 outbreak – data that allow us to study how the reported number of cases relates to the level of state capacity worldwide. The information on the country-level number of infected cases and deaths is retrieved from the Coronavirus Pandemic (Covid-19) section of Our World in Data (Ritchie et al., Reference Ritchie, Mathieu, Rodés-Guirao, Appel, Giattino, Ortiz-Ospina, Hasell, Macdonald, Beltekian and Roser2020). We compute the per-capita cumulative number of infected cases and deaths using the information on the population collected by the World Development Indicators (WDI).Footnote 8 We collect data on 193 countries across the globe, with the average one reporting around 10 cases (and 0.17 deaths) per 1,000 inhabitants.Footnote 9
We match this information on the Covid-19 diffusion with data on state capacity provided by Besley and Persson (Reference Besley and Persson2011). This dataset has been extensively used by scholars and is particularly rich in the fiscal capacity dimension, supplying IMF data on taxes that cover the period 1975–2000 (information that is therefore predetermined to the pandemic outbreak). The dataset reports four measures of fiscal capacity that capture the level of sophistication of the fiscal system and the country's capability of raising taxes. Namely, we use the share of taxes in GDP which describes a country's government size. We then use the share of income taxes in total taxes, 1 minus the share of trade taxes in total taxes, and the difference between the share of income taxes and the share of trade taxes – the idea being that countries with a limited fiscal capacity tend to raise tax revenues by taxing products that are the easiest to track such as, e.g. shipped commodities that are easily checked at the border (see e.g. Becker et al., Reference Becker, Ferrara, Melander and Pascali2020; Cantoni et al., Reference Cantoni, Mohr and Weigand2019; Sánchez De La Sierra, Reference Sánchez De La Sierra2020). We take the average of these shares, representing the long-run level of capacity of a state.Footnote 10
In order to make sure that our findings are not driven by particular features of the Besley and Persson (Reference Besley and Persson2011) dataset, we also employ, as a robustness check, fiscal data at the country-level from the United Nations Government Revenue Dataset (UNU-WIDER, 2023). Similarly, we use data from the V-Dem project (Coppedge et al., Reference Coppedge, Gerring, Knutsen, Lindberg, Teorell, Altman, Bernhard, Cornell, Fish, Gastaldi, Gjerlöw, Glynn, Good God, Grahn, Hicken, Kinzelbach, Krusell, Marquardt, McMann, Mechkova, Medzihorsky, Natsika, Neundorf, Paxton, Pemstein, Pernes, Rydén, von Römer, Seim, Sigman, Skaaning, Staton, Sundström, Tzelgov, Wang, Wig, Wilson and Ziblatt2023) to account for institutional differences in the rule of law that may confound the impact of state capacity. In further robustness checks we will present below, we include the Oxford Covid-19 Stringency Index for government policies addressing the pandemic (Hale et al., Reference Hale, Angrist, Goldszmidt, Kira, Petherick, Phillips, Webster, Cameron-Blake, Hallas, Majumdar and Tatlow2021), as well as official figures for the population density (Ritchie et al., Reference Ritchie, Rodés-Guirao, Mathieu, Gerber, Ortiz-Ospina, Hasell and Roser2023) and recent estimates for the urbanization rate provided by the European Commission (2023).
Finally, in a further analysis, we make use of two direct proxies of information capacity. The first proxy is the SPI provided by the World Bank (Dang et al., Reference Dang, Pullinger, Serajuddin and Stacy2023). This tool comprises 51 micro indicators categorized into five macro ‘pillars’, designed to represent a country's statistical system performance. Practically, the World Bank's framework assigns scores for each indicator, which is then mapped to one of the key areas of evaluation of national statistical systems: (i) use of data, (ii) services, (iii) data products, (iv) sources, and (v) data infrastructures. We compute and use in the analysis the country average score between 2015 and 2019, the former being the starting year of the current aggregation method. The second, alternative proxy of information capacity comes from Brambor et al. (Reference Brambor, Goenaga, Lindvall and Teorell2020), who develop a time-varying measure of a state's information capacity studying the evolution of civil and population registers, statistical agencies, and national censuses for a panel of countries from 1750 to date. We use the synthetic index that the authors obtain through a principal component analysis as it is directly comparable with the World Bank's SPI, taking its average value in the period 1980–2015.
Empirical strategy
Our conceptual framework indicates that the containment and detection effects interact with each other in a non-linear fashion, generating a pattern of the reported Covid-19 infection rate (or mortality rate) which is consistent with a U-shaped function of the state capacity level when the latter effect is substantial (see Figure 1b). We test this by estimating the following specification in which variation in fiscal capacity maps into differences in the reported infection (or mortality) rate, y ir, through the quadratic function f(⋅):

where i indicates municipalities (or countries) and r the regions (or continents) for the Colombian analysis (the cross-country analysis). f(⋅) is a quadratic function of a measure of fiscal capacity described in section ‘Data description and empirical strategy’, while X is a vector of covariates that we use to control the effect of fiscal capacity for other potential mechanisms (this will be described later on for the two analyses). Since our source of variation is at the municipality level, we include region fixed effects, μr, when using Colombian data which allow us to compare municipalities that are proximate to each other. When using cross-country data, μr indicates continent fixed effects. $\varepsilon _{ir}$ is the error term that is robustly estimated for heteroscedasticity. Finally, we note that we take a logarithmic transformation of our left-hand-side variables to take into account their skewed distribution.
In addition to Equation 1, in online Appendix Section C we also provide a complementary test of the healthcare and information capacity individual effects. The test takes inspiration from Borjas (Reference Borjas2020) who asserts that the rate of infected persons in the population can be decomposed in two distinct ratios: the percentage of tested persons who are positive and the incidence of testing.Footnote 11 Borjas’ decomposition implies that the diffusion of Covid-19 in the population depends not only on the capacity to contain the spread of the virus (which should reduce the share of tested persons who are positive) but also on the capacity to test, that is on the capacity to provide information about the true diffusion of the disease. This latter channel should relax the under-reporting problem and therefore increase the reported infection rate. We employ this logic in the following regression framework:

where yir is the reported infection rate in the population and Hir and Iir are proxies of healthcare and information capacity, respectively. Our theory is consistent with $\hat{\beta }_1 < 0$ (containment effect) and $\hat{\beta }_2 > 0$
(detection effect).Footnote 12
Evidence from Colombia
We start examining the link between state capacity and the reported Covid-19 diffusion by leveraging variation across Colombian municipalities. In this case, the main explanatory variable is the number of tax collection offices per 1,000 inhabitants and the vector of covariates (X in Equation 1) includes a series of variables that may have an impact on the pandemic through channels that are unrelated to the state capacity dimension. These controls are: the municipality's population; the distance from the closest highway; the share of inhabitants below the poverty line; and a dummy equal to 1 if the municipality is the department's capital. Moreover, we also control for the percentage of Covid cases imported from abroad. Therefore, we identify the effect of a higher tax collection capacity on the reported number of Covid-19-related cases by comparing municipalities of the same size, the same diffusion of poverty, and the same degree of openness as well as administrative importance. Finally, we add region fixed effects, allowing us to exploit the within-region variation in state capacity accounting for all unobserved differences across regions that may have a role in explaining the pandemic-related outcomes.
Estimation results are reported in columns 1 and 2 of online Appendix Table A4, where dependent variables are the infection rates and the mortality rates, respectively. The estimated coefficients of the quadratic specification are both statistically significant and their sign indicates that the curve is U-shaped. To better visualize these findings, we compute marginal effects and predicted values of the dependent variables. The latter are graphically illustrated in Figure 4, with subfigures (a) depicting the predicted values of the infection rates and (b) of the mortality rates. The estimated shape in both graphs is seemingly U-shaped suggesting that several dynamics related to state capacity (containment and detection) are simultaneously at play. In online Appendix Table A5, we show that these estimation results are not driven by the inclusion of the fixed effects and the controls. When we exclude them we observe, as expected, a smaller R 2; nevertheless, the sign and magnitude of the estimated coefficients are substantially unscathed.

Figure 4. Infection and mortality rate predicted by Colombian municipalities’ fiscal capacity. (a) Infection rate, (b) mortality rate.
Notes: the figures depict the predictions of our main specification (1) with infection rate (a) and mortality rate (b) as dependent variable, including as controls the municipality's population in 1995, the distance from the closest highway, the share of people in poverty, and the number of Covid-19 cases imported from abroad out of 100 positive tests, as well as department fixed effects and a dummy for departments capitals. 90% confidence intervals with heteroscedasticity robust standard errors are reported.
We note that the curve is not symmetric around some intermediate state capacity values, since the areas with the strongest institutions report the largest predicted Covid-19 impact. The explanation might be that, overall, the detection effect is particularly high, as illustrated, for example, by the thick red line in Figure 1(b). There, we explained that a large detection effect makes the under-reporting of the infected cases sizable (as illustrated by the vertical dashed lines in Figure 1(b)). In addition to this, the comparatively higher impact of Covid-19 on these high-state capacity municipalities may also be partly due to other factors that correlate with a robust administrative and institutional structure, such as the openness to international relations and the patterns of local mobility. As argued above, these factors are only imperfectly accounted for by our controls and may contribute to explaining the high rates of infections and fatalities.
To test the robustness of our results we repeat our estimations using principal components. As explained above these are computed out of a full-fledged set of variables that proxy various dimensions of local state capacity. As above, we report the estimated coefficients of the regression analysis in online Appendix (see Table A6) and illustrate here the related predicted values of the dependent variables. These are shown in Figures 5 and 6 where the dependent variables are Covid-19 infection and mortality rate, respectively. As one can see, the first and third principal components display a notable U-shaped relationship with the pandemic-related measures. This is possibly explained by the fact that these are, as pointed out above, closely tied to the standard ‘fiscal’ and ‘administrative’ interpretation of state capacity. Since both display a U-shaped correlation with the reported pandemic impact throughout Colombian municipalities, this analysis seems to support our argument regarding the heterogeneous effects of state capacity.

Figure 5. Infection rate and Colombian municipalities’ state capacity principal components.
Notes: the figures depict the predictions of our main specification (1) with infection rate as dependent variable and either of the first three principal components as main explanatory variable, including as controls the municipality's population in 1995, the distance from the closest highway, the share of people in poverty, and the number of Covid-19 cases imported from abroad out of 100 positive tests, as well as department fixed effects and a dummy for departments capitals. 90% confidence intervals with heteroscedasticity robust standard errors are reported.

Figure 6. Mortality rate and Colombian municipalities’ state capacity principal components.
Notes: the figures depict the predictions of our main specification (1) with mortality rate as dependent variable and either of the first three principal components as main explanatory variable, including as controls the municipality's population in 1995, the distance from the closest highway, the share of people in poverty, and the number of Covid-19 cases imported from abroad out of 100 positive tests, as well as department fixed effects and a dummy for departments capitals. 90% confidence intervals with heteroscedasticity robust standard errors are reported.
The second principal component, instead, has a more straightforward correlation with Covid-19 statistics from Colombian municipalities, especially with regards to the mortality rate. As stressed above, the second principal component seems to capture a latent dimension linked to the provision of sanitation services. As such, we may expect that this factor could be directly linked to the general health conditions of the citizens within each municipality. If the population is, by and large, healthier, it should be easier to detect Covid infections (and deaths) when these occur, avoiding the risk of mistaking them for the result of other chronically prevalent diseases in less equipped municipalities. The (roughly) monotonically increasing relationship between this public health-related measure and pandemic outcomes is another piece of evidence pointing to the fact that the detection effect was significant in explaining the reported Covid-19 rates. Once again, we re-run the regressions without controls and fixed effects. The estimated coefficients, reported in Table A7, are barely affected by the exclusion of the full set of covariates as compared to the coefficients in Table A6.
Finally, as noted in section ‘Colombian municipality-level data’, our main explanatory variable might incorrectly proxy a municipality's fiscal capacity in the presence of criminal groups. In such a case, the existence of a tax collection office, taken at face value, would misleadingly suggest an effective control of the area by the state, while some criminal gang is the actual ruler of the place. This concern is particularly relevant for a country like Colombia that has been enduring armed struggles since the 1960s. To address this issue, we re-estimate Equation 1 adding the number of guerrilla attacks per capita in the 1988–2004 period to the control set. The regression results, displayed in Table A8, indicate that our findings are robust to this potential confounding factor, as the estimated coefficients for fiscal capacity are not significantly altered by the inclusion of the guerrilla-related control variable.
Cross-country evidence
In the case of cross-country analysis, we employ as explanatory variables the average value of four measures of fiscal capacity over the 1975–2000 period, namely: the share of taxes in GDP, the share of income taxes in total taxes, 1 minus the share of trade taxes in total taxes, and the difference between the share of income taxes and that of trade taxes. We re-estimate Equation 1 for both the infection and the mortality rate with each of these measures, controlling for the country's GDP per capita, population, the share of trade in GDP, and the shares of young (under 14 years of age) and old (over 65 years of age) people. These last demographic covariates should help control for any phenomenon that may be jointly affecting fiscal systems and pandemic dynamics through the age structure of the population. For instance, a larger share of elderly people might, on the one hand, entail that more resources are to be drawn from the citizens in order to fund pensions and healthcare, while, on the other hand, it could also facilitate the spread of the Covid-19 virus as it can infect more fragile individuals. We also include continent fixed effects, allowing us to exploit the variation within sufficiently comparable sets of countries, and a categorical dummy that expresses each country's legal origin – be it English, French, German, Socialist, or Scandinavian – that should account for unobservable factors that are common throughout similar institutional systems (e.g. La Porta et al., Reference La Porta, Lopez-de Silanes, Shleifer and Vishny1997). However, we will document below that the exclusion of these controls or the inclusion of others, potentially relevant, do not change our results.
Figures 7 and 8 illustrate the predicted values for the infection rate and the mortality rate, respectively (see online Appendix Table A9 for the estimated coefficients). Since the number of countries in this sample is roughly one-tenth of the number of Colombian municipalities, the coefficients for the cross-country variables are in general less precisely estimated than those of section ‘Evidence from Colombia’, as witnessed by the larger confidence intervals (or the larger standard errors in Table A9). Nevertheless, all four state capacity proxies display the same U-shaped relationship with Covid-19 reported statistics that we had found in the Colombian case. The worst performing proxy prediction-wise is the most general of the four, that is the ratio between the total amount of taxes and GDP. It can be argued that this is a less efficient proxy of state capacity compared to the others: even countries with poor institutional arrangements could boast relatively high taxes-to-GDP numbers if most of the taxes come from easily accessible sources, such as customs duties. Accordingly, the institutional ability to tax residents’ income and collect taxes from non-trade-related sources should be more suited measures to proxy sophistication in raising taxes across states with heterogeneous exposures to international trade.

Figure 7. Infection rate and fiscal capacity across countries.
Notes: the figures depict the predictions of our main specification (1) with infection rate as dependent variable and the share of taxes in GDP, the share of income taxes in total taxes, the share of non-trade related taxes in total taxes, and the difference between the shares of income- and trade-taxes, respectively, as main explanatory variable. The set of controls include the country's population and GDP per capita, the share of people under 14 and over 65 years of age, a categorical variable expressing the country's legal origin, as well as continent fixed effects. 90% confidence intervals with heteroscedasticity robust standard errors are reported.

Figure 8. Mortality rate and fiscal capacity across countries.
Notes: the figures depict the predictions of our main specification (1) with mortality rate as dependent variable and the share of taxes in GDP, the share of income taxes in total taxes, the share of non-trade related taxes in total taxes, and the difference between the shares of income- and trade-taxes, respectively, as main explanatory variable. The set of controls include the country's population and GDP per capita, the share of people under 14 and over 65 years of age, a categorical variable expressing the country's legal origin, as well as continent fixed effects. 90% confidence intervals with heteroscedasticity robust standard errors are reported.
Finally, we show the robustness of our results by running the following checks. First, we replicate our analysis after dropping all controls and fixed effects (online Appendix Table A10). Second, we employ fiscal measures from an alternative source, namely the United Nations University World Institute for Development Economics Research Government Revenue Dataset (UNU-WIDER, 2023). Since this dataset covers a larger number of countries in the world as compared to Besley and Persson (Reference Besley and Persson2011), the sample size increases to about 160 countries (Table A11). Third, we add in the regression two measures capturing the degree of the rule of law and of constraints imposed on the executive (Table A12) – this is important to account for potential manipulation of the information that is more likely to take place in non-democratic regimes. Fourth, we further control for the strictness of the anti-pandemic policies enacted by each government (Table A13). This can help us account for the differences in the spread of the virus between countries that implemented harsh measures and comparable ones whose governments took a softer stance for political calculus. Fifth, we control for population density and urbanization rates, two factors that can explain the variability in the probability of contagion across states with similar institutional arrangements (Table A14).
Reassuringly, all these alternative specifications display results that are fairly consistent with those in Table A9, with non-trade taxes as the fiscal capacity proxy whose coefficients seem more precisely estimated. The estimated coefficients that differ the most with respect to those of the baseline specification are the ones in Table A13, where we control for the stringency of anti-Covid-19 policies. However imperfectly, government responsiveness to public health threats and the adequacy of its choices are correlated with the level of state capacity. Thus, it is likely that the slight gap with the other specifications can be ascribed to partial collinearity between the Stringency Index covariate and our main explanatory variables.
Conclusions
The Covid-19 pandemic has had a huge impact on our lives and it is still affecting important aspects as diverse as the economy, the health, and the social relationships in all countries around the globe. While the body of scientific research on the pandemic has been massive, we still lack a precise quantitative measure of the mortality and infection rates. In this study, we contend that state capacity is an important predictor of these (reported) rates. State capacity is relevant to explain the Covid-19 outbreak for at least two reasons. First, its healthcare capacity dimension helped contain the virus diffusion through an institutional response to the emergence; second, its information capacity dimension helped policymakers collect data and detect information on the true severity of the virus outbreak. We develop a conceptual framework that we use to explain how institutions affected the reported infection (and mortality) rates and use evidence from Colombia, and across countries, that is consistent with these two dimensions interacting with each other, generating non-linear effects.
The focus on the Covid-19 outbreak gives us the opportunity to dwell on the multi-faceted nature of the state capacity concept. Both within Colombia and worldwide, we observe higher reported rates in high state capacity countries – a pattern that is difficult to reconcile without taking into account the role of the state as collector and processor of data. We interpret this as an increase in detection that is driven by the public authority's capacity to seek the coronavirus through PCR testing and other capabilities in processing data, that is its information capacity. When the level of state capacity is lower, this capability comes down and, despite the containment falls, one can observe a reduction in the reported infection and mortality rates. Our results thus indicate that institutions do matter in explaining the reported geographical distribution of Covid-19, once one also incorporates in the analysis the detection effect of state capacity.
Our study brings forward important policy implications. The under-reporting documented in our paper, while possibly temporary (or just related to the first stage of the pandemic), was likely to deploy long-standing effects through unfounded policymaking. Many of the healthcare-related policies implemented at the time could have been different with better information available – possibly contributing to saving human lives. The implications of poor information capacity are therefore broad going from being unable to respond to an emergency to making wrong policies because of imprecise information. Our work paves the way for future research on these important aspects of state-building and also warns scholars and policymakers about the complex effects of state capacity during an emergency.
It might be useful to stress some limitations of our study. First, we note that we are not able to inform policymakers on whether it is better to allocate an extra dollar to detection or containment in order to respond effectively to an emergency. This assessment would ideally require a quasi-natural experiment design which allows to identify the detection effect by holding fixed the impact of the healthcare capacity. Second, we note that the relationship between state capacity and the reported infection (and mortality) rates can be even more complex, including the interdependence of the two outlined effects. In the online Appendix we bring suggestive evidence of complementarity between these two effects, with better detection mechanisms helping improve the public health measures needed to address the emergency. Moreover, it may happen that, beyond a certain level, healthcare capacity exhibits diminishing marginal returns, so that the marginal effectiveness of containment may decrease. All in all, our analysis highlights how state capacity is a complex concept whose effects on socioeconomic outcomes cannot be easily ascertained if not backed by an in-depth theoretical analysis.
Acknowledgements
We thank the editors and four anonymous reviewers for their relevant comments. We are also grateful to Leopoldo Ferguson, Jorge Tovar, and, especially, Massimiliano Onorato for comments and discussion.