Policy Significance Statement
Resources to combat corruption are often scarce. While tools such as audits have proven effective in some contexts, they are often costly and difficult to scale and sustain. Therefore, anticipating, with a certain level of precision, where politicians are most likely to misbehave is key in the fight against this problem. Using data from Colombian municipalities, our approach shows how governments can use artificial intelligence tools to predict where irregularities are most likely to occur. This setup also allows us to identify the features with the greatest predictive power to anticipate where misconduct will occur, which is useful for discussions on institutional reform to curb corruption.
1. Introduction
Corruption, the misuse of public office for private gain (Svensson, Reference Svensson2005), has pervasive consequences on the development of countries and the well-being of the population. Corruption affects governments’ abilities to collect taxes (Olken and Pande, Reference Olken and Pande2012), provide public goods and services (Olken, Reference Olken2007), and correct negative externalities (Bertrand et al., Reference Bertrand, Djankov, Hanna and Mullainathan2007). The private sector is also affected by malfeasance, as firms’ production choices and investment decisions are less efficient as a consequence of the higher levels of uncertainty generated by corruption (Sequeira and Djankov, Reference Sequeira and Djankov2014). Moreover, corruption acts as a barrier to entry for new firms, thus limiting market competition (Colonnelli and Prem, Reference Colonnelli and Prem2022). Not surprisingly, corrupt countries tend to have lower levels of GDP per capita, worse indicators of human capital, and lower levels of trade openness and political freedom (Svensson, Reference Svensson2005).
Despite its relevance, fighting corruption is tough, in part because it is difficult to measure and detect. Traditionally, most corruption measures have been based on perception surveys, in which key actors, such as businessmen or journalists, are asked their opinion regarding how corrupt a country is. These measures, some of which are still in force, were largely developed to make comparisons between countries.Footnote 1 However, comparisons within countries, for example between cities, are also important as they can determine how resources need to be allocated in the fight against corruption. In recent years, there has been notable progress toward the use of more objective and granular indicators of malfeasance. Some of these measures use, as a source of information, either the data contained in the public procurement platforms of each country, or the results of the audits carried out by the anti-corruption agencies.
In fact, the evidence suggests that top-down accountability, whose main tool are audits conducted by high-level agencies, is an effective tool in the fight against corruption (Olken, Reference Olken2007; Ferraz and Finan, Reference Ferraz and Finan2008, Reference Ferraz and Finan2011) and can have positive effects on economic performance (Colonnelli et al., Reference Colonnelli, Lagaras, Ponticelli, Prem and Tsoutsoura2022; Colonnelli and Prem, Reference Colonnelli and Prem2022). But audits are a costly and scarce resource, in terms of time, money, and human capital. They cannot be carried out all the time everywhere, therefore they need to be assigned efficiently if what is sought is to maximize their effectiveness. In this context, recent advances in the field of artificial intelligence can be useful, since they allow us to anticipate where corruption is most likely to occur. Decarolis and Giorgiantonio (Reference Decarolis and Giorgiantonio2020) and Gallego et al. (Reference Gallego, Rivero and Mart́inez2021b) apply these tools to predict malfeasance in public procurement at the contract level in Colombia and Italy, respectively; Colonnelli et al. (Reference Colonnelli, Gallego, Prem, Buonanno, Vanin and Vargas2022) and de Blasio et al. (Reference De Blasio, D’Ignazio and Letta2020) use them to develop corruption indicators at the municipal level in Brazil and Italy, respectively; while Salles and Delles (Reference Salles and Delles2020) follow a cross-country approach.
In this article, we train a set of machine-learning models to predict corruption at the municipality level in Colombia.Footnote 2 For this purpose, we use the results of the prosecutions for disciplinary offenses carried out by the Office of the Inspector General against mayors of the country’s municipalities, for the 2008–2011 and 2012–2015 mayoral periods. Using this indicator as the outcome variable, we train four canonical machine-learning models—Random Forests, Gradient Boosting Machine (GBM), Lasso, and Neural Networks—and ensemble their predictions using the Super Learner approach (Polley et al., Reference Polley, Rose and Van der Laan2011). These models are fed by 147 municipality-level predictors, which in turn, are grouped into 10 categories of interest that allow us to understand what are the most important predictors of corruption. Our results show, on the first place, that misconduct at the municipality level in Colombia can be predicted with tolerable levels of precision. The performance of our models is acceptable, reaching levels of precision of 84% and an area under the receiver-operating characteristic (ROC) curve (AUC) of 0.71, two metrics commonly used in the literature to evaluate algorithms. Secondly, our feature importance analysis allows us to understand which characteristics have the greatest predictive power when forecasting corruption. The results show that, in the Colombian case, variables associated with the development of the financial sector and human capital have the greatest predictive power. Variables associated with the public sector, local politics, or crime have a lower weight, despite being frequently associated with corruption. Finally, characteristics typically related to the Colombian context, such as armed conflict, illicit activities, or dependence on natural resources, have the least predictive weight.
We consider that the construction of this type of indicators, based on observable characteristics of the municipalities and using cutting-edge artificial intelligence models, represents an important tool in the fight against corruption. In fact, in a recent application in the context of the economic and health crisis caused by the coronavirus disease-2019 (COVID-19) pandemic in Colombia, Gallego et al. (Reference Gallego, Prem and Vargas2020b) finds that municipalities with a higher risk of corruption according to a predicted index based on a machine-learning model, show a greater increase in the use of discretionary (non-competitive) contracts to respond to the emergency. This result is consistent with the fact that several mayors have been investigated and convicted of misusing resources in the midst of the emergency, demonstrating that corruption indexes based on machine-learning models can indeed serve to anticipate in which places misconduct is more likely to occur.
From a public policy perspective, this work represents an important contribution as it illustrates how artificial intelligence can be used to allocate scarce resources. Anti-corruption audits are effective but expensive, so predicting where they will have the greatest impact may help increase government efficiency. Critical events, such as the COVID-19 pandemic, in which governments have to spend a lot in a short time, create opportunities for corruption (Gallego et al., Reference Gallego, Prem, Vargas, Bandiera, Bosio and Spagnolo2021a). Therefore, these types of tools are crucial for governments to respond quickly to the transparency challenges created by these events. Additionally, compared to other studies that predict malfeasance at the contract level (e.g., Decarolis and Giorgiantonio, Reference Decarolis and Giorgiantonio2020; Gallego et al., Reference Gallego, Rivero and Mart́inez2021b), in this article, we focus on the municipal level, which is important given the way audits are allocated in some countries. Colonnelli et al. (Reference Colonnelli, Gallego, Prem, Buonanno, Vanin and Vargas2022) and de Blasio et al. (Reference De Blasio, D’Ignazio and Letta2020) also predict at the municipal level for Brazil and Italy, respectively. The advantage of our study is that by focusing on a country like Colombia, we can analyze the role played by key variables in this context, such as those related to the armed conflict, drug trafficking, or certain types of natural resources.
2. Context and Data
2.1. Context
Colombia is considered a country highly affected by corruption. It is ranked 96, out of 180 countries, in Transparency International’s Annual Corruption Perception Index. Furthermore, a recent opinion poll reveals that 78% of the population considers that corruption is getting worse in the country, and 29% think that this is the worst problem in Colombia. A recent report on this phenomenon, based on national and regional press reports, found that 69% of corruption cases in the country are associated with the municipal order (Transparencia por Colombia, 2019).Footnote 3 This is particularly serious, considering that since the decentralization process that began to consolidate in the 1990s, mayors began to decide on increasingly larger fractions of the public budget.Footnote 4 In fact, mayors are in charge of providing basic public services in important areas such as education, health, drinking water, and sanitation.
The Office of the Inspector General (PGN for its Spanish acronym) is the autonomous agency in charge of monitoring the behavior of public officials so that their actions comply with the current disciplinary code. The Inspector General is elected for 4 years by the Senate from a list proposed by the President, the Supreme Court of Justice, and the Council of State. The PGN is empowered to initiate, carry out, and rule on investigations conducted against public servants due to disciplinary offenses (Cetina et al., Reference Cetina, Garay, Salcedo-Albaran and Vanegas2020). These investigations may be induced by press news, its own audits, reports from other agencies, or tip-offs (Martinez, Reference Martinez2019). After starting a process, the officials assigned to the case verify and analyze the information received, assess the facts and the responsibility of the people involved, and if they find sufficient evidence, file charges. In this way, the PGN is independent of the three branches of government in Colombia. However, unlike in other countries, the assignment of anti-corruption audits is not random.
Investigations can end in sanctions and removals from office of members of the executive, including the mayors, whenever misconduct is proved. According to Transparencia por Colombia (2019), the four most committed crimes in recent years are embezzlement (18%), wrongful awarding and signing of contracts (13%), falsification of a public document (12%), and conspiracy to commit a crime (11%). Just to give an example, the case of Samuel Moreno, elected mayor of Bogota for the 2008–2012 period, is quite illustrative. Moreno was suspended and removed from office by the Office of the Inspector General on 2012, accused of corruption in public procurement, in particular for the construction of a public transport trunk line. Moreno is currently in prison, serving an 30-year sentence for these crimes.Footnote 5
2.2. Measuring corruption
Measuring corruption is challenging (Olken, Reference Olken2007; Olken and Pande, Reference Olken and Pande2012). More so measuring corruption for all the districts of a particular country.Footnote 6 Rather than relying on perceptions of citizens or key actors—a common approach to measure corruption championed by international organizations—we follow Colonnelli et al. (Reference Colonnelli, Gallego, Prem, Buonanno, Vanin and Vargas2022) and Gallego et al. (Reference Gallego, Rivero and Mart́inez2021b) and use a machine-learning approach to predict corruption based on factual corruption detections and observable characteristics of municipalities.
Using information from the Office of the Inspector General of Colombia and originally collected by Martinez (Reference Martinez2019), we construct a dummy variable indicating if the mayor of each municipality was prosecuted by this anticorruption agency in the 2008–2011 or 2012–2015 mayoral periods. We define this measure as our outcome variable and combine it with a large set of municipal features to train four machine-learning models: Random Forests, GBM, Lasso, and Neural Networks.
Note that our outcome variable corresponds to PGN prosecutions for violations of the disciplinary code. It is important to highlight several aspects of this measure. First, according to Martinez (Reference Martinez2019), in 95% of the cases in his database where the outcome of the prosecution is observed, the investigation leads to a disciplinary sanction. Therefore, this measure is really capturing instances of misbehavior on the part of public servants, mostly mayors (70%). Second, unfortunately, with the information available, we cannot identify which of the investigations correspond to major offenses or minor ones. However, at the municipal level and for the period of interest of this study, the correlation between having an official found guilty and having one removed from office is positive, highly significant, and large in magnitude (Pearson ρ = 0.73). Therefore, as this is a purely predictive exercise, this high correlation implies that detecting places where an official is likely to be found guilty means it is also highly likely that an official committed a serious offense to the code. Nonetheless, an important caveat of our analysis is that although the dependent variable certainly captures the misconduct of public servants, it does not always measure severe acts of corruption.
Finally, this measure can also suffer from what the machine-learning literature has called the selective labels problem. As we described before, the decision of when and who to investigate is up to the PGN, with no randomness in the process. Therefore, this decision is not free from bias since investigators could go after cases in which more resources are involved, they could be intimidated or bribed by illegal actors, or they could simply make errors of judgment. Therefore, the observed outcomes are not necessarily a random sample of the population, which makes the predictive exercise more complicated.Footnote 7 However, it is reassuring that the results presented below, both in terms of the performance of the models and the feature importance, are not substantially different from those of Colonnelli et al. (Reference Colonnelli, Gallego, Prem, Buonanno, Vanin and Vargas2022), who use data from proven cases of corruption detected after random audits in Brazil. An interesting avenue for future research—which is outside the scope of this study—is to use the contraction method proposed by Lakkaraju et al. (Reference Lakkaraju, Kleinberg, Leskovec, Ludwig and Mullainathan2017), which “facilitates effective evaluation of predictive models even in the presence of unmeasured confounders (unobservables) which influence both human decisions and the resulting outcomes.” In any case, we acknowledge that these predictive exercises will be more accurate when audits are randomly assigned, which in itself is an important policy recommendation.
2.3. Covariates
We use a total of 147 municipality-level predictors, grouped into 10 categories and measured based on the electoral period before the time at which the outcome was measured. The categories are as follows: financial sector, conflict, crime, human capital, local politics, public sector, local demographics, economic activity, illegal activity, and natural resources. The financial sector category includes per capita measures of financial sector employees, bank deposits, bank credits, housing credits, bank offices, among others. Conflict and crime variables include guerrilla and paramilitary presence and attack indicators, demobilized combatants, kidnappings, homicides, robberies, and so forth. The human capital dimension includes several educational features.
Local politics refers to electoral variables such as the number of candidates, margin of victory, voter turnout, among others. The public sector dimension includes judiciary indicators, expenditures, transfers, and revenues, and state capacity indices. Local demographics refer to population, density, rurality, inequality, child development, access to public services, and so forth. In the case of economic activity, we include GDP measures for different sectors and nighttime lights. The illegal activity dimension includes variables related to coca production and illegal mining. Finally, natural resources refer to features describing the oil, gold, and palm sectors plus some measures of deforestation. Supplementary Table A1 includes the complete list of covariates used to train the models.
3. Machine-Learning Models
In this section, we describe the machine-learning models used to predict corruption as well as the training procedure and the different measures we use to assess the performance of the different models.
3.1. Models
In order to predict municipality-level corruption, we train a set of popular machine-learning models, which include Random Forests, Gradient Boosting, Neural Networks, and Lasso. Each of these models has its own weaknesses and strengths, and therefore we also rely on an ensemble model that combines the predictive power of all individual models to optimize the overall performance (Friedman et al., Reference Friedman, Hastie and Tibshirani2001). We ultimately allow the data to inform which of the models is best suited for this application based on their out-of-sample performance.
3.1.1. Lasso
The Lasso regression, first developed by Tibshirani (Reference Tibshirani1996), is similar to a logistic regression model, but adds a penalization term based on the sum of the absolute values of the coefficients and a penalization term based on the sum of the square of the parameters. By adding these penalization terms, the parameters of the model are shrunk toward zero, leading to a more parsimonious model than the logistic regression. In this way, we end up with a simple and less prone to over-fitting model. The tuning parameter in the cross-validation are the weight of the penalization terms in the objective function (λ) and the relative weight of the absolute sum of coefficients (α) as the penalization term.
3.1.2. Random Forests
Random Forests are ensembles of many decision trees, where each one of them is a sequence of rules that divides the sample into leaves, that is, sub-groups, based on certain variable cutoffs. The prediction for each leaf, in the case of a classification task, is the most common outcome for the trained observations on that leaf. The trees are fit with the aim maximizing the information gain of the resulting partitions of the data. In the case of Random Forests, each tree is constructed by sampling a random subset of the training data and a random subset of the predictors. Each of these trees end up generating a prediction, and the overall prediction of the Random Forest is the average (or the majority) of the predictions among all trees. In this application, we keep fixed the number of fitted trees (500) and use cross-validation to determine the optimal number of features available in every node.Footnote 8
3.1.3. Gradient Boosting Machine
GBMs are ensembles of weak learners, in this case, decision trees. Under boosting, classification algorithms are sequentially applied to a reweighted version of the training data (Friedman et al., Reference Friedman, Hastie and Tibshirani2000). GBM is a variant of Random Forests, in which trees are not fitted randomly nor independently. Instead, each tree is fitted sequentially to the full dataset, in such a way that the weaknesses of trees are identified by using gradients in the loss function, allowing subsequent predictors to learn from the mistakes of the previous ones. In other words, a gradient descent procedure is used to minimize the loss when adding new trees. As opposed to Random Forests, in this case observations are not selected via bootstraping, but as a function of past errors. By doing this each new tree offers a slight improvement of the model (Freund et al., Reference Freund, Schapire and Abe1999). In our models, we keep fixed the learning rate (shrinkage parameter) and the minimum number of observations in the terminal nodes to avoid overfitting, and use our cross-validation procedure to determine the optimal number of trees and the interaction depth.
3.1.4. Neural Networks
Neural networks capture the relationship between input and output signals through models that mimic the way biological brains work. These models are composed of three basic elements: an activation function, that for each neuron, transforms the weighted average of input signals (predictors) into an output signal; a network topology, which is composed by the number of neurons, layers, and connections used by the model; and a training algorithm, which determines the way in which connection weights are set with the task of activating or not neurons as a function of the input signals. This process determines the final prediction of the model. The optimization problem seeks to find the optimal weights of the input signals for each node. In our analysis, we keep fixed a logistic activation function and use cross-validation to determine the optimal number of units in the hidden layer (size) and the regularization parameter (decay).Footnote 9
3.1.5. Super learner ensemble
Ensembles are collections of models which are grouped to each other, to give a final prediction. It is usually the case that ensembles—as they result from the combination of different models—perform better than their individual components. For our analysis, we use the Super Learner ensemble method developed by Polley et al. (Reference Polley, Rose and Van der Laan2011). This model aims to find an optimal combination of individual models by minimizing the cross-validated out-of-bag risk of these predictions. Van der Laan et al. (Reference Van der Laan, Polley and Hubbard2007) show that this ensemble model performs asymptotically as well as the best possible weighted combination of its constituent algorithms. Finally, we also use the Super Learner model not only to stack the individual predictions, but also to test for the relative importance of different groups of variables to predict politician’s misconduct.
3.2. Training and testing
We use an indicator variable for politician’s misconduct in mayoral term t as our variable of interest and all the predictors are measured as averages within the politician term. In this way, we end up with a cross-sectional dataset with all the municipalities for the mayoral periods 2008–2011 and 2012–2015. To train our models, we conduct the following steps:
1. We divide our dataset into a training set that uses 70% of the data and a testing set that uses 30%.
2. In our training set we perform a fivefold cross-validation procedure to train our models and choose the optimal combination of parameters. This method divides the training set into five different equal-size samples at random. Then, a model is fit in four subsamples and then test it in the remaining one. We then repeat this procedure for each of the five subsamples, so each one of them end up being a validation set, and for each of the values of the tuning parameter grid of each model. Finally, the best-performing parameters are chosen.
3. We repeat the previous step 10 times with different random partitions. In this way, we are able to obtain 10 “optimal parameters.” Then we use as our optimal parameter the average of them. In the case of integer parameters, we round it to the closest integer.
4. Based on these optimal parameters we assess the performance of our models in the test set that has never been used for training purposes.
We standardize the data by the mean and standard deviation. Table 1 shows the optimal parameters from our training procedure for each of our models.
Notes. This table presents the optimal parameters for each of the prediction models we implement after the training procedure described in Section 3.2.
3.3. Assessing models’ performance
Once we have calibrated our model following the procedure explained above, we proceed to compare the performance of the different models using the test set. Our first performance measure of interest is the area under the ROC curve (AUC). This measure captures the trade-off between the true positive rate and false positive rate, as we vary the discrimination threshold. It can also be interpreted as the probability that, if we randomly select two observations, they will be correctly ordered in their predicted risk of corruption, that is, the probability that the municipality at a greater risk for corruption is assigned a higher probability of corruption. We complement this measure with each model’s level of accuracy, which is defined as the proportion of municipalities correctly predicted as corrupt; model’s sensitivity which is the proportion of actual positives identified correctly (true positives over true positives plus false negatives); model’s precision which is the proportion of correctly identified positives (true positives over true positives plus false positives), and models’ specificity which is the proportion of actual negatives identified correctly (true negatives over true negatives plus false positives).
3.4. Identifying best predictors
We begin by assessing the individual municipality characteristics that best predict corruption. In the case of tree-based models, importance is measured as the information gain, achieved when splitting on each variable. In this case, importance is measured on a scale from 0 to 100, where 100 is value for the most important predictor and the rest of the variable’s information gain is expressed relative to the variable with the highest one. For the Lasso model the importance is measured by the estimated coefficients, where larger coefficients (in absolute value) correspond to higher importance. Finally, for Neural Networks, importance is determined by the weights that connect neurons within the network.
We then move to the analysis of the predictive power of subgroups of related municipality characteristics to understand which categories matter the most. It could be the case that some groups do not have one particular variable that highly predicts corruption, but that the group as whole has a high predictive power. To do this, we estimate models including one category at the time (i.e., excluding all covariates that are not part of it) and compute the resulting AUC for the group. Then, we are able to rank them according to their AUC, and compare the computed AUC with a 50% level, which corresponds to the AUC of a random prediction “model.” In this case, the group that increases the AUC by itself the most is the model with the highest level of predictive power. Finally, we assess the statistical difference in the predictive power between groups by computing confidence intervals at a 95% confidence level. We do this using a bootstrap procedure over the test set and computing the AUC for each sample.
4. Findings
In this section, we present the main results of our analysis. First, we focus on the overall performance of the predictive models. Then, we identify the best individual and group predictors and their link to the corruption literature.
4.1. Models’ performance and the predictability of mayors’ misconduct
Figure 1 plots the ROC curves for each of the four models and for the ensemble. There are several aspects to highlight: first, all the curves are far enough away from the 45° curve, which would correspond to a naive classifier that generates a false positive for each true positive. Second, the Neural Network achieves the worst performance of all according to this metric, which explains why it is not used in the ensemble, as shown in Table 1. Finally, the performance of the remaining models, in regards to the area under the ROC curve (AUC), is similar and acceptable, without being outstanding.
Table 2 corroborates the previous assertion. In terms of the AUC, Random Forest, GBM, and the ensemble achieve the highest performance (0.72), compared to the Neural Network, which achieves the lowest (0.70). Accuracy is similar in all five models, with a hit rate of 84% for Random Forests, GBM, Lasso, and the ensemble, and 81% for the Neural Network. Table 2 reports three additional metrics, sensitivity, specificity, and precision. The metrics suggest that the models tend to predict more false positives, as suggested by the low levels of precision.Footnote 10 In sum, although the models do not reach performance levels as high as in other studies,Footnote 11 an accuracy of 84% and an AUC of 0.72 are still acceptable. Consequently, these models could be used by the authorities to decide where to conduct anti-corruption audits.
Notes. This table presents the model performance for all our prediction models. AUC, accuracy, sensitivity, precision, and specificity are defined in Section 3.3.
4.2. What are the best predictors of mayors’ misconduct?
The analysis of the features that have the greatest predictive power to anticipate where mayors are most likely to misbehave, is divided into two parts: First, we group the 147 municipality-level characteristics into 10 dimensions of interest, to determine how much predictive power each one has. For this, we define the following dimensions: public sector, human capital, economic activity, local demographics, financial development, local politics, natural resources’ exposure, illicit activity, crime, and conflict. We then disaggregate the analysis, by studying which individual characteristics, in each model, have the greatest predictive power.
Figure 2 shows the results of the first analysis. Surprisingly, variables associated with financial development rank first, followed by the measures of local demographics, local politics, and human capital. The result is surprising since dimensions that the literature usually associates with corruption, such as those related to the public sector, crime, or conflict, occupy intermediate positions (Rose-Ackerman and Palifka, Reference Rose-Ackerman and Palifka2016; Fisman and Golden, Reference Fisman and Golden2017). Shaxson (Reference Shaxson2007), for example, suggests that the resource curse is explained by the higher levels of corruption that exist in countries where these resources are abundant. Other studies suggest that the size and quality of the public sector is a determining factor in the level of corruption (Robinson and Verdier, Reference Robinson and Verdier2013; Colonnelli et al., Reference Colonnelli, Prem and Teso2020b; Gallego et al., Reference Gallego, Li and Wantchekon2020a). However, our results challenge all these explanations by suggesting that the main red flags of corruption are in the financial sector (Cooray and Schneider, Reference Cooray and Schneider2018).
Moreover, variables usually associated with the Colombian context, such as those related to the dependence on natural resources and illicit activities, occupy the last places in this ranking, challenging the view that other manifestations of state weakness would predict where corruption is most likely to occur. In sum, these results corroborate what Colonnelli et al. (Reference Colonnelli, Gallego, Prem, Buonanno, Vanin and Vargas2022) find for Brazil, in the sense that variables associated with the private sector have a preponderant weight when explaining the level of corruption within a country, while the characteristics related to the public sector have less weight. These findings contrast with the great emphasis that is usually given to the public sector and public officials when thinking about anti-corruption reforms (Olken and Pande, Reference Olken and Pande2012).
Finally, Figure 3 shows which individual variables have the greatest predictive power for each machine-learning model. It is interesting that in three out of four cases, a financial sector variable ranks first: The number of financial sector workers for the Random Forests and the GBM, and the size of the housing credit market in Lasso. Other financial variables appear consistently in the models, like the number of bank offices.Footnote 12 This result reaffirms what was found above, in the sense that the level of development of the financial sector is key to understanding why some places are more corrupt than others.Footnote 13
Delving into why the development of the financial sector is an important predictor of corruption is outside the scope of this study, among other things, because this exercise is purely predictive and not causal. However, we propose a hypothesis: the degree of concentration and competition in the financial sector, reflected in the size of its main players, can determine the levels of rent-seeking, money in politics, and influence in government decisions. Anecdotal evidence from the Colombian case would give suggestive support to this hypothesis. The AVAL group, the leading financial conglomerate in the country, was related to the so-called Lava Jato scandal and the Brazilian multinational Odebrecht. According to judicial investigations, the Brazilian company and officials from the financial group bribed public servants from the Ministry of Transportation in 2009 to be awarded the construction of a major highway in the country.Footnote 14
Finally, we would like to underscore one of the main limitations of our analysis. As we said before, the exercise presented here is predictive, but not causal. This fact means that these types of tools inform in what kind of places acts of corruption are more likely to occur (e.g., depending on the development of the financial sector). However, the models hardly shed light on the type of reforms or interventions (in that sector) that would help control corruption. To answer this type of questions, causal inference tools such as randomized controlled trials or quasi-experimental methods are still useful, some of which are strengthened by machine learning, such as the doubly robust models proposed by Belloni et al. (Reference Belloni, Chernozhukov and Hansen2014).
5. Conclusions
In this article, we propose the use of artificial intelligence tools to predict where rulers are more likely to commit acts of corruption. We apply these methods to the Colombian case, exploiting the fact that municipal mayors manage a significant fraction of public resources and are frequently involved in corruption scandals. Using information from prosecutions conducted by the Office of the Inspector General against these mayors, we trained four canonical machine learning algorithms, and ensembled their predictions, to forecast where there is a greater risk of corruption. The performance of our models is good and allows us to understand what are the features of municipalities that have the greatest predictive power to anticipate where will misconduct occur. Surprisingly, variables associated with the financial sector have the greatest weight, features related to the public sector play a secondary role, while characteristics associated with armed conflict, illicit activities, and dependence on natural resources have the least predictive power.
From a public policy perspective, we consider these tools to be particularly useful. Anti-corruption audits, carried out by independent agencies, have proven to be efficient in curbing this phenomenon (Olken, Reference Olken2007; Ferraz and Finan, Reference Ferraz and Finan2008, Reference Ferraz and Finan2011) and in improving economic performance (Colonnelli and Prem, Reference Colonnelli and Prem2022). But audits are expensive and therefore a scarce resource. Its use must be optimized so that its effectiveness is the highest. The use of artificial intelligence tools, as illustrated in this article, helps to fulfill this purpose, especially during crises such as pandemics, wars, and natural disasters, in which governments must spend a lot and in a short time, which creates opportunities for corruption.
Acknowledgments
We thank Misíon de Observaci’on Electoral, Contraloríaa General de la República, and Luis Mart́ınez for sharing with us the data used in this project. Erika Corzo and Andŕes Rivera provided excellent research assistance. We also thank seminar participants at the World Bank and University of Pennsylvania.
Funding Statement
This work received no specific grant from any funding agency, commercial, or not-for-profit sectors.
Competing Interests
The authors declare no competing interests exist.
Author Contributions
Conceptualization: J.G., M.P., J.V.; Data analysis: J.G., M.P., J.V.; Data curation: J.G., M.P., J.V.; Methodology: J.G., M.P., J.V.; Writing: J.G., M.P., JV.
Data Availability Statement
The replication data that support the findings of this study will are available at https://osf.io/vfdhj/.
Supplementary Materials
To view supplementary material for this article, please visit http://doi.org/10.1017/dap.2022.35.
Comments
No Comments have been published for this article.