A large body of work has estimated the outcome of variation in presidential power. To identify studies of presidential power systematically, we searched a selection of leading comparative politics journalsFootnote 1 and identified a total of forty-nine studies that included an estimation of presidential power.Footnote 2 The distribution of this work confirms that scholars are increasingly choosing to estimate the effect of presidential power generally. Four were published from 1995–99 inclusive and ten from 2000–04 inclusive, whereas twenty-five articles were published from 2005–09 inclusive with ten in 2010 and 2011 alone. In all but four of these studies, presidential power was operationalized explicitly or implicitly as an explanatory variable. In these forty-five studies, the dependent variable ranged widely across topics such as economic reform, democratic consolidation, the level of protectionism, the effective number of parties, cabinet composition, voter turnout and many others. In thirty of these forty-five studies, variation in presidential power was confirmed to have a significant effect on the outcome under investigation.
What are scholars trying to capture when they estimate the effect of presidential power? In eleven of the forty-nine studies we identified, scholars focused only on a specific aspect of presidential power. For example, Cheibub wished to explain variation in budget balances in democratic systems.Footnote 3 Consistent with his focus, he operationalized a presidential power variable, but only in terms of the president’s power over budgetary policy and the president’s veto power rather than presidential power generally. Thus when scholars wish to test a particular theory of presidential power, there is evidence that they have estimated the effects of only the specific elements of presidential power that relate to that theory.
In the remaining cases, though, scholars stated that they wished to estimate the effect of presidential power generally. A very small number of scholars were more precise about what they understood by this term. For example, Biglaiser and DeRouen stated that they were trying to capture ‘centralized executive authority’.Footnote 4 Hicken and Stoll understood presidential power to be ‘the degree to which power is concentrated in the presidency within the national level of government’.Footnote 5 Yet most scholars stated only that they were interested in the effects of a general term such as presidential power(s), presidential strength, presidential authority, executive power, executive authority or an equivalent term. While there could be semantic differences between these terms, there is no discussion of such differences; scholars have been using them synonymously. Some studies have used the terms presidential power and executive power as direct synonyms.Footnote 6 However, some studies have estimated the effect of variation in the level of constraints on the executive in the system of checks and balances by operationalizing Polity’s XCONST variable or Henisz’s POLCON variable.Footnote 7 Studies have also estimated a presidential power variable and the XCONST executive constraints variable separately.Footnote 8 In short, scholars have been able to distinguish presidential power from executive constraints more broadly. We excluded studies that estimated solely the effect of executive constraints.
Overall, we identified thirty-eight studies in which scholars tried to estimate the impact of presidential power generally. Although they used different terms, we are confident that they were trying to capture the extent to which the presidency was a powerful actor within the national government, rather than either some specific power of the institution or the position of the executive within the system of checks and balances more broadly.
Existing Measures Of Presidential Power
How have scholars tried to estimate the impact of presidential power generally? A number of the thirty-eight studies we identified drew up a discrete measure of presidential power with cross-national country scores. Most studies, though, relied on a measure that had been drawn up by other scholars, whose sole aim was to generate a set of presidential power scores rather than to estimate the empirical effect of variation in the scores. These measures were often available only in specialist journals or online datasets. Therefore, to identify the full set of presidential power measures that has been proposed over the years, it was necessary to move beyond a search of leading journals. To that end, a separate Google Scholar search was conducted using terms such as ‘presidential power measure’ and ‘index of presidential power’. We identified nineteen separate and original measures of presidential power,Footnote 9 plus a further sixteen studies that used one of these measures but both/either reported scores for a different set of countries and/or gave countries different scores from the original study.Footnote 10 Thus, we have a dataset of thirty-five measures of presidential power.
The methodology used across the thirty-five measures is relatively consistent. The measures are all based on a set of individual indicators of presidential power. Often, the indicators are binary. If a president enjoys a particular power, then a value of 1 is assigned for that indicator, and 0 otherwise. Sometimes the indicators are ordinal. For example, Shugart and Carey propose ten indicators of presidential power, and each indicator has a range of 0–4.Footnote 11 Presidents are then awarded a score within this range for each indicator. Whether the indicator scores are binary or ordinal, the total score for presidential power is invariably the aggregate of the scores for each indicator. This generates a set of cross-national presidential power scores for particular time periods.
While there are now many different measures of presidential power, there are empirical and theoretical problems with them. First, while none of the measures aimed to assess individual presidents’ personal power, they did capture two different manifestations of presidential power. Some were derived solely from constitutional indicators of presidential power, whereas others were based on a mix of constitutional and behavioral powers, meaning the power of the presidency in ‘actual political practice’.Footnote 12 There are problems with measuring the constitutional powers of presidents, because constitutions can be imperfect measures of actual political power. However, there are also problems with measuring the behavioral power of presidents, because there is the risk of capturing the impact of factors such as party competition rather than the power of the presidency itself.
Secondly, even if we confine ourselves to measures of one type of presidential power, the correlation between the different measures can be relatively low. For example, comparing only those measures that are based on indicators of constitutional powers, the pairwise correlation between the Shugart and Carey and Johannsen measures is −0.19.Footnote 13 The same figure for the Hellman and Frye measures is 0.50, even though both are measuring presidential power scores only in Central and Eastern Europe and the former Soviet Union.Footnote 14 Inevitably, this means that empirical results are likely to be sensitive to the particular measure that is used.Footnote 15
Thirdly, there is great variation in the country coverage of the different studies, as well as the time periods covered. Only three of the thirty-five measures covered a large number of countries across political regimes generally.Footnote 16 Some focused on only one particular region, such as Latin America, Eastern Europe and the former Soviet Union, or Africa. Others selected on the basis of a different analytical criterion. For example, Tavits reports the scores for twenty-three countries with weak presidencies.Footnote 17 What is more, scholars have now been proposing presidential powers scores for nearly twenty years, yet the scores are not updated after publication. Given that constitutions are often amended, reported presidential power scores can soon go out of date. This means that countries sometimes cannot be reliably included in an estimation even if a presidential power score for that country exists.
Finally, there are problems of construct validity. Fortin has shown that the indicators of any given measure of presidential power are not necessarily capturing a single latent construct.Footnote 18 She performed factor analysis on a dataset that pooled Shugart and Carey’s presidential power scores with Frye, Hellman and Tucker’s scores.Footnote 19 These scores are based on ten indicators of presidential power that capture two different dimensions, one relating to the president’s executive powers with four indicators and another relating to the president’s legislative powers with six indicators. However, Fortin found that seven of the ten indicators cluster into a single factor with eigenvalues greater than 1 and ‘with no evidence of separate latent constructs for legislative and non-legislative powers’.Footnote 20 She also pointed out that the process of aggregating the scores for the individual indicators is problematic. She states: ‘[a]ggregation produces homogeneity claims, meaning that equal scores are substitutable or equivalent’.Footnote 21 However, she noted that ‘each score can be obtained through broad combinations of different powers, and should thus not be considered homogenous in terms of causal analyses’.Footnote 22 She goes on to argue that for any given measure, ‘not all items hypothesized to capture the concept of presidential power seem to matter equally in accounting for composite scores’ and that ‘not all potentially relevant items were tested’.Footnote 23 She concludes that existing indices of presidential power have ‘limited validity’.Footnote 24
Generating A New Set Of Presidential Power Scores
We generate a time-series cross-sectional dataset of presidential power scores with country years as the unit of observation. In doing so, we resist the temptation to construct a new measure of presidential power from scratch. Fortin’s study shows that any measure of presidential power is likely to suffer from a basic problem of construct validity.Footnote 25 She effectively questions whether any measure of presidential power is likely to be valid. We agree with her analysis, but draw a different conclusion. Most social science concepts, such as voter turnout, social equality and corruption, suffer from equivalent problems of construct validity. For that reason, we prefer to emphasize the reliability of the data that underpins the concept we are trying to capture. Specifically, we wish to use the expert information embedded in existing measures, but in a way that generates a more reliable set of cross-national presidential power scores.
To maximize the reliability of our new set of scores, three elements are emphasized. First, we focus solely on measures that record the constitutional power of presidents. To be sure, constitutions can sometimes be imperfect indicators of presidential power, but the overall reliability of our new set of measures is increased by referring solely to information in publicly available documents rather than by including essentially contestable judgments about presidential power in practice. Five of the thirty-five measures of presidential power that we identified provided scores for the behavioral power of presidents.Footnote 26 Excluding them leaves thirty measures. For the purposes of our methodology, two measures of constitutional presidential power that scored only a single country were also excluded, leaving a database of twenty-eight measures from which to generate our new set of scores.
Secondly, we wish to draw upon all of the expert information in these twenty-eight studies, but to generate new scores in a way that indicates their general reliability. This allows researchers to decide whether to include particular countries in any estimation of presidential power. Therefore, standard errors and 95 per cent confidence intervals are reported for each of our presidential power scores.Footnote 27
Thirdly, we wish to maximize the reliability of our scores by accounting for systematic variation between the twenty-eight measures of presidential power and thus reduce the impact of any idiosyncratic measures. To do so, principal component analysis (PCA) is employed. If certain measures are found to vary systematically from others, then it is possible to adjust for the relative importance of those measures when generating our new presidential power scores.
To begin, we identify the time period covered by the presidential power score for all the different countries in each of the twenty-eight original datasets. There can be more than one time period for a given country. For example, there are two time periods for Albania (1991–97 inclusive and 1998–2012 inclusive), corresponding to the first post-communist constitution, which came into force in 1991, and the new constitution that was promulgated in 1998. Eight of our twenty-eight datasets recorded a presidential power score for Albania for the 1991–97 period and three for the later period. Overall, there are scores for a total of 116 countries and 181 country time periods. There was a maximum of four time periods for a number of countries, including Chile and Slovakia, and a maximum of seventeen presidential power scores for one country time period, Romania 1991–2012. The mean number of scores per country time period was 2.7, the modal category was one score for fifty-four country time periods and the median number of scores per country time period was two. Therefore, the data is in country time period format. Country scores do not change on a yearly basis. They change only when the constitution is amended in a way that alters that country’s presidential power score. For example, there are two lines for Argentina in the dataset: one for 1984–94 and another for the period from 1995 onwards, following the constitutional amendments in August 1994.
With information about the time period for each country, the first new measure can be calculated (prespow1). Given that presidential power scores are calculated differently across many of the different datasets, a set of mean normalized scores is generated. For each of the twenty-eight datasets, each country score was normalized using the following formula: (x minus minimum possible value)/(maximum possible value minus minimum possible value). For example, Shugart and Carey recorded a score of seventeen for Panama 1972–2012 on their scale from 0–40;Footnote 28 thus their normalized presidential power score for Panama was 0.43 in a range from 0–1. A score for Panama was recorded in four of the twenty-eight datasets. The average of these four normalized scores was 0.47, generating a raw prespow1 measure. The whole set of country scores was then normalized to generate a range from 0–1 to facilitate comparison with our second set of scores below. The final normalized prespow1 score for Panama is 0.45. The full set of raw and normalized prespow1 scores with standard errors and 95 per cent confidence intervals is reported in Table 2 in the online appendix.Footnote 29 A selection of scores is provided in Table 1.
To calculate our second new measure, PCA was employed. This method relies on a correlation or covariance matrix. However, there are large gaps in our sample. Any individual measure of presidential power covers only a specific subset of countries and country years. For example, Shugart and Carey may have good coverage of the Americas, but no African countries are included. Moreover, Shugart and Carey’s scores were reported as of 1992. As a result, their data only partially overlap with that of Hicken and Stoll, who code presidential power for the Americas as well as for countries in Asia, Africa and Eastern Europe, and who also have the opportunity to record scores for more recent country years.Footnote 30 Therefore, before we can apply PCA, we need to address the issue of missing data.
We do this by following the method of analyzing incomplete data suggested by Truxillo and performing PCA by using maximum-likelihood estimation with the expectation-maximization (EM) algorithm.Footnote 31 This approach is an alternative to multiple imputation and is particularly suited to PCA, for while principal components can be explicitly computed, as Chen notes, we can also derive the principal components using an EM approach.Footnote 32 This allows us to use the EM to estimate the missing data. This is essentially an iterative procedure that, without explicitly deriving the sample covariance, enables us to determine the subspace spanned by the dominant eigenvector.Footnote 33 The initial step in this approach involves computing the maximum-likelihood estimates of the mean vector and covariance matrix for our set of twenty-eight presidential power measures.Footnote 34 These estimates are derived from an iterative EM algorithm,Footnote 35 which provides estimates of the missing data based on the observed values within the dataset (that is, the existing measures of presidential power). In doing so, it estimates parameters that take into account any dependencies in the missingness among our measures of power.Footnote 36 So, the Expectation (E) step fills in the gaps in our data. The now-complete data, including all observed and estimated data points, are processed with maximum-likelihood estimation, or the Maximization (M) step. This provides the updated mean vector and covariance matrix estimates. This process is repeated until the ‘maximum change in the estimates from one iteration to the next does not exceed a convergence criterion’.Footnote 37 That is, with the new data from the M step, the E is repeated, followed again by the M step, and so on. This iterative process continues until we derive reliable estimates of the missing data matrix.
With complete data, we can then perform PCA. This method seeks a linear combination of potentially correlated variables and extracts the maximum variance from them. The resulting principal component (Y1) is weighted by the degree to which each original variable explains the variance in the underlying orthogonal dimension.Footnote 38 That is,
Each of the twenty-eight measures of presidential power can be treated as a separate variable. Using PCA, a single presidential power score can be generated for each country time period using the information from all twenty-eight measures.Footnote 39 The resulting measure is a linear weighted construct of all existing power measures.Footnote 40 Using this technique, we can control for variation across the twenty-eight measures of presidential power, reducing the impact of idiosyncratic measures on our final presidential power score. This method allows us to weight the contribution of each existing measure of presidential power. Thus the prespow2 scores are a linear construct of all existing presidential power variables, which are weighted by their rotated component scores.Footnote 41 These scores capture the underlying variance explained by each measure of power. The Kaiser-Meyer-Olkin measure of sampling adequacy is quite high, lending credence to our low-dimensional representation of presidential power. In a final step, the raw scores are normalized to generate a range from 0–1. The full set of raw and normalized prespow2 scores with standard errors and 95 per cent confidence intervals is reported in Table 3 in the online appendix. A selection of scores is provided in Table 2 of this article.
Discussion
We have generated a set of presidential power scores for a greater number of countries and country years than any existing dataset. By accounting for the idiosyncrasies of existing measures, we have maximized the reliability of our set of scores relative to any existing measure. By using publicly available measures, our method is replicable. Our scores also have the potential to be dynamic. Our method makes it easy to include new measures of presidential power and generate updated prespow1 and prespow2 scores. In fact, additional measures would be welcomed, as they will help to further increase the reliability of the scores. To be sure, if scholars wish to test a particular theoretical proposition about a certain aspect of presidential power, such as veto power or decree power, they should construct their own measure and estimate its effect.Footnote 42 However, if they wish to examine the effect of presidential power generally, which has been the purpose of the vast majority of studies to date, there is great benefit to be gained from the scores we have generated. With this aim in mind, two points should be emphasized.
First, for both of our measures, standard errors and 95 per cent confidence intervals for each country year have been reported. This confirms the basic reliability of any individual score, so that scholars can make an informed choice about whether to include a country in their estimation. For example, there are only two original scores for Cyprus (1960–), and both are very different. The normalized Hicken and Stoll score is 0.325, and 1 for Shugart.Footnote 43 Since Cyprus is the only presidential system in Europe, the relatively high scores for Cyprus in both Tables 1 and 2 might be considered to have good face validity (prespow1=0.64, prespow2=0.70). However, both Tables 1 and 2 show that the confidence intervals for Cyprus are very large, reflecting the differences in the original measures. The way the scores have been generated and reported gives scholars the opportunity to decide whether to include Cyprus in any estimation. Some may wish to include it because of what they might consider to be good face validity. Others may wish to exclude it because of the large confidence intervals. We make no recommendation, but provide the grounds on which scholars can make an informed choice.
Secondly, we also provide the grounds on which scholars can decide which set of scores to use in comparative analysis. Figure 1 compares the range of standard errors for the prespow1 and prespow2 scores for the different regions. It suggests that the prespow2 scores increase the range of the standard errors for Latin America, but decrease it for both Africa and Asia. The effect on the scores for presidents in European countries is minimal. This suggests that scholars wanting to estimate the effect of presidential power solely in Latin America might wish to use the prespow1 scores. For Africa, they might wish to use the prespow2 scores. Scholars who wanted to estimate the effect of presidential power across all regions might also wish to use the prespow2 scores, because on balance the reliability of the whole set of scores is probably slightly greater, even if the range of the standard errors in Latin America is increased. Again, we make no firm recommendation because the choice will be sensitive to the precise cases with which the scholar is working. However, we provide information the scholar can use to make an informed decision.
Conclusion
Studies have increasingly demonstrated that presidential power affects a wide range of political outcomes. However, there are many separate measures of presidential power. By pooling the comparative and local knowledge present in twenty-eight existing measures, we have generated a new set of presidential power scores for a larger number of countries and a longer time series than before. We have also maximized the reliability of these scores by deriving them solely from measures based on constitutional indicators of presidential power, and by using a method that accounts for the idiosyncrasies of country scores in existing measures. In addition, by reporting the standard errors and confidence intervals for all the country years in our measures, we have provided information with which scholars can make an informed choice about whether a particular country should be included in an estimation and which of our measures should be used in comparative studies. Overall, we encourage people to keep developing new measures of presidential power and to update existing measures for as many countries and as long a time period as possible. The advantage of our approach is that new country scores can be easily incorporated, which creates the potential for country coverage to be further extended, for existing country scores to be updated and for cross-national measures to become even more reliable.Footnote 44