1. Introduction
Benefit–cost analysis (BCA) measures individual welfare using willingness to pay (WTP) for whatever impacts an individual experiences as a result of a policy. It aggregates this measure across the affected population without regard to the income of the individuals experiencing the impacts. A dollar’s worth of welfare has the same weight in BCA regardless of whether it is the welfare of a poor person or the welfare of a wealthy person. Two concerns have been expressed with respect to this “unit weighting” of WTP. The first is that a dollar’s worth of consumption is typically considered to have a greater impact on the welfare of the poor than on the welfare of the wealthy–due to diminishing marginal utility of income–which means that any given quantum of welfare is represented by a smaller number of dollars if it accrues to a poor person than if it accrues to a wealthy person. As a result, BCA is actually biased against the poor and thus may promote inequitable strategies for increasing aggregate welfare. The second concern is that, even if the bias in WTP as a measure of welfare were not an issue (which is to say, even if the marginal utility of income were constant with respect to income), BCA would still place equal weight on the welfare of the poor and the welfare of the wealthy, and thus take no account of the possibility that decision makers and members of society may simply place greater moral weight on the welfare of the poor, a phenomenon that has been referred to as “diminishing marginal moral value of welfare” (Adler, Reference Adler2013).
The idea of addressing either or both of these two concerns using distributional weights to inflate the WTP of the poor and deflate the WTP of the wealthy has been around for at least six decades (see, e.g., Arrow, Reference Arrow1963; Feldstein, Reference Feldstein1972; Harberger, Reference Harberger1978; Boadway & Bruce, Reference Boadway and Bruce1984; Drèze & Stern, Reference Drèze and Stern1987; Blackorby & Donaldson, Reference Blackorby and Donaldson1990; Layard & Glaister, Reference Layard and Glaister1994; and Adler, Reference Adler2013). Recently, interest in distributional weighting has risen, in response to a memorandum from the Biden White House calling for ways to address distributional concerns in regulatory review. The memorandum is silent about what kinds of strategies could be used, but it seems clear that substantial changes to current practice are called for, and distributional weighting does not seem beyond the pale. In practice, however, the application of distributional weights has barely gotten started. The UK Treasury Green Book (2018) provides guidance on computing and applying weights, but we are not aware of any BCA conducted within the British government that adopts the guidance. Similarly, there are no instances of distributional weighting in regulatory review by agencies of the US federal government (Robinson et al., Reference Robinson, Hammitt and Zeckhauser2016).
Fleurbaey and Abi-Rafeh (Reference Fleurbaey and Abi-Rafeh2016) provide an important possible explanation for this absence of weighting in practice: “The real reason for the lack of success of weighted BCA is most likely the belief among BCA practitioners that there is no simple recipe for determining the…weights.” There are two complicating factors. First, there is a belief among some that selecting weights involves a value judgment about the relative importance of efficiency and equity that should not be made by analysts (Banzhaf, Reference Banzhaf2011; Adler, Reference Adler2013). Second, even if there were general acceptance of the proposition that it is appropriate for analysts to select weights, there would be no consensus on precisely what weights to select.
We have addressed the first issue at length in Acland and Greenberg (Reference Acland and Greenberg2023), where we argue that value judgments only enter when addressing differential moral concern for the welfare of the poor versus the wealthy, so that the question of whether to apply weights to address that concern (what we call “equity-weights”) is quite different from the question of whether to apply weights to address the bias introduced by diminishing marginal utility of income (what we call “utility-weights”). The essence of our argument is that utility-weighting is simply a correction of a methodological problem with the measurement of individual welfare, and thus that it is entirely appropriate for analysts to select weights. We argue–though we are not the first–that weights should be selected based on empirical observations of individual preferences, which is the topic of this paper. By contrast, additional weighting by analysts to address disproportionate concern for the welfare of the poor is inappropriate because a) it does indeed insert moral or ethical values into the analysis, thus taking the assessment of tradeoffs between welfare and equity out of the hands of democratically accountable decision makers, to whom that responsibility should be assigned in a deliberative democracy; and b) it combines information about welfare and information about equity into a single metric and thus renders invisible the separate information about each value that decision makers need in order to make their own assessments of the relative importance of the welfare and equity impacts of different policies. In this paper, we take this position as given and refer the unpersuaded reader to our earlier paper.Footnote 1
The purpose of this paper is to address Fleurbaey and Abi-Rafeh’s second reason for the lack of progress on the practical application of distributional weights, that there can be no consensus on the appropriate weights to apply. In the case of utility-weighting, we believe there is a solid foundation for a consensus.
There is already a reasonable degree of consensus on the correct way to compute weights, using the ratio of the marginal utility of income of each income level to that of a reference income level, usually the median (Cowell & Gardiner, Reference Cowell and Gardiner1999; Adler, Reference Adler2013; Nurmi & Ahtiainen, Reference Nurmi and Ahtiainen2018). Using ratios overcomes the problem of the scale-invariance of utility of income. Furthermore, in the literature on distributional weighting, and indeed the literature on diminishing marginal utility itself, there is a considerable degree of agreement that the most practicable utility function to use for computing marginal utilities is an iso-elastic, or constant relative risk aversion utility function, which we explicate below (Kaplow, Reference Kaplow2010; Adler, Reference Adler2013; Nurmi & Ahtiainen, Reference Nurmi and Ahtiainen2018). This utility function has a single parameter, the elasticity of marginal utility of income, with which the appropriate weights can easily be computed. What remains is to determine the appropriate value of that parameter. To accomplish this task, we have conducted a rigorous and comprehensive meta-analysis of 1,711 estimates of the elasticity, from 158 independent studies over four decades.
Contrary to a commonly expressed concern about this parameter, the estimate from our meta-analysis is tightly bounded, with a mean of 1.61 and a 95% confidence interval of 1.18 to 2.05. The 1.61 estimate implies that in distributionally weighted BCA the benefits and costs of households of average size, with incomes at the 25th percentile, should receive a weight that is a little over two and a half times that for households at the median, while the benefits and costs of those at the 75th percentile should receive a weight that is half that for households at the median. We discuss the plausibility of these weights in Section 6.2.
In addition to distributional weights, as discussed below, the marginal utility of income plays a critical role in determining the social discount rate whenever the rate relies on the optimal growth rate method originally developed by Ramsey (Reference Ramsey1928). This is because the Ramsey formula accounts for the fact that as incomes grow over time, the marginal value of income decreases, so that costs and benefits that occur in the future will generate less welfare for those who experience them than those that occur in the present.Footnote 2 Our estimate of the elasticity of the marginal utility of income implies a social discount rate of approximately 4.0%, with a lower bound of 3.3% and an upper bound of 4.8%.
There have been two meta-analyses that cover some of the same ground as ours yet differ in important ways. Like us, Groom and Maddison (Reference Groom and Maddison2019) conduct a meta-analysis of the elasticity of the marginal utility of income. However, their analysis is limited to six estimates for the United Kingdom (five of their own and one from Layard et al., Reference Layard, Mayraz and Nickell2008). In contrast, we have attempted to find as many estimates as possible for the US and UK, obtaining over 1,700 in all. The meta-analysis conducted by Havranek et al. (Reference Havranek, Horvath, Irsova and Rusnak2013) is based on a large number of estimates. However, it is focused on the elasticity of intertemporal substitution, which is the inverse of the elasticity of the marginal utility of income. The inverse of an average is not equal to the average of the inverse. Consequently, it is limited to only one of several methods for obtaining estimates of the elasticity of the marginal utility of income, while our meta-analysis includes several others. Moreover, as discussed in greater detail later, it has a different objective than ours (examining reasons why the elasticity of intertemporal substitution varies across countries rather than obtaining an up-to-date value that can be used for utility-weighting and discounting by BCA practitioners), and it is based on a very different methodology.
An important contribution of our study is that we conduct meta-regression to assess the significance of the various apparent differences among estimates generated using different methodologies and/or data, in different locations, and at different times. A key result is that regardless of methodology, time period, or location, there does not appear to be a wide disparity in estimates, and those that exist are statistically insignificant, suggesting that one can have some confidence in adopting our recommended elasticity values.
The following section discusses the role the elasticity of the marginal utility of income plays in determining distributional weights and discount rates. Section 3 reviews the variety of methods that have been used to estimate the elasticity. Section 4 describes the sample of elasticity estimates used in the meta-analysis, while Section 5 describes the methods that were used. Section 6 presents the findings from the meta-analysis. Section 7 discusses our conclusions from the meta-analysis.
2. Determining utility-weights and discount rates
2.1. Utility weights
As stated in the introduction, the strategy we adopt for addressing the diminishing marginal utility of income is to assign weights to the WTP of each individual that are based on the ratio of their marginal utility of income to that of some reference income–such as the median–and multiply estimates of costs and benefits to individuals at different income levels by these weights (Cowell & Gardiner, Reference Cowell and Gardiner1999; Adler, Reference Adler2013; Nurmi & Ahtiainen, Reference Nurmi and Ahtiainen2018). These weights convert the dollar valuation of policy impacts for individuals at any given income level to the valuation they would have if their income was at the median. As stated, we refer to such weights as “utility-weights.” Once utility-weighting is completed, and diminishing marginal utility is thereby accounted for, the utility-weighted net benefit of a policy or program is the net benefit that would be estimated if everyone affected by the policy had median income. On this basis, the welfare impact of a policy or program can be assessed without bias.Footnote 3
As just indicated, determining utility weights requires estimates of the marginal utility of income at different income levels, and in order to make this feasible, it is necessary to make an assumption about the functional form of utility. In the methodologies that have been used to estimate the marginal utility of income, which are described in Section 3, the standard assumption is that preferences can be represented by an isoelastic utility function:
where $ y $ is income and $ \varepsilon $ is the elasticity of marginal utility with respect to income.Footnote 4
Given this utility function, the marginal utility of income is
Because ε is the elasticity of marginal utility of income with respect to income, it indicates the percentage decrease in the marginal utility of income caused by a 1 % increase in income. The isoelastic function has the desirable property of constant elasticity, which allows for straightforward empirical estimation.Footnote 5
Given this utility function and an estimate of $ \varepsilon $ , utility weights can be readily determined. For instance, setting the utility weight for those at median income (the $ m $ th group) equal to one (which is to say, choosing median income as the reference income), the utility weight ( $ {w}_i $ ) for a higher or lower income group (the $ i $ th group) would be the ratio of the income for the $ m $ th group to the income of the $ i $ th group, taken to the $ \varepsilon $ th power:
The information necessary to compute $ \frac{y_m}{y_i} $ for US households is readily obtainable from federal government statistics such as the US Census Bureau’s Annual ASEC Survey, but computing utility weights for various income groups also requires a value for $ \varepsilon $ . While a substantial number of estimates of $ \varepsilon $ are available from previous research, findings vary considerably across studies and the estimates from any one individual study are far from definitive. This is one of our motivations for conducting a meta-analysis of the available estimates of $ \varepsilon $ .
2.2. Social discount rates
Some of the approaches to determining the social discount rate involve using market rates of interest as proxies for the social opportunity cost of capital, using the after-tax return on savings as a proxy for the social opportunity cost of foregone consumption, or using a mix of the two (see Boardman et al., Reference Boardman, Greenberg, Vining and Weimer2018). An alternative approach, one that is recommended in the UK Green Book (2018) and the European Union’s guidelines for BCA (European Commission, 2014), involves an optimal growth rate method originally developed by Ramsey (Reference Ramsey1928). Ramsey’s work produced the following equation, which is known as the Ramsey formula or Ramsey rule, for determining the social discount rate, $ d $ :
where $ \rho $ is the rate of pure time preference, $ g $ is the rate of growth of income, and $ \varepsilon $ is the absolute value of the elasticity of the marginal utility of income with respect to changes in income. It is argued that the Ramsey rule is appropriate for the appraisal of public projects because by consistently following, it society will maximize the present value of utility from its current and future per capita consumption (Feldstein, Reference Feldstein1964; Lind, Reference Lind and Lind1982 and Moore et al., Reference Moore, Boardman, Vining, Weimer and Greenberg2004). Equation (4) presumes that the consumption that results from future income should be discounted (1) because of impatience in delaying consumption, as reflected by $ \rho; $ and (2) in order to address the fact that the welfare value of the marginal dollar of income will in fact be lower in the future, assuming that future income will exceed current income because of economic growth (that is, assuming that $ g $ is positive). Thus, $ g\varepsilon $ is the predicted percentage reduction in the marginal utility of income. The greater the extent to which individuals have a declining marginal utility of consumption–that is, the more positive the value of $ \varepsilon $ –the larger the discount rate will be. As a consequence, future increments to income and consumption will be given less weight in BCAs than current increments to income and consumption. But to use the Ramsey formula requires an estimate of $ \varepsilon $ . The meta-analysis will hopefully contribute to this goal.
3. Methods for estimating the elasticity of the marginal utility of income
The studies from which we take the estimates of $ \varepsilon $ that we use in our meta-analysis use a variety of methods. We next briefly describe and assess each of these methods.Footnote 6
3.1. Indirect behavioral evidence: lifetime consumption models
The most frequently used approach by far to estimate $ \varepsilon $ relies on either micro survey panel data or (much more frequently) macro time-series data. The theory that underpins this approach is that households allocate consumption over time in order to maximize a multi-period utility function. Exactly how they allocate their consumption depends on their rate of time preference and the curvature of their utility of income function. Under the assumption that individuals allocate such that their rate of time preference is equal to the interest rate, observing how consumption is affected by changes in the interest rate allows inferences to be made about the shape of the utility function.
In utilizing time series data, researchers typically use a log-linearized consumption Euler equation: $ \ln \left({g}_t\right)=c+b{r}_t+{e}_t $ , where $ {g}_t $ is the growth rate in consumption at time $ t;c $ is the constant; $ {r}_t $ is the real rate of return on assets (e.g., the stock market return or the treasury bill return); coefficient $ b $ provides an estimate of the elasticity of intertemporal substitution (EIS),Footnote 7 which is the inverse of $ \varepsilon $ ; and $ {e}_t $ is the error term. Because the causality goes in both directions (i.e., consumption growth affects the return on assets), instruments are usually used for $ {r}_t $ . As can be seen, if $ r $ is viewed as a proxy for the social discount rate, $ d $ in Equation (4), the regression that is used bears a close kinship with Equation (4), the Ramsey equation, once Equation (4) is divided by $ \varepsilon $ , and the $ \rho $ parameter, the rate of pure time preference, is set to zero. Indeed, sometimes, but much less often, there is a direct attempt to estimate $ \varepsilon $ by regressing $ {g}_t $ on $ {r}_t $ , although this approach is more subject to weak instruments (see Yogo, Reference Yogo2004).
The estimation is sensitive to model specification and to the choice of the interest rate used to proxy for $ d $ (Evans, Reference Evans2008). Estimation also requires several assumptions, arguably the most important of which is that capital markets are perfect (e.g., market participants are rational and there are no constraints on borrowing or on information) and that the utility function is additively separable. The estimates may depend on the period over which they are computed, especially if market turbulence or changes in financial regulations occur. They may also be sensitive to which goods are used in the demand estimation or consumer preferences for the goods may change over time. Additionally, when the estimates rely on macro-data, rather than micro-data, it is not possible to take account of changes in demographic composition or household behavioral changes over the lifespan.
3.2. Indirect behavioral evidence: consumer demand analysis
This approach focuses on how utility is affected by the consumption of a particular good or broad category of goods (e.g., all foods). Essentially, the estimate of $ \varepsilon $ is the ratio of the good’s income elasticity to its own-price elasticity, with an adjustment for the share of the total budget accounted for by the good. That is, the following formula, which was developed by Frisch (Reference Frisch1959), is used: $ \varepsilon =y\left(1- wy\right)/p $ , where $ y $ is the income elasticity of demand, $ p $ is the compensated own-price elasticity of demand, and $ w $ is the budget share accounted for by the good. The empirical approaches vary, sometimes using single-equation models and sometime based on complete demand systems. Time series data are typically used.
Like the lifetime consumption approach, findings may be sensitive to the specific specification of the regression model used to estimate $ \varepsilon $ . The composition of the category of goods is assumed to be additively separable–that is, additional utility from consuming the good does not depend on the quantity of any other good that is consumed. Thus, the good selected is often “all foods.” In the case of food, for example, the assumption of additive separability would hold if utility from food consumption for an individual is unaffected by the value of the house in which the individual lives.Footnote 8 Obviously, there is no guarantee that the assumption is met, although some researchers have tested it and failed to reject it (Selvanathan, Reference Selvanathan1988).
3.3. Indirect behavioral evidence: relative risk aversion models
A few researchers have attempted to use insurance data to estimate the so-called coefficient of relative risk aversion, a measure that determines WTP to avoid risks to income, which corresponds to the elasticity of the marginal utility of income if relative risk aversion is independent of wealth.Footnote 9 The model presumes a rational insurance market in which consumers have knowledge of the probability that they will make a claim. The model used to estimate the coefficient of relative risk aversion was developed by Szpiro (Reference Szpiro1986) and uses time series insurance data.
Szpiro (Reference Szpiro1986) begins with the following relationship, which is derived in a technical appendix:
where $ I $ is the amount of insurance, $ W $ is wealth, $ r(W) $ is absolute risk aversion, and $ \lambda $ is insurance loading (i.e., the ratio of premiums to claims minus one). Szpiro (Reference Szpiro1986) then establishes the relationship between absolute risk aversion and relative risk aversion ( $ \varepsilon (W) $ ) as
Under the assumption of constant relative risk aversion (which is the same as isoelastic utility)–an assumption which Szpiro (Reference Szpiro1986) empirically examines and finds cannot be statistically rejected–substituting this expression into the first equation yields
The negative sign in this model reflects the assumption that while the amount of insurance increases with wealth, it declines with insurance loading.
Because the amount of insurance, $ I $ , cannot be directly observed, but total claims, $ Q $ , and the claim rate, $ q $ , can be, Szpiro (Reference Szpiro1986) substitutes $ Q/q $ for $ I $ to get $ Q= qW-\left(q/\varepsilon \right)\left(\;\lambda W\right) $ , resulting in the regression equation
The elasticity can then be computed as $ \varepsilon =-\frac{\beta_0}{\beta_1} $ .
3.4. Relating subjective well-being to income
A number of happiness surveys have been conducted in recent years in which respondents rate their life satisfaction or “happiness” on a scale of 1–5 or 1–10. To use these findings to estimate $ \varepsilon $ , analysts must first make an assumption about the functional relationship between the reported life satisfaction of individuals and their utility. In particular, the assumption typically made is that happiness is linearly linked to utility through a transformation: $ {h}_{it}={u}_{it}+{v}_{it} $ , where $ {h}_{it} $ is the measure of happiness for individual $ i $ at time $ t $ , $ {u}_{it} $ is individual $ i^{\prime }s $ utility at time $ t $ , and $ {v}_{it} $ is an error term that is independent of the circumstances affecting utility (see Layard et al., Reference Layard, Mayraz and Nickell2008). Given this transformation, and a measure of the utility of the surveyed respondents, $ \varepsilon $ can be estimated through a modification of Equation (1). This approach appears to have been used only once (Layard et al., Reference Layard, Mayraz and Nickell2008) but this study is frequently cited and is the single estimate upon which the UK government bases its recommended utility-weights (Treasury, Reference Treasury2018).
The extent to which individuals accurately measure their happiness is obviously problematic, as is the extent to which what they say reflects their cardinal utility. Moreover, the relationship between measured happiness and utility may not be linear. Nonetheless, Layard et al. (Reference Layard, Mayraz and Nickell2008)) used different data sets to conduct their analysis for different countries (including both the US and the UK) and obtained surprisingly consistent results.
3.5. Asking experts
One way to determine the value of $ \varepsilon $ is to survey persons who are familiar with the concept, an approach that to the best of our knowledge has been used only once (Drupp et al., Reference Drupp, Freeman, Groom and Nesje2018). It is not clear how even experts could knowledgeably guess at the value unless they were aware of at least a few previous studies or had values in mind for $ d $ , $ \rho $ , and $ g $ and used Equation (4) to obtain a value for $ \varepsilon $ .
3.6. Progressivity of income taxes
An estimate of $ \varepsilon $ can be obtained by using progressivity in income tax schedules under an assumption of equal sacrifice. In other words, that society’s (or at least policymakers’) aversion to inequality is such that income tax rates are set so that the loss in welfare is equal among taxpayers at different income levels. As shown by Groom and Maddison (Reference Groom and Maddison2019) and Evans (Reference Evans2008), operationally, the equal sacrifice assumption implies that the following equation must hold for all income levels:
where $ k $ is a constant, $ y $ is income before taxes, and $ T(y) $ is the total tax liability according to the income tax schedule. By assuming an isoelastic utility function and then substituting it into the above equation, the following relationship is obtained:
By totally differentiating this expression with respect to $ y $ and rearranging the terms, the following equation is derived:
where $ MTR $ is the marginal tax rate and $ ATR $ is the average tax rate. Taking logs of this expression and solving for $ \varepsilon $ results in an equation that can be used to estimate $ \varepsilon $ :
This approach has been used fairly frequently, but the equal sacrifice assumption itself is subject to obvious criticisms. For example, in addition to possible concerns about inequalities in the income distribution, in setting tax rates policymakers likely also consider potential disincentive effects and the goals of various interest groups. Application of the model is also not entirely straightforward. For example, decisions must be made about which taxes to include and how to compute the marginal and average tax rates. Moreover, tax rates change over time and differ for different demographic groups.
3.7. Convex time budget experiments
This approach, which is fairly new and so far not widely used, involves a laboratory experiment in which subjects allocate a pot of money between two time periods, with an interest rate applied to any money allocated to the later time period. Log-linearizing the first-order condition gives the following relationship between allocation in the two periods, the interest rate, and the temporal distance between the two periods:
where $ \rho $ is the individual rate of pure time preference, $ k $ is the distance between the two periods, $ r $ is the interest rate, and $ \varepsilon $ is the elasticity of marginal utility of income. Estimating this using censored regression allows for computing $ \varepsilon $ as the inverse of the coefficient on $ \ln \left(1+r\right) $ .Footnote 10 The methodology presents a trade-off for researchers. In order to achieve incentive compatibility, researchers must use real stakes, which requires that the stakes be relatively low (for budget reasons) and that time horizons be short (for practical reasons). In order to elicit preferences using larger stakes and longer time horizons, which would be more ideal for BCA purposes, it is necessary to use hypothetical stakes, which are considered less reliable.
3.8. Discussion
Some might question the validity of some or all of these methodologies, relying as they do on formal models that are not universally accepted, and, in most cases, on strong assumptions. We acknowledge that each of the methodologies is subject to critique, and that in some cases there may be disagreement as to whether the formal models used reflect actual human decision making. Our approach to these critiques is twofold. First, we see ourselves as embarked on an enterprise that is fundamentally and necessarily rooted in existing formal models. Ours is primarily a contribution to economic analysis of public policy, which is founded on formal economic theory as a matter of principle. Concave utility functions are embedded in almost all areas of formal economic theory. Given that we are operating within these constraints, it is natural to take the formal models that underlie the various estimation strategies at face value.
It is true that, having adopted certain structural models, it is necessary to make certain assumptions in order to justify the empirical methods (which we have attempted to enumerate), and it is true that the estimates of $ \varepsilon $ generated by the various methodologies are sensitive to violation of these assumptions. Accordingly, the second component of our twofold approach to addressing critiques of the methodologies is to estimate a meta-regression to explore whether estimates of $ \varepsilon $ are sensitive to the methodology used. We report the results of this meta-regression in Section 6, where we find that there is no statistically significant effect of the methodology used.
Finally, we take note of a theoretical finding that the elasticity of the value of a statistical life (VSL) with respect to income is an upper bound on the elasticity of marginal utility of income (Hammitt, Reference Hammitt2017). Meta-analysis of the elasticity of VSL finds that the mean is approximately .6 (Viscussi & Masters, Reference Viscussi and Masters2017), far below the large majority of estimates of $ \varepsilon $ , which might call the theoretical finding into question. Furthermore, the model from which this finding is derived rests upon assumptions about how individuals value reductions in fatality risk which we consider to be more problematic than the assumptions made in the above methodologies. In particular, the model reduces the very complex psychology of mortality to a simplistic expected utility model over wealth, ignoring, among other things, the many profound emotions evoked by the prospect of fatal injury. In addition, as the model places only a bound on the elasticity and does not allow for empirical estimation of $ \varepsilon $ itself, we have concluded that this approach is not relevant to our analysis.
4. Available estimates of the elasticity of the marginal utility of income
Because each of the approaches discussed above has its own weaknesses and requires fairly strong assumptions, the best approach is not apparent to us. Nevertheless, as discussed in the following section, it is possible that available estimates tend to converge on what hopefully is the correct value. We explore this possibility through a meta-analysis. To conduct such a meta-analysis, it is necessary to first do a literature search and collect as many of the available estimates of ε as possible. We included all the estimates we found that used the methods described in Section 3.
In conducting our literature search, we focused on the US and the UK, two countries with similar cultures and economies, where one could anticipate that the elasticity values might be similar. As will be seen, pooling the available estimates of ε for these two countries provides us with a considerable number of sample points. Although estimates of ε do exist for other countries, especially for those in continental Europe and for Japan, the vast majority of work is for the US and UK. Moreover, as shown by Havranek et al. (Reference Havranek, Horvath, Irsova and Rusnak2013), inter-country variation in the estimates is considerable, for a variety of reasons. As a result, our findings may not be valid for any countries other than the US and the UK. Given the preponderance of results from these two countries, we think it is important to avoid the possibility that inclusion of estimates from other countries might give the impression of a false level of generalizability.
Many of the studies we examined provide multiple estimates of ε, and the database we constructed for the meta-analysis contains these multiple estimates. For reasons discussed later, however, we cluster all the estimates for a single study in our meta-analysis. A few studies, however, used more than one of the approaches described in Section 3, and a few provided estimates for both the US and UK In these instances, we treat each approach used and each covered country as a separate study.
In all, we found 168 studies that provided one or more estimates of $ \varepsilon $ . The number of studies and the number of estimates based on each approach is reported in Table 1.
As is evident, studies based on the lifetime consumption model provide the vast majority of estimates of $ \varepsilon $ , with studies based on consumer demand analysis occupying a distant second place. All but one of the studies based on the lifetime consumption model in our database were taken from a multiple nation meta-analysis of EIS conducted by Havranek et al. (Reference Havranek, Horvath, Irsova and Rusnak2013), who did an extremely thorough literature search of EIS estimates. In fact, the one estimate we found that was not included in Havranek et al. (Reference Havranek, Horvath, Irsova and Rusnak2013) was from a study conducted after their search was completed. Their meta-analysis, which covered all nations for which EIS estimates existed, differs from ours by focusing on why EIS estimates vary across countries and by excluding estimates that use approaches other than the lifetime consumption model, as that was beyond the scope of their study. As will be seen in the following section, their statistical approach also differs greatly from ours, and has come to be considered inferior (Hedges et al., Reference Hedges, Tipton and Johnson2010).
5. Methods used in the meta-analysis of available estimates of $ \boldsymbol{\varepsilon} $
Using several meta-analytic tools (see Hedges, Reference Hedges, Cooper and Hedges1994 and Shadish & Haddock, Reference Shadish, Haddock, Cooper and Hedges1994), we pooled the available estimates of $ \varepsilon $ described in Section 4 and estimated their mean value. We also estimated means for several subsets of estimates. In calculating mean values, best practice in meta-analysis involves taking account of the fact that that some estimates are statistically more precise than others–as implied by their smaller variances–and calculating a weighted mean, the weight being the inverse of the variance of each of the individual estimate of $ \varepsilon $ . By using this weighting scheme, estimates of $ \varepsilon $ that are of greater statistical significance contribute more to the pooled mean than estimates that are less so.
In our meta-analysis, a complication in computing the means of the available estimates of $ \varepsilon $ occurs because many of the studies we use provide multiple estimates. Because they are estimated by the same researchers or team of researchers and typically with the same data sets, within study estimates are likely to be correlated, and thus, not independent. In addition, if each estimate is given equal weight in a meta-analysis, studies with a greater number of estimates will be more influential in determining mean values than those with fewer estimates. Using a procedure developed by Hedges et al. (Reference Hedges, Tipton and Johnson2010), we were able to determine that controlling for correlation among estimates within studies affected our estimates of the mean elasticity at only four significant digits. Under this circumstance, Hedges et al. (Reference Hedges, Tipton and Johnson2010) provide a simple methodology for computing the mean and standard error of the elasticity. Thus, unlike Havranek et al. (Reference Havranek, Horvath, Irsova and Rusnak2013), who treated each available estimate in their database equally, our results adjust for the number of estimates in each study and the correlations among those estimates, as well as for the statistical precision of each estimate.
The Hedges et al. (Reference Hedges, Tipton and Johnson2010) procedure determines the weight given to each estimate within a given study, $ {w}_j $ , as follows:
where $ j=1\dots m $ indexes studies, $ {k}_j $ is the number of estimates in study $ j $ , $ i=1\dots {k}_j $ indexes the estimates within study $ j $ , and $ {v}_{ij} $ is the variance of estimate $ i $ in study $ j $ . Thus, there is only a single weight per study, which is applied to each estimate within the study; and estimates in studies with more estimates receive a smaller weight than those in studies with fewer estimates, given similar variances.
Given the value of $ {w}_j $ for estimates in each study, the weighted mean of the estimates across studies, $ b $ , is computed as follows:
Thus, although the sum of the elasticities tends to be greater in studies with a greater number of estimates, their weight tends to be smaller. The mean, $ b $ , can alternatively be estimated via a meta-regression in which $ b $ is the intercept term and values of $ {w}_j $ are used as weights. The advantage of a meta-regression is that each factor that may influence estimates of $ \varepsilon $ (e.g., whether they are based on US or UK data and the estimation method that was used) can be investigated, while holding other factors constant. The results from such a meta-regression are presented later.
The variance of the weighted mean, $ {v}^R $ is
where $ {\overline{\varepsilon}}_j $ is the unweighted average of the estimates of epsilon within study $ j $ . The square root of $ {v}^R $ is the robust standard error.
In computing the mean values, we used 158 of the 168 studies that provided estimates of $ \varepsilon $ . Four of the six progressivity of income taxes and two of the 13 consumer demand studies did not provide standard errors. Hence, $ {w}_j $ could not be computed for them, and they had to be excluded. In addition, in convex time budget experiments, each respondent is presented with multiple tradeoffs, each of which is treated as a separate observation. For example, in the most extreme such study (Coble & Lusk, Reference Coble and Lusk2010), there were 47 study participants but 4,418 total choices, with each choice treated as an independent sample point. This procedure greatly inflates the values of $ {w}_j $ for the time budget experiments. In fact, our two largest values of $ {w}_j $ came from convex time budget studies, the largest of which was over four times the largest weight among studies using any of the other methodologies. This rendered the convex time budget studies unsuitable for inclusion in the meta-analysis. The implications of leaving out these 10 studies is investigated below.
Consideration was also given to omitting estimates of $ \varepsilon $ with implausible values. For example, negative values are clearly implausible, and perhaps very small positive values are as well. And large positive values, say above 4 or 5, may also be implausible. For example, as suggested by Equation (4), if $ g $ is near 2, as it appears to be, estimates this large imply an exceedingly high social discount rate. While very small and very large estimates do exist in our database, as shown below, excluding values such as those below, say, 0.5 or above, say, 3 would inevitably be arbitrary and seemed likely to introduce bias. Instead, we calculate means with the studies that reported the lowest and highest 10% (and, alternatively, the lowest and highest 20%) of all the estimates excluded, as a robustness check. However, we put the greatest emphasis on the mean value that includes all the estimates we obtained, with the exception of the 10 studies discussed in the previous paragraph. In our judgement, this is the most reliable estimate of $ \varepsilon $ to use in computing utility weights.
6. Findings from the meta-analysis
6.1. Elasticity of marginal utility of income
Table 2 shows the overall weighted mean estimate for all 158 included studies and for several subgroups of studies. The overall weighted mean of $ \varepsilon $ is 1.61, which is statistically significant at the 1% level and has a confidence interval of 1.18 to 2.05.
Had it been possible to include the 10 omitted studies in computing the overall mean it is highly unlikely that the 1.61 estimate would have changed by very much. These studies account for only 6% of all the available studies. Moreover, although one of the elasticity estimates is 5.5 and two are just above zero, the remaining seven elasticity estimates are between 1.2 and 2.0, fairly close to the 1.61 estimate. To obtain a rough idea of whether the omitted studies would be likely to affect this mean if we knew the standard errors needed for their inclusion, we arbitrarily assume that the standard error of each estimate of $ \varepsilon $ is equal to 1. Using these standard errors and Equation (6), the estimated mean that incorporates the omitted ten studies is virtually identical to the estimate for the 158 included studies. If the standard error of each estimate is assumed to be 0.5 (or 1.5) instead of 1, the overall mean is again hardly changed from the value for the sample of 158 studies.
Table 2 permits several comparisons of mean estimates. (We test the statistical significance of the results using meta-regression below.) One important comparison is between the mean for all the available studies and the mean estimate that excludes the highest 10% of all the estimates and the lowest 10% of all the estimates. There is no difference in the mean elasticity at three significant digits. Trimming the highest and lowest 20% of estimates increases the mean by a small amount, from 1.612 to 1.676.
Perhaps the most important comparison of weighted means in Table 1 is between studies based on the lifetime consumption approach, the relatively few studies that use the consumer demand approach, and the even smaller number of studies that use other approaches. The mean for consumer demand studies is quite a bit smaller than the mean for the lifetime consumption studies, 1.2 versus 1.7. Because there were only seven studies that use other methods, they were combined. Their mean is closer to that for the lifetime consumption studies. These differences suggest that there could be value in additional studies that use methodologies other than lifetime consumption.
The differences in means between US and UK studies (1.7 versus 1.3) and between older and newer studies (1.9 versus 1.1) appear noteworthy. The difference between studies that rely on micro-data (usually surveys) and those based on aggregate data is relatively small, 1.8 versus 1.5. Arguably, the larger mean that is based on micro-data may be the more reliable one. For example, estimates that rely on aggregate data can be biased because they cannot control for demographic factors (Attanasio & Weber, Reference Attanasio and Weber1993).
In addition to these pairwise comparisons, we conducted a meta-regression of elasticity estimates on each of the variables in Table 2, clustered at the study level to account for correlations among estimates from the same study. Despite the fact that some of the pairwise comparisons appear noteworthy, when all variables were controlled for simultaneously, none of the differences proved statistically significant. We conclude that while the results in Table 2 might suggest further study, the meta-regression results provide some confidence in using a mean value of 1.61 based on the full sample of estimates.Footnote 11
A major concern in conducting meta-analysis is publication bias (see Begg, Reference Begg, Cooper and Hedges1994), which results if the estimates from the studies included in the meta-analysis are unrepresentative of all those that exist. One potential cause of publication bias is that published studies, those that are most likely to be in the meta-database, are more likely to have statistically significant findings, suggesting a negative observed relationship between the estimates and their standard errors for studies in the database because studies with wide confidence intervals would be missing. This does not appear to occur in our case. Of the estimates of $ \varepsilon $ in our database that rely on the lifetime consumption model, over half (54.7%) are statistically insignificant at the 5% level. Of the estimates that rely on other methods, almost a quarter (22.7%) are statistically insignificant.
Another potential source of publication bias is that studies with implausible findings are more likely to be rejected for publication. This seems unlikely for studies using the lifetime consumption approach. For example, 14.2% of the estimates in our database that use this method are negative, while 26.0% exceed 5 in value. In contrast, none of the estimates of $ \varepsilon $ that used other approaches exceed these thresholds, suggesting that publication bias could exist for them. One way to examine the possibility that estimates with implausible values were unpublished is to determine the number of missing studies that would be required to appreciably change our mean estimate of $ \varepsilon $ . To use Equation (6) in these calculations, we assume that the missing studies would have the same average values for $ {w}_j $ and $ {k}_j{w}_j $ as the studies in our sample. We computed the number of studies with estimates equal to 4.0(−1.0) necessary to increase(decrease) our estimate of the mean elasticity to the upper(lower) bound of our confidence interval and in both cases found that there would need to have been approximately 100 missing studies, which we consider to be implausible, particularly because the missing studies would be more likely to contain a mix of exceptionally high and low estimates of $ \varepsilon $ rather than all high or all low. We conclude that a strong bias due to missing studies with implausible estimates is improbable.
6.2. Implied utility weights and discount rate
Table 3 shows the utility-weights implied by the mean and upper and lower bound values that we suggest for $ \varepsilon $ . In addition to the values for $ \varepsilon $ , the weights are based on Equation (3) and data on income for US households obtained from the US Census Bureau’s Annual ASEC Survey. As defined by the Census Bureau, income is estimated prior to taxes and excludes non-monetary receipts (e.g., SNAP payments and goods that are bartered). In addition, the income of people in group quarters (prisons and elder-care facilities, for example) is not included. The welfare weights in Table 3 pertain to households at the 25th, 50th, 75th, and 95th household income levels in 2021. Because income is higher in larger households than in smaller households, and the average size of US households is 2.5, the income amounts used to compute the utility weights are a weighted average of the incomes of two-person and three-person households. The utility weights must equal 1 at the 50th percentile because at that percentile $ {y}_m={y}_i $ and hence $ \frac{y_m}{y_i}=1 $ .
Note: The following formula is used to compute the weights: $ {\left({y}^m/{y}^i\right)}^{\varepsilon } $ where $ {y}^i $ is the income of the i th percentile group, $ {y}^m $ is the median income of the group at the 50th percentile, and ε is the elasticity of the marginal utility of income. Source for 2021 household income at each percentile: US Census Bureau’s Annual ASEC Survey in September 2021 (Ruggles et al., Reference Ruggles, Flood, Goeken, Schouweiler and Sobek2022).
Note that with an elasticity of 1.6, the weight on the 25th percentile household is approximately five times larger than that on the 75th percentile household. To assess whether these weights are plausible it is necessary to consider the lived experience of households at these levels of income. Household income is $44,250 at the 25th percentile and $128,000 at the 75th percentile, an approximately threefold difference. Based on estimates from the Center for Women’s Welfare at the University of Washington of the income necessary for “self-sufficiency” in urban California, these incomes are approximately 50% and 150% of the necessary income, respectively. If their income were spread evenly across the six categories of expenditures used by the Center, the money available for food each month in the average household at the 25th percentile would be $387. The cost to the average household of the Thrifty Food Plan, which is the US Department of Agriculture’s estimate of the cheapest possible strategy for achieving minimum nutritional needs, is $583 per month. The 25th percentile household has 66% of the money they need to achieve adequate nutrition. An additional $100 per month, if spent on food, would close that gap to 83%. The welfare impact of that increase in food consumption is probably outside the lived experience of most readers but is perhaps imaginable. It seems reasonable to think that it would be five times as great as the welfare impact of whatever a household with a combined income of $128,000 would spend the same amount of money on.
That said, at the bottom and top of the income distribution the weights become very large and very small, respectively. For a household with income of $500,000 the weight is .05 and for a family with income of $5,000 it is 81.8, a difference of 1,535%. We could again try to imagine and compare the lived experience of these two households, but the exercise might become quite abstract. In ongoing work on the practical application of utility-weighting, we have concluded that it is appropriate, in order to avoid controversy, to establish upper and lower thresholds on the weights. The choice of the lower threshold has a considerable effect on utility-weighted net benefits because the weights increase hyperbolically as income goes down, so that sensitivity analysis on the lower threshold is important. In our applied work we have chosen a benchmark upper threshold weight of 5 and a lower threshold of .5.
With respect to the social discount rate, Equation (4) indicates that computing this parameter requires estimates of the rate of time preference ( $ \rho $ ) and the rate of growth of consumption ( $ g $ ), in addition to an estimate of $ \varepsilon $ . Boardman et al. (Reference Boardman, Greenberg, Vining and Weimer2018) suggest that $ \rho $ be set to 1.0 and $ g $ to 1.9. A recent survey of approximately 180 so-called experts by Drupp et al. (Reference Drupp, Freeman, Groom and Nesje2018) obtained very similar values (at the mean, $ \rho $ = 1.1 and $ g $ = 1.7). Using the values suggested by Boardman et al. and our estimated 1.6 value for $ \varepsilon $ implies a social discount rate, $ d $ , of 4.0%. If instead, our lower bound for $ \varepsilon $ of 1.2 is used, $ d $ = 3.28%; at the upper bound of 2.0 for $ \varepsilon $ , $ d $ = 4.8%. These values are in the range that government bodies and various scholars have found for the social discount rate when it is based on the Ramsey formula. For example, the UK Green Book (2018) suggests a rate of 3.5% and the European Union recommends 4% (European Commission, 2017).
7. Conclusion
The meta-analysis reported in this paper contributes to the literature in three ways. First, in order to most effectively advance a consensus on the correct utility weights and the Ramsey discount rate, it is important to have the most comprehensive and up-to-date estimates possible. Second, unlike previous studies, our meta-analysis accounts for the statistical precision of each of the included estimates and addresses the correlation of estimates within studies. Third, we have estimated a meta-regression and found that there are no systematic differences in estimates of $ \varepsilon $ that would complicate the application of our recommendations.
Using meta-analysis, we establish a value of the elasticity of marginal utility of income, $ \varepsilon $ , intended to contribute to a consensus on how to use weights to correct for bias in BCA caused by the diminishing marginal utility of income. Based on estimates from all the studies that could be included in the meta-analysis, the overall weighted mean of $ \varepsilon $ was found to be 1.61. There is some evidence that this estimate could be a bit low, although certainly still less than 2.0. For example, when only studies that were based on micro-data (arguably more reliable than those based on aggregate data) were included in the meta-analysis, the mean of $ \varepsilon $ increased to 1.779. However, the most recent studies–those conducted since the turn of the century–had a mean value of only 1.1. Relying on these findings, we tentatively conclude that it is reasonable to base the social discount rate and utility weights on an elasticity of 1.61, with lower- and upper-bound sensitivity testing at 1.2 and 2.0.
This paper is the second in a series of three papers that we hope will move the practice of BCA in the direction of utility-weighting to correct for the bias introduced into BCA by diminishing marginal utility of income. In the first paper we make the case that utility-weighting should be done. In brief our argument is that BCA should be an unbiased measure of aggregate individual welfare, both because we consider that to be the right information to provide to decision makers and because we believe that, in a time of increasing concern about distributional impacts, if BCA continues to be biased against the poor it will become increasingly irrelevant. The theoretical finding that efficiency and desirable distributional outcomes in a frictionless and undistorted system of markets can be achieved by making all laws and regulations on the basis of the potential Pareto criterion, with appropriate transfers being made through the tax system, is simply not relevant to the actual process of policy making.
In this paper and the third, which is ongoing, we attempt to establish that utility-weighting can be done. The two main objections are that there is no basis for a consensus on the correct weights to use and that the information necessary to apply utility-weighting is not available or is prohibitively difficult to apply. In this paper we believe we have successfully countered the first objection. There is a widely agreed upon model for computing weights and a narrowly bounded estimate of the relevant parameter. In the third paper we attempt to confront the second objection. We apply utility weighting to a series of existing, real-world BCAs and develop some tools for applying weights in an unbiased way and conducting sensitivity analysis. We conclude that, indeed, utility-weighting can be accomplished with available data and the kinds of informed assumptions that are routinely made in economic analysis of all kinds.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/bca.2023.29.