1. Introduction
We study statistical wage discrimination of young women. The risk that the workers become unavailable due to child-rearing is often cited as the rational reason for discrimination (a notion proposed by Kenneth Arrow and Edmund Phelps Schwab, Reference Schwab1986; Norman, Reference Norman2003), hence the term statistical discrimination. Rationality is premised by accuracy in the perception of the risk. Specifically, rational employers should accommodate delayed child-bearing, as observed across many countries during the past four decades, through lowering the penalty on wages of young women relative to young men.
The existing literature shows that – once realized and thus observable – fertility matters for the employers in wage setting as well as hiring decisions (see Correll et al., Reference Correll, Benard and Paik2007; Adda et al., Reference Adda, Dustmann and Stevens2017; Kleven et al., Reference Kleven, Landais and Søgaard2019b; Costa Dias et al., Reference Costa Dias, Joyce and Parodi2020, among others). In experimental studies, the probability of child-bearing appears to discourage employment (Baert, Reference Baert2014; Becker et al., Reference Becker, Fernandes and Weichselbaumer2019; He et al., Reference He, Li and Han2023) as well as assignment of high-rewards tasks (Peterson Gloor et al., Reference Peterson Gloor, Okimoto and King2021). Much less is known about the effects of the probability of child-bearing on wages. We fill this gap. Trends toward delayed fertility change the perspective for the employers: they face a declining probability that an average hired young woman will become a primary care-giver during her contract. This in turn implies – if statistical discrimination is indeed the underlying cause of gender wage gaps – that wages should become more equal between young men and women.
In this paper, we test the conjecture that fertility postponement reduces gender inequality among workers aged 20–30 years. Youth is particularly interesting for two reasons. First, lack of professional experience gives rise to more ex ante gender-based stereotyping about productivity. In addition, the tenure with the current employer cannot help in resolving this uncertainty as future absences due to care-giving remain unknown until actual childbirth and can vary over time (due to changing needs of the child/children; this is not the case with, e.g., racial gaps, see Altonji Pierret, Reference Altonji and Pierret2001, who show that employers can learn employee's productivity). Second, already observed substantial delay in fertility makes the probability of child-bearing by young workers close to negligible during a contract with an employer.Footnote 1 Thus, among this group the decline in gender wage inequality should be the steepest.
To test this conjecture, we collect individual-level data for over fifty countries, spanning nearly four decades. Using these databases, we compute comparable estimates of adjusted gender wage gaps (AGWG) among youth. These estimates account for differences in personal characteristics such as education, marital status, family composition, occupation, industry, and urban residence. We obtain them with the help of the Nopo (Reference Nopo2008) decomposition. This method guarantees that we compare the comparable: It relies on the exact matching of all available characteristics to estimate the AGWG. The AGWG measures reflect the concept of equal pay for equal work, delineating gender wage equality from the improved position of women in the labor market, brought about by other secular trends, such as improved educational attainment and increased participation in the labor force, among others (see Neels et al., Reference Neels, Murphy, Ní Bhrolcháin and Beaujouan2017; Blau Kahn, Reference Blau and Kahn2017). In general, we obtain over 1,200 comparable estimates of AGWG across countries and years. This is an unbalanced panel, which we regress against a measure of fertility timing. We use mean maternal age at first birth (MAB1). This measure portrays the risk for employers that a young woman will become a primary caregiver during the employment contract. This measure is more adequate than the completed fertility rates (CFR) or the total fertility rates (TFR).Footnote 2
A priori, one could expect a bidirectional relationship between fertility timing and AGWG among youth (see Goldscheider et al., Reference Goldscheider, Bernhardt and Brandén2013; Baizan et al., Reference Baizan, Arpino and Delclós2016; Vignoli et al., Reference Vignoli, Tocchioni and Mattei2020, among others). More egalitarian countries can offer better opportunities for women, which increases the opportunity costs of becoming mothers, leading to postponement of the first childbirth. Moreover, other mechanisms could affect both fertility timing and gender wage gaps among youth. To address these concerns, we propose several instruments, which are intended to reduce endogeneity bias. We also provide specifications with country fixed effects which address unobserved time-invariant characteristics. Thus, we overcome two important challenges faced by earlier studies. First, typically data limitations imply that studies cover a single country, which impedes deriving conclusions related to slow-moving processes such as changes in fertility patterns.Footnote 3 Second, published estimates of the gender wage gap rarely refer to youth alone, and are not directly comparable across countries and time.Footnote 4
Our analysis suggest that delayed fertility indeed contributed to reducing gender wage gap among youth. Point estimates consistently suggest that a one year delay in fertility timing is associated with 15% reduction in gender wage gap among youth. These results confirm that the employers’ perceptions of risk related with child-bearing and child-rearing follow demographic trends. However, the econometric model alone cannot inform if this estimate reflects accurately the risks related to fertility.Footnote 5 We complement our econometric analysis with a simulation exercise that recovers productivity gap associated to childbearing based on two inputs: the probability of child-bearing and the childcare time gap between men and women. We show that the marginal predictions from our econometric model exceed the simulated productivity gaps in the case of the US, the UK, and Ireland. For most continental European countries, we find that the marginal predictions are aligned with the productivity gap simulations. In these countries, the adjusted gender wage gaps among youth display magnitudes consistent with the data-driven simulations of productivity gap.
Our paper is structured as follows. First, we discuss the relevant literature from social sciences in section 2, identifying the knowledge gaps that our study aims to fill. Second, we move to data and methods in section 3. This paper involves extensive harmonization of vast collection of individual-level data, which we document in the paper and in the appendices. We also explain the methodological choices concerning the estimation of adjusted gender wage gaps. Section 3 concludes with extensive descriptive analyses documenting the correspondence between the AGWG and the fertility timing patterns. Section 4 delves into the results and discuss their robustness as well as limitations. The final section concludes with limitations and policy implications of our study.
2. Literature
Starting from Becker (Reference Becker1971), Phelps (Reference Phelps1972), and Arrow (Reference Arrow1973), statistical discrimination is a recognized mechanism explaining the differences in wages between a favored and a disfavored group.Footnote 6 In a nutshell, consider the argument proposed by Becker et al. (Reference Becker, Fernandes and Weichselbaumer2019): when people become parents they can experience a drop in productivity, for example, due to absences or the inability to work certain hours (see also Goldin, Reference Goldin2014). The employer does not know whether a young worker will become a parent during their tenure at a given business, but they expect that if parenting occurs, then the productivity drop for women will be larger than for men. Rational employers will internalize this expected productivity drop by offering lower wages to women. This is a form of statistical discrimination because it involves averaging over all women on the one hand and all men on the other, regardless of their actual fertility intentions and realizations. Female candidates, foreseeing this wage discounting, may find it optimal to request lower wages in order to obtain any employment.Footnote 7
This form of statistical discrimination is different from the racial gaps studied in the literature (Altonji Pierret, Reference Altonji and Pierret2001), since employers cannot learn about potential productivity drops from continued interaction with the employee. Moreover, if employers expect that women will become mothers, the fact that they remain childless up to some point increases the conditional probability of having children in the future and leads to even larger wage reductions for women compared to men. Moreover, care needs might vary over time (e.g., if children catch certain diseases). So, the uncertainty surrounding productivity drops might not be resolved with childbirth.
Experiments show that in short-term interactions, statistical discrimination against women in stereotypically male tasks (programming) shrinks when information on performance is provided, but it does not vanish entirely (Bohren et al., Reference Bohren, Haggag, Imas and Pope2019). While this result is a cause for some optimism, updating beliefs about actual workers might be insufficient to address statistical discrimination. On the one hand, employers update beliefs about the unknown population of potential workers much slower than they update their beliefs about hired workers (Pager Karafin, Reference Pager and Karafin2009). On the other hand, attention discrimination effectively limits the updating of beliefs, thus reinforcing inaccurate premises upon which statistical discrimination is based (Bartoš et al., Reference Bartoš, Bauer, Chytilová and Matějka2016).
Correspondence and audit studies have found that both the actual birth of a child (and parenthood status, see Petit, Reference Petit2007; Baert, Reference Baert2014; Bygren et al., Reference Bygren, Erlandsson and Gähler2017; Hipp, Reference Hipp2020; Kleven et al., Reference Kleven, Landais, Posch, Steinhauer and Zweimüller2019a, among others) and the potential parent status matters for the call back rates, which are usually lower for mothers and the potential mothers (Baert, Reference Baert2014; Becker et al., Reference Becker, Fernandes and Weichselbaumer2019; He et al., Reference He, Li and Han2023; Wang Chen, Reference Wang and Chen2023). Besides call backs, vignette experiments suggest that the type of positions (tasks) offered to women of childbearing age differ from offers to men and other women (Peterson Gloor et al., Reference Peterson Gloor, Okimoto and King2021).
Observational studies have consistently found that realized fertility leads to wage declines for women both compared to childless women and to men (Gangl Ziefle, Reference Gangl and Ziefle2009; Adda et al., Reference Adda, Dustmann and Stevens2017; Fuller Cooke, Reference Fuller and Cooke2018; Kleven et al., Reference Kleven, Landais and Søgaard2019b; Costa Dias et al., Reference Costa Dias, Joyce and Parodi2020; Cukrowska-Torzewska Lovasz, Reference Cukrowska-Torzewska and Lovasz2020). These studies find that wage gaps related to motherhood are prevalent across countries (Kleven et al., Reference Kleven, Landais, Posch, Steinhauer and Zweimüller2019a), and that wage gaps tend to exceed productivity gaps between parents and non-parents (Gallen, Reference Gallen2023).Footnote 8 Descriptive evidence suggests that the delayed fertility is beneficial for wages, ceteris paribus. However, the extent to which women can benefit from postponing childbirth is heterogeneous across education levels (Taniguchi, Reference Taniguchi1999) and occupations (Landivar, Reference Landivar2020). These studies do not identify the presence of bias against women before child-bearing, rather they confirm strong bias against mothers, and occasionally also a preference for fathers (e.g., Yu Hara, Reference Yu and Hara2021).
Finally, demographers have provided voluminous evidence of reverse link, i.e., that economic conditions and gender equality affect fertility decisions. Job instability and uncertainty have been linked to fertility postponement and foregoing (e.g., Vignoli et al., Reference Vignoli, Drefahl and De Santis2012; Wood Neels, Reference Wood and Neels2017; Vignoli et al., Reference Vignoli, Tocchioni and Mattei2020). Okun Raz-Yurovich (Reference Okun and Raz-Yurovich2019); Goldscheider et al. (Reference Goldscheider, Bernhardt and Brandén2013) and González et al. (Reference González, Cortina and Rodríguez2019) study the role of gender equality within households and subsequent fertility with mixed evidence.
To the best of our knowledge, this is the first study to focus on changes in fertility timing and gender inequality in wages among young workers using a comparative approach. While economists are concerned about causal identification of the effects of parenthood once it occurs, demographers and sociologists devote attention to households/couples and their fertility decisions, which left this question of paramount policy relevance somewhat orphan. In addition, typically data limitations imply that studies cover a single country, which impedes deriving general conclusions, whereas methodological differentiation makes cross-country comparisons of the estimates available in the literature a challenge.
Our study aims to fill the existing gap in several ways. First, we provide a novel and comprehensive collection of AGWG among young workers. We have harmonized nearly 1,200 individual-level data sets and obtained comparable estimates of AGWG among youth. This large collection of estimates allows us to purposefully ignore time-invariant country-specificity, such as culture, legal context, or social norms. We discuss the details in the next section. Second, we explore the link which has so far slipped from the radar of social scientists of many disciplines: we study whether AGWG among young workers declines, as employers receive sufficiently informative signals about delayed fertility of young women. We utilize several instruments to address the issue of endogeneity and thus provide causal estimates. Our results are discussed in detail in sections 4 and 4.2.
3. Data and methods
The research question at hand requires estimates of the AGWG among young workers, across countries, and for subsequent birth cohorts. These estimates constitute the dependent variable of the main analysis. Given that no such data set exists, we collected individual-level data sets, harmonized them, and obtained over 1,200 comparable estimates of adjusted gender wage gaps among youth. In this section, we describe the availability of individual-level data across countries and years. In section 3.1, we discuss the harmonizing of the acquired data, while in section 3.2 we describe the method used to measure the AGWG. The key explanatory variable of interest is mean maternal age at first birth. We discuss in detail the sources for this variable in section 3.3. We describe instruments for fertility timing in section 3.4. Section 3.6 presents descriptive statistics for the sample used in the study. Finally, section 3.5 describes the estimation strategy.
3.1. Data on gender wage gaps among youth
We collected a large number of individual-level databases. We introduced only two restrictions on the data sets to be included in this study. First, the data set has to comprise sufficient information to compute an hourly wage. Second, the data have to report individual-level characteristics, at least gender, age, and education. We relied on Eurostat, Integrated Public Use Microdata Series from the University of Michigan and LISSY service provided by Luxembourg Income Study. These data sources provide comparable samples across numerous countries based on censuses (IPUMS) or on large representative samples (Eurostat and LISSY). In addition, we also utilized data from International Social Survey Program, which is based on representative, albeit smaller, samples. These cross-country sources were subsequently complemented by individual-level data obtained from central statistical offices or analogous institutions around the world. We obtain panel data for Canada, Germany, Korea, Russia, Sweden, Ukraine, the UK, and the US. We obtain labor force survey data or household budget survey data from Albania, Argentina, Armenia, Belarus, Chile, Croatia, France, Italy, Poland, Serbia, the UK, and Uruguay. This selection of countries was driven by the availability of hourly wage data. Finally, the World Bank in cooperation with local statistical offices provides Living Standards Measurement Survey for several countries around the world, including Albania, Bosnia and Herzegovina, Bulgaria, Kazakhstan, Kyrgyzstan, Serbia, and Tajikistan. Appendix A discusses in detail each of the data sources.
Overall, we collected data for 56 countries spanning the last four decades. These databases were harmonized in order to obtain comparable estimates of adjusted gender wage gaps. The dependent variable in the decomposition is hourly wage, which is derived based on usual hours worked and total pay without bonuses. The sample is restricted to individuals aged 20–30 years old. Education was harmonized to three levels: primary or less, secondary, and tertiary or more. In most of our data sets, we are able to identify household structure. The harmonized variables include indicators for the presence of small children,Footnote 9 partnership status,Footnote 10 and degree of urbanization.Footnote 11
In addition to these basic controls, we harmonized industry and occupation, whenever it was available. Industry variable was converted into a categorical variable with six levels agriculture, construction, manufacturing, market services, non-market services, and utilities. Occupation variable was re-coded to match one-digit International Standardized Classification of Occupations. For consistency with databases collected by LISSY, these categories were aggregated to three broad levels: managers/professionals (ISCO levels 1–2), laborers and elementary workers (ISCO levels 6–9), and the residual category of occupations.
3.2. Measuring the adjusted gender wage gaps
We decompose wage differences using the approach proposed by Nopo (Reference Nopo2008). This method is based on exact matching, and it is not affected by gaps generated pertaining non-matchable individuals. Moreover, given that the approach is non-parametric, the resulting estimates are less sensitive to inaccurate model specification than regression-based decompositions. This feature is particularly important given that we apply the decomposition to a large collection of highly heterogeneous countries. Prior research found that estimates of the adjusted gap obtained using Nopo (Reference Nopo2008) decomposition prove robust to the inclusion of additional control variables (Goraus et al., Reference Goraus, Tyrowicz and van der Velde2017), which is useful given that the full set of controls is not always available for each country and data source.
The measures of the AGWG account for three sets of variables: personal (age and education), household (urban residence, marital status, and having children), and job characteristics (industry and occupation). This set of controls is common in the literature (Weichselbaumer Winter-Ebmer, Reference Weichselbaumer and Winter-Ebmer2005). It is also the largest set of controls feasible jointly in our collection of the individual data sets. With age and education we can adjust for human capital. With household characteristics we adjust for the type of the labor market as well as possible flexibility constraints and incentives (children, marital status). Finally, occupation and industry help us to account for job characteristics.
For each harmonized data set, we identify the availability of control variables and obtain AGWG for the most comprehensive set of controls, but also for subsets controls, excluding the third set, job characteristics. On the one hand, increasing the number of controls raises the internal validity of our estimates (young men and women, for whom we compute the gap, are indeed comparable). On the other hand, Nopo (Reference Nopo2008) decomposition can only be estimated within common support, i.e., the procedure excludes men and women without statistical twins from the estimation. A large proportion of individuals outside the common support limits the external validity of the AGWG estimate as representative sample of the population. One major advantage of Nopo (Reference Nopo2008) decomposition is that one can compute the share of (un)matched men and women. We use this information to determine which of the battery of estimates for each country and year is preferable.
In order to strike a balance between comprehensiveness of AGWG measure and the comprehensiveness of sample on which it was computed, we compare several estimates in each sample for combination of control variables. We utilize the estimate with the highest number of controls subject to the constraint that at least 75% of men and 75% of women are matched.Footnote 12 As a robustness check, we replicate all the estimates with the additional restriction that the set of controls must include occupation and industry.Footnote 13
3.3. Data on fertility timing
Our primary explanatory variable in this study is the decline in probability of child-bearing by young female workers, the fertility timing. We measure this process using data on mean maternal age at first birth. In any given year, increases in this variable serves as an indication that women postponed child-bearing, and employers would believe that it is less likely that women below the former mean age at first birth would bear children.
We combine multiple sources to collect data for fertility timing for the countries and years covered by the individual-level data. The data for most European countries are provided by the Eurostat (variable AGEMOTH1). United Nations Economic Commission for Europe (UNECE). The Organization for Economic Cooperation and Development (OECD) extends this data source to include some non-EU members and reports a full time series from 1960 onward.Footnote 14 In addition, Human Fertility Database and Human Fertility Collection report maternal age at first birth for some developing countries around the world.Footnote 15 Bongaarts Blanc (Reference Bongaarts and Blanc2015) provide data for a large collection of countries using Demographic and Health Surveys program. Data for China come from He et al. (Reference He, Zhang, Wang and Jiang2019). Australian Bureau of Statistics provides full extent of first birth data by the age of the mother spanning 1975–2019, which we use to calculate the means for each year. The central statistical office from South Africa provides extensive birth data based on 2011 census. Last, Population Bulletin of the United Nations reports data for selected years in the case of Brazil.
Mean maternal age at first birth is especially well suited for our study (MAB1, see OECD, 2024, for a definition). It is obtained from vitality statistics and thus it is less frequently available across countries and years than TFR. However, this measure isolates first-time mothers, not confounding the postponement of the first childbirth with the spacing of children at older ages. The advantage of using MAB1 data is that it focuses exactly on fertility timing. For example, two countries can report the same TFR or CFR, but have highly differentiated patterns of mothers’ age at the first birth. Also, it is independent of the number of children per woman. Age-specific fertility rates are scarcely available across countries and years. Moreover, age-specific fertility rates include higher parity births, for which the relation to the gender wage gap is less intuitive than for the first parity.
3.4. Instrumenting for fertility measures
We instrument for mean maternal age at first birth using the authorization of contraceptive pills. We complement this instrument with three sets of variables. We rely on drivers of family formation: (i) the length of compulsory schooling, and (ii) military conscription. In addition, we use (iii) fertility rate in the generation of mothers, that is we use total fertility rate lagged by 20 years. We describe these instruments and data sources below.
Pill authorization
As the main source of identification we use the authorization of contraceptive pills. Authorization is a purely administrative procedure. It does not imply automatically access to the drug. Authorization stems from a procedure verifying if a given product fulfills the public health criteria established by regulatory authorities. These criteria are determined by the authorities independently across countries.Footnote 16 Likewise, authorization as a medication does not preclude that it is dispensed at all.Footnote 17
Thus, authorization does not necessitate adoption or utilization, but lack of it almost certainly precludes them. Specifically, in many countries the pill was authorized but it was not adopted. Finlay et al. (Reference Finlay, Canning and Po2012) report wide dispersion concerning the channels of distribution and administering. For example, in some countries, once authorized, contraceptive pills were initially solely distributed as treatment for hormonal disorders, whereas in other countries it was available only to married women.Footnote 18
The literature on pill and women's labor supply decisions is rich.Footnote 19 To the best of our knowledge, there is no literature studying the role of pill in forming employers’ beliefs. Most of our AGWG estimates correspond to cohorts joining the labor market in the mid to late 1990s rather than cohorts directly exposed to the introduction of the pill. This feature distinguishes our approach from the earlier literature: for the cohorts covered by AGWG estimates the link between the pill and fertility timing is mediated by a number of channels. First, there appears to be improvement in the quality of parents, which may strengthen the shift in general social norms toward gender equality in reaching professional aspirations (Ananat Hungerman, Reference Ananat and Hungerman2012). Second, inter-generational transmission of norms between mothers and daughters is a recognized phenomenon (Booth Kee, Reference Booth and Kee2009; Kolk, Reference Kolk2014; Boelmann et al., Reference Boelmann, Raute and Schönberg2020), thus earlier authorization of pill in a given country raises the odds that younger generations, which we analyze in this study, had mother's generation with access to the pill. Third, prevalence of pill hints that fertility is more likely to be timed in line with professional career, which may be viewed by employers as potential for bargaining (even if only indirect).
With this instrument, concerns for reverse causality are negligible. The pill was authorized (the focus of our instrument, rather than adoption) in most countries of our sample in early to late 1970s. The birth cohorts studied in our estimation the procreative age and the labor market at the earliest in the 1990s. There is no reason to expect that gender wage gaps for youth active as of 1990s influenced the timing of the technical authorization procedure for the pill several decades earlier. A violation of the exclusion restriction would require that conditional on fertility timing, employers varied their expectation concerning other characteristics of women, and that this variation is linked to when the pill became authorized.
Compulsory education
The reforms in compulsory education have been previously demonstrated to causally affect both fertility timing and fertility level (Black et al., Reference Black, Devereux and Salvanes2008; Cygan-Rehm Maeder, Reference Cygan-Rehm and Maeder2013). We use data on the number of years in compulsory education. These data are provided by UNESCO for all countries as of 1998. For the years before 1998 we infer the years in compulsory education from the available literature (e.g., Brunello et al., Reference Brunello, Fort and Weber2009; Murtin Viarengo, Reference Murtin and Viarengo2011; Fenoll Kuehn, Reference Fenoll and Kuehn2017). The data do not include pre-primary education. To fill the gaps for several countries missing from the data sources listed above, we utilize country legislation as reported in Right to Education Initiative. In Canada, where the compulsory length of education is set at the level of provinces, we use estimates by Oreopoulos (Reference Oreopoulos2005).
The use of this instrument is not without weaknesses. Across countries there are substantial differences in the meaning of compulsory education. In some cases, the relevant metric would have been the legal school leaving age, whereas in others length of compulsory education is just as informative. For example, compulsory education in Mexico formally lasts 14 years, whereas it lasts 9 years in Czech Republic, with the school entry age at 6. However, in the latter case, the parents are legally bound to provide for child's education until the age of maturity, that is 18th birthday, which makes education de facto mandatory for 12 years and high school dropout rates are much lower in Czech Republic than in Mexico. The imperfection of this measure introduces noise to our first-stage estimates.
The exogeneity of the instrument hinges on the lack of gender-specific consequences from these reforms. The exclusion restriction would be violated if the marginal increase in the years of compulsory education was differentiated between boys and girls. The evidence is scarce. Devereux Hart (Reference Devereux and Hart2010) study the reform in Great Britain, showing similar take up rate. They show some minor differences in terms of returns (positive for boys, imprecise estimates for girls). Pischke von Wachter (Reference Pischke and von Wachter2008) find no evidence of returns to marginal increase in compulsory education in Germany, neither for boys nor for girls.
Military conscription
The third instrument corresponds to the length of military conscription in months. The length of military conscription can drive mean age of women at first birth through several channels. On the one hand, longer military conscription could lead to postponing fertility. First, longer periods of conscription could lead men to postpone family formation until the conscription is over. Second, even if individuals find their partners before enlisting, they could be deployed or relocated, thus facing obvious obstacles to conceiving children. Accordingly, compulsory military service may rise mean maternal age at first birth. On the other hand, military service provides stable, guaranteed income, which may reduce earnings uncertainty and thus encourage child-bearing. Similarly, military service can provide skills relevant for future employers, thus raising earnings potential among men and encouraging child-bearing. Given that military conscription can work in either way, we do not hypothesize on the sign of the coefficient in the first stage regression. We use data on the number of months as provided by Mulligan Shleifer (Reference Mulligan and Shleifer2005) and extend it for time and countries using the same data source, that is the Military Balance which is published annually. We supplement this database with the records of War Resisters’ International and the World Factbook.Footnote 20 This variable exhibits variation over time and countries. This variable ranges from zero, i.e., there is no conscription, to 48 months in Israel. Given that in some countries the duration is established as a range, models include two instruments: one for the lower bound and one for the upper bound.
Empirical evidence tentatively suggests that conscription does not affect gender earnings gap, because its effects on earnings remain small. This makes compulsory military service a suitable instrument for fertility timing in AGWG regressions. Mouganie (Reference Mouganie2019) finds that lifting conscription in France did not have a significant impact on wages or employment later on, even though eligibility increases the years of education (as people studied to avoid conscription). Similarly, Grenet et al. (Reference Grenet, Hart and Roberts2011) also fail to find significant returns to conscription in another regression discontinuity design. Card Cardoso (Reference Card and Cardoso2012) find no wage effects of military conscription in Portugal (except for men with only primary education).Footnote 21
Fertility among mothers’ generation
The fourth instrument utilizes data on total fertility rate in the generation of mothers of the individuals in our sample. For example, if a sample for a given country comes from 2000, and we restrict the individuals used in the estimation to between 20 and 30 years of age, we take the data for 1980 (=2000 − 20) for that country. The data total fertility rate of mothers’ generation comes from The World Bank. There is broad evidence for the inter-generational transmission of fertility norms covering both the demographic transition of the 19th century (Pearson Lee, Reference Pearson and Lee1899) and the current demographic changes (Steenhof Liefbroer, Reference Steenhof and Liefbroer2008; Kolk, Reference Kolk2015), which makes this instrument plausibly correlated with mean maternal age at first birth measured contemporaneously. Clearly, there is no room for reverse causality with this instrument (the fertility of the mothers’ generation has already been realized and is well known). Likewise, there is little reason to expect that it has effects on gender wage gaps other than via forming the employers’ beliefs about the contemporaneous timing of fertility.
Among the four instruments in our study, military conscription, compulsory schooling, and lagged fertility have country-by-year variation, whereas the pill authorization is essentially one year for each country. For each country–year sample in our data we construct an indicator measuring how many years have lapsed since the introduction of the pill. In our preferred specifications, we use Baltagi (Reference Baltagi1981) estimator, which is efficient in a setup with both time-invariant and time-varying instruments. We adjust the standard F-statistic validating the instruments to account for time-variant and time-invariant components of the instruments.
3.5. The estimation method
We model the link between fertility timing (FT) and adjusted gender wage gap among youth (AGWG):
where i denotes country, s denotes data source, and t denotes time. A negative and statistically significant coefficient implies that AGWG follows adjustments in fertility timing. In other words, objective risks associated with the costs of child-bearing and child-rearing to the employer transmit to the wage gaps imposed by those employers. Finding evidence for statistical discrimination does not imply that it is the only reason for unequal pay for equal work.
The variable FT is potentially endogenous in the presence of time-varying unobserved variables that affect both the AGWG and fertility choices. If that is the case, β is biased as an estimate of the link between fertility timing and gender inequality. Hence, we also estimate the relationship using an IV approach. Our two-stage IV model takes the form:
where PILL denotes the instrument obtained from contraceptive pill authorization, EDU corresponds to compulsory schooling duration, CONSCR is based on military conscription data. Finally, $M\_FERT$ utilizes variation in total fertility rates from the period when the mothers generation was in reproductive age. Due to the nature of our instruments, we utilize the random-effects estimator proposed by Baltagi (Reference Baltagi1981) with fixed effects for data source and specification used to obtain AGWG i,s,t. As is conventional in the literature (Heckman Navarro-Lozano, Reference Heckman and Navarro-Lozano2004; Mogstad Torgovitsky, Reference Mogstad and Torgovitsky2018), the estimates of θ, ϱ, μ, and $\varsigma$ are vectors accounting for cross-sectional component of each variable, time-varying component of each variable; for levels and basis functions up to the fourth polynomial for each variable. Our preferred specification utilizes all available instruments, but for robustness we provide estimates that use subsets of instruments. The time trend is estimated as common across countries, otherwise we would not be able to identify β IV parameter. Estimating the model given by equations (2) and (3), we cluster model disturbances at the level of country, data source, and AGWG specification controls.Footnote 22
3.6. Descriptive statistics
In total, we obtained 1,233 unique estimates of adjusted gender wage gaps computed for individuals aged 18–30 years old.Footnote 23 This collection of estimates covers 55 countries for 37 years. The specific number of estimates for each country and data source is reported in Table A1 in Appendix A. This number of estimates reflects the unique combinations of country, data source, and year. The final sample includes estimates for 1,106 countries, years, and data sources for estimates of AGWG among youth, as for some country and years fertility timing and/or our instruments are missing.
The average AGWG suggests that at the beginning of their careers men earn 12.6% more than women (7.9% without adjustment). The gender gaps are quite spread. In some cases, we observe negative values, which indicate that in a given country × year × data source women earn more than men. More broadly, adjusted estimates tend to be larger, as seen also in the last percentile. This is also illustrated by Fig. B1 in Appendix B, where we plot adjusted and unadjusted estimates against each other. Most estimates lie above the 45 degree line, which indicates that the adjusted gaps are larger than the unadjusted ones.
Note that the levels of AGWG in our study are different than those reported in reviews by, e.g., Blau Kahn (Reference Blau and Kahn2017). This is because our study reports gaps for youth, as opposed to individuals in working age in most other studies. Given that we have data for individuals across all age groups, we additionally compute the gaps for all working-age individuals, and these prove to be comparable to estimates reported in earlier studies. They also exhibit similar time trends. Table B3 in Appendix B reports the time evolution of both gender wage gaps as well as mean maternal age at first birth. The portrayed time trends adjust for data source and country composition and thus are not driven by the differentiated data availability across countries. In columns (1) and (2) we report the time trend for the working-age populations, whereas in columns (3) and (4) we report analogous estimates for the youth. Earlier literature reports essentially no time trend for the AGWG at the country level (Weichselbaumer Winter-Ebmer, Reference Weichselbaumer and Winter-Ebmer2005), which we replicate in our data spanning nearly two decades more. We add evidence on time trends for AGWG among young individuals. This trend is negative, but the decline is slow: 0.16 of a percentage point each year. Given an average adjusted gap of roughly 12%, it would take almost a century for the gap to disappear. By contrast, the time trend on mean maternal age at first birth is large and positive. On average, it rises by a full year, every twelve years. Notwithstanding, the trend toward a higher mean maternal age at first birth is not universal. For numerous countries in our sample this statistic falls over time. We portray the full distribution of our youth AGWG data and mean maternal age at first birth in Fig. B2 in Appendix B.
As discussed earlier, our measure of fertility timing is the mean maternal age at first birth. Table 1 reports the average value in our sample at around 27 years old (see also Fig. B2 in Appendix B). Figure 1 displays two measures of association between between fertility timing and the AGWG for young workers. The figure on the left presents raw data, and as such is a measure of the unconditional correlation. The figure on the right is obtained after detrending both MAB1 and the youth AGWG.
Table 1 reports the unadjusted descriptive statistics for the same instruments for all observations. We report the relationship between fertility timing (MAB1), and each of the instrumental variables in Fig. B3, also residualizing the time trends. All proposed instruments display strong correlation to fertility timing. Most links tend to be non-linear. We report variation in the year in which the pill was authorized in each country. Authorization, as we emphasized, is no evidence for adoption in any given country. It only states that the pill was legally admissible in a given country. The time since the introduction of the pill is associated with lower MAB1 for most part of the distribution. We report also the descriptive statistics for the duration of compulsory education, the duration of military conscription, and mothers’ generation fertility rate. More years of education are accompanied by an increase in the mean maternal age at first birth. By contrast, higher values of past fertility rate and months of mandatory conscription are associated with lower mean maternal age at first birth.
4. Results
We report the results in two substantive parts. First, we focus on our empirical exercise, reporting the estimates from panel regressions and instrumental variables estimation in sub section 4.1. These results are indicative of the strength of the statistical relationship between fertility timing and the estimates of the AGWG. The second substantive part of our analysis is presented in section 4.2, where we provide a theory-disciplined data-driven benchmark for the obtained coefficients.
4.1. The effects of gradually delayed fertility on AGWG
Delayed fertility implies lower AGWG among young workers. Our estimates indicate that a rise in mean maternal age at first birth by one year leads to a reduction of the AGWG among young workers of roughly 2–3 percentage points. We provide estimates for a broad array of specifications in Table 2. In columns (1)–(3) we report panel IV estimations, whereas column (4) reports analogous results from a panel OLS estimation. Column (1) reports the coefficient for mean maternal age at first birth, when all four instruments are utilized. Subsequent two columns report estimates for specifications with the pill authorization as the only instrument in column (2) and for the remaining three instruments in column (3).
Notes: AGWG stands for adjusted gender wage gap. IV specifications using Baltagi (Reference Baltagi1981) estimator with time-varying and time-invariant component, random-effects models, include year, specification, and data source fixed effects. Column (1) with all instruments jointly. Column (2) with the pill authorization as the only instrument. Column (3) with all instruments but the pill authorization. In all IV specifications, we include linear term and base functions up to a fourth polynomial. We report the reduced form estimates in Fig. B4
In panel A, for each country, year, and data source we utilize one estimate, that with the maximum number of available control factors subject to the constraint that 75% of individuals find a match among the opposite gender. In panel B, for each country and year we apply the same restriction as in A, but only among estimates without controls for industry and occupation. In panel C, for each country and year we impose an additional restriction that estimates should adjust for occupation and industry.
All specifications include a time trend, data source fixed effects, and adjust for the AGWG model specification. The OLS specifications adjust for weight = 1/N c,y, where N c,y denotes a number of data sources for given country in a given year. All columns report estimates for young workers (between 20 and 30 years of age).
Standard errors clustered at country–data source–controls level. Asterisks ***, **, and * denote significance at 1%, 5%, and 10%, respectively. Full set of estimates from the first and the second stage regressions is available upon request.
The estimated effect of 1.9–3.3 percentage points implies that the observed delay in fertility amounts to a decline between 20% and 30% of the inequality in wages adjusted for differences in individual characteristics. This magnitude is robust to the inclusion of multiple control variables. The F-statistics reported in Table 2 are large, well above the conventionally assumed thresholds (Lee et al., Reference Lee, McCrary, Moreira and Porter2020) which speaks to relevance of the utilized instruments. Admittedly, with the Baltagi (Reference Baltagi1981) estimator, the F-statistics explores both the cross-section and the time in the variation of instrumental variables.Footnote 24
Table 2 displays three panels. Panel A reports the results for the best AGWG estimates, as defined earlier. Panel B reports estimates analogous to panel A, but on a sample restricted to AGWG estimates that do not adjust for occupation and industry. This sample selection reflects the premise that potentially the process behind industry and occupation choice is not gender neutral and the returns to those characteristics should be included in the unexplained component. Finally, panel C reports estimates obtained from a sample where all AGWG measures adjust occupation and industry, which corresponds to a narrow definition. Recall that for data points (country × year) it is possible that no estimates for panel C are available, but all countries and years included in panel C, will be included in panels A and B. By focusing on AGWG estimates that meet some criteria, the sample size drops from panel A to either B or C. The sample size decreases more in panel C, which reflects both the fact that some databases do not collect information on occupation or industry, and that the inclusion of these variables reduces the common support, i.e., in some database the percentage of matched individuals of each fender drops below the 75% threshold. The restriction in panel C implies a drop from 1100 + samples (country × year × data source) to 800 + samples. In spite of differences across samples, estimates of the relationship between fertility timing and the AGWG remain essentially unaffected.
4.1.1. Robustness
While the Baltagi (Reference Baltagi1981) estimator is particularly well suited to our data, we study the sensitivity of the overall conclusions to alternative instrumental variable estimators. The OLS estimate of β reported in column (4) of Table 2 adjust for country fixed effects, which we replicate in an IV setting in columns (1) and (2) of Table 3. We employ conventional fixed effects IV in column (1) of Table 3, subsequently high-dimensional fixed effects IV estimator in column (2). Both prove to deliver highly statistically significant estimates of β IV and of similar magnitude as the IV estimates reported in Table 2.
Notes: IV estimates of the relation between adjusted gender wage gaps (AGWG) and fertility timing. Standard errors clustered at the country–data source level in parentheses in columns (1)–(5). Confidence intervals (95%) in brackets in columns (6) and (7). We report estimations analogous to column (1) from Table 2. In column (1) we use fixed-effects IV estimator (FE 2SLS). In column (2) we use high-dimensional fixed-effects IV estimator (HDFE). In columns (3)–(5) we utilize Firpo et al. (Reference Firpo, Fortin and Lemieux2009) recentered influence function transformation of the estimator from column (1) in Table 2, the transformation is for the 25th, 50th, and 75th percentile of AGWG. In columns (6a) and (6b) we provide estimates of quantile IV estimator, accounting for the quantiles of mean maternal age at first birth (intercepts and slopes).
For each country, year, and data source we utilize one estimate, that with the maximum number of available control factors subject to the constraint that 75% of individuals find a match among the opposite gender.
All specifications include time trends, data source fixed effects, and adjust for the AGWG model specification. Asterisks ***, **, and * denote significance at 10%, 5%, and 1%, respectively.
Analogous set of estimates for sample restricted as in panel C of Table 2 is reported in Table C1 in Appendix C. Full set of estimates from first and second stage regressions is available upon request.
Indeed, whether we use the Baltagi (Reference Baltagi1981) estimator or its alternatives, the estimates prove similar to the OLS in economic terms. In statistical terms, the IV specifications yield estimates roughly 25% higher than in the OLS specifications, which corresponds to an increase from roughly 2 to 2.5–3 percentage points. Ishimaru (Reference Ishimaru2022) proposes to decompose the OLS-IV gap into the weights implied by covariates, the weights implied by treatment, as residual, the endogeneity bias. When we apply this approach to the estimates obtained in column (1) of Table 3, each of the three components proves insignificant for the full sample of AGWG estimates. In the restricted sample, there is some role for the weights implied by treatment levels. Likewise the endogeneity bias remains insignificant. In other words, the reverse causality bias is not large in statistical terms: in a sample covering over fifty countries spanning four decades the timing of the first child does not appear to be driven by the prevailing AGWG, whereas the weights implied by treatment levels exhibit an effect roughly 25% higher than the OLS estimates.
Table 3 tests of heterogeneous effects along two dimensions. We also employ a quantile IV estimator to study the magnitude of the effects of fertility along the distribution of AGWG. The results are presented in columns (3)–(5) for the 25th percentile, median, and 75th percentile, respectively. This estimation uses unconditional quantiles of AGWG via recentered influence function (Firpo et al., Reference Firpo, Fortin and Lemieux2009) and model specification analogous to column (1) from Table 2. The estimated effect of fertility timing on AGWG amounts to roughly −0.02 for the 25th and 50th percentile, it appears to be somewhat higher for the 75th percentile, but still within the ballpark of Baltagi (Reference Baltagi1981) estimates reported in Table 2.
Next, we explore the heterogeneity along the distribution of fertility timing. We conduct this analysis for the intercepts of the link between fertility timing and AGWG and for the slopes. We split the sample into low (below the 25th percentile), medium, and high (above 75th percentile) mean maternal age at first birth. For the intercepts, we take a set of dummies for medium and high mean maternal age at first birth as endogenous variables. For the slopes, there are three endogenous variables: the first takes on the value of mean maternal age at first birth if it is low and zero otherwise, whereas the second and the third take on the medium and high values, respectively. In all other respects, these specifications are analogous to column (1) from Table 2. The level effect appears to be the highest when mean maternal age at first birth is within the first quartile, but this observation may partly be a consequence of the fact that the fourth quartile in terms of fertility timing is very close to the upper boundary on the age in the estimation of adjusted gender wage gaps.Footnote 25
The OLS and the IV estimates of delayed fertility on gender wage gaps at young age reveal an effect of roughly 2-3 percentage points. Given that mean maternal age at first birth in our sample increased from 25.96 in 1990s to 28.16 in 2010s, these estimates imply that essentially a third of the decline in AGWG among young workers can be attributed to delayed fertility. Note that this delay in fertility is of paramount importance from the perspective of the employer–employee relationship: Cazes Tonin (Reference Cazes and Tonin2010) show that the mean employment tenure of European workers below 30 years old in 2006 was between 1.5 and 2.5 years. A year of fertility delay is a sizable chunk of the average tenure in these countries.Footnote 26
4.1.2. Discussion
The instruments used in our study have been previously demonstrated to affect various margins of fertility, labor supply, and human capital in individual-level investigations. It is thus not warranted that fertility timing is the only channel through which our instruments affect the AGWG.Footnote 27 We study if and to what extent our instruments could operate through other channels, that is in addition to instrumenting for fertility timing. We explore three distinct channels: employment, education, and (contemporaneous) fertility rates. In addition, we account for GDP per capita as a measure of labor productivity (akin to the opportunity cost of not working). Table C2 in C shows that once we account for these direct channels, the IV estimates remain in the same ballpark (despite considerable reduction in the countries and years due to data limitations). Unfortunately, time-varying information concerning gender norms is not available for a large group of countries in our sample, so we cannot study if they are affected by our instruments and at the same time could affect both women's and men's wages.
Further, we study if the effects on youth, as identified in Table 2 are relevant for the labor market in general. To this end, Table C3 in Appendix C reports estimations analogous to Table 2, where the dependent variable is AGWG for all age groups. To run this analysis, we re-estimate AGWG from individual-level data without age restriction (including in the AGWG estimation the controls for age groups). We then re-estimated the specifications to obtain analogous results for all working-age groups. The point estimates are around half the size, but they remain negative and statistically significant. This corroborates the intuition that fertility timing has persistent effect on gender inequality among age groups.
In the main results, fertility timing is captured by the mean maternal age at first birth. Table C4 in Appendix C explores the relation between gender wage gaps and fertility level (as measured by TFR). This check amounts to replacing MAB1 with TFR as the endogenous variable in equations (2) and (3), and re-estimating the IV regressions. Consistent with our earlier discussion on how to capture fertility timing, there is no evidence of a relationship between TFR and the estimated gender wage gaps.Footnote 28 These estimates should be taken with caution, as one cannot prove the lack of a relationship. Moreover, delayed fertility can in principle imply fewer children and hence lower fertility rates. However, these patterns are not visible in the data, whereas the patterns for fertility timing are strong and robust.
Our approach rests upon using several instruments simultaneously. In practice this requires partial monotonicity, that is conditional on all other instruments no remaining instrument affects the endogenous variable in opposite directions across countries or years (partial monotonicity, see Mogstad et al., Reference Mogstad, Torgovitsky and Walters2021). Our results remain robust to including one instrument at the time as well as a combination of instruments, which is somewhat reassuring that partial monotonicity is not violated in our case. These estimates are available in Tables C5 and C6 in Appendix C. Table C5 explores changes related to the inclusion of each set of instruments one by one, whereas C6 includes only one instrument at a time.
Our estimates of AGWG adjust for the fact of having children: when comparing the wages of men and women, child-rearing is taken into account. However, one way to think about our results would be to focus only on young individuals with children. However, such exercise could be misleading in a sense that estimates of AGWG among parents alone ignore the marginal parent, whereas those obtained for parents and non-parents average among the marginal parent. Furthermore, this exercise would further reduce sample sizes for many countries. This would be especially acute for the sample coming from ISSP, thus reducing substantially the non-European variation in our sample.
Nopo (Reference Nopo2008) ensures that gender wage gaps are computed as a weighted average of differences among men and women of identical characteristics. It is possible that the relative importance of high wage gap groups declines and share of low wage gap groups increases. For example, as higher fraction of individuals have a university degree, the contribution of the gap between men and women with this level of education to the total AWGW becomes larger. This could be particularly relevant in our case if changes in mean maternal age at first birth were also driven primarily by composition changes. For example, groups with early fertility would become less numerous whereas groups with delayed fertility would become more numerous in a given country in a given year. One should expect no adjustment in the beliefs of the employers among their potential pool of employees, but weighted average of wage gaps and weighted average of mean maternal age at first birth would display co-movement. In such case, our estimates would reflect composition changes rather than adjustment of the beliefs of specific employers. We cannot rule this channel out in our estimations. However, in Table C2 in Appendix C we adjust the estimates for a broad array of measures of the composition of workers and their variation over time. Our results remain unaffected.
4.2. Benchmarking statistical gender discrimination
While statistical discrimination may be viewed as an injustice, as much as any form of group responsibility, it is also conceived as economically rational under information asymmetry when no credible separating equilibrium exists. The upside of statistical discrimination is that the mechanism relies on rationality of employers: it ought to reflect the expected productivity gap between young men and women. To establish an empirical analog of the productivity gap, we consider the following framework.
Young adults draw a parental type c i from a distribution, with E m(c i) = c m < E w(c i) = c w, which denotes how involved they will be in the care of the child.Footnote 29 In principle, c i can take any real value. While for most parents c i would be positive, some parents might have no cost of caring, and some might even see positive productivity spillovers from having children, e.g., if having a child improves their motivation to work. The assumption of c m < c w states that costs are on average higher for women (see Becker et al., Reference Becker, Fernandes and Weichselbaumer2019). Parental type is private information, and it cannot be communicated to the employer ex ante (before becoming a parent), whereas ex post (after becoming a parent) credible communication is costly, e.g., it depends on child characteristics which may vary over time.Footnote 30
Individuals draw a procreation type: with probability π i an individual becomes a parent, bearing the associated productivity costs within the window of contract duration. With the complementary probability they remain without children for the duration of the contract. Assume that the probability of becoming a parent (π i) is independent of the productivity change associated with parenting costs (c i) and it is drawn from a distribution common for both genders.
Conditional on human capital h, workers’ productivity is equal to h − c i if they bear the costs of parenthood c i which occurs with probability π i or h if they have not become parents. The employer can observe human capital h, but before parenting the employer cannot know the individual π i. Moreover, even after becoming a parent, employers cannot fully observe c i, hence they average over workers. The expected productivity of the worker and thus the wage w is given by
where the last equality follows from the fact that cov(π i, c i) = 0.
Under statistical discrimination, the employers offer wages in expectation of individual productivity, averaging over groups. With probability π i of bearing the parenthood costs, the AGWG becomes:
Thus, if statistical discrimination stands behind (adjusted) gender wage gaps, a decline in E(π i) ceteris paribus should imply narrowing of the AGWG. Note that even if π i is the same across all individuals, there is still averaging due to unobservable differences in productivity (c i) between parents and non-parents as well as between mothers and fathers.Footnote 31
Equation (5) takes the employers’ perspective and portrays the link between objective differences in productivity across genders and the (adjusted) gender wage gap. We aim to obtain analogs of c w − c m and E(π i) from observational data to compare our estimates of E(w m|h) − E(w w|h). We then juxtapose E(π i) × (c w − c m) to empirical estimates of E(w m|h) − E(w w|h). Rationality imposes that the employers form expectations based on the observed probability E(π i) = π of incurring the cost c.
4.2.1. Implementation
Probability of child-bearing
Denote p(a) to signify age (a) specific fertility rates for the first parity. Then, the probability of becoming a parent during a contract window (conditional on not being a parent at the moment of hiring) is given by
Effectively, equation (6) an upper bound expectation in a sense that the contract duration with a given employee can be shorter than the age brackets a ∈ [20, 30] (in other words: the actual probability faced by the employer cannot exceed this value).Footnote 32 The age brackets of between 20 and 30 are set to be consistent with the age groups for which adjusted gender wage gap was estimated. We construct this indicator using information about the number of first births to women of a given age in a given year. The data come from the Eurostat for European countries and from the Human Fertility Database for the US.
Productivity cost
Productivity loss is akin to a reduction in productivity endowment. This endowment is identical for both men and women, regardless of whether they have children or not. To capture c w − c m, we resort to the time-use data. We obtain time-use surveys from the Center on Time Use Research at University College London. The center provides Multinational Time Use Study, which is an effort to harmonize the available time-use surveys.Footnote 33
In the time-use surveys, household members report the time spent on caring. Child-bearing is not the only reduction to the time endowment. Individuals may have other caring responsibilities and social norms may be driving the gender distribution of those functions. We distinguish between parents in households with children and independent adults in households without children. For each person we obtain the measure of time spent on caring. We then apply formula (7) to obtain the measures of c w − c m. We aggregate this individual-level data to recover information on caring by men and women, with and without child-rearing responsibilities, aged between 20 and 30 years old. We construct four mean (median) measures: for men without kids, men with kids, women without kids, and women with kids. Based on these data, we compute the reduction in the time endowment of T hours per week by the mean (median) number of hours spent on caring and we obtain c w − c m as:
where t denotes time spent caring, w and m denote women and men, respectively, whereas k and ~k denote with and without children, respectively. Conventionally, we set T = 80 hours per week. The reduction in time endowment proxies the reduction in production capacity. A potential concern is related to possibly disproportionate allocation of household chores among mothers, when compared to childless men and men with children. To address this concern, we provide a set of analogous results with c w − c m measure augmented with household chores.
We combine data availability of data-driven proxy of c w − c m from the time-use data with the value of π computed using the Human Fertility Database and the Eurostat data. The obtained measures of c w − c m and π are available only for a small selection of the countries included in our sample: these are wealthy countries with several decades of rolling out gender equality policies. Thus, we do this exercise to test if estimates obtained from our main results in Table 2 (panel A, column 1) lie in the same ballpark as the simulated (c w − c m) × π from observational data. If that is the case, then it seems that the extent of AGWG coincides with accurate statistical discrimination. If, however, (c w − c m) × π falls short of the estimated AGWG, then either statistical discrimination is inaccurate or additional stereotyping is involved. The results are portrayed in Fig. 2.
In the case of the US and the UK, the estimates of AGWG are higher than most of the simulated values implied by (c w − c m) × π. In these two countries, availability of multiple waves of time-use permits tracing dynamics over time and the higher values of (c w − c m) × π tend to be associated with higher values of AGWG estimates, ceteris paribus.Footnote 34 We infer that the existing AGWG among youth far exceed the productivity gap implied by the data on the timing of fertility and actual reduction in time endowment due to caring obligations. The gap is rather small for Spain and in the ball park of the productivity gap estimates for Austria.
In Appendix E, we present analogous estimates using data from International Social Survey Program (ISSP). Using the ISSP data allows us to expand the list of countries to be studied (in total: 14 out of 55 countries in our estimations). This replication of the benchmarking exercise for a broader group of countries reveals a similar conclusion: for all countries at least one of the measures falls in the confidence interval of the AGWG estimates.Footnote 35 Our results are consistent with the findings of Gallen (Reference Gallen2023) for Denmark, who shows that the pay gap for mothers is entirely explained by productivity, whereas the gap for non-mothers is not.
5. Conclusions
Statistical discrimination – regardless of its legal status and ethical consequences – stems from the idea that rational employers internalize productivity gaps when maximizing the expected payoff from hiring a worker. Consequently, when hiring workers who are expected to deliver lower productivity, the employers discount that fact in wages. For statistical discrimination to be consistent with the data, employers need to adjust their expectations concerning the productivity gaps in line with the data. A delay in fertility observed around the world over the past decades provides a convenient context for evaluating if (adjusted) gender wage gaps among young workers are consistent with the hypothesis of statistical discrimination. In this study, we provide estimates of adjusted wage gaps between young men and women from 56 countries around the world, spanning four decades and compare those estimates with the evolution of mean maternal age at first birth.
We find significant effect of delayed fertility on adjusted gender wage gaps among youth. This result proves robust to the estimation method. The effect estimated through instrumental variables amounts to roughly 2 percentage points decline in AGWG per one year delay in the first parity. This effect is sizable, amounting to 15% of the overall youth AGWG and about 30% of the observed decline in AGWG among young workers over the past decades.
The fact that AGWG for young workers declines with delayed fertility is not proving that entire gap is due to statistical discrimination: employers could adjust slowly their expectations to changes in fertility timing, or they may view the productivity costs implied by motherhood as higher than they actually are. Both inaccuracies would imply stereotypes and heuristics inconsistent with the hypothesis of statistical discrimination. To address this issue we provide simulations for the productivity gap of young women, relative to men. For some countries, the implied productivity cost of parenting is well aligned with the range of AGWG implied by our model. We also illustrate that for a group of European countries the range of AGWG estimates largely exceeds the productivity costs implied by the data-driven probability of first parity among young women active in the labor market.
Our study contributes by demonstrating that in general the employers correctly receive the signal about the changes to the probability of child-bearing and adjust downwards AGWG in the light of delayed fertility. This adjustment is accurate in terms of magnitude for some countries, whereas in others we are able to show that the estimates of AGWG are in excess of what would be justifiable given the observed distribution of age at first birth and the costs associated with motherhood. This may explain why audit studies on motherhood penalty finds such conflicting evidence: from strong discrimination against would-be mothers in some countries to virtually no differences in call-back rates.
While our study is able to bridge several gaps in the existing literature, caution is needed in interpreting the results. In terms of data, our study covers 56 countries, some of which have yet to undergo the second demographic transition. Although extending the study to comprise other countries is currently impossible, our estimates do not need to apply to employers in countries where individuals aged 20–30 years old have children with near certainty. Data limitations in terms of individual-level wage data and demographic data on first parity constrained our ability to study countries with high levels of fertility rates and low age at first birth.
In terms of methodology, we introduce four instruments to identify the causal effect of delayed fertility on adjusted gender wage gaps. The statistical properties of the first stage regression appear satisfactory, yet the IV estimates are qualitatively very similar to the linear model. More research is called for to determine the magnitude of the reverse causality bias, that is to study the role of gendered labor market inequality in the timing of child birth.
In terms of policy implications, our study shows that probability of child-bearing is reflected in wage offer for young women relative to men. This hints that greater equality sharing of the care between mothers and fathers can help the labor market position of young mothers and especially would-be mothers. Exploring further the role of sharing the care is a promising avenue for future research.
Given our results, future research could focus on study whether and to what extent the shifts in the gender wage gap by age follow the same pattern as the shifts in the age-profile of delivering a first child. To the best of our knowledge, such analysis could be performed only for a few advanced economies, of which one – the United States – exhibits patterns very different from the rest of the world. More research is needed to understand the origins of these country-specific developments and extracting further insights common across countries.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/dem.2024.21.
Acknowledgements
The authors express gratitude for insightful comments and suggestions by Michele Belot, Shelly Lundberg, Anna Matysiak, Michal Myck, Filip Pertold, Barbara Petrongolo, Olga Popova, Nuria Rodriguez-Planas, Andrea Weber, and participants of EEA 2019, Gender Workshop in Prague 2019, EACES 2021, LABFAM seminar as well as seminars at GRAPE, WSE, and University of Warsaw. We have received great help from John Bongaarts, Janos Kollo, Mathias Nathan, and Yaer Zhuang. The assistance of LISSY data facility by LISER is gratefully acknowledged. Kamil Sekut provided wonderful research assistance. Financial support by National Science Center grant no. #2017/27/L/HS4/03219 is acknowledged. This paper is an outcome of a joint NCN-LNT DAINA initiative. The authors do not have any conflicts of interest to disclose.
Competing interests
Joanna Tyrowicz and Lucas van der Velde acknowledge the support received by the National Science Centre (Narodowe Centrum Nauki) through the project grant #2017/27/L/HS4/03219. The authors do not hold any position that could create a conflict of interest.