Statistical power, an integral part of the research process, underpins the design and evaluation of empirical research. Beyond just good science, it is a near universal expectation from granting agencies (Chow et al., Reference Chow, Shao, Wang and Lokhnygina2017; Descôteaux, Reference Descôteaux2007). These agencies typically expect researchers to determine adequate sample sizes through a priori power calculations (Cohen, Reference Cohen1988; Jackson et al., Reference Jackson, Gillaspy and Purc-Stephenson2009; Maxwell et al., Reference Maxwell, Kelley and Rausch2008). Post hoc calculations are equally important — they facilitate evaluating the conclusions drawn from any given study (Levine & Ensom, Reference Levine and Ensom2001). Within the field of behavior genetics, particularly in the context of ACE models, power refers to the probability of correctly rejecting the null hypothesis — that genetic or common environmental effects have no impact on the outcome traits (Verhulst, Reference Verhulst2017; Visscher, Reference Visscher2004). The ACE model is a statistical model that posits that additive genetic factors (A), common environmental factors (C), and specific (or nonshared) environmental factors plus measurement error (E) account for individual differences in a phenotype. Previous studies have thoroughly discussed how sample size, variance components, and the ratio of twin types impact the power of parameter estimation via mathematical derivations (e.g., Visscher, Reference Visscher2004; Visscher et al., Reference Visscher, Gordon and Neale2008), computer simulations (e.g., Verhulst, Reference Verhulst2017), or a combination of the two (Martin et al., Reference Martin, Eaves, Kearsey and Davies1978; Sham et al., Reference Sham, Purcell, Cherny, Neale and Neale2020). Notably, these studies all use monozygotic (MZ) and dizygotic (DZ) twins. However, an ACE model is not exclusively identifiable with MZ and DZ twins. In fact, any two groups of kin pairs with different genetic-relatedness parameters (H) are mathematically sufficient to fit an ACE model (Hunter et al., Reference Hunter, Garrison, Burt and Rodgers2021).
Most simulation research has focused on classical twin designs, which set the genetic relatedness parameter (H) at 1.0 for MZ twins and 0.5 for DZ twins. However, alternate family designs employing different H parameters do exist. One such example is the same-sex (SS) and opposite-sex (OS) DZ twin pair design, often called the SS-OS design. This design becomes particularly useful when zygosity is unavailable as researchers can distinguish OS DZ twins from SS twins based on birth date and biological sex. The remaining SS twin pairs are a mixture of MZ and SS DZ twins. Historically, this design was a staple in earlier twin studies such as the Scottish Mental Surveys conducted in 1932 and 1947 (Deary et al., Reference Deary, Whiteman, Starr, Whalley and Fox2004), before the widespread use of genotyping. Despite technological advancements, this design remains relevant, particularly when genotyping is not feasible. For example, a series of studies by Figlio and colleagues (Figlio, Guryan et al., Reference Figlio, Guryan, Karbownik and Roth2014; Figlio, Freese et al., Reference Figlio, Freese, Karbownik and Roth2017) used the SS-OS twin design on administrative data to analyze all twins born in Florida from 1994 to 2004. The authors relied on these records to increase the representation of twins from disadvantaged backgrounds, thereby mitigating selection effects commonly found in twin studies (Hagenbeek et al., Reference Hagenbeek, Hirzinger, Breunig, Bruins, Kuznetsov, Schut, Odintsova and Boomsma2023; Holden et al., Reference Holden, Haughbrook and Hart2022). In doing so, they ensured a more representative sample, but they had to forgo determining zygosity — a design trade-off the authors argued was worthwhile (Figlio et al., Reference Figlio, Freese, Karbownik and Roth2017).
These design considerations are not unique to Figlio and colleagues (Figlio, Guryan et al., Reference Figlio, Guryan, Karbownik and Roth2014; Figlio, Freese et al., Reference Figlio, Freese, Karbownik and Roth2017), but reflect a widespread limitation, as most surveys lack zygosity data. Yet, many large-scale social surveysFootnote 1 collect family data without being specifically tailored for twin studies. These surveys usually focus on social, economic, educational, geographical, and political topics (e.g., China Family Panel Study, by Xie & Hu, Reference Xie and Hu2014; The National Longitudinal Survey of Youth, by Rodgers et al., Reference Rodgers, Beasley, Bard, Meredith, D Hunter, Johnson, Buster, Li, May, Garrison, Miller, van den Oord and Rowe2016), and employ household sampling methods for efficiency (Parsaeian et al., Reference Parsaeian, Mahdavi, Saadati, Mehdipour, Sheidaei, Khatibzadeh, Farzadfar and Shahraz2021; United Nations, 2008). By deploying the SS-OS design on these public datasets, we enable these datasets to yield not just individual-level information but also rich genetic insights from twin studies, adding depth to our analyses. Compared to twin registries and genomewide association studies (GWASs), these public datasets often cover a wider range of research topics and contain more diverse populations (Hagenbeek et al., Reference Hagenbeek, Hirzinger, Breunig, Bruins, Kuznetsov, Schut, Odintsova and Boomsma2023; Holden et al., Reference Holden, Haughbrook and Hart2022), allowing us to move beyond typical WEIRD (Western, Educated, Industrialized, Rich, and Democratic) samples (Henrich et al., Reference Henrich, Heine and Norenzayan2010; Popejoy & Fullerton, Reference Popejoy and Fullerton2016; see Holden et al., Reference Holden, Haughbrook and Hart2022; Milhollen et al., Reference Milhollen, Lyu and Garrison2022 for additional discussion of these samples for behavior geneticists). Compared to individual-level analysis, another notable advantage of the SS-OS design, such as the one used by Figlio and colleagues (Figlio, Guryan et al., Reference Figlio, Guryan, Karbownik and Roth2014; Figlio, Freese et al., Reference Figlio, Freese, Karbownik and Roth2017), is its capacity to meet the equal environments assumption, without exclusively relying on MZ versus DZ twins.Footnote 2
The genetic relatedness patterns for twins (HMZ = 1.0 and HDZ = .5) are a byproduct of their development (Beck et al., Reference Beck, Bruins, Mbarek, Davies, Boomsma, Khalil, Lewi and Lopriore2021). DZ twins, on average, share 50% of their segregating genes, a percentage that arises from the random segregation of each chromosome pair within the gametes. MZ twins, conversely, share 100% of their genes as they originate from the same zygote. Consequently, MZ twins always share the same biological sex, whereas DZ twins can be either the same or different sex. Therefore, the genetic similarity within a group of SS twin pairs constitutes a weighted average of the 50% and 100% shared by DZ and MZ twins respectively. This proportion (HSS) can be inferred from population twinning rates, as long as the following two assumptions are met: (1) the sample is representative of the corresponding population; and (2) the specific population’s twinning rate is known and well-established.
Numerous studies have established reliable population twinning rates, which vary across countries (Monden et al., Reference Monden, Pison and Smits2021; Pison et al., Reference Pison, Monden and Smits2015), ethnicities (Pollard, Reference Pollard1995), social classes (Gómez et al., Reference Gómez, Sosa, Corte and Otta2019; Walle et al., Reference Walle, Pison and Sala-Diakanda1992), and eras (Esposito et al., Reference Esposito, Dalmartello, Franchi, Mauri, Cipriani, Corrao and Parazzini2022; Gómez et al., Reference Gómez, Sosa, Corte and Otta2019), among many other factors (Beck et al, Reference Beck, Bruins, Mbarek, Davies, Boomsma, Khalil, Lewi and Lopriore2021; Nylander, Reference Nylander1981). When both sample characteristics and the population rates are available, a weighted HSS for a particular sample can be calculated to fit the ACE model. Without population rates, ‘local’ estimation methods, such as Weinberg’s (Reference Weinberg1901) differential rule, the mixture distribution model (Neale, Reference Neale2003), and latent class analysis (Heath et al., Reference Heath, Nyholt, Neuman, Madden, Bucholz, Todd, Nelson, Montgomery and Martin2003), can be used to derive HSS strictly from the sample attributes. Regardless of the approach taken to derive H for SS twins, the expected value of HSS for the group will fall in the 0.5 < HSS < 1.0 range. Consider, for example, a univariate ACE model applied to SS and OS twins. In this case, the expected variance structure, shown in equations 1 and 2, has off-diagonal values (representing the covariance between twin pairs) in SS twins’ covariance matrix for additive genetics (A) equal to HSS.
By fitting the two groups of kin pairs to the (co)variance structure displayed in equations 1 and 2, we can decompose the total variance into additive genetic variance (A), common environmental variance (C), and unique environmental variance (E). The resulting variance estimates will be identical to the ACE model from the classical twin design (Rijsdijk & Sham, Reference Rijsdijk and Sham2002). The respective covariance matrix of any two groups of kin pairs with different H (ΔH > 0) can be used to estimate all three variance components. For example, Rodgers et al. (Reference Rodgers, Garrison, O’Keefe, Bard, Hunter, Beasley and van den Oord2019) used cousins (H = .125) and half cousins (H = .0625) from the National Longitudinal Survey of Youth to fit a series of ACE models to estimate the heritability of height. Similarly, kinship links identified in the China Family Panel Study, including twins, full siblings and cousins, also present opportunities for the application of nonclassical ACE models (Lyu & Garrison, Reference Lyu and Garrison2022b).
Power in Designs of ΔH < .5
A power analysis on SS-OS design can enhance our understanding of their statistical properties and evaluate their feasibility. In classical twin designs, the relatedness difference (ΔH) between MZ and DZ twins is .5. However, in nonclassical models, like the SS-OS design, this difference is generally less than .5. Consequently, the implied covariance matrices for these nonclassical kin groups (as represented by the 2 × 2 matrices in equations 1 and 2) tend to be more similar than in classical twin designs. The implication is that we need a narrower estimated standard error to ensure that the two covariance matrices generated from empirical data can fit the implied structure of the univariate ACE model. Put simply, researchers may need larger sample sizes to achieve the same level of statistical power as a classical twin design.
In their commentary on Scarr-Salapatek (Reference Scarr-Salapatek1971), Eaves and Jinks (Reference Eaves and Jinks1972) investigated the power of standard proportion of additive genetic variance (a2; Computed by A/(A + C + E)) estimation under a weighted least-squares approach when using SS and OS twins. Specifically, they considered the case where a2 = .6 and ΔH = .2, and they found that the SS-OS design needed a sample size that was approximately three times larger to achieve comparable power as the classical twin design. However, the statistical power of a2 estimation is heavily associated with the variance combination and the relative proportion of MZ and DZ twins (Verhulst, Reference Verhulst2017). As a result, the ‘three times sample size’ rule of thumb may not be universally applicable, and arguably should not be treated as a rule without a more systematic exploration of power in such designs.
Mathematically, we adapted Visscher’s (Reference Visscher2004) paradigm by using least-squares (LS) estimation to evaluate the power of the univariate ACE model as a function of the genetic relatedness of the SS twins (HSS). Equation 3 illustrates the relation between power and sample size and R. A detailed mathematical derivation of equation 3 is provided in Appendix A.
In equation 3, a2 is the standardized proportion of the additive genetic variance of the measured trait, c2 is standardized proportion of the common environment variance of the trait, and n is the number of kin pairs in each kin group. Z1-α and Z1-β denote corresponding Z values in an N(0,1) for the assigned type-I error rate (α), and is power (1-β), respectively. Power is positively associated with sample size and genetic relatedness of the SS twins (.5 < HSS < 1), provided that the SS and OS twins have the same sample size n and a specified type-I error rate.
The power for detecting additive genetic variance varies as a function of the relatedness of SS twins. Illustrated in Figure 1, when a2 = .3/ .5/ .7, c2 = .2, e2 = .5/ .3/ .1, n = 500, α = .05, power increased as the HSS twin relatedness deviated from .5. As the A component increases its share in total variance, the same level of power could be maintained when using kin pairs with smaller ΔH. Although most recent research estimates variance components with the maximum likelihood (ML) approach, the results of the univariate ACE model fit with LS and ML are very similar (Visscher, Reference Visscher2004). This striking similarity allows for extrapolating the association found with LS to the general pattern of univariate ACE model fitting. However, ML estimation generally has greater power than LS estimation (Visscher, Reference Visscher2004), leading to differences in sample size requirements for satisfactory power. Relying solely on LS analytic results does not offer researchers sufficient accuracy to establish a priori power estimation for empirical analysis. Moreover, simulations facilitate establishing different levels of c2 to investigate their effects on the power of the a2 estimate. Thus, we will also examine the power of the a2 estimate in a more comprehensive set of simulations in addition to deriving power using least squares estimation.
The Challenge of Non-Positive Parameter Estimates
Besides the issue of power, using the nonclassical kin models may result in estimated variance components that are zero (Cholesky decomposition) or negative (correlated factors models; Carey, Reference Carey2005). Although it is possible that these nonpositive estimates reflect biological mechanisms that arise from genetics or the environment (Steinsaltz et al., Reference Steinsaltz, Dahl and Wachter2020), such biologically justified estimates are rare (Verhulst et al., Reference Verhulst, Prom-Wormley, Keller, Medland and Neale2019). More often, these parameters reflect something statistical — related to modeling or measurement. Negative estimates can occur due to model misspecification. For example, misspecification between ACE and ADE models that can appear as C and D cannot be estimated simultaneously or appear when using a too simplified model (e.g., ACE) to reveal a more complex structure (Ozaki et al., Reference Ozaki, Toyoda, Iwama, Kubo and Ando2011; Verhulst et al., Reference Verhulst, Prom-Wormley, Keller, Medland and Neale2019; see Hunter et al., Reference Hunter, Garrison, Burt and Rodgers2021 for a mathematical approach discussing models with more complex structures.).
Furthermore, negative estimates can simply arise from sampling error (Tabachnick et al., Reference Tabachnick, Fidell and Ullman2019). This can occur when a biased observed variance or covariance in either the MZ or DZ sample reduces one of three components to zero or a negative value. Although larger samples in modern twin studies can mitigate these sampling errors, concerns persist in designs using SS and OS twins or other nonclassical designs. For example, a study examining the heritability of height used 116 SS twins and 61 OS twins from the China Family Panel Study to fit an ACE model (Lyu & Garrison, Reference Lyu and Garrison2022b). This resulted in a slight negative E estimate. Similarly, a study using cousins (Rodgers et al., Reference Rodgers, Garrison, O’Keefe, Bard, Hunter, Beasley and van den Oord2019) encountered several instances of zero-value estimates. In such designs, using kin pairs with genetic relatedness differences of less than .5 (ΔH < .5) requires larger sample sizes to ensure that the observed covariance pattern adequately represents the population parameters. However, obtaining larger sample sizes may not always be feasible for researchers working with public datasets. At present, there is no established guidance for suitable sample sizes in an SS-OS design, leaving researchers without the ability to determine whether their specific combinations of kin groups and sample size is sufficient for their desired level of power.
Sex-Limitation Models
Another potential issue when using SS and OS twins instead of MZ and DZ twins is the effect of sex limitation (Neale & Cardon, Reference Neale and Cardon2013). In this context, ‘sex limitation’ refers to models that account for differences in genetic and environmental influences on a trait by biological sex. The difference may be scalar, indicating that all sexes are influenced by the same factor but to varying degrees, or nonscalar, indicating that specific factors influence only one of the sexes (Neale et al., Reference Neale, Røysamb and Jacobson2006). Traditional twin designs often exclude OS DZ twins to avoid potential confounds introduced by sex differences (Polderman et al., Reference Polderman, Benyamin, de Leeuw, Sullivan, van Bochoven, Visscher and Posthuma2015). However, the challenges associated with sex limitation effects become unavoidable when fitting ACE models with SS twins and OS twins. Beyond just the obvious methodological necessity, there are substantive implications. For example, past research has found that within an OS sibling pair, the male sibling often receives more parental resources than his female sibling, especially in nations with limited social resources (Blau et al., Reference Blau, Kahn, Brummund, Cook and Larson-Koester2020; Das Gupta et al., Reference Das Gupta, Zhenghua, Bohua, Zhenming, Chung and Hwa-Ok2003; Hesketh & Xing, Reference Hesketh and Xing2006). In the case of OS twins, one study found that family background effects were stronger for the male twin compared to the female twin, though the genetic effects were comparable for both sexes (Miller et al., Reference Miller, Mulvey and Martin1997). The assumption in classical twin design that the common environment (C) component is identical between twin pairs is not substantiated under these circumstances (Felson, Reference Felson2014; Kendler et al., Reference Kendler, Neale, Kessler, Heath and Eaves1993; Loehlin & Nichols, Reference Loehlin and Nichols1976; Richardson & Norgate, Reference Richardson and Norgate2005). One commonly suggested modeling solution is to set the common environment correlation at a value less than 1 (Neale et al., Reference Neale, Røysamb and Jacobson2006). This adjustment aims to partially account for the impact of sex differences by implementing a sex-limited scalar in a univariate ACE model (Neale et al. Reference Neale, Røysamb and Jacobson2006). Equation 4 illustrated the assumed variance structure for OS twins with sex limitations,
where values of off-diagonals in the common environment (C) covariance matrix are the presumed common environment correlation (r c). However, it is unknown how this approach will affect the power and performance of the ACE model fitting with SS twins and OS twins, or any other two groups of kin pairs where the H difference is less than .5.
Hence, the current study aims to better understand the complexities of utilizing SS twins and OS twins in genetically informed designs. Specifically, we developed a series of simulations to investigate (1) the power of heritability estimation, (2) ACE model performance in AIC-based model selection, and (3) the frequency of the negative estimates as a function of H and sample sizes under the maximum likelihood theory. In addition, we analyzed the impact of sex-limitation models within this framework.
Methods
We conducted a simulation with a 10 × 10 × 4 design (see Table 1) with 1000 replications per condition. Given that our primary objective is to illustrate the impact of HSS and sample size on the fitting of univariate ACE models, we have established 10 conditions for HSS ranging from .55 to 1.00 in increments of .05. These 10 progressive conditions encompass the potential range of the SS-OS design. Furthermore, we have set 10 conditions for sample sizes, ranging from 30 to 1950, to cover most scenarios in empirical studies using the SS-OS design (Polderman et al. Reference Polderman, Benyamin, de Leeuw, Sullivan, van Bochoven, Visscher and Posthuma2015). As a result, we will simulate 100 conditions varying in HSS and sample sizes, providing a robust guideline for practical applications of the SS-OS design.
Note:
* HSS is the expected genetic relatedness of same-sex twins.
** Sample sizes are the number of twin pairs in each group. If the sample size is 30, there will be 30 pairs of SS twins and 30 pairs of OS twins, totaling 120 individuals.
Previous research indicated that the power of the A estimation in a univariate ACE model highly depends on the relative scale between A and C (Verhulst, Reference Verhulst2017). In reality, different traits have a broad range of patterns for A and C (Polderman et al., Reference Polderman, Benyamin, de Leeuw, Sullivan, van Bochoven, Visscher and Posthuma2015). Hence, four conditions of A, C, and E variance patterns were set to cover different traits with different variance component structures. All four variance patterns have a total variance of 3. The proportion of A variance ranges from 16.7% to 80%, emulating traits subject to low, medium, and high additive genetic variance. Standardized proportions of each component in four conditions are also displayed in Table 1.
Data Generation
MZ and DZ data were simulated by generating random numbers under multivariate normal distribution by functions in ACEsimFit package 0.0.0.9 (Lyu & Garrison, Reference Lyu and Garrison2022a). Based on the HSS, a certain proportion of MZ and DZ twins were generated separately to form a group of SS twins, and then another group of DZ twins was generated as the group of OS twins. The simulated data were fitted with a univariate ACE model using the correlated factor approach and, for each condition, the simulation was repeated 1000 times. All simulations were performed in R version 4.1.3 (R Core Team, 2022). The univariate ACE models were fit using OpenMx 2.20.6 (Neale et al., Reference Neale, Hunter, Pritikin, Zahery, Brick, Kirkpatrick, Estabrook, Bates, Maes and Boker2016) with the NPSOL 5.0 optimization algorithm.
As for the investigation of modeling sex limitations, data were simulated under the same conditions, using a variance pattern of A = 1.5, C = .6, and E = .9. Notably, to emulate the sex-limited effect in the common environment, the correlation of C (rc) between OS twins was set at .95 instead of 1.00. However, for simplicity, we did not misspecify the common environment correlation.
The framework suggested by Satorra and Saris (Reference Satorra and Saris1985) formed the basis for deriving the power of heritability estimation. We calculated the mean noncentrality parameter (NCP), by comparing the values of log-likelihood ratio tests (-2 log likelihood) for the ACE and CE models, for the 1000 models in each condition. Next, we derived the power for each condition from a comparison between the null chi-square distribution and the alternative chi-square distribution with a given NCP. For a more detailed description of this approach, refer to Satorra and Saris (Reference Satorra and Saris1985), Verhulst (Reference Verhulst2017), and Visscher (Reference Visscher2004).
To evaluate how effectively the model correctly identified the assumed ACE variance structure, we employed the Akaike Information Criterion (AIC; Akaike Reference Akaike, Parzen, Tanabe and Kitagawa1998) to compare the relative performances of the ACE, AE, and CE models. We used the proportion of 1000 models where the ACE model has the lowest AIC among three models under each condition as an indicator of correct model selection.Footnote 3 Lower AIC values suggest one model’s superiority in explaining the data relative to other models. AIC has been a long-used approach to evaluating the relative performance among ACE, CE, and AE models in univariate twin designs and yields adequately accurate decisions for continuous traits (Sullivan & Eaves, Reference Sullivan and Eaves2002). Furthermore, we also calculated the proportion of the 1000 models where at least one of A, C, and E estimates has a negative value to evaluate the influence of sample sizes and H on model fitting.
At the recommendation of a reviewer, we also investigated A parameter bias in the absence of HSS misspecification. We computed average parameter A estimates across 1000 fitted models in every single condition under the variance combinations A = 2.4, C = .3, E = .3; A = 1.5, C = .6, E = .9 and A = 1.0, C = 1.0, E = 1.0.
Results
We ran a series of simulations to investigate the impact of HSS on model performance. We summarized the simulation results of the power of heritability estimation, ACE model performance in AIC Based Model Selection and the frequency of negative estimates of the 1000 fitted models in each condition. We presented the results in a series of matrices where the x-axis displays the 10 conditions of sample sizes and the y-axis is 10 conditions of HSS. Interpretation and insights from the results were discussed for the three criteria separately. Because many of the result patterns were similar, we primarily presented the results of A = 2.4, C = .3, E = .3 (80%, 10%, 10%) and A = 1.5, C = .6, E = .9 (50%, 20%, 30%). The results and corresponding figures for the other two combinations are available in Appendix B.
Power of Heritability Estimation
Generally, we found that the power of a univariate ACE model to detect A is positively associated with sample size and HSS. This finding was consistent with our mathematical derivation from the LS approach. As shown in both Figure 2-1 and Figure 2-2, the positive association between power and HSS suggests that a higher proportion of MZ twins in the SS twins will require a smaller sample size to reach a power of .8. For example, when the variance combination is A = 1.5, C = .6, E = .9 (Figure 2-1), a sample with HSS = .75 needs about 450 pairs of SS and OS twins to reach a power of .8, whereas a sample with HSS = .90 only needs 150 pairs to reach .8. As the covariance structures of SS and OS twins become more dissimilar, smaller samples will be sufficient to distinguish the covariance structures. Conversely, as they grow more similar, a larger sample is needed to have the same effect. Although the positive association is similar for two combinations of variance components, each condition for the combination of A = 2.4, C = .3, E = .3 (Figure 2-2) demonstrated higher power compared to the corresponding condition for the combination of A = 1.5, C = .6, E = .9 (Figure 2-1). For example, in a condition HSS = .75 and N = 300, the power for the variance combination of A = 2.4, C = .3, E = .3 is .734, which is lower than a power of .997 for the variance structure of A = 2.4, C = .3, E = .3 in the same condition. The proportions can be interpreted as we have power of .734 to detect a significant difference between the estimated value of A and 0 for the variance combination of A = 2.4, C = .3, E = .3. In other words, out of the 1000 models, 734 of them found the expected significant effect. Comparing all four variance combinations, we find that a greater share of A in total variance is associated with higher power in each condition. This finding is consistent with our mathematical derivation and Verhulst’s (Reference Verhulst2017) results that the power of the ACE model is higher when both the proportion of A and C in the total variance increase.
Model Performance: AIC-Based Model Selection
Regarding model performance in AIC-based model selection, we found that HSS and sample sizes generally but not exclusively have a positive association with the overall model performance. Model performance was operationalized by counting the percentage of the 1000 models where the ACE models have lower AIC values compared to AE and CE models. For example, with the variance combination of A = 1.5, C = .6, E = .9 (Figure 3), a sample with HSS = .75 needs about 1200 pairs of SS and OS twins to reach a power of .8, whereas an HSS = .90 sample only needs 450 pairs to reach .8. More specifically, in both variance combinations of A = 1.5, C = .6, E=.9 (Figure 3) and A = 2.4, C = .3, E = .3 (Supplementary Figure S2-1 in Appendix B), the worst conditions occur in the middle of the grid, where neither the sample size nor HSS are extremely small. Although the conditions on the upper left of the grid are better than the middle ones, the overall power at that range is far from acceptable. For all the variance combinations, acceptable overall power only exists when sample sizes and HSS were relatively large, which is consistent with our prediction.
We noticed that the association pattern between HSS and sample sizes and model performance fluctuated across the different variance component combinations. A relatively vague trend shows that as the C variance component dwindles, the model performance in each condition deteriorates. One intuitive explanation is that when the share of C decreases, more information (larger sample size) is needed to distinguish the covariance structure of the ACE model from the AE model. Consequently, the variance combination of A = .5, C = 2.0, E = .5 (Supplementary Figure S2-3 in Appendix B) had the best model performance results among the four combinations. An alternative possible explanation might be that an increase in E leads to a decline in model performance. Another interesting pattern emerges when the share of the C component increases: the ‘gorge’ in the fittings results moves towards the upper left, along with improved model performance.
Frequency of Negative Estimates
Our results indicate that negative estimates for A, C or E are less frequent with increasing HSS and sample sizes. For example, given a variance combination of A = 1.5, C = .6, E = .9 (Figure 4), a sample with HSS = .75 needs about 300 pairs of SS and another 300 pairs of OS twins to reduce the frequency of negative estimates to a 10% level. In contrast, a sample with HSS = .90 only needs 150 pairs. The 10% frequency indicates that out of the 1000 simulated models at least one negative parameter estimate occurs in 10% of the models. It appears that larger sample sizes and higher HSS values reduce the likelihood of negative estimates due to sampling error. Additionally, the negative estimates seem to be rather sensitive to different combinations of variance components. For example, there are distinctly fewer negative estimates for a variance combination of A = 1.5, C = .6, E = .9 (Figure 4) than for A = 2.4, C = .3, E = .3 (Figure S3-1). For example, under the conditions HSS = .75 and N = 300, the frequency of negative estimates is 10.1% for the variance combination of A = 1.5, C = .6, E = .9, as compared to 23.6% for the variance structure of A = 2.4, C = .3, E = .3. Given the smaller proportion of C and E components in the total variance for the latter combination, the chance of obtaining a negative variance estimate due to sampling error increases compared to A = 1.5, C = .6, E = .9.
Sex-Limited Effects (rc = .95; A = 1.5; C = .6; E = .9)
Addressing the potential for sex-limited effects, our results suggested that the general positive association between HSS and sample sizes and three criteria of model performance was broadly consistent with the standard model without sex-limited effects. The power to detect A diminished slightly when we set the correlation between OS twins to .95 (Figure 5), compared to the results with the same variance components (A = 1.5, C = .6, E = .9) without considering sex-limitation (Figure 2-1). A decrease in rc corresponded with a reduction in the proportion of C in the total variance. In turn, to achieve the same power level, larger sample sizes are required, which is consistent with Verhulst’s (Reference Verhulst2017) findings. Additionally, the models that factored in sex-limited effects (Figure 5) yielded more negative estimates than in the standard models (Supplementary Figure S4-3). A comparison of Figure 5 and Supplementary Figure S4-2 revealed an interesting pattern in the overall model fitting. When the sample sizes are relatively small, models incorporating sex-limited effects suggested a worse overall fit compared to models that did not. However, when sample sizes exceeded 450 pairs per group, the models with sex-limitation outperformed the standard models.
Parameter Bias
Following a reviewer’s suggestion, we investigated the bias of the ‘A’ parameter, computing average ‘A’ estimates across 1000 simulated models for each condition. Because the pattern of results was similar across conditions, we present one condition in Figure 6, with others in the Supplementary Appendix B. This figure depicts the ‘A’ parameter bias under the variance distribution A = 1.5, C = .6, E = .9 (50%, 20%, 30% respectively), showing that A estimates are not biased from 1.5 drastically in all conditions. A estimates are slightly biased (i.e., A estimates deviate from 1.5 upwards or downwards by 1% of the total variance, which equates to .03 in our study) when HSS is small or when sample sizes are restricted. More specifically, A is inclined to be underestimated when HSS is below .65 and the sample size falls short of 300, as illustrated in the upper-left part of Figure 6. In contrast, A tends to be overestimated when HSS exceeds .65 but the sample size is less than 210 (lower-left part of Figure 6), or when HSS is below .65, but the sample size exceeds 300 (upper-right part of Figure 6). As expected, the estimation bias for the A parameter gradually diminishes with higher HSS values and larger sample sizes. In general, a sample size above 300 and an HSS value greater than .65 can help to avoid the presence of unbiased estimates in the lower-right triangle of Figure 6.
Discussion
In the current study, we investigated how well univariate ACE models perform to correctly estimate the variance structure of A, C and E as a function of the expected relatedness of the SS twins (HSS) and sample size. We adopted Visscher’s (Reference Visscher2004) LS paradigm to mathematically derive the positive relationship among power, HSS, and sample sizes. We conducted simulations to further explore how heritability power, AIC-based model performance, and reduction of negative estimates are positively associated with larger HSS and larger sample sizes. In addition, we examined whether the simple solution of changing the common environment correlation to .95 for addressing sex-limited effects impacted model performance. We found that the solution causes slightly worse model performance under most circumstances.
Both the algebraic derivations and simulations illustrated a positive relationship between HSS and the power of correctly detecting the additive genetic effects (A) in an ACE model. A larger difference between the genetic correlations (ΔH) will require less information to distinguish the covariance structure of the SS twins from the covariance structure of the OS twins, as the only difference in the implied covariance structure between SS twins and OS twins is the correlation for additive genetics. The difference can also be understood as the significant difference between a model where additive genetics plays a role in affecting the phenotype from a model where additive genetics does not. We also found that traits subject to more additive genetic influence would have higher power under all conditions of HSS and sample sizes. Our results are consistent with previous findings (Verhulst, Reference Verhulst2017). Mathematically, an increase of standardized additive genetic component (approximately a decrease in the proportion of error variance) would lead to a greater difference between the intraclass correlations for OS and SS twins, which would eventually contribute to a higher power (see a more detailed math derivation in Visscher Reference Visscher2004).
We found a similar positive association between HSS and AIC-based model selection. Model performance, distinct from the power to detect the A parameter, evaluates whether the ACE model has the lowest AIC value. Typically, a high AIC-based model performance requires a proper fit of A, C and E components concurrently. This model performance offers a more conservative model selection criteria than a single estimate’s power. The variance components’ structure also influences the model performance. We observed that a higher proportion of C in the total variance suggested higher power across all conditions. Our results suggested that an adequate amount of C is vital for the model to correctly distinguish between ACE and AE models, because in our results the correct models (ACE) were more often misspecified as AE models than AC models. Nevertheless, further algebraic and simulation research is needed to identify factors impacting AIC comparison approaches.
For negative estimates, our study demonstrated that in general when the relatedness difference between two modeled groups (ΔH) is less than .5, negative estimated parameters are not unusual, even when samples were relatively large. Although the conventional wisdom is that the estimated error variance should always be non-negative, that reasoning is based on the idea that within-pair variance can never be eliminated. Our study highlighted that negative E estimates can occur simply due to sampling errors in some special circumstances. For example, suppose we fit an ACE model with a small number of kin pairs to the target trait predominantly affected by genes and shared environment. In that case, the E parameter will have a wide confidence interval. Therefore, it is not unusual that the model will estimate a negative E. Although we could force the estimate to be non-zero, that results in more problems. Indeed, ACE models with explicit or implicit constraints on estimates can cause deviations from the assumed type-I error rates and lead to biased estimates (Verhulst et al., Reference Verhulst, Prom-Wormley, Keller, Medland and Neale2019). Therefore, we do not recommend forcing the negative estimates to be greater than zero, especially in circumstances where they are not unusual. In our study, the negative estimates were entirely the result of sampling error, and occurred when variance components were relatively close to zero. Under those circumstances, estimates are more likely to be negative. As a corollary, in empirical studies, encountering negative estimates is not synonymous with a failed model. Rather, negative estimates can be an indicator of low power, small effect sizes, or general model misspecification. Therefore, we recommend checking other criteria given the specific conditions before continuing to analyze the results, adjusting model specifications, or discarding the data entirely.
We found that A estimates were slightly biased when the sample sizes were small or ΔH was low. Much like other analyses, greater ΔH and larger sample sizes contribute to reduced bias of A estimates, reaffirming the ideal situation for the SS-OS design: a sample size exceeding 300 pairs per group. Further, we found no systematic bias in this design, meaning that any biased conditions are likely the result of randomness in the simulation and model-fitting processes. An interesting future direction to explore is the sensitivity of this design to HSS misspecification. Given that HSS is usually an estimated value derived from population twinning rates or local estimating algorithms rather than a population parameter, we suspect that various degrees of HSS misspecification could substantially affect parameter bias.
Although our study focused primarily on the SS-OS design, these results are applicable to other research designs where the difference in relatedness (ΔH) is less than .5. Another scenario where H can diverge from .5 arises when researchers intend to fit covariance structure models, like the ACE model, with nontwin kin pairs. Such datasets can also support fitting an ACE model with MZ twins, siblings, or distant cousins. These configurations also result in an H difference not equal to .5. For example, past studies have employed full siblings and cousins to estimate heritability for specific phenotypic outcomes (Chakraborty et al., Reference Chakraborty, Schull, Harburg, Schork and Roeper1977; Rodgers et al., Reference Rodgers, Garrison, O’Keefe, Bard, Hunter, Beasley and van den Oord2019; Souto et al., Reference Souto, Almasy, Borrell, Garí, Martínez, Mateo, Stone, Blangero and Fontcuberta2000). The difference in H between cousins, siblings or twins is invariably less than .5, given that the relatedness coefficient for cousins does not exceed .125. These nontwin designs can serve as a valuable resource for researchers investigating the environmental and genetic influence on various traits.
Future researchers planning to use two groups of kin pairs with a ΔH less than .5 should at a minimum avoid scenarios with a ΔH less than .1 and sample sizes smaller than 60 pairs per group. Since we found the association between model performance and H and sample sizes varied a lot along with the variance component structure of the targeted trait, proposing a single guideline for all circumstances will be inappropriate. Indeed, numerous studies have warned against overreliance on rules of thumb in structural equation models (Chen et al., Reference Chen, Curran, Bollen, Kirby and Paxton2008; Heene et al., Reference Heene, Hilbert, Draxler, Ziegler and Bühner2011; Kyriazos, Reference Kyriazos2018; Montoya & Edwards, Reference Montoya and Edwards2021), including within behavior genetics (Garrison & Rodgers, Reference Garrison and Rodgers2021). Instead, using available parameters to calculate the power of the heritability estimation before using empirical data to fit the ACE model will be preferable. If one study does not have a specific focus on A or C but is designed to illustrate the multiple sources of effects, an overall model-fit indicator like AIC in our study would be a more appropriate reference. Nevertheless, as the criteria like AIC could only be evaluated using simulations, researchers could look up the supplementary tables mentioned in the Supplementary Appendix B to find an approximate power rate corresponding to the parameter setting in their own study. Alternatively, we encourage researchers to run their own simulations using the expected parameters and covariance structure. That simulation will lead to a tailored recommendation indicating what proportion of the nested comparisons suggest the ACE structure is the best-fit model. We developed the ACEsimFit package to assist such encouraged researchers. It contains several R functions and vignettes demonstrating how to simulate and fit the models (Lyu & Garrison, Reference Lyu and Garrison2022a).
Our results indicated less robust models when addressing sex-limited effects by slightly decreasing the assumed common environmental correlation between OS twins. However, sex-limited effects are far more complicated than a reduction of common environmental correlations. For example, in a study using SS and OS twins, OS twins may not have exactly the same family environment as the classical twin study assumed due to gender inequality (Blau et al., Reference Blau, Kahn, Brummund, Cook and Larson-Koester2020; Das Gupta et al., Reference Das Gupta, Zhenghua, Bohua, Zhenming, Chung and Hwa-Ok2003; Hesketh & Xing, Reference Hesketh and Xing2006). From a modeling perspective, both genetic and environmental differences between sexes can take different forms, such as scalar and nonscalar sex limitations (Neale et al., Reference Neale, Røysamb and Jacobson2006). From an empirical perspective, different traits may be susceptible to sex-limited effects. For example, height and BMI are traits that have substantial sex differences in their heritability (Hesketh & Xing, Reference Hesketh and Xing2006; Schousboe et al., Reference Schousboe, Willemsen, Kyvik, Mortensen, Boomsma, Cornes, Davis, Fagnani, Hjelmborg, Kaprio, Lange, Luciano, Martin, Pedersen, Pietiläinen, Rissanen, Saarni, Sørensen, van Baal and Harris2003; Silventoinen et al., Reference Silventoinen, Kaprio, Lahelma, Viken and Rose2001) but personality traits such as the Big 5 do not (South et al., Reference South, Jarnecke and Vize2018). Hence, before using the SS and OS twins to fit univariate ACE models, we recommend carefully considering the specific potential impacts of sex-limited effects. We also recommend addressing them by either modifying the assumed component structure or considering alternative models (e.g., a G × E model or a model assigning different covariance structures by biological sex; Neale et al., Reference Neale, Røysamb and Jacobson2006).
Our study assessed the feasibility and risks of using twin pairs with smaller genetic relatedness differences in univariate ACE models. However, like all simulations, we had to keep the simulation scope narrow. First, we only evaluated univariate ACE models. Some research questions can only be addressed with the multivariate models, such as examining covariance between multiple traits and estimating A, C, D and E simultaneously (Maes et al., Reference Maes, Neale, Kirkpatrick and Kendler2021). The increased complexity of multivariate models likely demands larger sample sizes or ΔH for comparable power, but further investigation is needed. Second, our derivations and simulations assume that HSS is not misspecified and that the observed phenotype is normally distributed, conditions that may not always be met in empirical settings. Approaches such as population twinning rates, Weinberg’s differential rule (Weinberg, Reference Weinberg1901), mixture distribution models (Neale, Reference Neale2003), and latent class analysis (Heath et al., Reference Heath, Nyholt, Neuman, Madden, Bucholz, Todd, Nelson, Montgomery and Martin2003) give an approximation, not a direct observation, of MZ twins’ proportion in SS twins, potentially biasing the HSS. Previous research has suggested that the misspecification of HSS and non-normal distributions could bias estimated parameters (Benyamin et al., Reference Benyamin, Deary and Visscher2006), indicating a potential avenue for future research. Therefore, future efforts should be made to investigate the impact of parameter misspecification and non-normal distributions on the associations between HSS and model performance. Third, although AIC has been widely used in behavior genetics to determine the ‘best model’ (Sullivan & Eaves, Reference Sullivan and Eaves2002), its accuracy as a selection approach remains under-examined. A lengthy appendix in Garrison and Rodgers (Reference Garrison and Rodgers2021) hints at potential issues with AIC as a selection criterion, and the worst-fitting ‘gorge’ seen across all AIC result matrices further point to potential shortcomings of this approach. Therefore, more comprehensive research should be done to investigate this model selection approach.
Conclusion
In the current study, we have identified several factors that impact the performance of the ACE model. To begin with, we found that the power to detect a significant additive genetic (A) component was positively associated with the difference in genetic relatedness of two kin groups (ΔH) and sample size. Similarly, we noted a positive association between the ACE model’s performance — evaluated using the Akaike Information Criterion (AIC) and the lower frequency of negative estimates of ACE variance components — and both the difference in genetic relatedness (ΔH) between two kin groups and the sample size.
We observed that while different combinations of A, C and E variance followed a similar overall pattern — in that, for instance, a higher A parameter would consistently exhibit higher power at larger sample sizes — the absolute performance varied considerably. We also found that factoring in sex differences by reducing the assumed correlation of the common environment to .95 resulted in a model performance slightly inferior to the raw ACE model. Researchers using kin groups with ΔH of less than .5 should carefully consider the performance implications for their specific ACE model. It is crucial to conduct a comprehensive power analysis before delving into the interpretation of model outcomes.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/thg.2023.40.
Availability of data and material
This research only involves computer-simulated data. The source code can be found at https://github.com/R-Computing-Lab/Code-Relatedness-ACE
Funding
The current study is supported by the National Institute on Aging (NIA), RF1-AG073189.
Competing interests
We declare no conflict of interest.
Ethics approval
Not applicable.