EpiPen: An R Package to Investigate Two-Locus Epistatic Models

Raymond K. Walters; Charles Laurin; Gitta H. Lubke

doi:10.1017/thg.2014.25

EpiPen: An R Package to Investigate Two-Locus Epistatic Models

Published online by Cambridge University Press: 28 April 2014

Raymond K. Walters ,

Charles Laurin and

Gitta H. Lubke

Show author details

Raymond K. Walters*: Affiliation:
Department of Psychology, University of Notre Dame, Notre Dame, IN, USA
Charles Laurin: Affiliation:
Department of Psychology, University of Notre Dame, Notre Dame, IN, USA
Gitta H. Lubke: Affiliation:
Department of Psychology, University of Notre Dame, Notre Dame, IN, USA Biological Psychology, VU University Amsterdam, Amsterdam, The Netherlands
*: address for correspondence: Raymond K. Walters, 118 Haggar Hall, Notre Dame, IN 46628. E-mail: [email protected]

Article contents

Abstract
Methods
Results and Discussion
Conclusion
Supplementary Material
References

Abstract

Epistasis is a growing area of research in genome-wide studies, but the differences between alternative definitions of epistasis remain a source of confusion for many researchers. One problem is that models for epistasis are presented in a number of formats, some of which have difficult-to-interpret parameters. In addition, the relation between the different models is rarely explained. Existing software for testing epistatic interactions between single-nucleotide polymorphisms (SNPs) does not provide the flexibility to compare the available model parameterizations. For that reason we have developed an R package for investigating epistatic and penetrance models, EpiPen, to aid users who wish to easily compare, interpret, and utilize models for two-locus epistatic interactions. EpiPen facilitates research on SNP-SNP interactions by allowing the R user to easily convert between common parametric forms for two-locus interactions, generate data for simulation studies, and perform power analyses for the selected model with a continuous or dichotomous phenotype. The usefulness of the package for model interpretation and power analysis is illustrated using data on rheumatoid arthritis.

Keywords

epistasis SNP-SNP interactions penetrance variance explained

Type: Retraction
Information: Twin Research and Human Genetics , Volume 17 , Issue 4 , August 2014 , e1

DOI: https://doi.org/10.1017/thg.2014.25 [Opens in a new window]
Copyright: Copyright © The Authors 2014

Genome-wide association studies (GWAS) have conventionally focused on identifying associations between individual single-nucleotide polymorphisms (SNPs) and the phenotype, but there is growing interest in modeling more complex effects such as interactions. Genome-wide studies of pairwise interactions between SNPs have shown promising results (e.g., Hu et al., Reference Hu, Liu, Zhang, Li, Wang, He and Shi2010; Lippert et al., Reference Lippert, Listgarten, Davidson, Baxter, Poon, Kadie and Heckerman2013; Wan et al., Reference Wan, Yang, Yang, Xue, Fan, Tang and Yu2010a, Reference Wan, Yang, Yang, Xue, Tang and Yu2010b). For instance, Hemani et al. (Reference Hemani, Shakhbazov, Westra, Esko, Henders, McRae and Powell2014) recently identified and replicated 30 pairwise interactions associated with gene expression levels. Identifying SNP-SNP interactions could help close the gap of ‘missing heritability’ from GWAS by reducing estimates of narrow-sense heritability inflated by ‘phantom heritability’ from interactions (Zuk et al., Reference Zuk, Hechter, Sunyaev and Lander2012). Episitatic effects could also contribute large components of broad-sense heritability that would not be detected by univariate tests of association (Culverhouse et al., Reference Culverhouse, Suarez, Lin and Reich2002). In either case, however, the abundance of competing models for these interactions can lead to confusion and slow research efforts.

One primary source of confusion is the distinction between the biological definition of epistasis as a masking effect (Bateson, Reference Bateson1909) and the statistical definition involving deviation from additivity for the effects of genetic factors on quantitative outcomes (Fisher, Reference Fisher1918). As a further complication, the statistical definition of epistasis is scale-dependent. Cordell (Reference Cordell2009) provides an excellent review of these historical issues.

As a result of the diversity of definitions of epistasis in the literature, there are a multitude of statistical models for epistasis and, more recently, interactions between SNPs. Efforts to estimate all possible patterns of epistasis have identified numerous interpretable models (Li & Reich, Reference Li and Reich2000; Niu et al., Reference Niu, Zhang and Sha2009). The different software packages intended to test SNP-SNP interactions mirror this diversity (e.g., Herold et al., Reference Herold, Steffens, Brockschmidt, Baur and Becker2009; Purcell et al., Reference Purcell, Neale, Todd-Brown, Thomas, Ferreira, Bender and Sham2007; Ueki & Cordell, Reference Ueki and Cordell2012).

Although the implemented epistatic models are useful, the variety of models can make it difficult for researchers to meaningfully compare the results from different studies of epistasis. In particular, comparison of effect sizes and accumulation of studies of epistasis in meta-analysis are complicated by the use of differing models.

Therefore, in order to facilitate comparisons between differing studies of epistasis we have developed EpiPen, an R package for Epistatic and Penetrance models, to easily convert between common models of pairwise epistasis for both dichotomous and continuous outcomes. Additional tools for power analysis and simulation studies using each available model are also included in the package.

Methods

Epistasis between two loci can be modeled using penetrance models, variance components, or the generalized linear model (GLM). Although each of these approaches models the same set of possible two-locus interactions, they rely on very different frameworks for describing the interaction. The models implemented in the EpiPen package are briefly introduced here.

Penetrance Models

Following the biological approach to epistasis, the first approach is to present the model as a table of conditional outcomes relative to the genotypes at two bi-allelic loci (Table 1). For dichotomous phenotypes, the outcome of interest is the penetrance. The penetrance is the probability of being affected (i.e., the probability of being in the ‘case’ group), conditional on genotype. Each entry in the table denotes the penetrance conditional on the corresponding genotypes; for example, f ₅ = P(Y = 1|AaBb). Alternatively these risk models may be stated using log odds ($\log \frac{{P( {Y = 1|AaBb} )}}{{P( {Y = 0|AaBb} )}} = u\left( {f_5 } \right)$; see Marchini et al., Reference Marchini, Donnelly and Cardon2005).

TABLE 1 Saturated Model of Conditional Phenotypes at Two Bi-allelic Loci

As this model has a separate parameter for each cell it is saturated and fits any observable pattern of penetrances. Certain patterns of penetrances are biologically or statistically meaningful, such as patterns showing dominance of one locus over another. Such useful patterns can be modeled by imposing constraints on the cells of the penetrance table (Hallgrimsdottir & Yuster, Reference Hallgrimsdottir and Yuster2008; Li & Reich, Reference Li and Reich2000; Neuman & Rice, Reference Neuman and Rice1992; Niu et al., Reference Niu, Zhang and Sha2009; Todorov et al., Reference Todorov, Borecki and Rao1997; Vieland & Huang, Reference Vieland and Huang2003).

However, while the pattern of penetrances may be biologically meaningful, the individual parameters do not correspond to effects that have an easy conceptual interpretation. In order to get more interpretable parameters, it is common to define effects within the GLM framework.

Generalized Linear Models

The GLM relates the expected value of the phenotype to a linear function of the genotypes. Most commonly, this takes the form of linear regression, for a continuous phenotype, or logistic regression, for dichotomous phenotypes. For example, in the general case, logistic regression defines the probability of being affected conditional on the genotype as

(1)

$$\begin{equation} P\left( {Y = 1{\rm |}X = x} \right) = \frac{{e^u }}{{1 + e^u }},\vspace*{-10pt} \end{equation}$$

(2)

$$\begin{equation} u = \alpha + \sum \limits_{i = 1}^p {\rm \beta }_i x_i ,\end{equation}$$

where x_i are suitably coded variables for the genotypes of the two loci, and where u is the linear predictor. Linear regression similarly uses Eq. (2) with the link E(Y|X = x) = u in place of Eq. (1).

Using this structure, x_i can be coded in a variety of ways to model the desired effects. The parameterization of x_i is the key feature that distinguishes between many models for two-locus interactions. For instance, for a single locus let x ₁ = ( − 1, 0, 1) and x ₂ = ( − 0.5, 0.5, −0.5) code the genotypes (AA,Aa,aa) in order to reflect the additive trend and dominance deviation, respectively, at that locus. Define z ₁and z ₂ similarly for the second locus. Then the linear model containing these variables and their cross products is

(3)

$$\begin{eqnarray} u &=& {\rm \alpha } + {\rm \beta }_1 x_1 + {\rm \beta }_2 x_2 + {\rm \beta }_3 z_1 + {\rm \beta }_4 z_2 + {\rm \beta }_5 x_1 z_1 \nonumber\\ && +\, {\rm \beta }_6 x_2 z_1 + {\rm \beta }_7 x_1 z_2 + {\rm \beta }_8 x_2 z_2 ,\end{eqnarray}$$

which is the F ₂ model described by Anderson and Kempthorne (Reference Anderson and Kempthorne1954). If x ₂and z ₂ are instead coded (0,1,0) for the three genotypes at each locus, then Eq. (3) corresponds to the F _∞ model (Hayman & Mather, Reference Hayman and Mather1955).

In either case, the model contains 9 degrees of freedom, providing a saturated model for the nine possible haplotypes for two bi-allelic loci. The resulting regression coefficients correspond to the additive effects of each locus (β₁and β₃, respectively), the dominance effects of each locus (β₂and β₄, respectively), and their interactions (β₅, β₆, β₇, and β₈).

Other noteworthy parameterizations of the GLM include the Natural and Orthogonal Interactions (NOIA) model (Alvarez-Castro & Carlborg, Reference Alvarez-Castro and Carlborg2007), the General Two-Allele model (Zeng et al., Reference Zeng, Wang and Zou2005), and the Unweighted model (Cheverud & Routman, Reference Cheverud and Routman1995). Each of these models, as well as the F ₂ and F _∞ models, are implemented in the EpiPen package, allowing simple conversion between these formulations. The reasons for this variety of parameterizations are highlighted by comparing the GLM framework to the variance components model.

Variance Components

The third common approach is to formulate epistasis in terms of a decomposition of the phenotypic variance. The total genetic effect of two loci V_gis partitioned into components for the additive (V_A), dominance (V_D), and interaction (V_I) effects (Falconer & Mackay, Reference Falconer and Mackay1996).

(4)

$$\begin{equation} V_g = {\rm }V_A + V_D + V_I .\end{equation}$$

Note that this is equivalent to a decomposition of the (broad sense) heritability since

(5)

$$\begin{equation} H^2 = \frac{{V_g }}{{V_g + V_\epsilon }},\end{equation}$$

where V _εis the remaining phenotypic variance due to environmental factors. The additive and dominance variance components can each be decomposed into the univariate effects of each locus (i.e., V_A = V _{A_a} + V _{A_b} and V_D = V _{D_a} + V _{D_b}). Optionally, the phenotypic variance due to the interaction can also be further partitioned into an additive-by-additive effect, an additive-by-dominant effect, and a dominant-by-dominant effect:

(6)

$$\begin{equation} V_I = {\rm }V_{AA} + V_{AD} + V_{DD} ,\end{equation}$$

with V_AD further divisible to V_AD + V_DA to indicate whether the first or second locus is dominant in the interaction.

This complete decomposition loosely corresponds to the GLM parameters described for Eq. (3), with V_A corresponding to β₁ and β₃, V_D corresponding to β₂ and β₄, and the components of V_I corresponding to the remaining four βs. The key distinction, however, is that the variance components are defined to be orthogonal whereas the GLM parameters are not necessarily independent. Due to the orthogonality of components, each component has a clear intuitive interpretation. For instance, V _{A_a} = V _{A_b} = V _{D_a} = V _{D_b} = 0 indicates an absence of marginal univariate effects for the two loci, but depending on the selected parameterization, β₁ = β₂ = β₃ = β₄ = 0 in the GLM framework (Eq. 3) may not necessarily imply a lack of univariate effects if the remaining coefficients are non-zero. Furthermore, the variance components provide an intuitively meaningful scale for the magnitude of the effects, placing the observed effect on the same scale as broader heritability estimates.

To obtain independent GLM parameters that correspond to the classical variance components, it is necessary to consider the genotype frequencies for the two loci. The NOIA statistical model weights its variables in Eq. (3) to ensure that the parameters are independent as long as the loci are uncorrelated (Alvarez-Castro & Carlborg, Reference Alvarez-Castro and Carlborg2007). This model allows direct conversion between GLM models and the variance components model. Other GLM parameterizations may similarly maintain orthogonal components under stronger restrictions. For instance, the G2A model corresponds to the variance component decomposition if the loci are uncorrelated and Hardy–Weinberg equilibrium holds for both loci (Zeng et al., Reference Zeng, Wang and Zou2005).

It should be noted, however, that a given set of variance components does not uniquely define a two-locus GLM model. The variance components do not indicate the sign of any of the component effects, nor do they indicate the population mean. Only the magnitude of the effects is determined, with the resulting GLM parameters conditional on the allele frequencies. In addition, treating the variance components as a two-locus model only models the genetic effects for two loci, so any variance explained by covariates or other genetic effects must be modeled separately (i.e., by defining the components of V_g as only the contribution of two loci, with any other genetic effects treated as independent and included in V _ε).

Package Utilities

The EpiPen package provides functions for converting between the above models, generating data under a selected model, and performing power analysis. The methodology for accomplishing these tasks is summarized here. The EpiPen package is available from the Comprehensive R Archive Network (CRAN; http://cran.us.r-project.org/index.html).

Model conversions

Because the full model for each of the parameterizations described above is a saturated model for the nine possible two-locus genotype combinations, it is possible to equate any pair of models and convert to a different set of model parameters. In most cases, we rely on the GLM framework, rewriting Eq. 2 in matrix form

(7)

$$\begin{equation} {\bf u} = {\bf D}\beta ,\end{equation}$$

where u is the vector (u_AABB, u_AaBB, u_aaBB, u_AABb, (u_AaBb, u_aaBb, u_AAbb, u_Aabb, u_aabb) and D is an appropriate design matrix. As before, the elements of u correspond to the conditional expected value of the phenotype such that g ^{− 1}(u_AABB) = E(y|AABB) for the selected link function g( · ). Then, given a model with design matrix D₁, known parameters β₁, and link function g ₁( · ), in order to compute the unknown model parameters for a model with design matrix D₂ and link function g ₂( · ), we solve

(8)

$$\begin{equation} g_2^{ - 1} \left( {{\bf D}_2 \beta _2 } \right) = g_1^{ - 1} \left( {{\bf D}_1 \beta _1 } \right),\vspace*{-10pt}\end{equation}$$

(9)

$$\begin{equation} \beta _2 = {\bf D}_2^{ - 1} g_2 \left[ {g_1^{ - 1} \left( {{\bf D}_1 \beta _1 } \right)} \right].\end{equation}$$

For GLM models, the parameters and design matrices are given by the selected model parameterization (see Supplementary Materials, Section S1). Penetrance models can be specified in this form by setting β = (f ₁, . . ., f ₉) with D as a identity matrix and an identity link function g(x) = x. Log odds models similarly define D and β along with a log odds link function.

For conversions involving variance components, closed-form solutions exist relating each variance component to a parameter in the NOIA statistical model. For instance,

(10)

$$\begin{equation} V_{A_a } = {\rm \alpha }_a^2 \left[ {p_{Aa} \left( {1 - p_{Aa} } \right) + 4p_{aa} \left( {1 - p_{Aa} } \right) - 4p_{aa}^2 } \right],\end{equation}$$

where p_aa and p_Aa are frequencies of the indicated genotype at the first locus. The full set of solutions for the NOIA statistical model are provided in Supplementary Materials (Section S2). Further conversions beyond the NOIA model can then be performed as described above. Utilizing the NOIA statistical model in this way ensures that the defined variance components will be orthogonal regardless of the minor allele frequency or violations of Hardy–Weinberg equilibrium at each locus. The only required assumption is that the two loci are uncorrelated (Alvarez-Castro & Carlborg, Reference Alvarez-Castro and Carlborg2007).

These conversions between models are useful to aid the interpretation of results from studying epistatic effects. Although the saturated model and the 4-df test of all interaction parameters are equivalent for all models, the individual parameters in each parameterization correspond to different effects under different assumptions. For example, the parameters for univariate effects estimate average marginal effects in the NOIA statistical model, average marginal effects under Hardy–Weinberg equilibrium in the G2A model, marginal effects of allele substitution in the NOIA functional model, and marginal effects of genotype substitution in the genotype model (for further discussion of interpretation for these models, see Alvarez-Castro & Carlborg, Reference Alvarez-Castro and Carlborg2007; Zeng et al., Reference Zeng, Wang and Zou2005). In addition, the effects are estimated conditional on the remaining effects in the model; therefore, if the parameters are not orthogonal the estimates may vary depending on which other effects are included. In sum, it is important to select a model that estimates the genetic effects that are the focus of the investigation.

Data generation

Data generation from a selected two-locus model proceeds by constructing the appropriate N × 9 design matrix D as described above, with rows corresponding to simulated or observed genotypes at two loci (see Supplementary Materials, Section S1). The design matrix and model parameters are then used to compute the conditional expected value of the phenotype for each individual as E(y|D) = g ^{− 1}(Dβ). For dichotomous phenotypes, the phenotype is then generated from a Bernoulli distribution with the defined probability. For continuous phenotypes, normally distributed random error is added to the expected value for each individual to obtain the desired phenotype.

Power analysis

The EpiPen package provides a power analysis tool for tests of parameters in a two-locus GLM model. Unsurprisingly, the power for a given hypothesis test in the two-locus model will depend on the effect to be tested based on the selected model. For linear regression with continuous phenotypes, power may be computed based on Cohen's f ² effect size (Cohen, Reference Cohen1988). Specifically,

(11)

$$\begin{equation} f^2 = \frac{{R_F^2 - R_R^2 }}{{1 - R_F^2 }},\end{equation}$$

where R ²_F and R ²_R are the multiple R ² for the full model (i.e., most often the saturated model) and the restricted model, respectively. The restricted model is defined by constraining one or more parameters β to zero. This effect size is then used to estimate the non-centrality parameter for the distribution of the F test corresponding to this hypothesis test, providing an estimate of power at a given sample size as described by Cohen (Reference Cohen1988).

For dichotomous outcomes, power may instead be computed based on the asymptotic power of the likelihood ratio test comparing the full and restricted models. Briefly, the saturated model and the genotype frequencies of the two loci are used to compute the expected value of the information matrix for logistic or probit regression. The EpiPen package then estimates the power for the test by computing the non-centrality parameter for the likelihood ratio test of the full and reduced models based on the expected information matrix (Cox & Hinkley, Reference Cox and Hinkley1974), using an implementation of this approach in the R package asypow (Halvorsen et al., Reference Halvorsen, Brown, Lovato and Russel2013).

Results and Discussion

Sample Analysis

To demonstrate the use of the EpiPen package, we consider data on two SNPs associated with rheumatoid arthritis. In analysis of the transcriptional regulatory network of NF-κB using multifactor dimensionality reduction (MDR; Hahn et al., Reference Hahn, Ritchie and Moore2003), Julia et al. (Reference Julia, Moore, Miquel, Alegre, Barcelo, Ritchie and Marsal2007) identified an interaction between rs1290754 and rs1800797 associated with risk for rheumatoid arthritis. The reported penetrance and genotype frequencies for these two loci are summarized in Table 2. The genotype frequencies suggest the two loci are uncorrelated (r = -0.06, p = .19).

TABLE 2 Interaction of rs1290754 and rs1800797 Associated with Rheumatoid Arthritis

Note: ^aConditional penetrances of rheumatoid arthritis reported by Julia et al. (Reference Julia, Moore, Miquel, Alegre, Barcelo, Ritchie and Marsal2007).

^bReported genotype frequencies. Total N = 439.

Because this interaction was identified using MDR, parametric effect sizes were not computed. To aid in understanding the magnitude of the effect for these two loci, the EpiPen package may be used to convert the reported penetrances to a variance components model. The resulting variance components indicate that the two loci jointly explain 7.3% of the phenotypic variance, with the largest effects attributable to an additive by dominance interaction explaining 3.5% of the phenotypic variance and followed by a dominance effect of rs1290754 explaining 1.8% of the variance.

The EpiPen package may also be used to construct a design matrix to test the significance of specific GLM model parameters for this data. Using the NOIA statistical model, the Wald test for the additive (rs1290754) by dominant (rs1800797) interaction is significant (p = .001). There is also strong evidence of dominance deviation for rs1290754, but the effect is not significant after correcting for multiple testing of eight coefficients (uncorrected p = .015).

Lastly, treating the observed penetrances as the population values, we can investigate the power to detect the additive by dominance interaction and the dominance effect of rs1290754. As shown by Figure 1, the sample of N = 439 provides sufficient power to detect the interaction explaining 3.5% of the phenotypic variance at the α = 0.05 level, but a larger sample would be required to reliably detect the dominance effect.

FIGURE 1 Post-hoc power analysis for the interaction between the additive effect of rs1290754 and the dominance deviation of rs1800797 estimated from data reported by Julia et al. (Reference Julia, Moore, Miquel, Alegre, Barcelo, Ritchie and Marsal2007). Note: Vertical reference line indicates observed sample size N = 439.

These power estimates are likely to be optimistic, however, compared to many genome-wide studies. Notably, the effect sizes observed by Julia et al. (Reference Julia, Moore, Miquel, Alegre, Barcelo, Ritchie and Marsal2007) are quite large; in comparison, the epistatic interactions among 238 SNPs identified by Hemani et al. (Reference Hemani, Shakhbazov, Westra, Esko, Henders, McRae and Powell2014) jointly explained only 0.22% of the phenotypic variance in the discovery sample. In addition, considering power at α = 0.05 does not account for the multiple testing burden in the likely scenario of testing pairwise interactions for large sets of genome-wide SNPs. For example, achieving power of 0.8 to detect an additive by additive interaction explaining 0.25% of the variance with α = 10 ^{− 8} requires over 17,000 individuals.

Existing Software

Existing software offers some support for studying epistatic effects, but does not offer the same flexibility of the EpiPen package for working with a wide range of model parameterizations. For instance, the epistatic model implemented in PLINK (Purcell et al., Reference Purcell, Neale, Todd-Brown, Thomas, Ferreira, Bender and Sham2007) by default only considers additive by additive interactions (V_AA) with limited parameterizations of the single locus effects. CASSI (Ueki & Cordell, Reference Ueki and Cordell2012) similarly limits testing of epistasis to the 1-df test of the additive by additive interaction. Since epistatic variance components may be evenly split among V_AA, V_AD, V_DA, and V_DD (Hemani et al., Reference Hemani, Shakhbazov, Westra, Esko, Henders, McRae and Powell2014), this testing approach may not identify a large proportion of existing epistatic effects. More options for considering these additional interaction components are available in INTERSNP (Herold et al., Reference Herold, Steffens, Brockschmidt, Baur and Becker2009), but GLM model parameterization in INTERSNP is restricted to the F ₂ model (Anderson & Kempthorne, Reference Anderson and Kempthorne1954). In comparison, the EpiPen package offers support for seven GLM model parameterizations, as well as penetrance and variance component models.

Additional software packages have pursued a model-free heuristic approach to identifying epistasis (e.g., GWIS – Goudey et al., Reference Goudey, Rawlinson, Wang, Shi, Ferra, Campbell and Kowalczyk2013; MDR – Hahn et al., Reference Hahn, Ritchie and Moore2003). Although these methods are useful, the lack of effect sizes for these models can make the interpretation of results and comparisons across studies challenging. In contrast, the EpiPen package supports computation of variance components from any included model to simplify interpretation and comparison.

As a result, the EpiPen package offers a useful supplement to the existing software in order to interpret the results of genome-wide interaction studies. For instance, a genome-wide scan using saturated two-locus models may be performed using INTERSNP, and the EpiPen package may then be used to compute variance components for the resulting F ₂model parameters. Alternatively, the EpiPen package could be used to perform the complete analysis, but computing all pairwise interactions genome-wide is computationally expensive and likely to benefit from optimization in stand-alone software. Therefore, use of the EpiPen package is recommended for interpreting results, as well as preparing for genome-wide interaction analyses through simulations and power analysis.

Conclusion

The EpiPen package provides a convenient utility for converting, comparing, and interpreting epistatic models with more flexibility than existing software. The included utilities are useful for performing simulation studies of two-locus interactions and planning future analyses of epistasis in genome-wide SNP data. This tool should facilitate future efforts to uncover the contribution of SNP-SNP interactions to the genetics of complex phenotypes.

Supplementary Material

To view supplementary material for this article, please visit http://dx.doi.org/10.1017/thg.2014.25.

Acknowledgments

The authors would like to thank Patrick Miller, Daniel McArtor, and two anonymous reviewers for their helpful comments on earlier versions of this manuscript. This work was supported by the National Institute on Drug Abuse (G.L., grant number R37DA018673).

References

Alvarez-Castro, J. M., & Carlborg, O. (2007). A unified model for functional and statistical epistasis and its application in quantitative trait loci analysis. Genetics, 176, 1151–1167.Google Scholar

Anderson, V. L., & Kempthorne, O. (1954). A model for the study of quantitative inheritance. Genetics, 39, 883–898.Google Scholar

Bateson, W. (1909). Mendel's principles of heredity. Cambridge: Cambridge University Press.Google Scholar

Cheverud, J. M., & Routman, E. J. (1995). Epistasis and its contribution to genetic variance components. Genetics, 139, 1455–1461.Google Scholar

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.Google Scholar

Cordell, H. J. (2009). Detecting gene-gene interactions that underlie human diseases. Nature Reviews Genetics, 10, 392–404.Google Scholar

Cox, D. R., & Hinkley, D. V. (1974). Theoretical statistics. London: Chapman and Hall.Google Scholar

Culverhouse, R., Suarez, B. K., Lin, J., & Reich, T. (2002). A perspective on epistasis: Limits of models displaying no main effect. American Journal of Human Genetics, 70, 461–471.Google Scholar

FalconerD.,S. D.,S., & Mackay, T. F. C. (1996). Introduction to quantitative genetics (4th ed.). Harlow: Pearson.Google Scholar

Fisher, R. A. (1918). The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh, 52, 399–433.Google Scholar

Goudey, B., Rawlinson, D., Wang, Q., Shi, F., Ferra, H., Campbell, R. M., . . . Kowalczyk, A. (2013). GWIS – model-free, fast and exhaustive search for epistatic interactions in case-control GWAS. BMC Genomics, 14 (Suppl. 3), S10.Google Scholar

Hahn, L. W., Ritchie, M. D., & Moore, J. H. (2003). Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics, 19, 376–382.Google Scholar

Hallgrimsdottir, I., & Yuster, D. S. (2008). A complete classification of epistatic two-locus models. BMC Genetics, 9, 17.Google Scholar

Halvorsen, K. B., Brown, B. W., Lovato, J., & Russel, R. K. (2013). asypow: Calculate power utilizing asymptotic likelihood ratio methods. R package version 2013.9-1.Google Scholar

Hayman, B. & Mather, K. (1955). The description of genic interactions in continuous variation. Biometrics, 11, 69–82.Google Scholar

Hemani, G., Shakhbazov, K., Westra, H.-J., Esko, T., Henders, A. K., McRae, A. F., . . . Powell, J. E. (2014). Detection and replication of epistasis influencing transcription in humans. Nature. Advance online publication. doi: 10.1038/ nature13005.Google Scholar

Herold, C., Steffens, M., Brockschmidt, F. F., Baur, M. P., & Becker, T. (2009). INTERSNP: Genome-wide interaction analysis guided by a priori information. Bioinformatics, 25, 3275–3281.Google Scholar

Hu, X., Liu, Q., Zhang, Z., Li, Z., Wang, S., He, L., & Shi, Y. (2010). SHEsisEpi, a GPU-enhanced genome-wide SNP-SNP interaction scanning algorithm, efficiently reveals the risk genetic epistasis in bipolar disorder. Cell Research, 20, 854–857.Google Scholar

Julia, A., Moore, J. H., Miquel, L., Alegre, C., Barcelo, P., Ritchie, M. D., & Marsal, S. (2007). Identification of a two-loci epistatic interaction associated with susceptibility to rheumatoid arthritis through reverse engineering and multifactor dimensionality reduction. Genomics, 90, 6–13.Google Scholar

Li, W., & Reich, J. (2000). A complete enumeration and classification of two-locus disease models. Human Heredity, 50, 334–349.Google Scholar

Lippert, C., Listgarten, J., Davidson, R. I., Baxter, J., Poon, H., Kadie, C. M., & Heckerman, D. (2013). An exhaustive epistatic SNP association analysis on expanded Wellcome Trust data. Scientific Reports, 3, 1099.Google Scholar

Marchini, J., Donnelly, P., & Cardon, L. R. (2005). Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genetics, 37, 413–417.Google Scholar

Neuman, R. J., & Rice, J. P. (1992). Two-locus models of disease. Genetic Epidemiology, 9, 347–365.Google Scholar

Niu, A., Zhang, Z., & Sha, Q. (2009). Application of seventeen two-locus models in genome-wide association studies by two-stage strategy. BMC Proceedings, 3 (Suppl. 7), S26.Google Scholar

Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M., Bender, D., . . . Sham, P. C. (2007). PLINK: A toolset for whole-genome association and population-based linkage analysis. American Journal of Human Genetics, 81, 559–575.Google Scholar

Todorov, A. A., Borecki, I. B., & Rao, D. C. (1997). Linkage analysis of complex traits using affected sibpairs: Effects of single-locus approximations on estimates of the required sample size. Genetic Epidemiology, 14, 389–401.Google Scholar

Ueki, M., & CordellH.,J. H.,J. (2012). Improved statistics for genome-wide interaction analysis. PLoS Genetics, 8, e1002625.Google Scholar

Vieland, V. J., & Huang, J. (2003). Two-locus heterogeneity cannot be distinguished from two-locus epistasis on the basis of affected-sib-pair data. American Journal of Human Genetics, 73, 223–232.Google Scholar

Wan, X., Yang, C., Yang, Q., Xue, H., Fan, X., Tang, N. L. S., & Yu, W. (2010a). BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. American Journal of Human Genetics, 87, 325–340.Google Scholar

Wan, X., Yang, C., Yang, Q., Xue, H., Tang, N. L. S., & Yu, W. (2010b). Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics, 26, 30–37.Google Scholar

Zeng, Z.-B., Wang, T., & Zou, W. (2005). Modeling quantitative trait loci and interpretation of models. Genetics, 169, 1711–1725.Google Scholar

Zuk, O., Hechter, E., Sunyaev, S. R., & Lander, E. S. (2012). The mystery of missing heritability: Genetic interactions create phantom heritability. Proceedings of the National Academy of Sciences, 109, 1193–1198.Google Scholar

TABLE 1 Saturated Model of Conditional Phenotypes at Two Bi-allelic Loci

TABLE 2 Interaction of rs1290754 and rs1800797 Associated with Rheumatoid Arthritis

FIGURE 1 Post-hoc power analysis for the interaction between the additive effect of rs1290754 and the dominance deviation of rs1800797 estimated from data reported by Julia et al. (2007). Note: Vertical reference line indicates observed sample size N = 439.

Walters Supplementary Material

Supplementary Material

PDF 144 KB

EpiPen: An R Package to Investigate Two-Locus Epistatic Models – RETRACTION

Raymond K. Walters , Charles Laurin and Gitta H. Lubke

Twin Research and Human Genetics , Volume 17 , Issue 4

Article contents

EpiPen: An R Package to Investigate Two-Locus Epistatic Models

Abstract

Keywords