1. Introduction
The X chromosome has been found to play an important role in many complex diseases (Ober et al., Reference Ober, Loisel and Gilad2008; Wise et al., Reference Wise, Gyi and Manolio2013). However, the development of methods for detecting associations with X-linked markers has lagged behind that for autosomal markers due to the complexity of the inheritance patterns of the X chromosome (Wise et al., Reference Wise, Gyi and Manolio2013; Schurz et al., Reference Schurz, Salie, Tromp, Hoal, Kinnear and Möller2019). One primary characteristic of the X chromosome in mammals is that females have two copies of the X chromosome while males only have one, which increases the difficulty of X-linked association studies (Clayton, Reference Clayton2009; Ziegler, Reference Ziegler2009; Loley et al., Reference Loley, Ziegler and König2011). In addition, the phenomenon of X-chromosome inactivation (XCI) in females may constitute a risk factor for diseases, which is defined as the expression silencing of one of the two copies of the X chromosome in females. Thus, the X-chromosome gene dosage in female XX cells equals that in male XY cells, namely dosage compensation (Chow et al., Reference Chow, Yen, Ziesche and Brown2005; Payer & Lee, Reference Payer and Lee2008; Pessia et al., Reference Pessia, Makino, Bailly-Bechet, McLysaght and Marais2012). As such, the genetic effect of homozygous females can be regarded as the same as that of hemizygous males under XCI. It has been reported that most of the genes on the X chromosome are subject to XCI, while only about 15% of X-linked genes escape from inactivation (XCI-E) (Carrel & Willard, Reference Carrel and Willard2005). Random X-chromosome inactivation (XCI-R) is the general process of XCI by which one of the two copies of the X chromosome in each cell is randomly inactivated. But the XCI patterns in some females may become skewed from that of the XCI-R in an age- and tissue-dependent manner, and the same allele can be inactivated in more than 75% of cells in some cases (Migeon, Reference Migeon1998; Minks et al., Reference Minks, Robinson and Brown2008; Starmer & Magnuson, Reference Starmer and Magnuson2009; Wang et al., Reference Wang, Yu and Shete2014), which is denoted by XCI-S for convenience.
At present, there are some association tests available for single-nucleotide polymorphisms (SNPs) on the X chromosome. Zheng et al. (Reference Zheng, Joo, Zhang and Geller2007) proposed six methods for testing associations on the X chromosome by combining the genetic effects in females and males. Among them, the allele-based tests Z A and Z mfA require the assumption of a Hardy–Weinberg equilibrium (HWE), while the genotype-based methods Z C and Z mfG are robust to departures from a HWE. Furthermore, note that all four methods mentioned above rely on the assumption that females and males have the same risk alleles. Thus, two other methods (Z˜mfA and Z˜mfG) were developed and are applicable to the situation in which females and males have different risk alleles. On the other hand, the six methods of Zheng et al. (Reference Zheng, Joo, Zhang and Geller2007) only consider the information on XCI-E and do not take account of XCI, which may lead to loss of power if XCI is present. Clayton (Reference Clayton2008) was the first to suggest that XCI should be considered in X-chromosome association studies. Clayton's methods (T A and T AD) are equivalent to the score tests of generalized linear models accounting for XCI-R and give the same codes for homozygous females and hemizygous males. When the allele frequencies of the same allele differ between the sexes, the test statistics $T_A^s$, $T_{AD}^s$ and S A, stratified by sex, have been proposed by Loley et al. (Reference Loley, Ziegler and König2011) and König et al. (Reference König, Loley, Erdmann and Ziegler2014). In addition, a software toolset XWAS (Gao et al., Reference Gao, Chang and Biddanda2015) includes four tests (FM 01, FM 02, FM F and FM S) based on logistic regressions. However, those approaches only consider XCI-R and ignore XCI-S. In order to simultaneously incorporate three biological patterns on the X chromosome (XCI-E, XCI-R and XCI-S), Wang et al. (Reference Wang, Yu and Shete2014) developed a maximum likelihood ratio method. However, this method is time-consuming because it is a permutation-based procedure for obtaining an empirical P-value. Meanwhile, Chen et al. (Reference Chen, Ng, Li, Liu and Huang2017) proposed a robust method (Xcat) based on a generalized genetic model with the approximate P-value being easily obtained. Recently, Wang et al. (Reference Wang, Xu, Wang, Fung and Zhou2019) proposed a robust test, Z max, by taking account of different dosage compensation patterns, which requires neither the assumption of a HWE nor the specification of underlying genetic models.
Imprinting is an epigenetic phenomenon that results in the differential expression of paternal and maternal alleles (Falls et al., Reference Falls, Pulford, Wylie and Jirtle1999). Researchers have found evidence for the existence of imprinting effects on some diseases, such as Angelman, Beckwith–Wiedemann and Prader–Willi syndromes (Falls et al., Reference Falls, Pulford, Wylie and Jirtle1999; Dong et al., Reference Dong, Li and Geller2005; Ziegler & König, Reference Ziegler and König2006; Wallace et al., Reference Wallace, Smyth, Maisuria-Armer, Walker, Todd and Clayton2010). On the other hand, it is likely that imprinted genes on the X chromosome are crucial to some diseases, such as Turner's syndrome (Donnelly et al., Reference Donnelly, Wolpert and Menold2002; Loesch et al., Reference Loesch, Quang Minh and Wendy2005). For some sex-specific diseases, such as autism, alleles on the paternal chromosome seem to be preferentially expressed, which is likely to explain why females are always less susceptible than males (Skuse, Reference Skuse2000). Imprinting is generally detected through testing for parent-of-origin effects (Hager et al., Reference Hager, Cheverud and Wolf2008). Thus, we use the term ‘parent-of-origin effects’ instead of ‘imprinting effects’ in the following sections. However, there is no method available for taking parent-of-origin effects into account when conducting association tests on the X chromosome.
Therefore, in this paper, we propose a robust method, Z XCII, which is an extension of Xcat to the generalized linear model simultaneously accounting for imprinting and three biological patterns (XCI-E, XCI-R and XCI-S) into X-chromosome association tests without the need to specify the genetic models on the X chromosome. We investigate the performance of the proposed method and compare it with several existing tests through extensive simulation studies. Simulation results show that the proposed method controls the size well under all of the scenarios considered when there is no association. Moreover, with regards to power, Z XCII is robust in all of the situations considered and generally outperforms most of the existing methods in the presence of imprinting effects, especially under complete imprinting effects.
2. Materials and methods
For a candidate SNP on the X chromosome with the mutant allele A and the normal allele a, there are four ordered genotypes for female offspring: a/a, a/A, A/a and A/A, where the left (right) allele of the slash is paternal (maternal). To distinguish the parent of origin of the mutant allele A in heterozygous female offspring, the information on their parental genotypes is required. With regards to male offspring, there are only two kinds of genotypes, a and A, which are maternal. Thus, we do not need to collect their parental genotypes. Assume that G f1 and G f2 are the numbers of allele A on the paternal and maternal X chromosomes in female offspring, respectively, and G m is the number of allele A on the X chromosome in male offspring. The values of G f1, G f2 and G m for different genotypes in the offspring generation are shown in Table 1. The disease status of an individual (female or male) in the offspring generation is denoted by Y with 1 (0) representing being affected (unaffected). In this paper, an affected daughter together with her parents is called a case–parent trio and an unaffected daughter together with her parents is considered as a control–parent trio (Deng & Chen, Reference Deng and Chen2001; Li et al., Reference Li, Li and He2016). Table 2 gives the genotype counts for the female offspring, where n f is the total number of daughter–parent trios consisting of r f case–parent trios and s f control–parent trios. The genotype counts for the male offspring are also listed in Table 2, where n m is the total number of males including r m cases and s m controls. As such, there are n r = r f + r m cases and n s = s f + s m controls in total. Therefore, the sample size is N = n r + n s = n f + n m. Let ϕ f0, ϕ f01, ϕ f10 and ϕ f2 be the penetrances of genotypes a/a, a/A, A/a and A/A in female offspring, respectively, and let ϕ m0 and ϕ m1 be the penetrances of genotypes a and A in male offspring, respectively. To test the association between the disease status Y and the SNP under study, we make the following two assumptions, just like Xcat (Chen et al., Reference Chen, Ng, Li, Liu and Huang2017): (1) in the presence of association between the disease and the SNP, the generalized genetic model is assumed to hold in female offspring with ordered penetrances, either increasing (ϕ f0 ⩽ ϕ f01, ϕ f10 ⩽ ϕ f2) or decreasing (ϕ f0 ⩾ ϕ f01, ϕ f10 ⩾ ϕ f2); and (2) the mutant allele in female offspring is the same as that in male offspring.
A logistic regression model is proposed to describe the association between the disease and the SNP in female offspring:
where β f0 is the intercept, β f1, β f2 and β f3 are the respective regression coefficients for G f1, G f2 and the interaction term G f1G f2, Xf is a vector of covariates and bf is a vector of the regression coefficients for Xf. The estimates of these coefficients can be obtained with the iteratively reweighted least squares method (Wood, Reference Wood2006) using the glm function in R language (http://www.r-project.org). The null hypothesis of no association between the disease and the SNP in female offspring is H f0∶β f1 = β f2 = β f3 = 0. If at least one of these equations is not satisfied, then the association exists, which indicates the alternative hypothesis (H f1). Logit(Pr(Y = 1|G f1, G f2, Xf)) outcomes for different genotypes in female offspring are presented in the fourth column of Table 1. Thus, under H f1, the parent-of-origin effects at the SNP locus can be expressed by:
when Xf is fixed at the same level. For example, β f1 = β f2 represents no parent-of-origin effects, while β f2 = β f3 = 0 denotes complete maternal parent-of-origin effect and β f1 = β f3 = 0 indicates complete paternal parent-of-origin effect. Moreover, we can use
to measure the degree of inactivation under XCI in a similar way to Wang et al. (Reference Wang, Zhang and Wang2019). On the other hand, the difference between β f3 and 0 can be interpreted as the deviation of the genetic model from the additive one under XCI-E. To be specific, Table 3 gives the explanations of the regression coefficients for several situations of XCI and XCI-E under no parent-of-origin effects (β f1 = β f2 = β). β f1 = β f2 = −β f3 means XCI-S with γ = 2 representing 100% of the cells having the mutant allele active or a dominant model under XCI-E. β f1 = β f2 = β and $\beta _{f3} ={-}{2 \over 3}\beta$ stand for XCI-S with γ = 1.5, where 75% of the cells have the mutant allele active. β f1 = β f2 ≠ 0 and β f3 = 0 correspond to XCI-R with γ = 1 or an additive model under XCI-E. β f1 = β f2 = β and β f3 = 2β imply XCI-S with γ = 0.5, where 25% of the cells have the mutant allele active. β f1 = β f2 = 0 and β f3 ≠ 0 indicate XCI-S with γ = 0 representing that 100% of the cells have the normal allele active or a recessive model under XCI-E. However, in the presence of parent-of-origin effects, the explanation of the regression coefficients is more complicated, since parent-of-origin effects may contribute to the XCI. For example, β f1 = 0.5 and β f2 = β f3 = 0 are indicative of the complete maternal parent-of-origin effect, whereas γ is obtained to be 1 (suggesting XCI-R) in this case. Therefore, XCI-R may be also caused by the complete maternal parent-of-origin effect.
XCI = X-chromosome inactivation.
Recall that when the disease is associated with the SNP, the generalized genetic model with ordered penetrances is assumed to hold in female offspring. As such, we have
and
which are equivalent to 0 ⩽ β f1 ⩽ β f1 + β f2 + β f3 and 0 ⩽ β f2 ⩽ β f1 + β f2 + β f3, respectively, with at least one inequality being strict. Adding these two inequalities together, we get 0 ⩽ β f1 + β f2 ⩽ 2(β f1 + β f2 + β f3) and thus β f1 + β f2 + 2β f3 ⩾ 0. Therefore, the alternative hypothesis becomes H f1∶β f1 ⩾ 0, β f2 ⩾ 0, β f1 + β f2 + 2β f3 ⩾ 0, with at least one inequality being strict, which can be expressed in matrix form as follows:
where ${\bf C} = \left({\matrix{ 1 & 0 & 0 \cr 0 & 1 & 0 \cr 1 & 1 & 2 \cr } } \right)\comma \;$${\bi \beta }_f = \left({\matrix{ {\beta_{f1}} \cr {\beta_{f2}} \cr {\beta_{f3}} \cr } } \right)\comma \;$ and 0 is a vector with all of the elements being 0. To test for the association, we first consider the following test statistics:
where $\hat{\beta }_{f}=\left(\hat{\beta }_{f1}\comma\ \hat{\beta }_{f2}\comma\ \hat{\beta }_{f3}\right)^T$ with $\hat{\beta }_{f1}$, $\hat{\beta }_{f2}$ and $\hat{\beta }_{f3}$ being the maximum likelihood estimates of β f1, β f2 and β f3, respectively. $\hat{\bi{I}}$ is the empirical Fisher's information matrix (Wood, Reference Wood2006).
Under the null hypothesis of no association, Z 1, Z 2 and Z 3 are independent of one another and asymptotically have standard normal distributions. Note that ${\bf C}{\bi \beta }_f \ge {\bi 0}$ leads to Z ⩾ 0 under H f1, and we thus only calculate the right-sided P-values for Z 1, Z 2 and Z 3, respectively. Then, we combine them using the Fisher's method (Fisher, Reference Fisher1954). Thus, the test statistic for female offspring can be constructed as:
where Φ(⋅) is the cumulative distribution function of the standard normal distribution. Under the null hypothesis, $Q_f^R$ has an asymptotic χ 2 distribution with degrees of freedom (df) being 6. As such, the P-value of $Q_f^R$ is $P_f^R = 1-\chi _6^2 \lpar {Q_f^R } \rpar$, where $\chi _6^2 \lpar{\cdot} \rpar$ is the cumulative distribution function of the χ 2 distribution with df being 6.
For male offspring, we model the relationship between the disease and the SNP using a logistic regression as:
where β m0 is the intercept, β m is the regression coefficient for Gm, Xm is a vector of covariates and bm is a vector of the regression coefficients for Xm. When there is no association between the disease and the SNP, the null hypothesis for male offspring is H m0:β m = 0. Then, the test statistic for male offspring is
where $\hat{\beta }_m$ is the maximum likelihood estimate of β m and $S_{{\hat{\beta }}_m}$ is the standard error of $\hat{\beta }_m$. Z m follows a standard normal distribution under H m0. When there are no covariates, Eq. (8) can be simplified to
as in Zheng et al. (Reference Zheng, Joo, Zhang and Geller2007) and Chen et al. (Reference Chen, Ng, Li, Liu and Huang2017).
For combining the test statistics of female and male offspring, we need to turn the P-value for female offspring ($P_f^R$) into a Z-score, which is $Z_f^R ={-}\Phi ^{{-}1}\lpar {P_f^R } \rpar$. Then, under the assumption that the mutant allele in female offspring is the same as that in male offspring, the combined test statistics Z R can be constructed as follows:
where $Z_f^R$ and Z m are weighted by their respective proportions of the sample size. Under the overall null hypothesis that there is no association between the disease and the SNP in both female and male offspring ($H_0\,\colon\, {\bf C}{\bi \beta }_f = {\bi 0}$ and β m = 0), Z R is asymptotically distributed as N(0, 1). Since the mutant allele is assumed to be A, with the overall one-sided alternative hypothesis $H_1\,\colon\, {\bf C}{\bi \beta }_f \ge {\bi 0}$ (with at least one inequality being strict) or β m > 0, we only need to calculate the right-sided P-value of Z R when the mutant allele is known in advance.
So far, we have only considered the situation when the mutant allele is A. When the mutant allele is a, the overall alternative hypothesis turns to be $H_1\,\colon\, {\bf C}{\bi \beta }_f \le {\bi 0}$ (with at least one inequality being strict) or β m < 0. Therefore, the corresponding test statistic for female offspring is $Q_f^L ={-}2\ln \lpar {\Phi \lpar {Z_1} \rpar \Phi \lpar {Z_2} \rpar \Phi \lpar {Z_3} \rpar } \rpar$, which combines the left-sided P-values of Z 1, Z 2 and Z 3, and the P-value of $Q_f^L$ is $P_f^L = 1-\chi _6^2 \lpar {Q_f^L } \rpar$. Again, we combine the transformed Z-score ($Z_f^L ={-}\Phi ^{{-}1}\lpar {P_f^L } \rpar$) for female offspring and Z m for male offspring to obtain the overall test statistic as:
Z L is asymptotically distributed as N(0, 1) under the overall null hypothesis. With this H 1, just like Z R, only the right-sided P-value of Z L is needed when the mutant allele is known to be a in advance.
However, we generally have no information on the mutant allele before conducting the association studies. In this case, we propose the test statistic as:
Although Z L and Z R are obviously dependent on each other, note that the components of Zt = (Z 1, Z 2, Z 3, Z m)T are independent of each other, and the functions −Z L and Z R of Zt are non-decreasing functions. Thus, the P-value of Z XCII can be approximately bounded by
where ξ = 1 − Φ(z) according to Owen (Reference Owen2009) and Esary et al. (Reference Esary, Proschan and Walkup1967). Therefore, we can simply get the approximated P-value of Z XCII by 2ξ.
3. Simulation study
3.1. Settings
We conduct a simulation study to investigate the size and power of the proposed Z XCII method and compare it with the existing ones. Notice that in Zheng et al. (Reference Zheng, Joo, Zhang and Geller2007), and $\tilde{Z}_{mfA} \hbox{ and }\tilde{Z}_{mfG}$ are less powerful than the other four test statistics (Z A, Z C, Z mfA and Z mfG) under the assumption that the mutant allele in females is the same as that in males. Thus, in this simulation study, Z˜mfA and Z˜mfG are excluded. $T_A^s$ and FM S are also excluded because they are asymptotically equivalent to Z C (Loley et al., Reference Loley, Ziegler and König2011) and Z mfG (Zheng et al., Reference Zheng, Joo, Zhang and Geller2007; Gao et al., Reference Gao, Chang and Biddanda2015; Wang et al., Reference Wang, Xu, Wang, Fung and Zhou2019), respectively. On the other hand, the permutation-based method in Wang et al. (Reference Wang, Yu and Shete2014) is excluded due to the intensive computations involved. Finally, we choose 14 methods (Z XCII, Z max, Xcat, S A, FM 02, Z C, Z mfG, T A, T AD, $T_{AD}^s$, FM 01, FM F, Z mfA and Z A) for the comparison. The references for the selected methods are listed in Table S1.
Note that most of the methods we compare do not consider the covariates, such as Xcat, S A, Z C, Z mfG, T A, T AD, Z mfA and Z A. Thus, we do not include any covariate for simplicity in this simulation study and directly generate the genotype counts in Table 2. Let $p_{_F }$ and $p_{_M }$ denote the frequencies of the mutant allele A for females and males in the parental generation, respectively. Under random mating, the genotype frequencies of a/a, a/A, A/a and A/A for female offspring are $g_{f0} = \lpar {1-p_{_M }} \rpar \lpar {1-p_{_F }} \rpar$, $g_{f01} = \lpar {1-p_{_M }} \rpar p_{_F }$, $g_{f10} = p_{_M }\lpar {1-p_{_F }} \rpar$ and $g_{f2} = p_{_M }p_{_F }$, respectively, and the genotype frequencies of a and A for male offspring are $g_{m0} = 1-p_{_F }$ and $g_{m1} = p_{_F }$, respectively. Note that if random mating holds in the parental generation, HWE holds in the offspring generation only under the assumption that the frequency of the same allele in females and that of males are equal (Puig et al., Reference Puig, Ginebra and Graffelman2017). On the other hand, we consider the situation where $p_{_F } = p_{_M } = p$ but HWE does not hold in the female offspring. The corresponding frequencies of the four genotypes are g f0 = (1 − p)2 + ρp(1 − p), g f01 = (1 − ρ)p(1 − p), g f10 = (1 − ρ)p(1 − p) and g f2 = p 2 + ρp(1 − p), respectively, when the inbreeding coefficient ρ ≠ 0. Furthermore, the genotype frequencies for male offspring are g m0 = 1 − p and g m1 = p, respectively.
Note that the relationships among the penetrances and the regression coefficients are ${{\phi _{f01}\lpar {1-\phi_{f0}} \rpar } \over {\lpar {1-\phi_{f01}} \rpar \phi _{f0}}} = e^{\beta _{f2}}$, ${{\phi _{f10}\lpar {1-\phi_{f0}} \rpar } \over {\lpar {1-\phi_{f10}} \rpar \phi _{f0}}} = e^{\beta _{f1}}$ and ${{\phi _{f2}\lpar {1-\phi_{f0}} \rpar } \over {\lpar {1-\phi_{f2}} \rpar \phi _{f0}}} = e^{\beta _{f1} + \beta _{f2} + \beta _{f3}}$ for a/A, A/a and A/A, respectively, for female offspring and ${{\phi _{m1}\lpar {1-\phi_{m0}} \rpar } \over {\lpar {1-\phi_{m1}} \rpar \phi _{m0}}} = e^{\beta _m}$ for male offspring. Thus, genotype counts for female offspring in Table 2 can be generated according to a quadrinomial distribution with probabilities $\lpar {{g_{f0}\phi _{f0}} \over {\phi _f}}$, ${{g_{f01}\phi _{f01}} \over {\phi _f}}$, ${{g_{f10}\phi _{f10}} \over {\phi _f}}$, ${{g_{f2}\phi _{f2}} \over {\phi _f}}\rpar$ for cases and $\lpar {{g_{f0}\lpar {1-\phi_{f0}} \rpar } \over {1-\phi _f}}$, ${{g_{f01}\lpar {1-\phi_{f01}} \rpar } \over {1-\phi _f}}$, ${{g_{f10}\lpar {1-\phi_{f10}} \rpar } \over {1-\phi _f}}$, ${{g_{f2}\lpar {1-\phi_{f2}} \rpar } \over {1-\phi _f}}\rpar$ for controls, where ϕ f = g f0ϕ f0 + g f01ϕ f01 + g f10ϕ f10 + g f2ϕ f2 is the disease prevalence of females. Similarly, we can obtain genotype counts for male offspring through a binomial distribution with probabilities $\left({{{g_{m0}\phi_{m0}} \over {\phi_m}}\comma \;{{g_{m1}\phi_{m1}} \over {\phi_m}}} \right)$ for cases and $\left({{{g_{m0}\lpar {1-\phi_{m0}} \rpar } \over {1-\phi_m}}\comma \;{{g_{m1}\lpar {1-\phi_{m1}} \rpar } \over {1-\phi_m}}} \right)$ for controls, where ϕ m = g m0ϕ m0 + g m1ϕ m1 is the disease prevalence of males.
We consider various simulation settings. $\lpar {p_{_F }\comma \;p_{_M }} \rpar$ is taken to be (0.15, 0.25), (0.20, 0.20), (0.25, 0.15), (0.25, 0.35), (0.30, 0.30) and (0.35, 0.25). Then, under random mating, the corresponding allele frequencies for females and males in the offspring generation are (0.20, 0.15), (0.20, 0.20), (0.20, 0.25), (0.30, 0.25), (0.30, 0.30) and (0.30, 0.35), respectively. When $p_{_F } = p_{_M } = p = 0.2$ and 0.3, we set ρ = −0.05 and ρ = 0.05 for simulating the departure from HWE. ϕ f0 and ϕ m0 are set to be 0.120. For simulating the size, let all of the other penetrances be 0.120. When XCI exists, we suppose ϕ f2 = ϕ m1 = 0.240. The values of γ under XCI with different values of ϕ f01 and ϕ f10 are shown in Table S2. To investigate the power, we first consider the situations where there are both XCI and parent-of-origin effects: (1) (ϕ f01, ϕ f10) = (0.120, 0.240) (XCI with γ = 1 and complete maternal parent-of-origin effect); (2) (ϕ f01, ϕ f10) = (0.192, 0.216) (XCI with γ = 1.499 and incomplete maternal parent-of-origin effect); (3) (ϕ f01, ϕ f10) = (0.144, 0.204) (XCI with γ = 1.001 and incomplete maternal parent-of-origin effect); (4) (ϕ f01, ϕ f10) = (0.132, 0.156) (XCI with γ = 0.492 and incomplete maternal parent-of-origin effect); (5) (ϕ f01, ϕ f10) = (0.240, 0.120) (XCI with γ = 1 and complete paternal parent-of-origin effect); (6) (ϕ f01, ϕ f10) = (0.216, 0.192) (XCI with γ = 1.499 and incomplete paternal parent-of-origin effect); (7) (ϕ f01, ϕ f10) = (0.204, 0.144) (XCI with γ = 1.001 and incomplete paternal parent-of-origin effect); and (8) (ϕ f01, ϕ f10) = (0.156, 0.132) (XCI with γ = 0.492 and incomplete paternal parent-of-origin effect). Next, we take account of the scenarios where XCI exists but there are no parent-of-origin effects with ϕ f01 = ϕ f10 = ϕ: (1) ϕ = 0.240 (XCI with γ = 2); (2) ϕ = 0.204 (XCI with γ = 1.503); (3) ϕ = 0.168 (XCI with γ = 0.935); (4) ϕ = 0.144 (XCI with γ = 0.500); and (5) ϕ = 0.120 (XCI with γ = 0). Furthermore, we consider the situation where there is neither XCI nor parent-of-origin effects, which is (ϕ f01, ϕ f10, ϕ f2, ϕ m1) = (0.180, 0.180, 0.240, 0.180). The sample size N for each replication is selected to be 1000, including n r = 500 cases and n s = 500 controls. To investigate the effect of sex ratio, we fix the sex ratio in the control group as s f : s m = 1:1, while it varies in the case group as r f : r m = 3:2, 1:1 and 2:3. We use the significance level α = 10−5, and the number of replications is fixed to be 106 and 104 for estimating the size and power, respectively. The definitions of these parameters and the detailed biological meanings of the situations we consider are provided in Tables S3 and S4, respectively.
3.2. Size
Table 4 gives the estimated sizes of Z XCII, Z max, Xcat, S A, FM 02, Z C, Z mfG, T A, T AD, $T_{AD}^s$, FM 01, FM F, Z mfA and Z A under different simulation settings when random mating holds in the parental generation. From Table 4, we can see that Z XCII, Z max, Xcat, FM 02, Z C, $T_{AD}^s$, FM 01, FM F, Z mfA and Z A generally control the size well, except that some of them produce a slightly conservative size under some situations. The sizes of S A and Z mfG are inflated when $\lpar {p_{_F }\comma \;p_{_M }} \rpar = \lpar {0.35\comma \;0.25} \rpar$ and the sex ratio is 3 : 2, and they stay close to the nominal level 10−5 for all of the other situations. T A and T AD can have inflated size when $\lpar {p_{_F }\comma \;p_{_M }} \rpar$ is equal to (0.25, 0.15) and (0.35, 0.25), which may be caused by the different allele frequencies between females and males in the offspring generation. However, they have a well-controlled size under the other situations. Table S5 reports the estimated sizes of different methods when $p_{_F } = p_{_M } = p$ but HWE does not hold in female offspring. In addition, Z XCII, Z max, Xcat, S A, FM 02, Z C, Z mfG, T A, T AD, $T_{AD}^s$, FM 01 and FM F generally control the size well. Z mfA and Z A can have inflated size when ρ = 0.05 and p = 0.30 since the allele-based test relies on the assumption of HWE in females.
a Numbers that are outside of the 95% confidence interval (0.38 × 10−5, 1.62 × 10−5) are highlighted in bold.
3.3. Power
To clearly illustrate the power results, we show the estimated powers of Z XCII, Z max, Xcat, S A, FM 02, Z C and Z mfG with relatively better performance in Figures 1–6 and Figures S1–S22, and those of T A, T AD, $T_{AD}^s$, FM 01, FM F, Z mfA and Z A with inflated size or lower powers are displayed in Figures S23–S50. Figure 1 gives the estimated powers of Z XCII, Z max, Xcat, S A, FM 02, Z C and Z mfG against sex ratio under random mating when there is XCI with γ = 1 and complete maternal parent-of-origin effect. It is shown in Figure 1 that Z XCII has the highest power among all seven methods. The powers of Z max, FM 02 and Z mfG are similar to each other and are generally higher than those of Xcat, S A and Z C. On the other hand, the powers are influenced by the sex ratio. When the proportion of males in the case group gets larger (r f:r m changing from 3:2 to 2:3), the power of Z XCII becomes smaller in Figure 1(a), while it remains nearly unchanged in the other subplots of Figure 1, and the powers of Z max, Xcat, FM 02, Z C and Z mfG are almost unchanged in Figure 1(a), while they are larger in the other subplots. However, with the number of males in the case group, S A is less powerful. It is also found that all of the methods have higher powers with increasing allele frequency (comparing the first row with the second row). Figure 2 displays the corresponding estimated powers when there is XCI with γ = 1.001 and incomplete maternal parent-of-origin effect. From Figure 2, we can see that the powers of Z XCII, Z max, FM 02 and Z mfG are very close to each other, which are generally larger than those of Xcat, S A and Z C. Compared to Figure 1, the effect of the sex ratio on Z XCII is greater as the power of Z XCII increases with larger male proportion in the case group in the second and third columns of Figure 2.
When there are XCI and no parent-of-origin effects under random mating, the estimated powers of Z XCII, Z max, Xcat, S A, FM 02, Z C and Z mfG with γ = 2, 0.935 and 0 are shown in Figures 3–5, respectively. From Figure 3, Z mfG has the highest power in the first row of Figure 3, while Z XCII is the most powerful in the second row. In fact, the powers of Z XCII, Z max, Xcat and Z mfG are very close to each other, which are larger than those of FM 02 and Z C. S A has relatively good performance in the first row of Figure 3, while it performs worse in the second row. In Figure 4, we find that Z XCII generally has higher power than Xcat, S A and Z C, although it has less power than Z max, FM 02 and Z mfG. Xcat is always the most powerful in all of the subplots of Figure 5. In the first row of Figure 5, Z XCII, Z max, FM 02 and Z C have similar powers, which perform much better than S A and Z mfG. In the second row of Figure 5, Z XCII is more powerful than the other five methods, except for Xcat. Furthermore, by comparing Figures 3–5, we find that the powers get larger with increasing γ-value. By comparing Figure 1 (complete maternal parent-of-origin effect), Figure 2 (incomplete maternal parent-of-origin effect) and Figure 4 (no parent-of-origin effects) with γ being fixed close to 1 (XCI-R), the power of Z XCII becomes smaller and smaller. Figure 6 plots the estimated powers of Z XCII, Z max, Xcat, S A, FM 02, Z C and Z mfG against the sex ratio under random mating when there is neither XCI nor parent-of-origin effects. Z XCII has similar power to Xcat and FM 02 in most situations. Z max, S A and Z mfG always outperform the other methods, while the power of Z C is always the lowest among those methods. The relatively low power of Z XCII is due to no XCI and no parent-of-origin effects.
The power results of Z XCII, Z max, Xcat, S A, FM 02, Z C and Z mfG with γ = 1.499 and 0.492 under random mating and incomplete maternal parent-of-origin effect are given in Figures S1 and S2, respectively. When there are no parent-of-origin effects, Figures S3 and S4 plot the estimated powers under XCI with γ = 1.503 and 0.500, respectively. The powers of these seven methods under random mating and paternal parent-of-origin effects are shown in Figures S5–S8. The results are similar to those under maternal parent-of-origin effects, except that the powers of Z XCII seem to be more strongly affected by the difference between $p_{_F }$ and $p_{_M }$ under paternal parent-of-origin effects. For example, the difference in power between Figure S5(c) and Figure S5(a) is much larger than that between Figure 1(c) and Figure 1(a).
Figures S9–S22 present the powers under the simulation settings where $p_{_F } = p_{_M } = p$ but HWE does not hold in female offspring. The left column of each figure represents the powers when ρ = −0.05, while the right column denotes the powers when ρ = 0.05. When comparing the two columns of each figure with the middle column in the corresponding figure under random mating (ρ = 0), we find that the powers with ρ = −0.05, 0 and 0.05 have similar trends, while the powers slightly increase as ρ changes from –0.05 to 0.05. This is probably due to the increase of genotype frequency of A/A. Finally, Figures S23–S50 display the powers of the other seven methods (T A, T AD, $T_{AD}^s$, FM 01, FM F, Z mfA and Z A), which control the size less well or have relatively low powers.
4. Discussion
In this paper, we propose a robust test, Z XCII, for testing associations between certain diseases and an X-linked SNP by simultaneously accounting for XCI and parent-of-origin effects. Our proposed method is an extension of Xcat for the situation where parent-of-origin effects have influence on the process of XCI. Two reasonable assumptions are made for Z XCII, just like Xcat (Chen et al., Reference Chen, Ng, Li, Liu and Huang2017): the generalized genetic model is hypothesized for female offspring and the mutant allele in female offspring is the same as that in male offspring. A good feature of the proposed method that should be emphasized is that there is no need to specify the patterns of XCI or parent-of-origin effects. The simulation studies are conducted in order to investigate the validity and performance of Z XCII under various scenarios of parameter values. The simulation results demonstrate that Z XCII is robust in all of the situations considered. It controls the size well and generally outperforms most of the 13 existing methods in power in the presence of parent-of-origin effects, especially complete parent-of-origin effects, although it suffers from slight loss in power when there are no parent-of-origin effects. Thus, the proposed method is a preferred choice when we are not sure whether or not there are parent-of-origin effects in practice.
It should be noted that Z XCII is an extension of Xcat. We first use the Fisher's method to combine Z 1, Z 2 and Z 3 in female offspring (denoted by Z f) and then obtain the proposed Z XCII by weighting Z f in female offspring and Z m in male offspring, while Xcat applies the Fisher's method directly to incorporate the test statistics for females and males (Chen et al., Reference Chen, Ng, Li, Liu and Huang2017). In fact, we have used the other methods to directly combine the test statistics for females and males, such as Fisher's approach used in Chen et al. (Reference Chen, Ng, Li, Liu and Huang2017) and Stouffer's method (Owen, Reference Owen2009). However, we find that Z XCII is optimal for most of the situations considered. On the other hand, compared to Xcat, the regression-based method allows us to adjust for covariates, which is another potential advantage of the proposed method. According to the simulation results (omitted here for brevity), we also found that Z XCII and other methods are not applicable to the association study for rare alleles. We may need to use the SKAT (Wu et al., Reference Wu, Lee, Cai, Li, Boehnke and Lin2011) or the extensions of SKAT (Larson et al., Reference Larson, Chen and Schaid2019) for dealing with this situation, which will be our subsequent work. In addition, note that the proposed Z XCII is only suitable for qualitative traits. If we want to analyse quantitative traits in future, we will need to change the logistic regression to multiple linear regression and conduct simulations to compare it with existing methods for quantitative traits. Finally, just like Wang et al. (Reference Wang, Yu and Shete2014), in order to simplify our model, we assumed that XCI-E is regarded as a binary variable to distinguish whether or not XCI is present. However, many genes have been observed to be of ‘variable escape’, with the levels of escape varying between individuals, cells and tissues or over time. How to consider these variable levels of XCI-E in our model will be our future work.
Supplementary material
For supplementary material accompanying this paper visit https://doi.org/10.1017/S0016672320000026.
Author contributions
Yu Zhang helped design the study, drafted the article and conducted the simulation study. Si-Qi Xu helped design the study and drafted the article. Wei Liu revised the article critically. Wing Kam Fung reviewed the whole paper and revised the article. Ji-Yuan Zhou helped design the study, supervised the field activities and directed their implementation, including quality assurance and control. All authors read and approved this version of the manuscript.
Acknowledgements
The authors thank the two reviewers for their helpful comments that greatly improve the presentation of this paper.
Financial support
This work was supported by the National Natural Science Foundation of China (grant number 81773544) and the Hong Kong RGC GRF grant (grant number 17302919).
Conflict of interest
None.
Ethical standards
None.