Mapping multiple quantitative trait loci under Bayes error control†

DANIEL SHRINER

doi:10.1017/S001667230900010X

Mapping multiple quantitative trait loci under Bayes error control†

Published online by Cambridge University Press: 09 July 2009

DANIEL SHRINER

Show author details

DANIEL SHRINER*: Affiliation:
Center for Research on Genomics and Global Health, National Institutes of Health, Bethesda, MD 20892, USA
*: *Center for Research on Genomics and Global Health, Building 12A, Room 4047, 12 South Dr., MSC 5635, Bethesda, MD 20892-5635, USA. Tel: +1 (301) 435 0068. Fax: +1 (301) 451 5426. e-mail: [email protected]

Article contents

Summary
Introduction
Materials and methods
Results
Discussion
Footnotes
References

Rights & Permissions

Summary

In mapping of quantitative trait loci (QTLs), performing hypothesis tests of linkage to a phenotype of interest across an entire genome involves multiple comparisons. Furthermore, linkage among loci induces correlation among tests. Under many multiple comparison frameworks, these problems are exacerbated when mapping multiple QTLs. Traditionally, significance thresholds have been subjectively set to control the probability of detecting at least one false positive outcome, although such thresholds are known to result in excessively low power to detect true positive outcomes. Recently, false discovery rate (FDR)-controlling procedures have been developed that yield more power both by relaxing the stringency of the significance threshold and by retaining more power for a given significance threshold. However, these procedures have been shown to perform poorly for mapping QTLs, principally because they ignore recombination fractions between markers. Here, I describe a procedure that accounts for recombination fractions and extends FDR control to include simultaneous control of the false non-discovery rate, i.e. the overall error rate is controlled. This procedure is developed in the Bayesian framework using a direct posterior probability approach. Data-driven significance thresholds are determined by minimizing the expected loss. The procedure is equivalent to jointly maximizing positive and negative predictive values. In the context of mapping QTLs for experimental crosses, the procedure is applicable to mapping main effects, gene–gene interactions and gene–environment interactions.

Type: Paper
Information: Genetics Research , Volume 91 , Issue 3 , June 2009 , pp. 147 - 159

DOI: https://doi.org/10.1017/S001667230900010X [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2009

1. Introduction

In genome-wide mapping of quantitative trait loci (QTLs), the goal is to characterize the genetic architecture of a trait by simultaneously identifying the entire subset of chromosomal intervals that affect the trait (Lynch & Walsh, Reference Lynch and Walsh1998). Observed data consist of phenotypic values and marker genotypes for a mapped population. For sparse marker maps, pseudomarkers are routinely inserted at evenly spaced intervals, e.g. 1 cM (Sen & Churchill, Reference Sen and Churchill2001). Genotypes are unobservable except at completely informative markers, but genotype probabilities can be inferred. In a QTL mapping analysis, performing hypothesis tests of linkage to a phenotype of interest across an entire genome induces a multiple testing problem. Furthermore, linkage among loci induces correlation among tests. With respect to the multiplicity problem, Bonferroni corrections are conservative and sacrifice power in order to maintain a low genome-wide false positive error rate. With respect to the correlation problem, Bonferroni corrections can become more conservative as correlation among tests increases.

In order to gain power, a less stringent threshold for claiming significance can be used, provided that some small number of false positive claims is acceptable. The false discovery rate (FDR) controls the proportion of hypothesis tests claimed to be positive that are falsely positive. Several groups have implemented various definitions of FDR control in the context of linkage analysis (Weller et al., Reference Weller, Song, Heyen, Lewin and Ron1998; Heyen et al., Reference Heyen, Weller, Ron, Band, Beever, Feldmesser, Da, Wiggans, Van Raden and Lewin1999; Weller, Reference Weller2000; Zaykin et al., Reference Zaykin, Young and Westfall2000; Beh et al., Reference Beh, Callaghan, Leish, Hulme, Lenane and Maddox2001; Mosig et al., Reference Mosig, Lipkin, Khutoreskaya, Tchourzyna, Soller and Friedmann2001; Yekutieli, Reference Yekutieli2001; Lee et al., Reference Lee, Dekkers, Soller, Malek, Fernando and Rothschild2002; Sabatti et al., Reference Sabatti, Service and Freimer2003; Varga et al., Reference Varga, Müller, Szabó, Pinke, Korom, Kovacs, Patthyc and Sollerd2003; Bennewitz et al., Reference Bennewitz, Reinsch, Guiard, Fritz, Thomsen, Looft, Kühn, Schwerin, Weimann, Erhardt, Reinhardt, Reents, Boichard and Kalm2004; Bernardo, Reference Bernardo2004; Cnaani et al., Reference Cnaani, Zilberman, Tinman, Hulata and Ron2004; Fernando et al., Reference Fernando, Nettleton, Southey, Dekkers, Rothschild and Soller2004; Simonsen & McIntyre, Reference Simonsen and McIntyre2004; Zhang et al., Reference Zhang, Zhang and Wells2004; Benjamini & Yekutieli, Reference Benjamini and Yekutieli2005).

Chen and Storey (Chen & Storey, Reference Chen and Storey2006) discussed two issues with linkage statistics. First, under the global null hypothesis, linkage statistics randomly fluctuate across the genome. Both theoretical and resampling methods have been developed to address this noise (e.g. Lander & Botstein, Reference Lander and Botstein1989; Lander & Kruglyak, Reference Lander and Kruglyak1995). Second, there is dependence in the signal due to linkage between markers. Under certain types of dependence structures, the FDR can be controlled, though control may be conservative compared with the desired nominal FDR level (Sabatti et al., Reference Sabatti, Service and Freimer2003; Benjamini & Yekutieli, Reference Benjamini and Yekutieli2005; Benjamini et al., Reference Benjamini, Krieger and Yekutieli2006).

In the case of sparse marker maps, it may be appropriate to consider the number of markers as the effective number of tests because the distance between markers may be large enough for the markers to be effectively uncorrelated. However, the FDR can be manipulated by marker placement (Fernando et al., Reference Fernando, Nettleton, Southey, Dekkers, Rothschild and Soller2004; Chen & Storey, Reference Chen and Storey2006). A routine practice in QTL mapping, as already mentioned, is to insert pseudomarkers and impute missing genotype data (Sen & Churchill, Reference Sen and Churchill2001). Inserting pseudomarkers provides the advantage of ensuring that the marker map is nearly balanced with respect to proportionally marking the regions of the genome that are and are not QTL, even if the observed markers are unbalanced. The availability of dense maps consisting of only observed markers resulting from high-throughput sequencing projects will likely eliminate the need for pseudomarkers while preserving the balance of marker maps.

In the case of dense marker maps, whether the map consists of markers only or markers supplemented with pseudomarkers, correlation between markers (and pseudomarkers) must be taken into account. Chen and Storey (Chen & Storey, Reference Chen and Storey2006) proceeded under the simplification that, under the null hypothesis of no linkage to the phenotype, the chromosome is the unit of testing, implicitly assuming that linkage extends across an entire chromosome. Similarly, Benjamini and Yekutieli (Benjamini & Yekutieli, Reference Benjamini and Yekutieli2005) based their work on the simplification that ‘for each trait all the hypotheses on a chromosome are either true or false depending on the presence of a QTL on the chromosome’.This choice ignores information provided by recombination fractions, which precludes localization of QTL on the sub-chromosomal scale, and is potentially biased if there are multiple QTLs on one chromosome (Kao et al., Reference Kao, Zeng and Teasdale1999). If meioses occur, particularly for long chromosomes or when recombination rates are high, linkage does not necessarily extend across an entire chromosome. For example, human chromosome 1 is ~225 cM, corresponding to a recombination fraction of 0·50 using the Kosambi mapping function, which indicates that the ends of this chromosome are unlinked. A more general definition of the proper unit of testing, appropriate for any recombination rate, is a linkage region, i.e. a region of a chromosome in which the set of all markers (and pseudomarkers) are in tight linkage such that test statistics are positively correlated and reflect only one underlying outcome.

In this study, an expected loss function is developed for multiple QTL mapping, taking into account dependence among markers as measured by recombination fractions. Using this method, the numbers of truly significant, falsely significant, truly non-significant and falsely non-significant hypothesis tests can all be estimated. In the Bayesian framework, these quantities can be directly estimated from posterior probabilities. Then, the FDR, which is the proportion of significant tests that are falsely significant, and the false non-discovery rate (FNR), which is the proportion of non-significant tests that are falsely non-significant, can both be estimated. Finally, these two error rates can be combined into a single value, the Bayes error, and minimizing this expected loss quantity can be used as an optimality criterion for determining significance thresholds. Bayes error-based mapping procedures are presented for main effects, gene–gene interactions and gene–environment interactions.

2. Materials and methods

(i) The multiple interacting QTL model

For continuous traits, consider the linear regression framework of Yi et al. (Yi et al., Reference Yi, Shriner, Banerjee, Mehta, Pomp and Yandell2007b) for multiple QTL models. The phenotypes of n individuals, y=(y ₁, …, y_n)^T, are expressed as

(1)

${\bf y} \equals {\bmmu } \plus {\bf X}_{\rm G} {\bmbeta }_{\rm G} \plus {\bf X}_{{\rm GG}} {\bmbeta }_{{\rm GG}} \plus {\bf X}_{\rm E} {\bmbeta }_{\rm E} \plus {\bf X}_{{\rm GE}} {\bmbeta }_{{\rm GE}} \plus {\bf e}\comma$

in which μ=(μ, …, μ)^T is the overall mean; X_G, X_GG, X_E and X_GE are the design matrices for main effects, gene–gene interactions, environmental effects and gene–environment interactions, respectively; β_G, β_GG, β_E and β_GE are vectors of main effects, gene–gene interactions, environmental effects and gene–environment interactions, respectively; and e is a vector of independent normal errors with mean zero and variance σ²I_n. Under prior independence, any one effect or interaction is independent of all other effects and interactions. Constraints can be imposed to make interactions dependent on main effects (Yi et al., Reference Yi, Shriner, Banerjee, Mehta, Pomp and Yandell2007b). With appropriate modifications to eqn (1), binary, categorical and ordinal traits can also be analysed (Yandell et al., Reference Yandell, Mehta, Banerjee, Shriner, Venkataraman, Moon, Neely, Wu, von Smith and Yi2007; Yi et al., Reference Yi, Banerjee, Pomp and Yandell2007a).

Suppose the genome is partitioned into H loci, ζ={ζ₁, …,ζ_H}, corresponding to the set of all markers and pseudomarkers. From the perspective of model selection, a subset of ζ explains phenotypic variation. Let λ={λ₁, …, λ_L}(∊{ζ₁, …, ζ_H}) be the positions of L QTL among the H loci. A vector of latent indicator variables γ is used to indicate which effects (main effects, gene–gene interactions and gene–environment interactions) are included in (γ=1) or excluded from (γ=0) the model (Yi, Reference Yi2004; Yi et al., Reference Yi, Yandell, Churchill, Allison, Eisen and Pomp2005, Reference Yi, Shriner, Banerjee, Mehta, Pomp and Yandell2007b). Effect sizes, conditional on γ, are distributed according to a mixture of a point mass at zero and a normal distribution (Yi & Shriner, Reference Yi and Shriner2008). In the Bayesian framework, the goal is to infer the posterior distribution of (γ,λ_γ). A full description of the prior distributions and the Markov chain Monte Carlo (MCMC) algorithms can be found elsewhere (Yi, Reference Yi2004; Yi et al., Reference Yi, Yandell, Churchill, Allison, Eisen and Pomp2005, Reference Yi, Shriner, Banerjee, Mehta, Pomp and Yandell2007b).

(ii) Discrete definitions of operating characteristics

Consider the possible outcomes when performing m hypothesis tests (Table 1). The number of true negatives is given by U, the number of false negatives is given by T, the number of true positives is given by S and the number of false positives is given by V. All four of these random numbers are unobservable. The marginal sums m ₀=U+V, which is the number of true null hypotheses and m ₁=T+S, which is the number of true alternative hypotheses, are likewise unobservable random numbers. The marginal sums W=U+T, which is the number of non-rejected hypotheses and R=V+S, which is the number of rejected hypotheses, are observable random numbers. The positive FDR (pFDR) is defined as

(2)

${\rm pFDR} \equals E\left( {{V \over R}\vert R \gt 0} \right) \comma$

which is the expected proportion of discoveries (i.e. rejected null hypotheses) that are false, given at least one discovery (Storey, Reference Storey2003). The marginal FDR (mFDR) is defined as

(3)

${\rm mFDR} \equals {{E\lpar V\rpar } \over {E\lpar R\rpar }}\comma$

(Benjamini & Hochberg, Reference Benjamini and Hochberg1995; Tsai et al., Reference Tsai, Hsueh and Chen2003). The pFDR and mFDR are asymptotically equivalent as the number of tests increases (Storey, Reference Storey2003). Similarly, the positive FNR (pFNR) is defined as

(4)

${\rm pFNR} \equals E\left( {{T \over W}\vert W \gt 0} \right)\comma$

which is the expected proportion on non-discoveries (i.e. accepted null hypotheses) that are false, given at least one non-discovery (Genovese & Wasserman, Reference Genovese and Wasserman2002; Storey, Reference Storey2003). The marginal FNR (mFNR) is defined as

(5)

${\rm mFNR} \equals {{E\lpar T\rpar } \over {E\lpar W\rpar }}.$

Table 1. Discrete outcomes of testing m-independent hypotheses

(iii) The number of tests for main effects

Under the global null hypothesis of no segregating QTL anywhere in the genome, equations have been derived to estimate the effective number of tests, m, under sparse and dense map cases (Lander & Botstein, Reference Lander and Botstein1989; Lander & Kruglyak, Reference Lander and Kruglyak1995). For sparse maps, the effective number of tests can be estimated simply by the number of markers, M. For dense maps, solve the equation (based on an Orenstein–Uhlenbeck diffusion process)

(6)

$\alpha \equals \lpar C \plus 2Gt_{\alpha } \rpar \chi ^{\setnum{2}} \lpar t_{\alpha } \rpar$

for t _α, in which α is the family-wide significance level (typically α=0·05), C is the number of chromosomes and G is the length of the genome in Morgans. For a recombinant inbred line design with selfing or with full-sib mating, G should be replaced with 2G or 4G, respectively. χ²(t _α) is the one-tailed probability (i.e. the area under the curve to the right of t _α) from the distribution function for the central χ² distribution. The χ² distribution has one degree of freedom for backcross or recombinant inbred line designs (for one effect size parameter) and two degrees of freedom for an F₂ design (for two effect size parameters). The effective number of tests is given by m=C+2Gt _α (Lander & Botstein, Reference Lander and Botstein1989; Lander & Kruglyak, Reference Lander and Kruglyak1995). If one has a sufficiently dense marker map to efficiently test linkage, additional markers will be tightly linked with existing markers. Additional markers yield diminishing returns, increasing the amount of correlation more than increasing the information content (Darvasi et al., Reference Darvasi, Weinreb, Minke, Weller and Soller1993). The average width of a test interval in cM is 100G/(C+2Gt _α). Note that this interval defines the average linkage region, which is the proper unit of testing for dense marker maps.

(iv) The direct posterior probability approach to FDR control

Suppose there is a test statistic t_i for each of i=1, …, m tests. Define a binary indicator variable H_i=0 if the null hypothesis is true for the ith test and H_i=1 if the alternative hypothesis is true for the ith test. In the context of linkage analysis, the null hypothesis is that the tested locus is not linked to the phenotype and the alternative hypothesis is that the tested locus is linked to the phenotype. Assume that (t_i, H_i) are identically and independently distributed random variables. Suppose the rejection region Г is held constant for testing all hypotheses. A test statistic t_i is either an element of the rejection, t_i∊Г, or it is not, t_i∉Г. Let Pr(⋅) denote a probability measure. Then, the pFDR for the ith test is

(7)

${\rm pFDR}_{i} \lpar \rmGamma \rpar \equals\! {{\Pr \lpar H_{i} \equals 0\comma t_{i} \in \rmGamma \rpar } \over {\Pr \lpar t_{i} \in \rmGamma \rpar }}\! \equals\! \Pr \lpar H_{i} \equals 0\vert t_{i} \in \rmGamma \rpar$

(Table 2) (Storey, Reference Storey2003).

Table 2. Probabilistic outcomes for hypothesis testing

^a H=0 indicates that the null hypothesis is true. H=1 indicates that the alternative hypothesis is true. t∊Г indicates that the test statistic is an element of the rejection region, Г. t∉Г indicates that the test statistic is not an element of the rejection region, Г. Pr(⋅) denotes the probability of an event. Pr(⋅,⋅) denotes the joint probability of two events.

Each locus in ζ may affect a trait through its main effects and/or interactions with other loci (epistasis) or environmental effects (Yi et al., Reference Yi, Yandell, Churchill, Allison, Eisen and Pomp2005). The posterior inclusion probability of the hth locus ζ_h, Pr(γ_h=1|y), is estimated by the frequency that the locus ζ_h appears in the posterior sample given the data y. Furthermore, Pr(γ=1|y) is a direct estimate of Pr(H=1). For a given rejection region Г, t_i∊Г if t_i is greater than or equal to a defined threshold of Pr(γ=1|y). The test statistic t_i is the cumulative posterior inclusion probability for an interval. The cumulative posterior inclusion probability over a chromosomal interval bracketed by loci h_L and h_R is given by $\sum\nolimits_{h \equals h_{L} }^{h_{R} } {\Pr \lpar \gamma _{h} \equals 1\vert {\bf y}\rpar }$ , for an interval in which all loci have posterior probabilities in the rejection region. (Care must be taken because the cumulative posterior inclusion probability may exceed one for a sufficiently large interval, indicating the presence of more than one QTL in that interval.) The length of the interval is given by ζ_hR−ζ_hL, which corresponds to ${\textstyle{{\zeta _{h_{R} } \minus \zeta _{h_{L} } } \over {100G\sol \lpar C \plus 2Gt_{\alpha } \rpar }}}$ tests. The probability that an interval declared positive is truly positive, Pr(H_i=1|t_i∊Г), is $\sum\nolimits_{h \equals h_{L} }^{h_{R} } {\Pr \lpar \gamma _{h} \equals 1\vert {\bf y}\rpar }$ . The probability that an interval declared positive is falsely positive, Pr(H_i=0|t_i∊Г), is given by 1−Pr(H_i=1|t_i∊Г), or $1 \minus \sum\nolimits_{h \equals h_{L} }^{h_{R} } {\Pr}{ \lpar \gamma _{h} \equals 1\vert {\bf y}\rpar }$ . The pFDR can then be estimated by the sum (over all intervals declared significant) of $1 \minus \sum\nolimits_{h \equals h_{L} }^{h_{R} } {\Pr \lpar \gamma _{h} \equals 1\vert {\bf y}\rpar }$ divided by the sum (over all intervals declared significant) of ${\textstyle{{\zeta _{h_{R} } \minus \zeta_{h_{L} } } \over {100G\sol \lpar C \plus 2Gt_{\rmalpha } \rpar }}}$ (Newton et al., Reference Newton, Noueiry, Sarkar and Ahlquist2004; Storey et al., Reference Storey, Akey and Kruglyak2005).

(v) The direct posterior probability approach to FNR control

For a given significance threshold Г, t_i∉Г if t_i is less than a defined threshold of Pr(γ=1|y). The probability that an interval declared negative is falsely negative,

(8)

${\rm pFNR}_{i} \lpar {\rmGamma }\rpar \equals {{\Pr \lpar H_{i} \equals 1\comma t_{i} \,\notin\, {\rmGamma }\rpar } \over {\Pr \lpar t_{i} \,\notin\, {\rmGamma }\rpar }} \equals \Pr \lpar H_{i} \equals 1\vert t_{i} \,\notin\, {\rmGamma }\rpar$

(Table 2), is given directly by $\sum\nolimits_{h \equals h_{L} }^{h_{R} } {\Pr \lpar {\gamma }_{h} \equals 1\vert {\bf y}\rpar }$ , for an interval in which all loci have posterior probabilities not in the rejection region. Thus, the pFNR is estimated by the sum of $\sum\nolimits_{h \equals h_{L} }^{h_{R} } {{\rm Pr} \lpar {\rm \gamma }_{h} \equals 1\vert {\bf y}\rpar }$ for all tests not declared significant divided by ${\textstyle{{{\zeta }_{h_{R} } \minus {\zeta }_{h_{L} } } \over {100G\sol \lpar C \plus 2Gt_{\rmalpha } \rpar }}}$ for all tests not declared significant.

(vi) Probabilistic definitions of operating characteristics

The eight operating characteristics for testing a hypothesis can be written in probabilistic form as follows (Table 2):

$\!{\rm Type} \ {\rm I} \ {\rm error} \ {\rm rate} \!\equals\! {{\Pr \lpar H \equals 0\comma t \in {\rmGamma }\rpar } \over {\Pr \lpar H \equals 0\rpar }} \!\equals \!\Pr \lpar t \in {\rmGamma }\vert H \!\equals 0\rpar \comma$

${\rm Type} \ {\rm II} \ {\rm error} \ {\rm rate} \equals {{\Pr \lpar H \equals 1\comma t \,\notin\, {\rmGamma }\rpar } \over {\Pr \lpar H \equals 1\rpar }} \!\equals \!\Pr \lpar t \,\notin\, {\rmGamma }\vert H \equals 1\rpar \comma$

${\rm Sensitivity} \equals {{\Pr \lpar H \equals 1\comma t \in {\rmGamma }\rpar } \over {\Pr \lpar H \equals 1\rpar }} \equals \Pr \lpar t \in {\rmGamma }\vert H \equals 1\rpar \comma$

${\rm Specificity} \equals {{\Pr \lpar H \equals 0\comma t \,\notin\, {\rmGamma }\rpar } \over {\Pr \lpar H \equals 0\rpar }} \equals \Pr \lpar t \,\notin\, {\rmGamma }\vert H \equals 0\rpar \comma$

$\openup1\eqalign{ {\rm Positive} \ {\rm predictive} \ {\rm value} \tab\equals {{\Pr \lpar H \equals 1\comma t \in {\rmGamma }\rpar } \over {\Pr \lpar t \in {\rmGamma }\rpar }} \cr\tab\equals \Pr \lpar H \equals 1\vert t \in {\rmGamma }\rpar \comma }$

$\openup1\eqalign{{\rm Negative} \ {\rm predictive} \ {\rm value}\tab \equals {{\Pr \lpar H \equals 0\comma t \,\notin\, {\rmGamma }\rpar } \over {\Pr \lpar t \,\notin\, {\rmGamma }\rpar }} \cr\tab\equals \Pr \lpar H \equals 0\vert t \,\notin\, {\rmGamma }\rpar \comma }$

${\rm FDR} \equals {{\Pr \lpar H \equals 0\comma t \in {\rmGamma }\rpar } \over {\Pr \lpar t \in {\rmGamma }\rpar }} \equals \Pr \lpar H \equals 0\vert t \in {\rmGamma }\rpar \comma$

${\rm FNR} \equals {{\Pr \lpar H \equals 1\comma t \,\notin\, {\rmGamma }\rpar } \over {\Pr \lpar t \,\notin\, {\rmGamma }\rpar }} \equals \Pr \lpar H \equals 1\vert t \,\notin\, {\rmGamma }\rpar .$

Sensitivity is equivalent to average power. Note that the positive predictive value and the FDR are complementary, as are the negative predictive value and the FNR. Coverage is the proportion of replicates for which an interval contains the true QTL location. Accuracy is a function of the difference between the estimated peak location and the true QTL location. Precision is an inverse measure of the width of an interval. Estimates of each of these operating characteristics exist for each of the m tests. Averaging over all tests can be done either marginally (the ratio of averages) or jointly (the average of ratios); in this study, averaging was done marginally.

(vii) Bayes error

One possible optimality criterion is to minimize the expected loss, BE, by minimizing the weighted average of the pFDR and the pFNR, (w)pFDR+(1−w)pFNR, for a user-defined cost w (Genovese & Wasserman, Reference Genovese and Wasserman2002; Storey, Reference Storey2003; Chen & Sarkar, Reference Chen and Sarkar2006). Under this scheme, the significance threshold, which in turn defines the rejection region, is objectively determined by the data rather than being subjectively set by the user. Due to complementarities, minimizing a weighted average of the FDR and FNR is equivalent to maximizing a weighted average of the positive and negative predictive values.

(viii) Extensions to gene–gene and gene–environment interactions

Whereas mapping main effects involves optimization over a two-dimensional curve of posterior probability as a function of genomic location, mapping gene–gene interactions involves optimization over a three-dimensional surface of posterior probability as a function of two genomic locations. Under backcross or recombinant inbred line designs, there is one type of gene–gene interaction effect. Under an F₂ design, there are four types of gene–gene interaction effects, referred to as additive–additive, additive–dominance, dominance–additive and dominance–dominance effects. The effective number of tests increases to $\lpar {m \atop 2}\rpar$ . Storey et al. (Storey et al., Reference Storey, Akey and Kruglyak2005) developed a sequential approach for detecting gene–gene interactions that detects interactions only if the main effects for both loci are significant. The implementation described herein allows for interaction effects independent of main effect sizes, which increases the type I error rate but also increases power for pairs of loci with primarily interaction effects. Mapping gene–environment interactions is analogous to mapping main effects, with one two-dimensional posterior probability curve for each interacting environmental effect.

(ix) Implementation in R/qtlbim

The freely available QTL mapping package R/qtlbim is an extensible, interactive environment for Bayesian analysis of multiple interacting QTL in experimental crosses (Yandell et al., Reference Yandell, Mehta, Banerjee, Shriner, Venkataraman, Moon, Neely, Wu, von Smith and Yi2007). It provides several efficient MCMC algorithms for evaluating posterior probabilities of genetic architectures, i.e. the number and locations of QTLs, main effects and gene–gene and gene–environment interactions. R/qtlbim provides tools to monitor mixing behaviour and convergence of the simulated Markov chain, and provides extensive informative graphical and numerical summaries of the MCMC output to infer and interpret the genetic architecture of complex traits. Code implementing Bayes error calculations will be included in R/qtlbim, which facilitates the general usage of Bayesian methodologies for genome-wide interacting QTL analysis.

(x) Simulation study

Data simulation and Bayesian QTL mapping were performed using the R package qtlbim (Yandell et al., Reference Yandell, Mehta, Banerjee, Shriner, Venkataraman, Moon, Neely, Wu, von Smith and Yi2007). The phenotypic trait was assumed to be normally distributed. An F₂ design was simulated. The simulated genome consisted of four autosomal chromosomes with a total length of 400 cM. Using the approximation for dense maps (Lander & Botstein, Reference Lander and Botstein1989; Lander & Kruglyak, Reference Lander and Kruglyak1995) presented in eqn (6) for α=0·05, these choices yielded 130 effective tests, for which the comparable LOD (likelihood of odds) threshold, equal to χ²(t _α)2ln10, was 3·41. Under the global null hypothesis, the average test spanned ~3·1 cM. Pseudomarkers were inserted at 1 cM intervals using the Kosambi mapping function and genotype probabilities were estimated using the multipoint method. The experimental conditions are presented in Table 3. For every set of conditions, 1000 independent replicate data sets were generated and analysed.

Table 3. Simulation conditions

^a The notation for marker maps indicates the number of markers on each chromosome. For Experiment 4, the 50 markers are evenly distributed across chromosome 1. For Experiment 6, 40 markers are evenly distributed between 32·5 and 57·5 cM and the other 10 markers are evenly distributed across the rest of the chromosome.

^b The notation for main effects is (chromosome, location, additive effect and dominance effect).

^c The notation for gene–gene interactions is (chromosome 1, location 1, chromosome 2, location 2, additive–additive effect, additive–dominance effect, dominance–additive effect and dominance–dominance effect).

^d The notation for gene–environment interactions is (chromosome, location, additive effect and dominance effect).

^e In Experiment 10, the prior number of QTLs was misspecified as five.

^f In Experiment 21, the estimated number of tests is given separately for main effects, gene–gene interactions and gene–environment interactions.

The prior number of QTLs in the MCMC analysis was the true value, except for Experiment 10 in which the prior number was misspecified as five (greater than the number of chromosomes). For a correctly constructed MCMC algorithm, the Markov chain will converge to a unique stationary distribution that is the target posterior distribution, regardless of the initialization values, provided that a sufficiently long chain is run (Geyer, Reference Geyer1992). The main effect of misspecified priors is to increase the burn-in period before the Markov chain converges to the stationary distribution. The MCMC algorithm was run for 400 000 iterations, with a thinning value of 20 and no burn-in. The cost in the Bayes error function was 0·5.

3. Results

Seven sets of simulations were performed. The first set of simulations was designed to investigate stability to chromosomal lengths and the marker map (Table 4). In Experiment 2 compared with Experiment 1, the lengths of the chromosomes differed but the number of markers on each chromosome was the same. This had the effect of increasing the intermarker distance on the chromosome with the true QTL, so that the estimation of the QTL location was less precise. In Experiment 3 compared with Experiment 1, the lengths of the chromosomes differed and the number of markers differed such that the average intermarker distance was the same. In Experiment 4 compared with Experiment 1, chromosome 1 (on which was one QTL) was saturated with markers. In Experiment 5 compared with Experiment 1, marker density was increased evenly across all chromosomes. In Experiment 6 compared with Experiment 1, only the interval of chromosome 1 surrounding the true QTL was saturated. The total number of markers in Experiments 4–6 was the same. The significance threshold, the pFDR, and the pFNR were not sensitive to chromosome lengths or marker placement.

Table 4. Mean operating characteristics for mapping main effects as a function of chromosome lengths and the marker map

The second set of simulations was designed to investigate mapping main effects as a function of QTL location, effect size and sample size (Table 5). When a QTL was located close to the edge of a chromosome (Experiment 7), the interval for the QTL location was truncated, leading to smaller widths and posterior probabilities. The estimation of the true QTL location was less accurate, with the bias being away from the edge of the chromosome. Also, it was more difficult to differentiate false versus true positive QTL, as can be seen by larger pFDR estimates and smaller positive predictive values. There were more false positive QTL, although the average posterior probability for a false positive QTL was smaller. With a larger effect size (Experiment 8), the significance threshold was markedly increased, implying a more stringent test. The pFDR was much smaller and the positive predictive value was much larger for a QTL with a larger effect. Sensitivity was reduced but specificity improved, consistent with increased stringency. Coverage and the true positive posterior probability both increased and the true positive width decreased with larger effects. The number of false positive QTLs decreased but the false positive posterior probability and the false positive width both increased. With a larger sample size (Experiment 9), similar to increasing the effect size, the significance threshold was increased, implying a more stringent test. The pFDR was much smaller and the positive predictive value was much larger for a larger sample size. Sensitivity was reduced but specificity increased, consistent with a more stringent test. Coverage and the true positive posterior probability both increased and the true positive width decreased with larger effects. The number of false positive QTLs decreased but the false positive posterior probability and the false positive width both increased. A misspecified prior (Experiment 10) had no effect.

Table 5. Mean operating characteristics for mapping main effects as a function of QTL location, effect size, sample size and prior (mis)specification

The third set of simulations was designed to investigate mapping multiple QTLs (Table 6). Increases in the amount of true positive signal due to multiple true QTLs had effects as expected on all of the operating characteristics (Experiments 11 and 12). The pFDR and the number of false positive intervals were smaller with multiple QTLs. Power increased as the number of QTLs increased, or equivalently, as the proportion of true alternative hypotheses increased.

Table 6. Mean operating characteristics for mapping main effects as a function of QTL number

The fourth set of simulations was designed to investigate mapping gene–gene interactions (Table 7). Not surprisingly, the pFDR was much larger for gene–gene interactions than for main effects (Experiment 13). The vast majority of claims of significance were in fact false positives, but with vanishingly small posterior probabilities. On average, it was ~100 times easier to distinguish false positives from true positives for gene–gene interactions than for main effects. Increasing the effect size (Experiment 14), the number of gene–gene interactions (Experiment 15) or the sample size (Experiment 16) had the same consequences as for main effects. As expected, large sample sizes were critical for obtaining high levels of predictive values and posterior probabilities for true hypotheses (Experiment 16).

Table 7. Mean operating characteristics for mapping gene–gene interactions as a function of effect size, QTL number and sample size

The fifth set of simulations was designed to investigate mapping gene–environment interactions (Table 8). Under the specific conditions tested, gene–environment interactions were easier to map than main effects (Experiments 17–20). Otherwise, all of the trends observed for main effects and gene–gene interactions also held for gene–environment interactions. The sixth set of simulations was designed to investigate simultaneous mapping of main effects, gene–gene interactions and gene–environment interactions (Table 9). Error estimates were slightly larger in the presence of all three types of genetics effects, but there was no impediment to this type of joint analysis (Experiment 21).

Table 8. Mean operating characteristics for mapping gene–environment interactions as a function of effect size, QTL number and sample size

Table 9. Simultaneous mapping of main effects, gene–gene interactions and gene–environment interactions (Experiment 21)

The seventh set of simulations was designed to investigate the effect of the estimated effective number of tests (Table 10). This was accomplished by solving eqn (6) for α=0·05 (Experiment 1), α=0·01 (Experiment 22) and α=0·001 (Experiment 23). As the estimated effective number of tests increased, the estimated values of the various operating characteristics converged to their respective asymptotic limits.

Table 10. Mean operating characteristics for mapping main effects as a function of the estimated effective number of tests

4. Discussion

For Mendelian traits, statistical hypothesis testing during QTL mapping is complicated by the fact that genome-wide testing involves multiple comparisons. For complex traits with multiple QTLs, a second problem is induced by the fact that some unknown proportion of tests, but usually many more than just one test, are expected to be rejected. Under the latter case, traditional corrections for multiple comparisons, such as Bonferroni corrections, are generally conservative with respect to both multiplicity and correlation. Recently developed methods, such as FDR-controlling methods, retain more power, ideally increase in power with an increasing number of QTLs, and account for dependence among tests. In this study, optimizing predictive value led to more conservative control (1−specificity<0·05) and sub-optimal coverage (coverage<0·95) than expected by a traditional experiment-wide significance level of 0·05.

Insertion of pseudomarkers at even intervals yields a nearly balanced marker map, such that the false nulls and true nulls are proportionally represented. If the map of markers and pseudomarkers is sufficiently dense, then tests of markers or intervals between markers are correlated through linkage. Thus, it is crucial to account for linkage by estimating the effective number of tests. The ‘width’ of a test defines a proper unit of testing that accounts for all markers and pseudomarkers in tight linkage. This test definition prevents the manipulation of the FDR by marker placement (Fernando et al., Reference Fernando, Nettleton, Southey, Dekkers, Rothschild and Soller2004; Chen & Storey, Reference Chen and Storey2006). Herein, I describe a new implementation of a Bayesian FDR-controlling method. The proposed method extends previous ones by controlling the overall error rate by simultaneously controlling the FDR and the FNR. The proposed method accounts for the effective number of tests in a genome-wide linkage scan, although the relevant parameter α is merely a nuisance parameter and could be eliminated by integration. Critically, the proposed method accounts for the correlation among markers within a linkage region. The proposed method is applicable for continuous, binary and ordinal traits and for multiple interacting QTL mapping, i.e. with an arbitrary number of QTLs with arbitrary genetic effects.

For multiple QTLs mapping, it is unlikely that any single model sampled by the MCMC algorithm would correctly contain all QTLs. This problem is addressable through the use of Bayesian model averaging to estimate parameters for each QTL marginally. Also, the overall Bayes error rate decreased with multiple QTLs. With multiple QTLs, a greater proportion of alternative hypotheses are true, and pFDR-controlling methods tend to be more powerful in these situations (Storey, Reference Storey2002). Since different types of genetic effects can be estimated with differing levels of accuracy and precision, it seems reasonable to optimize mapping for main effects, gene–gene interactions and gene–environment interactions separately.

The most commonly used summary statistic in Bayesian hypothesis testing is the Bayes factor, defined as the ratio of the posterior odds to the prior odds (Jeffreys, Reference Jeffreys1961; Kass & Raftery, Reference Kass and Raftery1995). Equivalently, the Bayes factor is the ratio of the marginal likelihoods of the data under the two hypotheses. The Bayes factor can be seen as the Bayesian counterpart to the likelihood ratio statistic, which is based on maximization (rather than integration) in frequentist analysis. Under certain regularity conditions, the Bayesian information criterion (Schwarz, Reference Schwarz1978) is approximately equal to −2ln(Bayes factor) (Kass & Wasserman, Reference Kass and Wasserman1995). Thus, Bayes factors can be useful not only in a manner similar to LOD scores but also in the model selection framework.

On the other hand, there are disadvantages with using Bayes factors. First, the interpretation of Bayes factors, just as with the interpretation of likelihood ratio statistics in the frequentist framework, depends on a subjective significance threshold (Jeffreys, Reference Jeffreys1961; Kass & Raftery, Reference Kass and Raftery1995). Increasing the stringency of the threshold will increase specificity at the cost of reducing sensitivity. Minimizing the expected loss function provides an objective, data-driven solution to this problem. Second, Bayes factors depend critically upon the prior distributions. Whereas this dependency has not been particularly problematic for single trait analysis, it appears to be problematic for testing pleiotropy in multiple trait analysis (Banerjee et al., Reference Banerjee, Yandell and Yi2008). The prior probability that a locus affects none of the traits, i.e., the intersection of all of the null hypotheses, tends to become vanishingly small as the number of traits increases, thereby potentially yielding enormously large Bayes factors that are unstable and difficult to calibrate. In this context, analysing the posterior probability profile appears to be more tractable than analysing Bayes factors (Banerjee et al., Reference Banerjee, Yandell and Yi2008).

False positive error rates in Bayesian analyses can be affected by the specification of prior distributions. The posterior distribution of the number of QTLs is influenced by the prior distribution of the number of QTLs, whereas the Bayes factor tends to be relatively insensitive to the prior distribution of the number of QTLs (Yi & Shriner, Reference Yi and Shriner2008). In contrast, the prior distribution of effect sizes tends to more strongly affect false positive error rates. The point-normal prior induces shrinkage of effect sizes towards zero, which has the advantages of reducing model space by removing most loci from the model and reducing the false positive error rate, independent of whether posterior inference is made through Bayes factors or the Bayes error.

The merit of optimizing prediction rather than classification remains an open question. In this study, optimization of prediction tended to be conservative, as evident by sub-optimal coverage and large specificities. Maximizing the positive predictive value sacrifices detection of smaller effects in favour of higher positive predictive values for larger effects. Thus, optimizing prediction may have particular benefit in genome-wide analysis by improving replicability of findings.

This research was supported by National Institutes of Health grant DK062710 and the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health. I thank the Enabling Technology Laboratory in the Department of Mechanical Engineering, University of Alabama at Birmingham for assistance with high performance computing. I thank Howard Wiener and Grier Page for their suggestions, Śaunak Sen for critical discussion and the anonymous reviewers for their helpful comments.

Footnotes

†

Presented in part at the 6th Annual Meeting of the Complex Trait Consortium, 26–29 May 2007 in Braunschweig, Germany.

References

Banerjee, S., Yandell, B. S. & Yi, N. (2008). Bayesian quantitative trait loci mapping for multiple traits. Genetics 179, 2275–2289.CrossRef Google Scholar PubMed

Beh, K. J., Callaghan, M. J., Leish, Z., Hulme, D. J., Lenane, I. & Maddox, J. F. (2001). A genome scan for QTL affecting fleece and wool traits in Merino sheep. Wool Technology and Sheep Breeding 49, 88–97.Google Scholar

Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289–300.Google Scholar

Benjamini, Y., Krieger, A. M. & Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93, 491–507.CrossRef Google Scholar

Benjamini, Y. & Yekutieli, D. (2005). Quantitative trait loci analysis using the false discovery rate. Genetics 171, 783–790.CrossRef Google Scholar PubMed

Bennewitz, J., Reinsch, N., Guiard, V., Fritz, S., Thomsen, H., Looft, C., Kühn, C., Schwerin, M., Weimann, C., Erhardt, G., Reinhardt, F., Reents, R., Boichard, D. & Kalm, E. (2004). Multiple quantitative trait loci mapping with cofactors and application of alternative variants of the false discovery rate in an enlarged granddaughter design. Genetics 168, 1019–1027.CrossRef Google Scholar

Bernardo, R. (2004). What proportion of declared QTL in plants are false? Theoretical and Applied Genetics 109, 419–424.CrossRef Google Scholar PubMed

Chen, J. & Sarkar, S. K. (2006). A Bayesian determination of threshold for identifying differentially expressed genes in microarray experiments. Statistics in Medicine 25, 3174–3189.CrossRef Google Scholar PubMed

Chen, L. & Storey, J. D. (2006). Relaxed significance criteria for linkage analysis. Genetics 173, 2371–2381.CrossRef Google Scholar PubMed

Cnaani, A., Zilberman, N., Tinman, S., Hulata, G. & Ron, M. (2004). Genome-scan analysis for quantitative trait loci in an F₂ tilapia hybrid. Molecular Genetics and Genomics 272, 162–172.CrossRef Google Scholar

Darvasi, A., Weinreb, A., Minke, V., Weller, J. I. & Soller, M. (1993). Detecting marker-QTL linkage and estimating QTL gene effect and map location using a saturated genetic map. Genetics 134, 943–951.CrossRef Google Scholar PubMed

Fernando, R. L., Nettleton, D., Southey, B. R., Dekkers, J. C., Rothschild, M. F. & Soller, M. (2004). Controlling the proportion of false positives in multiple dependent tests. Genetics 166, 611–619.CrossRef Google Scholar PubMed

Genovese, C. & Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society Series B 64, 499–517.CrossRef Google Scholar

Geyer, C. J. (1992). Practical Markov chain Monte Carlo. Statistical Science 7, 473–483.Google Scholar

Heyen, D. W., Weller, J. I., Ron, M., Band, M., Beever, J. E., Feldmesser, E., Da, Y., Wiggans, G. R., Van Raden, P. M. & Lewin, H. A. (1999). A genome scan for QTL influencing milk production and health traits in dairy cattle. Physiological Genomics 1, 165–175.CrossRef Google Scholar PubMed

Jeffreys, H. (1961). Theory of Probability, 3rd edn. Oxford, UK: Oxford University Press.Google Scholar

Kao, C. H., Zeng, Z. B. & Teasdale, R. D. (1999). Multiple interval mapping for quantitative trait loci. Genetics 152, 1203–1216.CrossRef Google Scholar PubMed

Kass, R. E. & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association 90, 773–795.CrossRef Google Scholar

Kass, R. E. & Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association 90, 928–934.CrossRef Google Scholar

Lander, E. & Kruglyak, L. (1995). Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nature Genetics 11, 241–247.CrossRef Google Scholar PubMed

Lander, E. S. & Botstein, D. (1989). Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185–199.CrossRef Google Scholar PubMed

Lee, H., Dekkers, J. C., Soller, M., Malek, M., Fernando, R. L. & Rothschild, M. F. (2002). Application of the false discovery rate to quantitative trait loci interval mapping with multiple traits. Genetics 161, 905–914.CrossRef Google Scholar PubMed

Lynch, M. & Walsh, B. (1998). Genetics and Analysis of Quantitative Traits. Sunderland, MA: Sinauer Associates, Inc.Google Scholar

Mosig, M. O., Lipkin, E., Khutoreskaya, G., Tchourzyna, E., Soller, M. & Friedmann, A. (2001). A whole genome scan for quantitative trait loci affecting milk protein percentage in Israeli-Holstein cattle, by means of selective milk DNA pooling in a daughter design, using an adjusted false discovery rate criterion. Genetics 157, 1683–1698.CrossRef Google Scholar

Newton, M. A., Noueiry, A., Sarkar, D. & Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5, 155–176.CrossRef Google Scholar PubMed

Sabatti, C., Service, S. & Freimer, N. (2003). False discovery rate in linkage and association genome screens for complex disorders. Genetics 164, 829–833.CrossRef Google Scholar PubMed

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6, 461–464.CrossRef Google Scholar

Sen, Ś. & Churchill, G. A. (2001). A statistical framework for quantitative trait mapping. Genetics 159, 371–387.CrossRef Google Scholar PubMed

Simonsen, K. L. & McIntyre, L. M. (2004). Using alpha wisely: improving power to detect multiple QTL. Statistical Applications in Genetics and Molecular Biology 3, Article 1.CrossRef Google Scholar PubMed

Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society Series B 64, 479–498.CrossRef Google Scholar

Storey, J. D. (2003). The positive false discovery rate: a Bayesian interpretation and the q-value. The Annals of Statistics 31, 2013–2035.CrossRef Google Scholar

Storey, J. D., Akey, J. M. & Kruglyak, L. (2005). Multiple locus linkage analysis of genomewide expression in yeast. PLoS Biology 3, e267.CrossRef Google Scholar PubMed

Tsai, C.-A., Hsueh, H.-M. & Chen, J. J. (2003). Estimation of false discovery rates in multiple testing: application to gene microarray data. Biometrics 59, 1071–1081.CrossRef Google Scholar PubMed

Varga, L., Müller, G., Szabó, G., Pinke, O., Korom, E., Kovacs, B., Patthyc, L. & Sollerd, M. (2003). Mapping modifiers affecting muscularity of the myostatin mutant (Mstn^Cmpt-dl1Abc) compact mouse. Genetics 165, 257–267.CrossRef Google Scholar

Weller, J. I. (2000). Using the false discovery rate approach in the genetic dissection of complex traits: a response to Zaykin et al. Genetics 154, 1918.CrossRef Google Scholar

Weller, J. I., Song, J. Z., Heyen, D. W., Lewin, H. A. & Ron, M. (1998). A new approach to the problem of multiple comparisons in the genetic dissection of complex traits. Genetics 150, 1699–1706.CrossRef Google Scholar

Yandell, B. S., Mehta, T., Banerjee, S., Shriner, D., Venkataraman, R., Moon, J. Y., Neely, W. W., Wu, H., von Smith, R, & Yi, N. (2007). R/qtlbim: QTL with Bayesian interval mapping in experimental crosses. Bioinformatics 23, 641–643.CrossRef Google Scholar PubMed

Yekutieli, D. (2001). Theoretical results needed for applying the false discovery rate in statistical problems. PhD thesis, Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv, Israel.Google Scholar

Yi, N. (2004). A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci. Genetics 167, 967–975.CrossRef Google Scholar PubMed

Yi, N., Banerjee, S., Pomp, D. & Yandell, B. S. (2007 a). Bayesian mapping of genomewide interacting quantitative trait loci for ordinal traits. Genetics 176, 1855–1864.CrossRef Google Scholar PubMed

Yi, N. & Shriner, D. (2008). Advances in Bayesian multiple quantitative trait loci mapping in experimental crosses. Heredity 100, 240–252.CrossRef Google Scholar PubMed

Yi, N., Shriner, D., Banerjee, S., Mehta, T., Pomp, D. & Yandell, B. S. (2007 b). An efficient Bayesian model selection approach for interacting quantitative trait loci models with many effects. Genetics 176, 1865–1877.CrossRef Google Scholar PubMed

Yi, N., Yandell, B. S., Churchill, G. A., Allison, D. B., Eisen, E. J. & Pomp, D. (2005). Bayesian model selection for genome-wide epistatic quantitative trait loci analysis. Genetics 170, 1333–1344.CrossRef Google Scholar PubMed

Zaykin, D. V., Young, S. S. & Westfall, P. H. (2000). Using the false discovery rate approach in the genetic dissection of complex traits: a response to Weller et al. Genetics 154, 1917–1918.CrossRef Google Scholar PubMed

Zhang, D., Zhang, M. & Wells, M. T. (2004). Variable selection for large p small n regression model with incomplete data: application to QTL mapping. Technical Report. Department of Biostatistics and Computational Biology, University of Rochester Medical Center.Google Scholar