1. Introduction
In genome-wide mapping of quantitative trait loci (QTLs), the goal is to characterize the genetic architecture of a trait by simultaneously identifying the entire subset of chromosomal intervals that affect the trait (Lynch & Walsh, Reference Lynch and Walsh1998). Observed data consist of phenotypic values and marker genotypes for a mapped population. For sparse marker maps, pseudomarkers are routinely inserted at evenly spaced intervals, e.g. 1 cM (Sen & Churchill, Reference Sen and Churchill2001). Genotypes are unobservable except at completely informative markers, but genotype probabilities can be inferred. In a QTL mapping analysis, performing hypothesis tests of linkage to a phenotype of interest across an entire genome induces a multiple testing problem. Furthermore, linkage among loci induces correlation among tests. With respect to the multiplicity problem, Bonferroni corrections are conservative and sacrifice power in order to maintain a low genome-wide false positive error rate. With respect to the correlation problem, Bonferroni corrections can become more conservative as correlation among tests increases.
In order to gain power, a less stringent threshold for claiming significance can be used, provided that some small number of false positive claims is acceptable. The false discovery rate (FDR) controls the proportion of hypothesis tests claimed to be positive that are falsely positive. Several groups have implemented various definitions of FDR control in the context of linkage analysis (Weller et al., Reference Weller, Song, Heyen, Lewin and Ron1998; Heyen et al., Reference Heyen, Weller, Ron, Band, Beever, Feldmesser, Da, Wiggans, Van Raden and Lewin1999; Weller, Reference Weller2000; Zaykin et al., Reference Zaykin, Young and Westfall2000; Beh et al., Reference Beh, Callaghan, Leish, Hulme, Lenane and Maddox2001; Mosig et al., Reference Mosig, Lipkin, Khutoreskaya, Tchourzyna, Soller and Friedmann2001; Yekutieli, Reference Yekutieli2001; Lee et al., Reference Lee, Dekkers, Soller, Malek, Fernando and Rothschild2002; Sabatti et al., Reference Sabatti, Service and Freimer2003; Varga et al., Reference Varga, Müller, Szabó, Pinke, Korom, Kovacs, Patthyc and Sollerd2003; Bennewitz et al., Reference Bennewitz, Reinsch, Guiard, Fritz, Thomsen, Looft, Kühn, Schwerin, Weimann, Erhardt, Reinhardt, Reents, Boichard and Kalm2004; Bernardo, Reference Bernardo2004; Cnaani et al., Reference Cnaani, Zilberman, Tinman, Hulata and Ron2004; Fernando et al., Reference Fernando, Nettleton, Southey, Dekkers, Rothschild and Soller2004; Simonsen & McIntyre, Reference Simonsen and McIntyre2004; Zhang et al., Reference Zhang, Zhang and Wells2004; Benjamini & Yekutieli, Reference Benjamini and Yekutieli2005).
Chen and Storey (Chen & Storey, Reference Chen and Storey2006) discussed two issues with linkage statistics. First, under the global null hypothesis, linkage statistics randomly fluctuate across the genome. Both theoretical and resampling methods have been developed to address this noise (e.g. Lander & Botstein, Reference Lander and Botstein1989; Lander & Kruglyak, Reference Lander and Kruglyak1995). Second, there is dependence in the signal due to linkage between markers. Under certain types of dependence structures, the FDR can be controlled, though control may be conservative compared with the desired nominal FDR level (Sabatti et al., Reference Sabatti, Service and Freimer2003; Benjamini & Yekutieli, Reference Benjamini and Yekutieli2005; Benjamini et al., Reference Benjamini, Krieger and Yekutieli2006).
In the case of sparse marker maps, it may be appropriate to consider the number of markers as the effective number of tests because the distance between markers may be large enough for the markers to be effectively uncorrelated. However, the FDR can be manipulated by marker placement (Fernando et al., Reference Fernando, Nettleton, Southey, Dekkers, Rothschild and Soller2004; Chen & Storey, Reference Chen and Storey2006). A routine practice in QTL mapping, as already mentioned, is to insert pseudomarkers and impute missing genotype data (Sen & Churchill, Reference Sen and Churchill2001). Inserting pseudomarkers provides the advantage of ensuring that the marker map is nearly balanced with respect to proportionally marking the regions of the genome that are and are not QTL, even if the observed markers are unbalanced. The availability of dense maps consisting of only observed markers resulting from high-throughput sequencing projects will likely eliminate the need for pseudomarkers while preserving the balance of marker maps.
In the case of dense marker maps, whether the map consists of markers only or markers supplemented with pseudomarkers, correlation between markers (and pseudomarkers) must be taken into account. Chen and Storey (Chen & Storey, Reference Chen and Storey2006) proceeded under the simplification that, under the null hypothesis of no linkage to the phenotype, the chromosome is the unit of testing, implicitly assuming that linkage extends across an entire chromosome. Similarly, Benjamini and Yekutieli (Benjamini & Yekutieli, Reference Benjamini and Yekutieli2005) based their work on the simplification that ‘for each trait all the hypotheses on a chromosome are either true or false depending on the presence of a QTL on the chromosome’.This choice ignores information provided by recombination fractions, which precludes localization of QTL on the sub-chromosomal scale, and is potentially biased if there are multiple QTLs on one chromosome (Kao et al., Reference Kao, Zeng and Teasdale1999). If meioses occur, particularly for long chromosomes or when recombination rates are high, linkage does not necessarily extend across an entire chromosome. For example, human chromosome 1 is ~225 cM, corresponding to a recombination fraction of 0·50 using the Kosambi mapping function, which indicates that the ends of this chromosome are unlinked. A more general definition of the proper unit of testing, appropriate for any recombination rate, is a linkage region, i.e. a region of a chromosome in which the set of all markers (and pseudomarkers) are in tight linkage such that test statistics are positively correlated and reflect only one underlying outcome.
In this study, an expected loss function is developed for multiple QTL mapping, taking into account dependence among markers as measured by recombination fractions. Using this method, the numbers of truly significant, falsely significant, truly non-significant and falsely non-significant hypothesis tests can all be estimated. In the Bayesian framework, these quantities can be directly estimated from posterior probabilities. Then, the FDR, which is the proportion of significant tests that are falsely significant, and the false non-discovery rate (FNR), which is the proportion of non-significant tests that are falsely non-significant, can both be estimated. Finally, these two error rates can be combined into a single value, the Bayes error, and minimizing this expected loss quantity can be used as an optimality criterion for determining significance thresholds. Bayes error-based mapping procedures are presented for main effects, gene–gene interactions and gene–environment interactions.
2. Materials and methods
(i) The multiple interacting QTL model
For continuous traits, consider the linear regression framework of Yi et al. (Yi et al., Reference Yi, Shriner, Banerjee, Mehta, Pomp and Yandell2007b) for multiple QTL models. The phenotypes of n individuals, y=(y 1, …, yn)T, are expressed as
in which μ=(μ, …, μ)T is the overall mean; XG, XGG, XE and XGE are the design matrices for main effects, gene–gene interactions, environmental effects and gene–environment interactions, respectively; βG, βGG, βE and βGE are vectors of main effects, gene–gene interactions, environmental effects and gene–environment interactions, respectively; and e is a vector of independent normal errors with mean zero and variance σ2In. Under prior independence, any one effect or interaction is independent of all other effects and interactions. Constraints can be imposed to make interactions dependent on main effects (Yi et al., Reference Yi, Shriner, Banerjee, Mehta, Pomp and Yandell2007b). With appropriate modifications to eqn (1), binary, categorical and ordinal traits can also be analysed (Yandell et al., Reference Yandell, Mehta, Banerjee, Shriner, Venkataraman, Moon, Neely, Wu, von Smith and Yi2007; Yi et al., Reference Yi, Banerjee, Pomp and Yandell2007a).
Suppose the genome is partitioned into H loci, ζ={ζ1, …,ζH}, corresponding to the set of all markers and pseudomarkers. From the perspective of model selection, a subset of ζ explains phenotypic variation. Let λ={λ1, …, λL}(∊{ζ1, …, ζH}) be the positions of L QTL among the H loci. A vector of latent indicator variables γ is used to indicate which effects (main effects, gene–gene interactions and gene–environment interactions) are included in (γ=1) or excluded from (γ=0) the model (Yi, Reference Yi2004; Yi et al., Reference Yi, Yandell, Churchill, Allison, Eisen and Pomp2005, Reference Yi, Shriner, Banerjee, Mehta, Pomp and Yandell2007b). Effect sizes, conditional on γ, are distributed according to a mixture of a point mass at zero and a normal distribution (Yi & Shriner, Reference Yi and Shriner2008). In the Bayesian framework, the goal is to infer the posterior distribution of (γ,λγ). A full description of the prior distributions and the Markov chain Monte Carlo (MCMC) algorithms can be found elsewhere (Yi, Reference Yi2004; Yi et al., Reference Yi, Yandell, Churchill, Allison, Eisen and Pomp2005, Reference Yi, Shriner, Banerjee, Mehta, Pomp and Yandell2007b).
(ii) Discrete definitions of operating characteristics
Consider the possible outcomes when performing m hypothesis tests (Table 1). The number of true negatives is given by U, the number of false negatives is given by T, the number of true positives is given by S and the number of false positives is given by V. All four of these random numbers are unobservable. The marginal sums m 0=U+V, which is the number of true null hypotheses and m 1=T+S, which is the number of true alternative hypotheses, are likewise unobservable random numbers. The marginal sums W=U+T, which is the number of non-rejected hypotheses and R=V+S, which is the number of rejected hypotheses, are observable random numbers. The positive FDR (pFDR) is defined as
which is the expected proportion of discoveries (i.e. rejected null hypotheses) that are false, given at least one discovery (Storey, Reference Storey2003). The marginal FDR (mFDR) is defined as
(Benjamini & Hochberg, Reference Benjamini and Hochberg1995; Tsai et al., Reference Tsai, Hsueh and Chen2003). The pFDR and mFDR are asymptotically equivalent as the number of tests increases (Storey, Reference Storey2003). Similarly, the positive FNR (pFNR) is defined as
which is the expected proportion on non-discoveries (i.e. accepted null hypotheses) that are false, given at least one non-discovery (Genovese & Wasserman, Reference Genovese and Wasserman2002; Storey, Reference Storey2003). The marginal FNR (mFNR) is defined as
(iii) The number of tests for main effects
Under the global null hypothesis of no segregating QTL anywhere in the genome, equations have been derived to estimate the effective number of tests, m, under sparse and dense map cases (Lander & Botstein, Reference Lander and Botstein1989; Lander & Kruglyak, Reference Lander and Kruglyak1995). For sparse maps, the effective number of tests can be estimated simply by the number of markers, M. For dense maps, solve the equation (based on an Orenstein–Uhlenbeck diffusion process)
for t α, in which α is the family-wide significance level (typically α=0·05), C is the number of chromosomes and G is the length of the genome in Morgans. For a recombinant inbred line design with selfing or with full-sib mating, G should be replaced with 2G or 4G, respectively. χ2(t α) is the one-tailed probability (i.e. the area under the curve to the right of t α) from the distribution function for the central χ2 distribution. The χ2 distribution has one degree of freedom for backcross or recombinant inbred line designs (for one effect size parameter) and two degrees of freedom for an F2 design (for two effect size parameters). The effective number of tests is given by m=C+2Gt α (Lander & Botstein, Reference Lander and Botstein1989; Lander & Kruglyak, Reference Lander and Kruglyak1995). If one has a sufficiently dense marker map to efficiently test linkage, additional markers will be tightly linked with existing markers. Additional markers yield diminishing returns, increasing the amount of correlation more than increasing the information content (Darvasi et al., Reference Darvasi, Weinreb, Minke, Weller and Soller1993). The average width of a test interval in cM is 100G/(C+2Gt α). Note that this interval defines the average linkage region, which is the proper unit of testing for dense marker maps.
(iv) The direct posterior probability approach to FDR control
Suppose there is a test statistic ti for each of i=1, …, m tests. Define a binary indicator variable Hi=0 if the null hypothesis is true for the ith test and Hi=1 if the alternative hypothesis is true for the ith test. In the context of linkage analysis, the null hypothesis is that the tested locus is not linked to the phenotype and the alternative hypothesis is that the tested locus is linked to the phenotype. Assume that (ti, Hi) are identically and independently distributed random variables. Suppose the rejection region Г is held constant for testing all hypotheses. A test statistic ti is either an element of the rejection, ti∊Г, or it is not, ti∉Г. Let Pr(⋅) denote a probability measure. Then, the pFDR for the ith test is
(Table 2) (Storey, Reference Storey2003).
a H=0 indicates that the null hypothesis is true. H=1 indicates that the alternative hypothesis is true. t∊Г indicates that the test statistic is an element of the rejection region, Г. t∉Г indicates that the test statistic is not an element of the rejection region, Г. Pr(⋅) denotes the probability of an event. Pr(⋅,⋅) denotes the joint probability of two events.
Each locus in ζ may affect a trait through its main effects and/or interactions with other loci (epistasis) or environmental effects (Yi et al., Reference Yi, Yandell, Churchill, Allison, Eisen and Pomp2005). The posterior inclusion probability of the hth locus ζh, Pr(γh=1|y), is estimated by the frequency that the locus ζh appears in the posterior sample given the data y. Furthermore, Pr(γ=1|y) is a direct estimate of Pr(H=1). For a given rejection region Г, ti∊Г if ti is greater than or equal to a defined threshold of Pr(γ=1|y). The test statistic ti is the cumulative posterior inclusion probability for an interval. The cumulative posterior inclusion probability over a chromosomal interval bracketed by loci hL and hR is given by , for an interval in which all loci have posterior probabilities in the rejection region. (Care must be taken because the cumulative posterior inclusion probability may exceed one for a sufficiently large interval, indicating the presence of more than one QTL in that interval.) The length of the interval is given by ζhR−ζhL, which corresponds to tests. The probability that an interval declared positive is truly positive, Pr(Hi=1|ti∊Г), is . The probability that an interval declared positive is falsely positive, Pr(Hi=0|ti∊Г), is given by 1−Pr(Hi=1|ti∊Г), or . The pFDR can then be estimated by the sum (over all intervals declared significant) of divided by the sum (over all intervals declared significant) of (Newton et al., Reference Newton, Noueiry, Sarkar and Ahlquist2004; Storey et al., Reference Storey, Akey and Kruglyak2005).
(v) The direct posterior probability approach to FNR control
For a given significance threshold Г, ti∉Г if ti is less than a defined threshold of Pr(γ=1|y). The probability that an interval declared negative is falsely negative,
(Table 2), is given directly by , for an interval in which all loci have posterior probabilities not in the rejection region. Thus, the pFNR is estimated by the sum of for all tests not declared significant divided by for all tests not declared significant.
(vi) Probabilistic definitions of operating characteristics
The eight operating characteristics for testing a hypothesis can be written in probabilistic form as follows (Table 2):
Sensitivity is equivalent to average power. Note that the positive predictive value and the FDR are complementary, as are the negative predictive value and the FNR. Coverage is the proportion of replicates for which an interval contains the true QTL location. Accuracy is a function of the difference between the estimated peak location and the true QTL location. Precision is an inverse measure of the width of an interval. Estimates of each of these operating characteristics exist for each of the m tests. Averaging over all tests can be done either marginally (the ratio of averages) or jointly (the average of ratios); in this study, averaging was done marginally.
(vii) Bayes error
One possible optimality criterion is to minimize the expected loss, BE, by minimizing the weighted average of the pFDR and the pFNR, (w)pFDR+(1−w)pFNR, for a user-defined cost w (Genovese & Wasserman, Reference Genovese and Wasserman2002; Storey, Reference Storey2003; Chen & Sarkar, Reference Chen and Sarkar2006). Under this scheme, the significance threshold, which in turn defines the rejection region, is objectively determined by the data rather than being subjectively set by the user. Due to complementarities, minimizing a weighted average of the FDR and FNR is equivalent to maximizing a weighted average of the positive and negative predictive values.
(viii) Extensions to gene–gene and gene–environment interactions
Whereas mapping main effects involves optimization over a two-dimensional curve of posterior probability as a function of genomic location, mapping gene–gene interactions involves optimization over a three-dimensional surface of posterior probability as a function of two genomic locations. Under backcross or recombinant inbred line designs, there is one type of gene–gene interaction effect. Under an F2 design, there are four types of gene–gene interaction effects, referred to as additive–additive, additive–dominance, dominance–additive and dominance–dominance effects. The effective number of tests increases to . Storey et al. (Storey et al., Reference Storey, Akey and Kruglyak2005) developed a sequential approach for detecting gene–gene interactions that detects interactions only if the main effects for both loci are significant. The implementation described herein allows for interaction effects independent of main effect sizes, which increases the type I error rate but also increases power for pairs of loci with primarily interaction effects. Mapping gene–environment interactions is analogous to mapping main effects, with one two-dimensional posterior probability curve for each interacting environmental effect.
(ix) Implementation in R/qtlbim
The freely available QTL mapping package R/qtlbim is an extensible, interactive environment for Bayesian analysis of multiple interacting QTL in experimental crosses (Yandell et al., Reference Yandell, Mehta, Banerjee, Shriner, Venkataraman, Moon, Neely, Wu, von Smith and Yi2007). It provides several efficient MCMC algorithms for evaluating posterior probabilities of genetic architectures, i.e. the number and locations of QTLs, main effects and gene–gene and gene–environment interactions. R/qtlbim provides tools to monitor mixing behaviour and convergence of the simulated Markov chain, and provides extensive informative graphical and numerical summaries of the MCMC output to infer and interpret the genetic architecture of complex traits. Code implementing Bayes error calculations will be included in R/qtlbim, which facilitates the general usage of Bayesian methodologies for genome-wide interacting QTL analysis.
(x) Simulation study
Data simulation and Bayesian QTL mapping were performed using the R package qtlbim (Yandell et al., Reference Yandell, Mehta, Banerjee, Shriner, Venkataraman, Moon, Neely, Wu, von Smith and Yi2007). The phenotypic trait was assumed to be normally distributed. An F2 design was simulated. The simulated genome consisted of four autosomal chromosomes with a total length of 400 cM. Using the approximation for dense maps (Lander & Botstein, Reference Lander and Botstein1989; Lander & Kruglyak, Reference Lander and Kruglyak1995) presented in eqn (6) for α=0·05, these choices yielded 130 effective tests, for which the comparable LOD (likelihood of odds) threshold, equal to χ2(t α)2ln10, was 3·41. Under the global null hypothesis, the average test spanned ~3·1 cM. Pseudomarkers were inserted at 1 cM intervals using the Kosambi mapping function and genotype probabilities were estimated using the multipoint method. The experimental conditions are presented in Table 3. For every set of conditions, 1000 independent replicate data sets were generated and analysed.
a The notation for marker maps indicates the number of markers on each chromosome. For Experiment 4, the 50 markers are evenly distributed across chromosome 1. For Experiment 6, 40 markers are evenly distributed between 32·5 and 57·5 cM and the other 10 markers are evenly distributed across the rest of the chromosome.
b The notation for main effects is (chromosome, location, additive effect and dominance effect).
c The notation for gene–gene interactions is (chromosome 1, location 1, chromosome 2, location 2, additive–additive effect, additive–dominance effect, dominance–additive effect and dominance–dominance effect).
d The notation for gene–environment interactions is (chromosome, location, additive effect and dominance effect).
e In Experiment 10, the prior number of QTLs was misspecified as five.
f In Experiment 21, the estimated number of tests is given separately for main effects, gene–gene interactions and gene–environment interactions.
The prior number of QTLs in the MCMC analysis was the true value, except for Experiment 10 in which the prior number was misspecified as five (greater than the number of chromosomes). For a correctly constructed MCMC algorithm, the Markov chain will converge to a unique stationary distribution that is the target posterior distribution, regardless of the initialization values, provided that a sufficiently long chain is run (Geyer, Reference Geyer1992). The main effect of misspecified priors is to increase the burn-in period before the Markov chain converges to the stationary distribution. The MCMC algorithm was run for 400 000 iterations, with a thinning value of 20 and no burn-in. The cost in the Bayes error function was 0·5.
3. Results
Seven sets of simulations were performed. The first set of simulations was designed to investigate stability to chromosomal lengths and the marker map (Table 4). In Experiment 2 compared with Experiment 1, the lengths of the chromosomes differed but the number of markers on each chromosome was the same. This had the effect of increasing the intermarker distance on the chromosome with the true QTL, so that the estimation of the QTL location was less precise. In Experiment 3 compared with Experiment 1, the lengths of the chromosomes differed and the number of markers differed such that the average intermarker distance was the same. In Experiment 4 compared with Experiment 1, chromosome 1 (on which was one QTL) was saturated with markers. In Experiment 5 compared with Experiment 1, marker density was increased evenly across all chromosomes. In Experiment 6 compared with Experiment 1, only the interval of chromosome 1 surrounding the true QTL was saturated. The total number of markers in Experiments 4–6 was the same. The significance threshold, the pFDR, and the pFNR were not sensitive to chromosome lengths or marker placement.
The second set of simulations was designed to investigate mapping main effects as a function of QTL location, effect size and sample size (Table 5). When a QTL was located close to the edge of a chromosome (Experiment 7), the interval for the QTL location was truncated, leading to smaller widths and posterior probabilities. The estimation of the true QTL location was less accurate, with the bias being away from the edge of the chromosome. Also, it was more difficult to differentiate false versus true positive QTL, as can be seen by larger pFDR estimates and smaller positive predictive values. There were more false positive QTL, although the average posterior probability for a false positive QTL was smaller. With a larger effect size (Experiment 8), the significance threshold was markedly increased, implying a more stringent test. The pFDR was much smaller and the positive predictive value was much larger for a QTL with a larger effect. Sensitivity was reduced but specificity improved, consistent with increased stringency. Coverage and the true positive posterior probability both increased and the true positive width decreased with larger effects. The number of false positive QTLs decreased but the false positive posterior probability and the false positive width both increased. With a larger sample size (Experiment 9), similar to increasing the effect size, the significance threshold was increased, implying a more stringent test. The pFDR was much smaller and the positive predictive value was much larger for a larger sample size. Sensitivity was reduced but specificity increased, consistent with a more stringent test. Coverage and the true positive posterior probability both increased and the true positive width decreased with larger effects. The number of false positive QTLs decreased but the false positive posterior probability and the false positive width both increased. A misspecified prior (Experiment 10) had no effect.
The third set of simulations was designed to investigate mapping multiple QTLs (Table 6). Increases in the amount of true positive signal due to multiple true QTLs had effects as expected on all of the operating characteristics (Experiments 11 and 12). The pFDR and the number of false positive intervals were smaller with multiple QTLs. Power increased as the number of QTLs increased, or equivalently, as the proportion of true alternative hypotheses increased.
The fourth set of simulations was designed to investigate mapping gene–gene interactions (Table 7). Not surprisingly, the pFDR was much larger for gene–gene interactions than for main effects (Experiment 13). The vast majority of claims of significance were in fact false positives, but with vanishingly small posterior probabilities. On average, it was ~100 times easier to distinguish false positives from true positives for gene–gene interactions than for main effects. Increasing the effect size (Experiment 14), the number of gene–gene interactions (Experiment 15) or the sample size (Experiment 16) had the same consequences as for main effects. As expected, large sample sizes were critical for obtaining high levels of predictive values and posterior probabilities for true hypotheses (Experiment 16).
The fifth set of simulations was designed to investigate mapping gene–environment interactions (Table 8). Under the specific conditions tested, gene–environment interactions were easier to map than main effects (Experiments 17–20). Otherwise, all of the trends observed for main effects and gene–gene interactions also held for gene–environment interactions. The sixth set of simulations was designed to investigate simultaneous mapping of main effects, gene–gene interactions and gene–environment interactions (Table 9). Error estimates were slightly larger in the presence of all three types of genetics effects, but there was no impediment to this type of joint analysis (Experiment 21).
The seventh set of simulations was designed to investigate the effect of the estimated effective number of tests (Table 10). This was accomplished by solving eqn (6) for α=0·05 (Experiment 1), α=0·01 (Experiment 22) and α=0·001 (Experiment 23). As the estimated effective number of tests increased, the estimated values of the various operating characteristics converged to their respective asymptotic limits.
4. Discussion
For Mendelian traits, statistical hypothesis testing during QTL mapping is complicated by the fact that genome-wide testing involves multiple comparisons. For complex traits with multiple QTLs, a second problem is induced by the fact that some unknown proportion of tests, but usually many more than just one test, are expected to be rejected. Under the latter case, traditional corrections for multiple comparisons, such as Bonferroni corrections, are generally conservative with respect to both multiplicity and correlation. Recently developed methods, such as FDR-controlling methods, retain more power, ideally increase in power with an increasing number of QTLs, and account for dependence among tests. In this study, optimizing predictive value led to more conservative control (1−specificity<0·05) and sub-optimal coverage (coverage<0·95) than expected by a traditional experiment-wide significance level of 0·05.
Insertion of pseudomarkers at even intervals yields a nearly balanced marker map, such that the false nulls and true nulls are proportionally represented. If the map of markers and pseudomarkers is sufficiently dense, then tests of markers or intervals between markers are correlated through linkage. Thus, it is crucial to account for linkage by estimating the effective number of tests. The ‘width’ of a test defines a proper unit of testing that accounts for all markers and pseudomarkers in tight linkage. This test definition prevents the manipulation of the FDR by marker placement (Fernando et al., Reference Fernando, Nettleton, Southey, Dekkers, Rothschild and Soller2004; Chen & Storey, Reference Chen and Storey2006). Herein, I describe a new implementation of a Bayesian FDR-controlling method. The proposed method extends previous ones by controlling the overall error rate by simultaneously controlling the FDR and the FNR. The proposed method accounts for the effective number of tests in a genome-wide linkage scan, although the relevant parameter α is merely a nuisance parameter and could be eliminated by integration. Critically, the proposed method accounts for the correlation among markers within a linkage region. The proposed method is applicable for continuous, binary and ordinal traits and for multiple interacting QTL mapping, i.e. with an arbitrary number of QTLs with arbitrary genetic effects.
For multiple QTLs mapping, it is unlikely that any single model sampled by the MCMC algorithm would correctly contain all QTLs. This problem is addressable through the use of Bayesian model averaging to estimate parameters for each QTL marginally. Also, the overall Bayes error rate decreased with multiple QTLs. With multiple QTLs, a greater proportion of alternative hypotheses are true, and pFDR-controlling methods tend to be more powerful in these situations (Storey, Reference Storey2002). Since different types of genetic effects can be estimated with differing levels of accuracy and precision, it seems reasonable to optimize mapping for main effects, gene–gene interactions and gene–environment interactions separately.
The most commonly used summary statistic in Bayesian hypothesis testing is the Bayes factor, defined as the ratio of the posterior odds to the prior odds (Jeffreys, Reference Jeffreys1961; Kass & Raftery, Reference Kass and Raftery1995). Equivalently, the Bayes factor is the ratio of the marginal likelihoods of the data under the two hypotheses. The Bayes factor can be seen as the Bayesian counterpart to the likelihood ratio statistic, which is based on maximization (rather than integration) in frequentist analysis. Under certain regularity conditions, the Bayesian information criterion (Schwarz, Reference Schwarz1978) is approximately equal to −2ln(Bayes factor) (Kass & Wasserman, Reference Kass and Wasserman1995). Thus, Bayes factors can be useful not only in a manner similar to LOD scores but also in the model selection framework.
On the other hand, there are disadvantages with using Bayes factors. First, the interpretation of Bayes factors, just as with the interpretation of likelihood ratio statistics in the frequentist framework, depends on a subjective significance threshold (Jeffreys, Reference Jeffreys1961; Kass & Raftery, Reference Kass and Raftery1995). Increasing the stringency of the threshold will increase specificity at the cost of reducing sensitivity. Minimizing the expected loss function provides an objective, data-driven solution to this problem. Second, Bayes factors depend critically upon the prior distributions. Whereas this dependency has not been particularly problematic for single trait analysis, it appears to be problematic for testing pleiotropy in multiple trait analysis (Banerjee et al., Reference Banerjee, Yandell and Yi2008). The prior probability that a locus affects none of the traits, i.e., the intersection of all of the null hypotheses, tends to become vanishingly small as the number of traits increases, thereby potentially yielding enormously large Bayes factors that are unstable and difficult to calibrate. In this context, analysing the posterior probability profile appears to be more tractable than analysing Bayes factors (Banerjee et al., Reference Banerjee, Yandell and Yi2008).
False positive error rates in Bayesian analyses can be affected by the specification of prior distributions. The posterior distribution of the number of QTLs is influenced by the prior distribution of the number of QTLs, whereas the Bayes factor tends to be relatively insensitive to the prior distribution of the number of QTLs (Yi & Shriner, Reference Yi and Shriner2008). In contrast, the prior distribution of effect sizes tends to more strongly affect false positive error rates. The point-normal prior induces shrinkage of effect sizes towards zero, which has the advantages of reducing model space by removing most loci from the model and reducing the false positive error rate, independent of whether posterior inference is made through Bayes factors or the Bayes error.
The merit of optimizing prediction rather than classification remains an open question. In this study, optimization of prediction tended to be conservative, as evident by sub-optimal coverage and large specificities. Maximizing the positive predictive value sacrifices detection of smaller effects in favour of higher positive predictive values for larger effects. Thus, optimizing prediction may have particular benefit in genome-wide analysis by improving replicability of findings.
This research was supported by National Institutes of Health grant DK062710 and the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health. I thank the Enabling Technology Laboratory in the Department of Mechanical Engineering, University of Alabama at Birmingham for assistance with high performance computing. I thank Howard Wiener and Grier Page for their suggestions, Śaunak Sen for critical discussion and the anonymous reviewers for their helpful comments.