Linear and generalized linear models for the detection of QTL effects on within-subject variability

Dörte Wittenburg; Volker Guiard; Friedrich Liese; Norbert Reinsch

doi:10.1017/S0016672307008968

Linear and generalized linear models for the detection of QTL effects on within-subject variability

Published online by Cambridge University Press: 21 January 2008

Friedrich Liese and

Dörte Wittenburg: Affiliation:
Forschungsinstitut für die Biologie landwirtschaftlicher Nutztiere, Wilhelm-Stahl-Allee 2, 18196 Dummerstorf, Germany
Volker Guiard: Affiliation:
Forschungsinstitut für die Biologie landwirtschaftlicher Nutztiere, Wilhelm-Stahl-Allee 2, 18196 Dummerstorf, Germany
Friedrich Liese: Affiliation:
Universität Rostock, Institut für Mathematik, Universitätsplatz 1, 18051 Rostock, Germany
Norbert Reinsch*: Affiliation:
Forschungsinstitut für die Biologie landwirtschaftlicher Nutztiere, Wilhelm-Stahl-Allee 2, 18196 Dummerstorf, Germany
*: *Corresponding author. e-mail: [email protected]

Article contents

Summary
Introduction
Methods
Simulation studies
Discussion
References

Rights & Permissions

Summary

Quantitative trait loci (QTLs) may affect not only the mean of a trait but also its variability. A special aspect is the variability between multiple measured traits of genotyped animals, such as the within-litter variance of piglet birth weights. The sample variance of repeated measurements is assigned as an observation for every genotyped individual. It is shown that the conditional distribution of the non-normally distributed trait can be approximated by a gamma distribution. To detect QTL effects in the daughter design, a generalized linear model with the identity link function is applied. Suitable test statistics are constructed to test the null hypothesis H0: No QTL with effect on the within-litter variance is segregating versus HA: There is a QTL with effect on the variability of birth weight within litter. Furthermore, estimates of the QTL effect and the QTL position are introduced and discussed. The efficiency of the presented tests is compared with a test based on weighted regression. The error probability of the first type as well as the power of QTL detection are discussed and compared for the different tests.

Type: Research Article
Information: Genetics Research , Volume 89 , Issue 4 , August 2007 , pp. 245 - 257

DOI: https://doi.org/10.1017/S0016672307008968 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2007

1. Introduction

Quantitative genetic analyses of body weight data in snails (Ros et al., Reference Ros, Sorensen, Waagepetersen, Dupont-Nivet, SanCristobal, Bonnet and Mallard2004) suggest genetic differences in variability. Therefore, QTLs (quantitative trait loci) may affect not only the mean of a certain character but also its variability. The analysis of QTL effects on between-subject variability of a normally distributed trait was investigated by Weller & Wyler (Reference Weller and Wyler1992). They mentioned uniformity of flowering time of plants as an example of potential economic importance, in particular when crops are harvested mechanically. Some phenotypes are repeated several times by the same individual, such as the size and weight of tomatoes from the same panicle of a tomato plant. Uniformity of such repeated phenotypes may also be genetically controlled and affected by the individual's genotype.

In multiparous species, birth weight of newborns from the same litter may be regarded as a special case of a repeated phenotype of the mother. The difference compared with the tomato example is that the phenotype of the newborns is not only under maternal control but is also affected by the father's genetic contribution. Högberg & Rydhmer (Reference Högberg and Rydhmer2000) and Damgaard et al. (Reference Damgaard, Rydhmer, Løvendahl and Grandinson2003) considered the within-litter standard deviation of piglet birth weight and attributed it to the dam of each litter as a maternal trait. A low within-litter uniformity was considered as an effect which was unfavourable for sow productivity. Heritability estimates for the character were 10% (Högberg & Rydhmer, Reference Högberg and Rydhmer2000) and 8% (Damgaard et al., Reference Damgaard, Rydhmer, Løvendahl and Grandinson2003). The same trait has also been studied in rabbits (Bolet et al., Reference Bolet, Garreau, Joly, Theau-Clement, Hurtaud and Bodin2005).

In this article it assumed that the QTL affects the within-litter variability of a mother's progeny, i.e. in contrast to Weller & Wyler (Reference Weller and Wyler1992) the focus is on within-subject variability. This offers the opportunity to construct a test for H ₀: No QTL with effect on the within-litter variance is segregating versus H _A: There is a QTL with effect on the variability of birth weight within litter. A daughter design is considered, where genotyped females are paternal half-sibs. The sample variances of birth weights within litter are the traits to which our model is fitted. First, the QTL effect on the within-litter variance is described. Then it is shown that a generalized linear model (GLM) can be applied for QTL mapping. This GLM is contrasted with a weighted regression approach in terms of power of QTL detection by numerical simulation. Inclusion of sex effects, different experimental designs and further fields of application are part of the discussion.

2. Methods

(i) QTL effect on the within-litter variance

It is assumed that a population of pigs has two alleles at the QTL denoted by Q and q. We consider a fixed number N of sires in our study, which are drawn by chance from the population. Every sire is mated with n unrelated dams. We pick out one daughter per mating and consider her offspring's birth weight as a multiple measurement. We assume that piglet birth weights are independently and identically distributed within one litter. The birth weight consists of a fixed litter mean, the normally distributed mendelian sampling effect N(0, ½σ_polygene²) and the additive QTL effect, which is dependent on the piglet's genotype, with variance σ_QTL² and the normally distributed random deviation N(0, σ_e²).

The sample variance of weights at birth within one litter, that is the secondary observation, is taken as a trait for every daughter amounting to Nn observations. Daughters having inherited the QTL allele Q from the presumed heterozygous sire feature uniformity of birth weights. Daughters with a paternal q allele show an increased variability of birth weight. In this case, the residual deviation of piglet birth weight is multiplicatively inflated by the factor c _*∊(0, ∞). Thus, from the breeder's perspective, the positive effect of the QTL (the lower within-litter variance) is inherited with the QTL allele Q. A detailed description of the model for piglet birth weight and a further outline on the distribution of the traits are given in Appendix A.

The within-litter variance, i.e. the sample variance S _{i, j}² of birth weights within one litter, depends on the paternal QTL allele of the daughter i∊{1,…,N}, j∊{1, …,n}. The indicator function 1_{{Q},i, j} takes the value 1 if the daughter i, j has inherited the allele Q and 0 otherwise. Later, in Sections 2(ii) and 2(iv), the probability Pr(1_{{Q},i, j}=1) is determined conditional on the observed flanking marker alleles. The conditional expectation of S _{i, j}² given the inherited paternal QTL allele is

(1)

$\eqalign{\tab {\bb E} \lpar S_{i\comma j}^{\setnum{2}} \vert {\bf 1}_{\lcub Q\rcub \comma i\comma j} \equals 1\rpar \equals {\textstyle{1 \over 2}}\sigma_{polygene}^{\setnum{2}} \plus \sigma _{QTL}^{\setnum{2}} \plus \sigma _{e}^{\setnum{2}} \comma \cr \tab {\bb E} \lpar S_{i\comma j}^{\setnum{2}} \vert {\bf 1}_{\lcub Q\rcub \comma i\comma j} \equals 0\rpar \equals {\textstyle{1 \over 2}}\sigma _{polygene}^{\setnum{2}} \plus \sigma _{QTL}^{\setnum{2}} \plus c{\vskip6\ast }\hskip-4.2^{\setnum{2}} \hskip1\sigma _{e}^{\setnum{2}}. \cr}$

The value τ²≔½σ_polygene²+σ_e² summarizes the variance of the normally distributed effects of piglet birth weight under the condition that the sow has inherited the QTL allele Q. Similarly, τ_*²≔ σ_polygene²+(c _*σ_e)² includes the modified residual variance component. Set

(2)

$c^{\setnum{2}} \colon \equals {{{\bb E} \lpar S_{i\comma j}^{\setnum{2}} \vert {\bf 1}_{\lcub Q\rcub \comma i\comma j} \equals 0\rpar } \over {{\bb E} \lpar S_{i\comma j}^{\setnum{2}} \vert {\bf 1}_{\lcub Q\rcub \comma i\comma j} \equals 1\rpar }} \equals {{\tau {\asts\hskip-4 }^{\setnum{2}} \plus \sigma _{QTL}^{\setnum{2}} } \over {\tau ^{\setnum{2}} \plus \sigma _{QTL}^{\setnum{2}} }}.$

The parameter c ² is the ratio of the within-litter variance if the daughter i, j has inherited the QTL allele q and the within-litter variance if the daughter has inherited the allele Q. If the QTL effect on the within-litter variance actually exists, then the sample variance depends on the inherited paternal QTL allele and the ratio c ² is different from 1. Otherwise c ² is equal to 1.

(ii) Generalized linear model

The sires may have the marker genotype of kind m _l, 1, m _l, 2, where l∊{0,1, …, κ} denotes the marker position on the chromosome. The sire's two marker alleles are denoted by m _l, 1 on his paternal chromosome and m _l, 2 on his maternal chromosome for every marker position. It is not possible to determine which sire is heterozygous or homozygous at the QTL a priori. After the sires are genotyped, we suppose that all daughters are fully informative. Therefore, we need only to consider the paternal allele of daughters. The recombination rates are calculated by Haldane's mapping function. We consider intervals flanked by markers M _l and M _l+1 with realizations m _{l, r}m _{l+1, s}, where the subscripts r, s∊{1, 2} specify the sire's flanking marker alleles transmitted to the daughter. The transmission probability of the QTL allele Q at position d∊{0,1, …, δ} is a function of the flanking markers M _lM _l+1 and the paternal QTL allele. Let T _{i, j} denote the random variable, which is realized by the respective transmission probability t _{i, j, d} depending on the observed flanking marker alleles per daughter i, j at position d.

First, one presumed QTL position d∊{0,1, …, δ} is investigated. The observed value per daughter i, j is the realized sample variance s _{i, j}² of the piglet birth weights within one litter, i=1, …, N, j=1, …, n and s ²=(s _1, 1², s _{1, 2}², …, s _{N, n}²)^T. As a result of Appendix A, the distribution of the sample variance S _{i, j}² is approximated by a gamma distribution. Note that a gamma distributed random variable has the expectation μ and variance $\mu ^{\setnum{2}} {\textstyle{\phi \over w}}$ with dispersion parameter φ and weight w.

Our aim is to fit a GLM (McCullagh & Nelder, Reference McCullagh and Nelder1989) to the sample variances. To distend we introduce a multiplicative model. If the sire's genotype is Qq, then for r, s∊{1, 2} the conditional expectation of S _{i, j}² given the observed marker alleles {M _l=m _{l, r}, M _l+1=m _{l+1, s}} at position d is

(3)

$\eqalign{\mu _{i\comma j\comma d} \tab \equals {\bb E} \lpar S_{i\comma j}^{\setnum{2}} \vert M_{l} \equals {m_{l \comma r}} \comma M_{l \plus \setnum{1}} \equals m_{l \plus \setnum{1}\comma s} {\kern1pt} \semi {\kern1pt} d\rpar \cr \tab \equals t_{i\comma j\comma d} \lpar \sigma _{QTL}^{\setnum{2}} \plus \tau ^{\setnum{2}} \rpar \plus \lpar 1 \minus t_{i\comma j\comma d} \rpar \lpar \sigma _{QTL}^{\setnum{2}} \plus \tau {\asts\hskip-4 }^{\setnum{2}} \rpar \cr \tab \equals \sigma _{QTL}^{\setnum{2}} \plus \tau {\asts\hskip-4 }^{\setnum{2}} \plus t_{i\comma j\comma d} \lpar \tau ^{\setnum{2}} \minus \tau {\asts\hskip-4 }^{\setnum{2}} \rpar \cr \tab {\ecolon} u_{i} \plus b_{i} t_{i\comma j\comma d}.$

The mean value u _i per sire i∊{1, …, N} is

(4)

$u_{i} \equals \sigma_{QTL}^{\setnum{2}} \plus \tau{\asts\hskip-4 }^{\setnum{2}}$

and the parameter b _i describes the relation between the observed trait s _{i, j}² per daughter and the inherited paternal QTL allele, i.e.

(5)

$\eqalign{b_{i} \tab \equals \lpar \tau ^{\setnum{2}} \plus \sigma_{QTL}^{\setnum{2}} \rpar \minus \lpar \tau{\asts\hskip-4 }^{\setnum{2}} \plus \sigma_{QTL}^{\setnum{2}} \rpar \cr \tab \equals \lpar 1 \minus c^{\setnum{2}} \rpar \;\lpar \tau ^{\setnum{2}} \plus \sigma _{QTL}^{\setnum{2}} \rpar. \cr}$

In view of (3) the sample variance is described by the model

(6)

$S_{i\comma j}^{\setnum{2}} \equals \lcub u_{i} \plus b_{i} T_{i\comma j} \rcub {\cdot} \varepsilon _{i\comma j}.$

The ε_{i, j} are independently gamma distributed random variables with expectation 1. The weights are defined by $w_{i\comma j} \equals {\textstyle{{n_{i\comma j} \minus 1} \over 2}}$ , where n _{i, j} denotes the litter size of daughter i, j. The identity link function is used to obtain the linear predictor η_{i, j, d}=μ_{i, j, d}. The parameter vector β consists of the regression coefficients

(7)

$\beta \equals \lpar u_{\setnum{1}} \comma \ldots \comma u_{N} \comma b_{\setnum{1}} \comma \ldots \comma b_{N} \rpar ^{T}.$

The application of GLM theory leads to estimates of the expectations μ_d=(μ_{1, 1, d}, μ_{1, 2, d}, …, μ_{N, n, d})^T as well as the vector β in (7) at position d∊{0,1, …, δ} Consequently, it is possible to construct an appropriate test statistic to check the local null hypothesis H _{0, d}: There exists no QTL at position d affecting the within-litter variance, which is equivalent to

(8)

$\eqalign{\tab H_{\setnum{0}\comma d} \colon \quad b_{\setnum{1}} \equals \vskip-2{\ldots}\vskip2 \equals b_{N} \equals 0 \quad {\rm or} \ {\rm equivalently} \quad \mu_{d} {\rm \equals }\mu ^{\rm \setnum{0}} \cr {\rm vs} \ \tab H_{A\comma d} \colon \quad b_{k} \ne b_{l} \quad {\rm for} \ {\rm some} \ k \ne l .\cr}\hskip-8$

With (3) the log-likelihood function ℓ of the modelled gamma distributed random vector S ²=(S _1, 1², …,S _{N, n}²)^T can be expressed in terms of μ_d at position d. It holds

(9)

$\eqalign{\tab \ell \lpar s^{\setnum{2}} \comma \mu _{d} \comma \phi_{d} \rpar \equals \cr\tab {\quad\mathop\sum\limits_{i \equals \setnum{1}\comma j \equals \setnum{1}}^{N\comma n} {\left\{ {{{w_{i\comma j} } \over {\phi _{d} }}\left[{\hskip-{ \minus {{s_{i\comma j}^{\setnum{2}} } \over {u_{i} \plus b_{i} t_{i\comma j\comma d} }} }\minus {\rm ln\ \lpar }u_{i} \plus b_{i} t_{i\comma j\comma d} {\rm \rpar }}\right]}\right.}}\cr\tab \quad{ \plus \zeta \lpar s_{i\comma j}^{\setnum{2}}\comma \phi _{d} \rpar\hskip-14 \left.\phantom{{w_{i\comma j} } \over {\phi _{d} }}\right\} }{\ecolon} l\lpar s^{\setnum{2}} \comma \beta \comma \phi _{d} \rpar. \cr}$

The ζ(s _{i, j}², φ_d) summarizes those components where μ_{i, j, d} does not appear and β is the vector (7). The estimate $\widehat\beta _{N\comma n\comma d} \equals \lpar \widehat u_{n\comma \setnum{1}\comma d} \comma \ldots \comma \widehat u_{n\comma N\comma d} \comma \widehat b_{n\comma \setnum{1}\comma d} \comma \ldots \comma \widehat b_{n\comma N\comma d} \rpar ^{T}$ may be obtained by iterative procedures (McCullagh & Nelder, Reference McCullagh and Nelder1989) as implemented in the ‘glm’ function of the R program (R Development Core Team, 2005). Using (4) and (5) the parameter c ⁻¹ is estimated for each sire i∊{1, …,N} at the detected QTL position $\widehat d$ by

$\widehat{c_{n\comma i}^{ \minus \setnum{1}} } \equals \sqrt {{{\widehat b_{n\comma i\comma \widehats {d}} \over {\widehat u_{n\comma i\comma \widehats d} }} \plus 1}.$

For asymptotic investigations of the estimator $\widehat\beta _{N\comma n\comma d}$ , some special matrices are needed. The design matrix _d, which contains the transmission probabilities at position d, is

(10)

${\cal X}_{d} \equals \left(\matrix{ 1 \tab 0 \tab \ldots \tab {t_{\setnum{1}\comma \setnum{1}\comma d} } \tab 0 \tab \ldots \cr 1 \tab 0 \tab \cdots \tab {t_{\setnum{1}\comma \setnum{2}\comma d} } \tab 0 \tab \ldots \cr \vdots \tab \vdots \tab \cdots \tab \vdots \tab \vdots \tab \ldots \cr 1 \tab 0 \tab \cdots \tab {t_{\setnum{1}\comma n\comma d} } \tab 0 \tab \ldots \cr 0 \tab 1 \tab \cdots \tab 0 \tab {t_{\setnum{2}\comma \setnum{1}\comma d} } \tab \ldots \cr \vdots \tab \vdots \tab \cdots \tab \vdots \tab \vdots \tab \ldots \cr 0 \tab 1 \tab \cdots \tab 0 \tab {t_{\setnum{2}\comma n\comma d} } \tab \ldots \cr \vdots \tab \vdots \tab \cdots \tab \vdots \tab \vdots \tab \ldots \cr}} \right).$

Let be the diagonal weight matrix with elements $w_{i\comma j} \equals {\textstyle{{n_{i\comma j} \minus 1} \over 2}}$ . Then it follows from (9) that the Fisher information matrix (β, d) of the conditional distribution at position d is

$\eqalign{{\cal I}\lpar \beta \comma d\rpar \tab \equals \minus {\bb E} \lsqb \nabla \nabla ^{T} l\lpar S^{\setnum{2}} \comma \beta \comma \phi _{d} \rpar \vert M_{l} \equals m_{l\comma r} \comma M_{l \plus \setnum{1}} \cr \tab \hskip36\equals m_{l \plus \setnum{1}\comma s} {\kern1pt}\semi {\kern1pt}d\rsqb \equals {1 \over {\phi_{d} }}{\cal X}_{d}^{T} {\cal W} {\cal X}_{d}.}$

Moreover, let _d∊∝^2N×2N be the root of ⁻¹(β, d) defined by _d^T_d=⁻¹(β,d). Under some conditions (Fahrmeir & Kaufmann, Reference Fahrmeir and Kaufmann1985), which can be shown to be satisfied (see Supplementary Appendixes), the ML estimator $\widehat\beta _{N\comma n\comma d}$ is asymptotically normal, i.e.

(11)

${\cal G}_{d}^{ \minus T} \lpar \widehat\beta_{N\comma n\comma d} \minus \beta \rpar \to _{\cal D}\hskip-2 N\lpar 0\comma {\cal I}_{\setnum{2}N} \rpar \quad {\rm as } \ n \to \infty \comma$

where → denotes the convergence in distribution.

Under the null hypothesis (8) the model reduces to S _{i, j}²=u _i⋅ε_{i, j}. It can be shown that the dispersion parameter φ_d⁰=φ⁰ is approximately 1 under the null hypothesis for every position d (see Supplementary Appendixes). Thus, the likelihood function of S ² at φ⁰=1 is constant for every position d under H _{0, d}.

(iii) Test statistics in the GLM

Four different types of tests statistics differing, among others, in their treatment of the dispersion parameter are described in detail in the following. Three of them are later compared via simulation (Section 3).

The estimate of the expectation vector μ_d is defined by $\widehat\mu _{N\comma n\comma d} \equals {\cal X}_{d} \widehat\beta _{N\comma n\comma d}$ . With the log-likelihood function $\ell \lpar s^{\setnum{2}} \comma \widehat\mu _{N\comma n\comma d} \comma \phi _{d} \rpar$ in (9) the scaled deviance D* for a fixed value of the dispersion parameter φ_d is defined by (McCullagh & Nelder, Reference McCullagh and Nelder1989)

$D\ast \lpar s^{\setnum{2}} \comma \widehat\mu _{N\comma n\comma d} \comma \phi _{d} \rpar {\colone} 2\lsqb \ell \lpar s^{\setnum{2}} \comma s^{\setnum{2}} \comma \phi _{d} \rpar \minus \ell \lpar s^{\setnum{2}} \comma \widehat\mu _{N\comma n\comma d} \comma \phi _{d} \rpar \rsqb.$

The deviance D is characterized by $D\lpar s^{\setnum{2}} \comma \widehat\mu _{N\comma n\comma d} \rpar {\colone} \phi _{d} D\ast \lpar s^{\setnum{2}} \comma \widehat\mu_{N\comma n\comma d} \comma \phi _{d} \rpar$ . Under the null hypothesis H _{0, d} and for a fixed value φ_d the likelihood ratio is asymptotically χ²-distributed with N degrees of freedom (Fahrmeir & Tutz, Reference Fahrmeir and Tutz1994)

(12)

$2\lsqb l \lpar S^{\setnum{2}} \comma \widehat\beta _{N\comma n\comma d} \comma \phi _{d} \rpar \minus l \lpar S^{\setnum{2}} \comma \widehat\beta _{N\comma n}^{\setnum{0}} \comma \phi _{d} \rpar \rsqb \buildrel {H_{\setnum{0}\comma d} } \over \longrightarrow {_{\cal D}} \chi_{N}^{\setnum{2}} \quad {\rm as\ }n \to \infty \comma$

where $\widehat\beta _{N\comma n}^{\setnum{0}}$ is the ML estimate under the null hypothesis. With the log-likelihood function ℓ(s ²,μ_d, φ_d)=l(s ², β, φ_d) in (9), the statement (12) is equivalent to

(13)

$D \ast \lpar S^{\setnum{2}} \comma \widehat\mu _{N\comma n}^{\setnum{0}} \comma \phi_{d} \rpar \minus D \ast \lpar S^{\setnum{2}} \comma \widehat\mu _{N\comma n\comma d} {\kern1pt} \comma {\kern1pt} \phi_{d} \rpar \buildrel {H_{\setnum{0}\comma d} } \over \longrightarrow {_{\cal D}} {\chi}_{N}^{\setnum{2}} \quad {\rm as} \ n \to \infty.$

If φ_d=1 is satisfied, then

(14)

${D\lpar S^{\setnum{2}} \comma \widehat\mu _{N\comma n}^{\setnum{0}} \rpar \minus D\lpar S^{\setnum{2}} \comma \widehat\mu _{N\comma n\comma d} \rpar \buildrel {H_{\setnum{0}\comma d} } \over \longrightarrow {_{\cal D}} {\cal X}_{N}^{\setnum{2}} \quad {\rm as }\ n \to \infty.}$

The generalized Pearson estimator for the dispersion parameter φ_d at position d∊{0,1, …,δ} is defined by (e.g. Fahrmeir & Tutz, Reference Fahrmeir and Tutz1994)

$\widehat\phi _{N\comma n\comma d} \equals {1 \over {Nn \minus 2N}}\mathop\sum\limits_{i \equals \setnum{1}\comma j \equals \setnum{1}}^{N\comma n} {w_{i \comma j} } \left( {{{s_{i \comma j}^{\setnum{2}} \minus \widehat\mu _{N\comma n\comma i\comma j\comma d} } \over {\widehat\mu _{N\comma n\comma i\comma j\comma d} }}} \right) ^{\hskip-3\setnum{2}}.$

This estimator is consistent and approximately χ²-distributed (Fahrmeir & Tutz, Reference Fahrmeir and Tutz1994). If φ_d is replaced by a consistent estimator in (12), then this statement remains valid. According to Jørgensen (Reference Jørgensen1987) it holds

(15)

${1 \over {\widehat\phi _{N\comma n\comma d} }}\lsqb D\lpar S^{\setnum{2}} \comma \widehat\mu _{N\comma n}^{\setnum{0}} \rpar \minus D\lpar S^{\setnum{2}} \comma \widehat\mu _{N\comma n\comma d} \rpar \rsqb \buildrel {H_{\setnum{0}\comma d} } \over \longrightarrow {_{\cal D}} {\chi}_{N}^{\setnum{2}} \quad {\rm as } \ n \to \infty.$

Similarly, the deviance estimator $\tilde{\phi }_{N\comma n\comma d}$ is usually applied to estimate the dispersion parameter,

$\tilde{\phi }_{N\comma n\comma d} \equals {{D\lpar s^{\setnum{2}} \comma \widehat\mu _{N\comma n\comma d} \rpar } \over {Nn \minus 2N}}.$

Note that $\tilde{\phi }_{N\comma n\comma d}$ is not necessarily consistent. Using this deviance estimator the distribution of the left-hand term in (13) is approximated by the F-distribution with N and Nn−2N degrees of freedom (Jørgensen, Reference Jørgensen1987),

(16)

${{{D\lpar S^{\setnum{2}} \comma \widehat\mu _{N\comma n}^{\setnum{0}} \rpar \minus D\lpar S^{\setnum{2}} \comma \widehat\mu _{N\comma n\comma d} \rpar } \over {D\lpar S^{\setnum{2}} \comma \widehat\mu _{N\comma n\comma d} \rpar }}{{Nn \minus 2N} \over N}\mathop \approx \limits^{H_{{\setnum{0}\comma d}} } {_{\cal D}} F_{N\comma Nn \minus \setnum{2}N}.$

To test the local null hypothesis in (8) H_0,d: μ_d=μ⁰ there are at least four natural test statistics:

$\eqalign{ \tab {\rm according\ to\ \lpar 13\rpar }\cr\tab\quad L_{d}{\hskip-4\vskip-1\ast } {\rm \equals }D{\rm \ast \lpar }s^{\rm \setnum{2}} {\rm \comma }\widehat\mu _{N{\rm \comma }n}^{\rm \setnum{0}} \comma \phi _{d} \rpar \minus D\ast \lpar s^{\setnum{2}} \comma \widehat\mu _{N\comma n\comma d}, \phi _{d} \rpar \comma \cr \tab {\rm according\ to\ \lpar 14\rpar }\quad L_{d} {\rm \equals }D{\rm \lpar }s^{\rm \setnum{2}} {\rm \comma }\widehat\mu _{N\comma n}^{\rm \setnum{0}} {\rm \rpar - }D{\rm \lpar }s^{\rm \setnum{2}} {\rm \comma }\widehat\mu _{N\comma n\comma d} {\rm \rpar \comma } \cr \tab {\rm according\ to\ \lpar 15\rpar }\cr\tab\quad \widehat L_{d} \equals {1 \over {\widehat\phi _{N\comma n\comma d} }}\lsqb D\lpar s^{\setnum{2}} \comma \widehat\mu _{N\comma n}^{\setnum{0}} \rpar \minus D\lpar s^{\setnum{2}} \comma \widehat\mu _{N\comma n\comma d} \rpar \rsqb \comma \cr \tab {\rm according\ to\ \lpar 16\rpar }\cr\tab\quad \tilde{F}_{d} \equals {{D\lpar s^{\setnum{2}} \comma \widehat\mu _{N\comma n}^{\setnum{0}} \rpar \minus D\lpar s^{\setnum{2}} \comma \widehat\mu _{N\comma n\comma d} \rpar } \over {D\lpar s^{\setnum{2}} \comma \widehat\mu _{N\comma n\comma d} \rpar }}{{Nn \minus 2N} \over N}. \cr}$

If φ_d=1 is not fulfilled, then the test based on $\widehat L_{d}$ and $\tilde{F}_{d}$ can be expected to have more QTL detection power. The threshold value under local investigations is given by the 95% quantile of the corresponding distribution in (13), (14), (15) and (16) of the test statistics L _d*, L _d, $\widehat L_{d}$ and $\tilde{F}_{d}$ , respectively.

We now consider the global hypothesis testing problem H ₀: There exists no QTL on the chromosome with effect on the within-litter variance, which is equivalent to

(17)

$\eqalign{\tab H_{\setnum{0}} \colon \quad b_{\setnum{1}} \equals { \vskip-2\ldots} \equals b_{N} \equals 0\;{\rm or\ equivalently}\;\mu_{d} \equals \mu ^{\setnum{0}} \;\cr\tab{\rm for}\ d \equals 0\comma 1 \ldots \comma \delta \cr \tab{{\rm vs}\ \tab H_{A} \colon \quad b_{k} \ne b_{l} \quad {\rm for}\ {\rm some}\ k \ne l. \cr}$

The following test statistics are appropriate:

(18)

$L\ast \equals \mathop {\max }\limits_{d \in \lcub \setnum{0}\comma \setnum{1}\comma \ldots \comma \delta \rcub } L_{d}{\hskip-4\vskip-1\ast } \comma$

(19)

$L \equals \mathop {\max }\limits_{d \in \lcub \setnum{0}\comma \setnum{1}\comma \ldots \comma \delta \rcub } L_{d} \comma$

(20)

$\widehat L \equals \mathop {\max }\limits_{d \in \lcub \setnum{0}\comma \setnum{1}\comma \ldots \comma \delta \rcub } \widehat L_{d} \comma$

(21)

$\Ftilde \equals \mathop {\max }\limits_{d \in \lcub \setnum{0}\comma \setnum{1}\comma \ldots \comma \delta \rcub } \Ftilde_{d}.$

The null hypothesis (17) is rejected for large values of the corresponding test statistic. The theoretical distribution of the presented test statistics is unknown because of marker dependencies. Thus, to find the threshold we use the permutation test approach (Churchill & Doerge, Reference Churchill and Doerge1994). Properties of these tests will be given in Section 3.

Furthermore, the QTL is estimated by maximum likelihood at that position, where the value of the test statistic is maximal, e.g.

(22)

$\widehat d \in\hskip-1 \arg \mathop {\max }\limits_{d \in \lcub \setnum{0}\comma \setnum{1}\comma \ldots \comma \delta \rcub } L_{d}{\hskip-4\vskip-1\ast }.$

(iv) Weighted regression

Using ideas from Haley & Knott (Reference Haley and Knott1992) we introduce a weighted regression model and construct the test statistic for the global null hypothesis in (17). Later, in Section 3, the two approaches will be compared by computer simulations. Ros et al. (Reference Ros, Sorensen, Waagepetersen, Dupont-Nivet, SanCristobal, Bonnet and Mallard2004) recommended the use of a log-transformation on skew distributed traits (Box & Cox, Reference Box and Cox1964). Applying the logarithm of the sample variances the data are approximated by a normal distribution (see Supplementary Appendixes) and a linear model (LM) is constructed. Note that under the log-transformation the multiplicative effect on the within-litter variance becomes additive.

Considering the observed marker alleles m _{l, r}m _{l+1, s} of individual i, j and the parameter c ² in (2) the conditional expectation given the flanking marker alleles is

(23)

${\bb E}\lpar {\rm ln\ }S_{i \comma j}^{\setnum{2}} {\rm \vert }M_{l} \equals m_{l\comma r\comma } {\rm }M_{l \plus \setnum{1}} \equals m_{l \plus \setnum{1}\comma s\comma } {\kern1pt} \semi {\kern1pt} d{\rm \rpar } \approx {\rm ln\lpar }\tau {\asts\hskip-4 }^{\setnum{2}} {\rm \plus }\sigma _{QTL}^{\rm \setnum{2}} \rpar \minus t_{i\comma j\comma d} {\rm ln\, }c^{\rm \setnum{2}} .$

Thus, the LM at a fixed position d∊{0,1, …,δ} is defined by

(24)

$\ln S_{i\comma j}^{\setnum{2}} \equals u_{i} \plus b_{i} T_{i \comma j} \plus \varepsilon _{i \comma j}.$

Here the ε_{i, j} are normally distributed random variables with expectation null and T _{i, j} are the transmission probabilities as explained in Section 2(ii). The parameter u _i is the mean value per family and b _i describes the linear connection between the observations ln s _{i, j}² and the inherited paternal QTL allele expressed by the individual transmission probabilities. A standard (weighted) regression analysis has been carried out in order to estimate the parameter vector β similar to (7) for every position d on the chromosome. For weighted regression we refer to Seber (Reference Seber1977). To check the local null hypothesis H _{0, d} in (8), a test statistic F _d is constructed, which is a function of the residual sum of squares of the full and reduced model. This statistic is approximately F-distributed under H _{0, d} with N and Nn – 2N degrees of freedom (Seber, Reference Seber1977). To test the global null hypothesis H ₀ in (17) the permutation test is used again to determine the threshold value. The suitable test statistic is

(25)

$F \equals \mathop {\max }\limits_{{ d} \in {\rm \lcub \setnum{0}\comma \setnum{1}\comma } \ldots {\rm \comma }\delta {\rm \rcub }} F_{d}.$

Under the assumption that the sire i∊{1, …,N} has the genotype Qq, it follows from (23) that b _i=−lnc ². Thus, the parameter c ⁻¹ is estimated at the detected QTL position by

$\widehat{c_{n\comma i}^{ \minus \setnum{1}} } \equals \sqrt \exp \, \lcub \widehat{b}_{n\comma i\comma \widehats{d}}} {\rm \rcub }}\quad {\rm for}\ i \equals 1\comma \ldots \comma { N}.$

3. Simulation studies

When genotyping the individuals of the population, we set markers at intervals of 10 centiMorgan (cM) on a chromosome of length 100 cM (δ=99). We thus have 11 markers at our disposal (κ=10) and the local test statistics are evaluated in steps of 1 cM. Under the null hypothesis H ₀, no QTL is segregating in the population. To model the alternative hypothesis we placed a single QTL at position 25 cM (between the third and fourth marker). In the simulation study we used N=4 sires and n=200 daughters per sire. The litter size is Poisson distributed (Thomson, Reference Thomson2003) with a mean value of 10. The transformed weight at birth X _{i, j, k} in (A.4) is simulated. The standard deviation of piglet birth weight is assumed to be 320 g (e.g. Roehe, Reference Roehe1999). Similar to Roehe (Reference Roehe1999) the residual variance is about 40% of the phenotypic variance, σ_e²=(200 g)². The direct polygenic variance is about 9% of the phenotypic variance, σ_polygene²=(96·8 g)². The value of the transformed additive QTL effect _{i, j, k} is listed in Table 2 and depends on the piglet's genotype. Because the variance of the additive QTL effect takes about 1–3% of the phenotypic variance (e.g. Bidanel et al., Reference Bidanel, Milan, Iannuccelli, Amigues, Boscher, Bourgeois, Caritez, Gruand, le Roy, Lagant, Quintanilla, Renard, Gellin, Ollivier and Chevalet2001), the additive value is a=61 g. The factor c _* varies from 1 to 1·4 by 0·1. The gene frequency is assumed to be one-half. Covariances between the maternal effects and the direct effects of the piglet are neglected. The marker alleles m _{l, r}, l=0,1, …, 10, are drawn by chance according to the recombination rates. The simulation was repeated 100 times for every investigated factor c _*. Ten thousand permutations of the first simulated sample variances were used to determine the chromosome-wise threshold value. This critical value was also applied for the following 99 repeated simulations of sample variances. The simulations and tests were carried out with ‘glm’ and self-written functions of the R program (R Development Core Team, 2005).

Table 1. Calculation of genotype frequencies within litter; presumed gene frequency p _Q=0·5; dominance effect is omitted; a denotes the additive value

Table 2. Calculation of various expectations of the transformed QTL effect _i,j,k,m=G _i,j,k−(G _i,j,k|B=B _m); dominance effect is omitted; a denotes the additive value and σ_QTL² denotes the variance of the QTL effect G _i,j,k

Fig. 1 a shows the density estimates of the conditional distributions of S _{i, j}² obtained from the application ‘density’ with Gaussian kernel in R. This figure shows how the densities depend on the inherited paternal QTL allele as it is pointed out in Appendix A.

Fig. 1. QTL simulated at 25 cM, c _*=1·2. (a) Estimation of densities separated by paternal QTL allele Q and q; (b) average values of test statistic based on the GLM and 100 repetitions.

(i) Results based on the GLM

Examples of the results of analysing the simulated sample variances with use of the GLM theory are shown for the factor c _*=1·2 (c=1·177) in Figs. 1 b and 2 a, b. Fig. 1 b displays the average values of the test statistic in (20). The maximum of these values of the test statistic is attained at about 25 cM, where the QTL was actually simulated. Fig. 2 a is a histogram of detected QTL positions (22) if the null hypothesis is rejected. One can see that the estimated positions closely surround the correct position and deviate by only about 5–10 cM. In Fig. 2 b it is conspicuous that the estimates of $\widehat{c_{n\comma i}^{ \minus \setnum{1}} }$ are split into three groups. Depending on the linkage of marker and QTL alleles the values fluctuate around the parameters c or c ⁻¹ for heterozygous sires with genotype qQ and Qq, respectively. Otherwise, in case of homozygotes, the estimates vary about 1. When test statistic is applied, the results are similar to the application of as shown in Table 3.

Fig. 2. QTL simulated at 25 cM, N=4 sires, n=200 daughters, c _*=1·2 (c=1·177). (a) Detected QTL positions based on the GLM with test statistic ; (b) histogram of estimator $\widehat{c_{n\comma i}^{ \minus \setnum{1}} }$ based on the GLM; (c) detected QTL positions based on the LM with test statistic F; (d) histogram of estimator $\widehat{c_{n\comma i}^{ \minus \setnum{1}} }$ based on the LM.

Table 3. Summary of simulation results (10% of repetitions with exclusive homozygous sires); power_p_emp denotes the empirical pointwise power evaluated at the simulated QTL position at 25 cM with use of tabulated quantiles of the χ²- and F-distribution; power_g_emp is the empirical global power; mean_detec is the average of detected QTL positions and variance_detec is the sample variance of estimated positions; the statistic F is based on the LM and L, , are based on the GLM

(ii) Results based on the LM

Evaluating the simulated sample variances with use of the test statistic F in (25), the results differ slightly from the case of applying the test statistics and . For the weighted regression model we used a maximization as in (22) to estimate the QTL position. Fig. 2 c gives a histogram of the estimated QTL positions if H ₀ is rejected. The fluctuation around the simulated QTL position at 25 cM is similar to the GLM. Fig. 2 d displays a histogram of the estimator $\widehat{c_{n\comma i}^{ \minus \setnum{1}} }$ and shows again that the estimator has a mixed distribution.

Table 3 summarizes the results achieved using the test statistics L, , and F in (19)–(21) and (25), respectively. The empirical QTL detection power is determined by the relative frequency of rejecting the null hypothesis. It is obvious that with increasing factor c _* the empirical power increases. The empirical global power should not exceed the value of 90%, because 10% of the repeated simulations created exclusively homozygous sires at random (all N=4 sires are homozygous) and therefore a QTL effect on the within-litter variance could not be detected. Under the null hypothesis H ₀ (c=1) and chromosome-wise investigations the α level of 5% holds for the verified test statistics except statistic . This may be due to the approximative permutation test with only 10 000 re-samples.

For c _* values 1·2 and larger values, where the QTL detection power is already very high, all tests perform equally well. However, for c _*=1·1 or equivalent c=1·088 the GLM clearly outperforms the weighted regression approach and provides an extra gain of 12% empirical global power. From Table 3 we can see that the tests based on the GLM provide a higher empirical QTL detection power than tests based on weighted regression. Similar results are given under local investigations at d=25 cM (see Table 3).

4. Discussion

(i) LM versus GLM

Applying the test based on the LM requires far less computing time compared with the tests based on the GLM (2 hours vs 17 hours for simulating the dataset and running the tests on a PC with a 3 GHz Intel processor). The surplus of computing time for the GLM is, however, not large compared with the total time and costs usually required for QTL experiments. The benefit of the GLM with respect to empirical global power is obvious when c _*=1·1; the use of the test statistics and should be favoured. The presented test statistics remain suitable in the case of a varying number of daughters per sire. To avoid a loss of QTL detection power, a decreased number of daughters per sire may be adjusted for by considering more half-sib families. It may be interesting to mention that using the log-link instead of the identity link function leads to slightly less QTL detection power (Wittenburg, Reference Wittenburg2005).

Thomson (Reference Thomson2003) and Lange & Whittaker (Reference Lange and Whittaker2001) specified the detection of a QTL for non-normal traits. Thomson (Reference Thomson2003) proposed a model for non-normal data types using normal-based profile log-likelihood and solving the generalized estimating equations. These methods of parameter estimation are comparable to the techniques described above. The essential difference is, that the present work employs the approximate gamma distribution of S _{i, j}² and develops the appropriate profile log-likelihood for the estimation of the QTL position parameter d.

Standard regression interval mapping has been proved to be a robust method for non-normally distributed continuous traits in comparison with non-parametric approaches (Rebaï, Reference Rebaï1997). Kadarmideen et al. (Reference Kadarmideen, Janss and Dekkers2000) found similar QTL detection power with LM and GLM for binary traits. However, Yin & Zhang (Reference Yin and Zhang2006) demonstrated for ordinal data that the GLM outperformed the LM in terms of QTL detection power. Therefore, our results provide another example in which a GLM should be preferred for QTL mapping.

(ii) Analysis of heterogeneous variances

Several authors have dealt with heterogeneity of variances. Mixed Gaussian models for the within-litter standard deviation have already been mentioned (Högberg & Rydhmer, Reference Högberg and Rydhmer2000; Damgaard et al., Reference Damgaard, Rydhmer, Løvendahl and Grandinson2003). Foulley et al. (Reference Foulley, Gianola, SanCristobal and Im1990) put forward a log-linear model for residual variances in order to identify sources of heterogeneity, an idea which has been further extended by SanCristobal et al. (Reference SanCristobal, Foulley and Manfredi1993), Foulley & Quaas (Reference Foulley and Quaas1995) and SanCristobal-Gaudy et al. (Reference SanCristobal-Gaudy, Elsen, Bodin and Chevalet1998) by including heterogeneous genetic components of variance and random factors affecting variances. A Bayesian approach jointly considering genetic effects on mean and residual variance was developed by Sorensen & Waagepetersen (Reference Sorensen and Waagepetersen2003). Other extensions comprise mean–variance relationships (Foulley, Reference Foulley2004) in models allowing also for effects of explanatory variables on variance components. The latter approach in particular would presumably be worthwhile to be investigated as an alternative to the methods presented.

(iii) Ambiguity of the parameter c ²

The parameter c ² denotes the ratio of within-litter variance of q-daughters compared with Q-daughters. If the parameter c is significantly different from 1, it is ambiguous whether the within-litter variance is affected by a raised residual variance, by an enlarged polygenic variance or by an increased QTL variance (see equation (1)). When a non-significant result is observed, a constant within-litter variance could also be generated by, for example, an increased residual variance and decreased polygenic variance. Therefore, it should be kept in mind that the parameter c ² is just a cumulative effect for any changes in the components of the within-litter variance.

(iv) Granddaughter design

The applied model for the daughter design can be extended, with some modifications, to the granddaughter design. Consider a fixed number of grandsires, which are mated with unrelated granddams of the population. We select one son per mating and these sons are mated with unrelated dams. One daughter per mating is chosen to analyse the within-litter variance. For each granddaughter we have to calculate the sample variance of observed birth weights within one litter. We may assign the sample variances pooled over all daughters as observation for every sire and apply the techniques of Section 2 with adjusted degrees of freedom (number of total piglets minus number of daughters) in the matrix of weights.

(v) Sex effect

Up to now it has been assumed that no sex effect occurs on the piglet's mean value or on the variability of phenotypes. But the expected phenotypic value of male piglets may be larger than for female piglets. To consider such an effect, the model (A.1) in Appendix A has to be adjusted. Three different scenarios are possible: (i) No sex effect on the mean and variability of the phenotypic value exists. Therefore, the observed value per daughter consists of the sample variance of all weights at birth within one litter (degrees of freedom: litter size minus 1). (ii) A sex effect acts on the expected phenotypic value but not on its variability. Thus, the observed value per daughter is the sample variance of birth weights pooled over male and female progeny (degrees of freedom: litter size minus 2). (iii) The sex affects the variability of the phenotypic value. Thus, there are two observed values per litter: the respective sample variances of male and female progeny. In this case it is possible to test a QTL effect as well as a sex by QTL interaction.

(vi) Other fields of application

A second series of simulations was started with multiple measurements taken from the genotyped individuals themselves (i.e. the daughters), such as the withers height of cows in a daughter design. In this case the theory simplifies, because the phenotypic value depends only on the individual's own genotype and not on any paternal QTL allele in progeny. Thus, the model for the primary trait consists only of the mean value within daughter i, j and the normally distributed residual deviation, which is modified by c∊(0, ∞) if the daughter has inherited the QTL allele q. The value c appears directly in this model. Therefore, the parameter c ² still denotes the ratio of variance within individual of q-daughters to Q-daughters. Note that in this case the distribution of S _{i, j}² is exactly gamma.

In the simulated example, the withers height of cows was measured 10 times in a daughter design with N=4 sires and n=200 daughters per sire. The simulated QTL effect of c=1·2 was detected in 94% of the repetitions with use of the test statistic F and (six repetitions created exclusively homozygous sires). When the withers height was measured three times, the simulated QTL effect of c=1·3 was detected in 70% of the repetitions based on the test statistic F and in 85% based on the statistic (seven repetitions with only homozygous sires).

In plants and laboratory animals, a panel of recombinant inbred lines (RILs; e.g. Broman, Reference Broman2005) can be produced for QTL mapping purposes. As all members of a certain RIL share the same genotype but may vary in their phenotype, RILs can serve as a well-suited tool for mapping QTL effects on within-genotype variability. Essentially a panel of RILs can be treated with the methods presented in this article, when the data are analysed as a single half-sib family (or backcross) in the same way as in the cow example.

There are possible applications in plants which closely resemble the repeated measurements of withers height of cows. For example, one could examine some characteristic of tomato fruits as a multiple measurement of a tomato plant. Again, in this application the theory simplifies because of lack of a paternal genetic effect on the fruits (tomatoes are almost purely maternal tissue). Moreover, the sample variances per panicle may be pooled over all panicles to generate one observation per tomato plant and to consider some effect of panicle on the phenotypic mean of the fruits.

(vii) Gene frequency

The GLM (6) and LM (24) do not include a parameter for the gene frequency p _Q of the QTL allele Q; in the simulations the gene frequency was assumed to be one-half. Looking at the components which are affected by p _Q, the ratio c ² in (2) is obvious. The parameter c ² depends on the variance of the transformed QTL effect _{i, j, k} in (A.2) of Appendix A, which is calculated on the basis of known genotype frequencies within one litter (Table 2). Thus, if p _Q differs essentially from one-half, the condition (_{i, j, k}|1_{{Q}, i, j}=1)=(_{i, j, k}|1_{{Q}i, j}=0)=σ_QTL² is no longer satisfied. Consequently, the ratio c ² deviates from 1 even though a QTL effect on the within-litter variance does not exist. To consider the consequences of a gene frequency being different from one-half, the simulations under the null hypothesis (c _*=1) have been repeated for the statistics in (20) and F in (25) using gene frequencies of 0·10 and 0·25. Without giving the detailed simulation results we remark that the α level of 5% was always maintained in both tests. This can be referred to a relatively small variance of the additive QTL effect in comparison with the other variance components (see equation (1)). Hence in most practical applications with a possible paternal QTL effect on progeny (piglets), a gene frequency different from 0·5 can be neglected and the theory provided may still serve as a good approximation. In all other cases (tomatoes, withers height of cows) where repeated measurements are taken from either the genotyped individuals themselves or from purely maternal tissues, our theory is exact and unaffected by the allele frequency at the QTL.

The research project was financially supported by the H. Wilhelm Schaumann Stiftung.

5. Appendix A. Distribution of the sample variance

The phenotypic value Y _{i, j, k} of piglet within one litter is described by the following model consisting of independent components:

(A.1)

$\eqalign {Y_{i\comma j\comma k} \equals \tab \mu _{i\comma j} \plus A_{i\comma j\comma k} \plus G_{i\comma j\comma k} \plus {\bf 1}_{\lcub Q\rcub \comma i\comma j} E_{i\comma j\comma k} \cr \tab \plus c{\asts} \lpar 1 \minus {\bf 1}_{\lcub Q\rcub \comma i\comma j} \rpar E_{i\comma j\comma k}\comma$

where i indicates the sire, j the daughter per sire and k the piglet. The constant litter mean is denoted by μ_{i, j}. The random components are the mendelian sampling effect $A_{i\comma j\comma k} \sim N\lpar 0\comma {1 \over 2}\sigma _{polygene}^{\setnum{2}} \rpar$ , the additive QTL effect G _{i, j, k} and the random deviation E _{i, j, k}~N(0, σ_e²).

The indicator function 1_{{Q}, i, j} takes the value 1 if the daughter i, j inherits the QTL allele Q at the unknown QTL position from the sire i. In the case of inheriting q, the random deviation E _{i, j, k} of model (A.1) is modified by the factor c _*∊(0, ∞). Let 1_{{Qq, qQ}, i, j, k} be the indicator function with value 1 if the piglet has a heterozygous genotype, 1_{{QQ}, i, j, k} and 1_{{qq}, i, j, k} in the case of genotype QQ and qq, respectively. The additive QTL effect G _{i, j, k} depends on the piglet's genotype and has a three-point distribution. In the absence of a dominance effect its probabilities are

$\eqalign{ \tab Pr \lpar G_{i\comma j\comma k} \equals \hskip-1\minus a\rpar \equals Pr\lpar 1_{\lcub qq\rcub \comma i\comma j\comma k} \equals 1\rpar \comma \cr \tab Pr\lpar G_{i\comma j\comma k} \equals \hskip-1\minus 0\rpar \equals Pr\lpar 1_{\lcub Qq\comma qQ\rcub \comma i\comma j\comma k} \equals 1\rpar \comma \cr \tab Pr\lpar G_{i\comma j\comma k} \equals \hskip-1\minus a\rpar \equals Pr\lpar 1_{\lcub QQ\rcub \comma i\comma j\comma k} \equals 1\rpar \comma \cr}$

where a is some unknown constant which, in view of (A.1), is called the additive value. The probability that the piglet has a special genotype, e.g. Pr(1_{{QQ}, i, j, k}=1), is only allocatable in combination with the unobservable parental genotypes. Therefore, an additional random variable B is required, which denotes the random combination of QTL alleles of the daughter i,j and her associated male (mating types) as described in columns 3 and 4 of Table 1. The realizations of B are denoted by b _m, m=1, ..., 12 (column 2 of Table 1). The conditional distributions of G _{i, j, k} given B lead to the distribution of the additive QTL effect G _{i, j, k}. For a fixed index m the phenotypic values of the offspring within one litter are independently and identically distributed. Because μ_{QTL, m}=(G _{i, j, k}|B=b _m)≠0 (for some exceptions see Table 1), we set

(A.2)

${\tilde {G}}_{i\comma j\comma k} \equals G_{i\comma j\comma k} \minus {\bb E}\lpar G_{i\comma j\comma k} \vert B\rpar \comma$

which now satisfies (_{i, j, k}|B=b _m)=0 ∀m. If A_θ={1_{{Q}, i, j}=θ}, θ∊{0, 1}, then

(A.3)

$\eqalign{ {\bb E}\lpar {\tilde {G}}_{i\comma j\comma k}^{\setnum{2}} \vert A_{\setnum{1}} \rpar \tab \equals \mathop\sum\limits_{m \equals \setnum{1}}^{\setnum{6}} {Pr\lpar B \equals b_{m} \vert A_{\setnum{1}} \rpar } E\lpar {\tilde {G}}_{i\comma j\comma k}^{\setnum{2}} \vert A_{\setnum{1}} \cap \lcub B \equals b_{m} \rcub \rpar \cr \tab \equals \mathop\sum\limits_{m \equals \setnum{1}}^{\setnum{6}} {Pr\lpar B \equals b_{m} \vert A_{\setnum{1}} \rpar {\rm \ }} \widetilde\sigma _{QTL\comma m}^{\setnum{2}} \equals {1 \over 4}a^{\setnum{2}} \comma \cr {\bb E}\lpar {\tilde {G}}_{i\comma j\comma k}^{\setnum{2}} \vert A_{\setnum{0}} \rpar \tab \equals \mathop\sum\limits_{m \equals \setnum{7}}^{\setnum{12}} {Pr\lpar B \equals b_{m} \vert A_{\setnum{0}} \rpar {\rm \ }} \widetilde\sigma _{QTL\comma m}^{\setnum{2}} \equals {1 \over 4}a^{\setnum{2}} \comma \cr}$

where the different values of $\widetilde\sigma _{QTL,m}^{\setnum{2}} \equals {\bb E} \lpar {\tilde {G}}_{i\comma j\comma k}^{\setnum{2}} \vert A_{\theta} \cap \lcub B \equals b_{m} \rcub\rpar}$ are listed in Table 2, column 4 and the corresponding row b _m. The variance of the additive QTL effect is defined by $\sigma _{QTL}^{\setnum{2}} \colon \equals {1 \over 4}a^{\setnum{2}}$ . We see from (A.3) that (_{i, j, k}|A _θ)=σ_QTL², θ∊{0, 1}.

To eliminate μ_{i, j} that appears in (A.1), we introduce X _{i, j, k} by

(A.4)

$\eqalign{ X_{i\comma j\comma k} \tab \equals Y_{i\comma j\comma k} \minus \mu _{i\comma j} \minus {\bb E}\lpar G_{i\comma j\comma k} \vert B\rpar \cr \tab \equals A_{i\comma j\comma k} \plus {\tilde {G}}_{i\comma j\comma k} \plus {\bf 1}_{\lcub Q\rcub \comma i\comma j} E_{i\comma j\comma k} \plus c{\asts} \lpar 1 \minus {\bf 1}_{\lcub Q\rcub \comma i\comma j} \rpar E_{i\comma j\comma k}. \cr}$

If n _{i, j} denotes the litter size of daughter i,j, then the sample variance is

(A.5)

$\eqalign{\tab S_{i\comma j}^{\setnum{2}} \equals {1 \over {n_{i\comma j} \minus 1}}\mathop\sum\limits_{k \equals \setnum{1}}^{n_{{i\comma j}} } {\lpar X_{i\comma j\comma k} \minus {\overline {X}} _{i\comma j\comma.} \rpar ^{\setnum{2}} }\cr\tab\quad {\rm with\ }{\overline {X}} _{i\comma j\comma.} \equals {1 \over {n_{i\comma j} }}\mathop\sum\limits_{k \equals \setnum{1}}^{n_{{i\comma j}} } {X_{i\comma j\comma k} }.$

The within-litter variance depends on the paternal QTL allele of the daughter i, j. The conditional expectation of X _{i, j, k} and S _{i, j}² given the events A _θ, θ∊{0, 1}, are

$\eqalign{ {\bb E}\lpar X_{i\comma j\comma k} \vert A_{\theta } \rpar \equals \tab {\bb E}\lpar A_{i\comma j\comma k} \plus {\tilde {G}}_{i\comma j\comma k} \plus {\bf 1}_{\lcub Q\rcub \comma i\comma j} E_{i\comma j\comma k} \cr \tab \plus c{\asts} \lpar 1 \minus {\bf 1}_{\lcub Q\rcub \comma i\comma j} \rpar E_{i\comma j\comma k} \vert A_{\theta } \rpar \equals 0\comma \cr {\bb E}\lpar S_{i\comma j}^{\setnum{2}} \vert A_{\setnum{1}} \rpar \equals \tab {\bb V}\lpar X_{i\comma j\comma k} \vert A_{\setnum{1}} \rpar \cr \equals \tab {\bb E}\lpar \lsqb A_{i\comma j\comma k} \plus {\tilde {G}}_{i\comma j\comma k} \plus {\bf 1}_{\lcub Q\rcub \comma i\comma j} E_{i\comma j\comma k}\cr \tab \plus c{\asts} \lpar 1 \minus {\bf 1}_{\lcub Q\rcub \comma i\comma j} \rpar E_{i\comma j\comma k} \rsqb ^{\setnum{2}} \vert A_{\setnum{1}} \rpar \cr \equals \tab {1 \over 2}\sigma _{polygene}^{\setnum{2}} \plus \sigma _{QTL}^{\setnum{2}} \plus \sigma _{e}^{\setnum{2}} \ecolon \tau ^{\setnum{2}} \plus \sigma _{QTL}^{\setnum{2}} \comma \cr {\bb E}\lpar S_{i\comma j}^{\setnum{2}} \vert A_{\setnum{0}} \rpar \equals \tab {\bb V}\lpar X_{i\comma j\comma k} \vert A_{\setnum{0}} \rpar \cr \equals \tab {1 \over 2}\sigma _{polygene}^{\setnum{2}} \plus \sigma _{QTL}^{\setnum{2}} \plus c{\asts}^{\hskip-4\setnum{2}} \sigma _{e}^{\setnum{2}} \equals \colon \tau{\asts}^{\hskip-4\setnum{2}} \plus \sigma _{QTL}^{\setnum{2}}. \cr}$

Because the investigated sample variance includes the non-normally distributed variable _{i, j, k}, the conditional distribution of S _{i, j}² is not a χ² distribution. Therefore, an approximation with a gamma distribution &Ggr;_{μ,ν_{i, j}} with expectation μ and variance ${{\mu ^{\setnum{2}} } \over {\nu_{i\comma j} }}$ will be considered.

Assertion 1The conditional variance of S _{i, j}²in ( A.5) given {1_{{Q}, i, j}=1} is

${\bb V}\lpar S_{i\comma j}^{\setnum{2}} \vert {\bf 1}_{\lcub Q\rcub \comma i\comma j} \equals 1\rpar \equals {2 \over {n_{i\comma j} \minus 1}}\left[ {\lpar \tau ^{\setnum{2}} \plus 2\sigma _{QTL}^{\setnum{2}} \rpar \tau ^{\setnum{2}} \plus \sigma _{QTL}^{\setnum{4}} \left( {{{n_{i\comma j} } \over 4} \plus {1 \over 4} \plus {1 \over {n_{i\comma j} }}} \right)} \right].$

An analogous statement holds under the condition of {1_{{Q}, i, j}=0} if τ_*²is used instead of τ².

Proof: See Supplementary Appendixes.

Fig. 3 a and b show a simulated histogram of S _{i, j}². Paternal QTL alleles Q and q were distinguished. The figures suggest approximating the conditional distribution of S _{i, j}², in (A.5) by a two-parameter gamma distribution &Ggr;_{μ,ν_i,j} with

(A.6)

$\eqalign{ \mu \tab \equals \tau ^{\setnum{2}} \plus \sigma _{QTL}^{\setnum{2}} \comma \cr \nu_{i\comma j} \tab \equals {{n_{i\comma j} \minus 1} \over 2}{{\lpar \tau ^{\setnum{2}} \plus \sigma _{QTL}^{\setnum{2}} \rpar ^{\setnum{2}} } \over {\lpar \tau ^{\setnum{2}} \plus 2\sigma _{QTL}^{\setnum{2}} \rpar \tau ^{\setnum{2}} \plus \sigma _{QTL}^{\setnum{4}} \lpar {{n_{i\comma j} } \over 4} \plus {1 \over 4} \plus {1 \over {n_{i\comma j} }}\rpar }}. \cr}$

Note that the presented model (A.1) is an all-purpose model that covers two assumptions. First, similar to the ideas of Hill & Zhang (Reference Hill and Zhang2004), the QTL may affect both the phenotypic mean and its variability. In this case, the distribution of the daughter's trait is approximated by a gamma distribution as shown above. Second, the QTL affects only the variability of the phenotype. Then the general model (A.1) simplifies and the distribution of the within-subject sample variance is exactly gamma.

Fig. 3. (a) Histogram of simulated values s _{i, j}² when inheriting Q versus density of a gamma distribution with parameters in (A.6); (b) histogram of simulated values s _{i, j}² when inheriting q versus density of a gamma distribution with parameters in (A.6), where τ² is replaced by τ_*² and c _*=1·2.

References

Bidanel, J.-P., Milan, D., Iannuccelli, N., Amigues, Y., Boscher, M.-Y., Bourgeois, F., Caritez, J.-C., Gruand, J., le Roy, P., Lagant, H., Quintanilla, R., Renard, C., Gellin, J., Ollivier, L. & Chevalet, C. (2001). Detection of quantitative trait loci for growth and fatness in pigs. Genetics Selection Evolution 33, 289–309.CrossRef Google Scholar PubMed

Bolet, G., Garreau, H., Joly, T., Theau-Clement, M., Hurtaud, J. & Bodin, L. (2005). Genetic homogenization of birth weights in rabbits: evolution of the characteristics of the genital tract after two generations of selection. In Book of Abstracts of the 56th Annual Meeting of the EAAP. Wageningen Academic Publishers, p. 80.Google Scholar

Box, G. E. P. & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society B 26, 211–252.Google Scholar

Broman, K. W. (2005). The genomes of recombinant inbred lines. Genetics 169, 1133–1146.CrossRef Google Scholar PubMed

Churchill, G. A. & Doerge, R. W. (1994). Empirical threshold values for quantitative trait mapping. Genetics 138, 963–971.CrossRef Google Scholar PubMed

Damgaard, L. H., Rydhmer, L., Løvendahl, P. & Grandinson, K. (2003). Genetic parameters for within-litter variation in piglet birth weight and change in within-litter variation during suckling. Journal of Animal Science 81, 604–610.CrossRef Google Scholar PubMed

Fahrmeir, L. & Kaufmann, H. (1985). Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. The Annals of Statistics 13, 342–368.CrossRef Google Scholar

Fahrmeir, L. & Tutz, G. (1994). Multivariate Statistical Modelling Based on Generalized Linear Models. New York: Springer.CrossRef Google Scholar

Foulley, J.-L. (2004). Including mean–variance relationships in heteroskedastic mixed linear models: theory and application. URL http://interstat.statjournals.net (retrieved 30 May 2007).Google Scholar

Foulley, J. L., Gianola, D., SanCristobal, M. & Im, S. (1990). A method for assessing extent and sources of heterogeneity of residual variances in mixed linear models. Journal of Dairy Science 73, 1612–1624.CrossRef Google Scholar

Foulley, J. L. & Quaas, R. L. (1995). Heterogeneous variances in gaussian linear mixed models. Genetics Selection Evolution 27, 211–228.CrossRef Google Scholar

Haley, C. S. & Knott, S. A. (1992). A simple regression method for mapping quantitative trait loci in lines crosses using flanking markers. Heredity 69, 315–324.CrossRef Google Scholar PubMed

Hill, W. G. & Zhang, X.-S. (2004). Effects on phenotypic variability of directional selection arising through genetic differences in residual variability. Genetical Research 83, 121–132.CrossRef Google Scholar PubMed

Högberg, A. & Rydhmer, L. (2000). A genetic study of piglet growth and survival. Acta Agriculturæ Scandinavica, Section A, Animal Science 50, 300–303.Google Scholar

Jørgensen, B. (1987). Exponential dispersion models. Journal of the Royal Statistical Society B 49, 127–162.Google Scholar

Kadarmideen, H. N., Janss, L. L. G. & Dekkers, J. C. M. (2000). Power of quantitative trait locus mapping for polygenic binary traits using generalized and regression interval mapping in multi-family half-sib designs. Genetical Research 76, 305–317.CrossRef Google Scholar PubMed

Lange, C. & Whittaker, J. C. (2001). Mapping quantitative trait loci using generalized estimating equations. Genetics 159, 1325–1337.CrossRef Google Scholar PubMed

McCullagh, P. & Nelder, J. A. (1989). Generalized Linear Models, 2nd edn. London: Chapman and Hall.CrossRef Google Scholar

R Development Core Team (2005). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. URL http://www.R-project.org (retrieved 24 October 2005).Google Scholar

Rebaï, A. (1997). Comparison of methods for regression interval mapping in QTL analysis with non-normal traits. Genetical Research 69, 69–74.CrossRef Google Scholar

Roehe, R. (1999). Genetic determination of individual birth weight and its association with sow productivity traits using Bayesian analyses. Journal of Animal Science 77, 330–343.CrossRef Google Scholar PubMed

Ros, M., Sorensen, D., Waagepetersen, R., Dupont-Nivet, M., SanCristobal, M., Bonnet, J. C. & Mallard, J. (2004). Evidence for genetic control of adult weight plasticity in the snail Helix aspersa. Genetics 168, 2089–2097.CrossRef Google Scholar PubMed

SanCristobal, M., Foulley, J. L. & Manfredi, E. (1993). Inference about multiplicative heteroskedastic components of variance in a mixed linear gaussian model with an application to beef cattle breeding. Genetics Selection Evolution 25, 3–30.CrossRef Google Scholar

SanCristobal-Gaudy, M., Elsen, J. M., Bodin, L. & Chevalet, C. (1998). Prediction of the response to a selection for canalisation of a continuous trait in animal breeding. Genetics Selection Evolution 30, 423–451.CrossRef Google Scholar

Seber, G. A. F. (1977). Linear Regression Analysis. New York: Wiley.Google Scholar

Sorensen, D. & Waagepetersen, R. (2003). Normal linear models with genetically structured residual variance heterogeneity: a case study. Genetical Research 82, 207–222.CrossRef Google Scholar PubMed

Thomson, P. C. (2003). A generalized estimating equations approach to quantitative trait locus detection of non-normal traits. Genetics Selection Evolution 35, 257–280.CrossRef Google Scholar PubMed

Weller, J. I. & Wyler, A. (1992). Power of different sampling strategies to detect quantitative trait loci variance effects. Theoretical and Applied Genetics 83, 582–588.CrossRef Google Scholar PubMed

Wittenburg, D. (2005). Lineare und verallgemeinerte lineare Modelle zum Nachweis von QTL-Effekten auf die Varianz wiederholter Messungen. Diploma thesis, Universität Rostock.Google Scholar

Yin, Z.-J. & Zhang, Q. (2006). Mapping quantitative trait loci for ordinal traits using the generalized linear model in half-sib designs. Animal Research 55, 245–255.CrossRef Google Scholar

Table 1. Calculation of genotype frequencies within litter; presumed gene frequency pQ=0·5; dominance effect is omitted; a denotes the additive value

Table 2. Calculation of various expectations of the transformed QTL effect i,j,k,m=Gi,j,k−(Gi,j,k|B=Bm); dominance effect is omitted; a denotes the additive value and σQTL2 denotes the variance of the QTL effect Gi,j,k

Fig. 1. QTL simulated at 25 cM, c*=1·2. (a) Estimation of densities separated by paternal QTL allele Q and q; (b) average values of test statistic based on the GLM and 100 repetitions.

Fig. 2. QTL simulated at 25 cM, N=4 sires, n=200 daughters, c*=1·2 (c=1·177). (a) Detected QTL positions based on the GLM with test statistic ; (b) histogram of estimator \widehat{c_{n\comma i}^{ \minus \setnum{1}} } based on the GLM; (c) detected QTL positions based on the LM with test statistic F; (d) histogram of estimator \widehat{c_{n\comma i}^{ \minus \setnum{1}} } based on the LM.

Table 3. Summary of simulation results (10% of repetitions with exclusive homozygous sires); power_p_emp denotes the empirical pointwise power evaluated at the simulated QTL position at 25 cM with use of tabulated quantiles of the χ2- and F-distribution; power_g_emp is the empirical global power; mean_detec is the average of detected QTL positions and variance_detec is the sample variance of estimated positions; the statistic F is based on the LM and L, , are based on the GLM

Fig. 3. (a) Histogram of simulated values si, j2 when inheriting Q versus density of a gamma distribution with parameters in (A.6); (b) histogram of simulated values si, j2 when inheriting q versus density of a gamma distribution with parameters in (A.6), where τ2 is replaced by τ*2 and c*=1·2.

Article contents

Linear and generalized linear models for the detection of QTL effects on within-subject variability

Summary

1. Introduction

2. Methods

(i) QTL effect on the within-litter variance

(ii) Generalized linear model

(iii) Test statistics in the GLM

(iv) Weighted regression

3. Simulation studies

(i) Results based on the GLM

(ii) Results based on the LM

4. Discussion

(i) LM versus GLM

(ii) Analysis of heterogeneous variances

(iii) Ambiguity of the parameter c 2

(iv) Granddaughter design

(v) Sex effect

(vi) Other fields of application

(vii) Gene frequency

5. Appendix A. Distribution of the sample variance

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests

(iii) Ambiguity of the parameter c ²