The genetics of environmental variation of dry matter grain yield in maize

YE YANG; CHRIS-CAROLIN SCHÖN; DANIEL SORENSEN

doi:10.1017/S0016672312000304

The genetics of environmental variation of dry matter grain yield in maize

Published online by Cambridge University Press: 28 May 2012

YE YANG ,

CHRIS-CAROLIN SCHÖN and

DANIEL SORENSEN

Show author details

YE YANG: Affiliation:
Department of Molecular Biology and Genetics, Faculty of Science and Technology, Aarhus University, DK-8830 Tjele, Denmark
CHRIS-CAROLIN SCHÖN: Affiliation:
Department of Plant Breeding, Center of Life and Food Sciences Weihenstephan, Technische Universität München, Emil-Ramann-Street 4, 85350 Freising, Germany
DANIEL SORENSEN*: Affiliation:
Department of Molecular Biology and Genetics, Faculty of Science and Technology, Aarhus University, DK-8830 Tjele, Denmark
*: *Corresponding author: Daniel Sorensen. Department of Molecular Biology and Genetics, Faculty of Science and Technology, Aarhus University, DK-8830 Tjele, Denmark. E-mail: [email protected]

Article contents

Summary
Introduction
Data and preliminary calculations
Non-parametric analysis
Parametric Bayesian analysis
Results of the non-parametric analyses
Results of the Bayesian analysis
Discussion
References

Rights & Permissions

Summary

Dry matter grain yield per plot from three genetically homogeneous single-cross maize hybrids were analysed to investigate whether environmental variance depends on genotype. Three genotypes were tested at 20 locations in 3 years. The data were analysed using a non-parametric approach and fully parametric Bayesian models. Both analyses reveal effects of genotype on environmental variation. The Bayesian analyses indicate that genotype by location–year interactions are the most important effects acting at the level of the mean. The best-fitting Bayesian model is one postulating genotype by location–year interactions acting on the mean and main effects of genotype and of location–year on the variance. Despite the detection of genotypic effects acting on the variance, location–year effects constitute the biggest relative source of variance heterogeneity.

Type: Research Papers
Information: Genetics Research , Volume 94 , Issue 3 , June 2012 , pp. 113 - 119

DOI: https://doi.org/10.1017/S0016672312000304 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

1. Introduction

Phenotypic plasticity, developmental homoeostasis and canalization are concepts related to environmental sensitivity and are importantly linked to mechanisms of adaptation and evolution. Environmental sensitivity can be defined either as mean phenotypic changes of a given genotype in different environments or as differences in the residual variance of different genotypes in the same environment (Jinks & Pooni, Reference Jinks and Pooni1988). The first definition gives rise to genotype by environment interactions acting at the level of the mean, a topic that is not the focus of the present work.

The subject of this study is related to the second definition, which gives rise to genetically structured heterogeneous variance models. In a given macro-environment, for a particular genotype a certain phenotypic value is expected. Typically, the observed phenotype differs from its expectation, and this deviation, which is the focus of the modelling exercise, may include a genetic component. This model has interesting implications in animal and plant improvement and in evolutionary biology. From a breeding point of view, it offers the possibility to decrease variation by selection, leading to more homogeneous products. Particularly in plant breeding this model is used to study stability of genotypes (Cotes et al., Reference Cotes, Crossa, Sanches and Cornelius2006). In evolutionary biology, a fundamental problem is to understand the forces that maintain phenotypic variation. Most of the models assume that environmental variance is constant and explain the observed levels of variation by invoking a balance between a gain of genetic variance by mutation and a loss by different forms of selection and drift. Recently, Zhang & Hill (Reference Zhang and Hill2005) discussed models where environmental variance is partly under genetic control and study conditions for its maintenance under stabilizing selection. Considerable experimental support for the presence of genetic factors affecting environmental variance has been provided in recent years. The majority of this evidence stems from analyses of experimental or field data fitting models of different levels of complexity that account for genetic variance in environmental variance. The topic has been comprehensibly reviewed by Hill & Mulder (Reference Hill and Mulder2010).

Genotype by environment interaction and heterogeneity of variance are topics that received considerable attention in the plant breeding literature (Piepho, Reference Piepho1999; Cotes et al., Reference Cotes, Crossa, Sanches and Cornelius2006), partly because yield trials are conducted in multiple environments and years. Recently, Edwards & Jannink (Reference Edwards and Jannink2006) used a Bayesian hierarchical model to study whether heterogeneity of residual and genotype by interaction variances could be assigned to genetic and genotype×environment effects. A similar model was implemented by Sorensen & Waagepetersen (Reference Sorensen and Waagepetersen2003) to investigate genetic heterogeneity of environmental variance for litter size in pigs. Inferences from such highly complex and parameterized models can be artefacts of the lack of normality of the data (Yang et al., Reference Yang, Christensen and Sorensen2011).

The objective of the present work is to use relatively simple statistical models to test whether environmental variation of dry matter grain yield in maize is heterogeneous, and whether this heterogeneity can be specifically attributed to genotypic effects. The focus is on the general environmental variance (Falconer & Mackay, Reference Falconer and Mackay1996) which is referred here simply as environmental variance. Part of the present data, consisting of replicates of the same genotype over a given environment, are particularly well suited to investigate this problem. The variance between observations within a genotype reflects environmental variation, and a different variance across genotypes indicates that a genetic component operates at the level of the environmental variance. This type of evidence can be compelling under experimentally controlled conditions and the analysis is in principle straightforward: it requires comparing sample variances across genotypes. Such an analysis, using abdominal and sternopleural bristle number in Drosophila was reported by Mackay & Lyman (Reference Mackay and Lyman2005). The present data include genotypes tested in multiple locations; this also allows studying the contribution to variance heterogeneity due to location–years and due to their interaction with genotype.

The statistical evidence for a dependence of environmental variance on genotype is produced using two different types of analysis. The first is based on a non-parametric approach and the second on a parametric Bayesian approach. The latter allows, in principle, a formal analysis accounting for and quantifying all known sources of variation. Flexibility of inferences comes at the cost of complexity and model dependence.

2. Data and preliminary calculations

The data consist of dry matter grain yield per plot (in kg, hereinafter referred to as yield, units omitted in the rest of the text for simplicity) from trials of a German Maize Breeding Programme. Plants were grown in 9 m² plots (1·5 m×6 m), 2 rows per plot. At each location, the same number of seeds were planted per plot producing 100 plants per plot, with very little variation in this number across plots. Three genetically homogeneous single-cross maize hybrids (all individuals from each of the three genotypes were genetically identical) were tested at 20 locations in 3 years. There were 27 location–year subclasses and 61 genotype–location–year subclasses, with a number of records (plots) ranging from 21 to 226 and a median of 64. Genotype 1 was represented in 26 of the 61 subclasses, genotype 2 in 19 and genotype 3 in 16. Preliminary analyses led to the elimination of extreme observations that had marked effects on inferences and after the editing process the number of observations for genotypes 1, 2 and 3 is 2046, 1595 and 1183, respectively, resulting in a total of 4824 records. The 20 locations represent a broad range of agroecological conditions in Germany.

The raw means $\mathop {\overline{y} }\nolimits_{i}$ , i=1, 2, 3, for genotypes 1, 2 and 3 are 9·17, 9·92 and 9·85, respectively. The corresponding sample variances $S_{i}^{\setnum{2}} \equals {\textstyle{1 \over {n_{i} \minus 1}}}\sum\nolimits_{j \equals \setnum{1}}^{i \equals n_{i} } {\mathop {\lpar {y_{ij} \minus \mathop {\overline{y} }\nolimits_{i} } \rpar}\nolimits^{\setnum{2}} }$ are 2·36, 2·48 and 2·87, where n _i is the number of observations in genotype i. With values of n _i larger than 1000 as in the present data, simple F-tests reveal significant differences at a level smaller than 0·01%, and indicate that the hypothesis of equality of environmental variances across genotypes must be rejected.

This naive analysis is deficient in two ways. Firstly, the presence of location–years violate the assumption of common mean among observations of the same genotype. The second is the assumption of normality of residuals (discussed further below and illustrated in Fig. 1). The two approaches implemented here, that differ in the degree of parameterization and formality, attempt to overcome these shortcomings.

Fig. 1. Histograms of Monte Carlo estimates of the distribution of the coefficient of skewness of residuals (discrepancy measure (4)) for the four chosen models fitted to untransformed (top) and to transformed (bottom) data (see subsection Likelihoods, for definition of the model labels). A distributional mean of zero indicates absence of skewness.

3. Non-parametric analysis

A first approach is to test for variance heterogeneity among the 61 genotype–location–year subclasses by means of Bartlett's statistic (Bartlett, Reference Bartlett1937). Along the same lines as in Ordas et al. (Reference Ordas, Malvar and Hill2008), the sampling distribution of Bartlett's statistic under the null hypothesis of variance homogeneity is approximated using 10 000 Monte Carlo replications, where in each, the indexes for the subclasses labelling the 4824 records are permuted and the 61 sample variances are obtained from the permuted records.

Repeating the exercise using as input the sample variances of the three genotypes, or the 27 location– year subclasses would lead to tests of heterogeneity associated with genotypes or location–years that are difficult to interpret, due to the very unbalanced structure of the data. Instead, heterogeneity due to genotype was studied using five independent subsets of the original 4824 yield data. The five subsets correspond to five location–year classes where the sampling variances of the three genotypes are computed using a minimum of 110 records per genotype (Table 2). This allows investigating the presence of genetic variance heterogeneity without the noise contributed by location–year effects.

4. Parametric Bayesian analysis

The other analysis is based on fitting a fully parametric Bayesian normal linear model with heterogeneous variance, to transformed data using the Box–Cox non-linear transformation (Box & Cox, Reference Box and Cox1964). The data consist of the original 4824 yield records, rather than the 61 sample variances used previously. The non-linear transformation parameter is first estimated using maximum likelihood and in a second step, the Bayesian model is implemented, conditional on this estimate. The transformation is used because a simple least squares analysis (using genotype–location–year location parameters) of the 4824 residuals revealed that the sample skewness and kurtosis are equal to −1·16 and 2·92, respectively. This signals the presence of asymmetry in the data but kurtosis is not detected. Skewness can lead to erroneous inferences, especially in the present work where the focus of inference is the variance.

(i) Likelihoods

The general form of the six Bayesian models fitted to the data is (see Table 1 for the detailed description of the six models)

(1)

${\bf y}\vert {\bf b}\comma \lpar \sigma _{i\comma M}^{\setnum{2}} \rpar _{i \equals \setnum{1}\comma \ldots \comma n} \sim N\lpar {{\bf Xb}\comma {\rm diag}\lpar \sigma _{i\comma M}^{\setnum{2}} \comma i \equals 1\comma \ldots \comma n\rpar } \rpar\comma$

where the vector of fixed effects b contains either main effects of genotype and of location–year, or interaction effects of genotype by location–year. The term σ_i,M ² is the conditional or residual variance for the ith observation under the Mth model, and X is an observed incidence matrix. The residual variance is interpreted as the environmental variance.

Table 1. Description of the six models

ME, main effect parameters; INT, interaction parameters; G, main effect of genotypes (3 levels); LY, main effects of location–years (27 levels); GLY, interaction of genotype by location–years (61 levels).

Of the six models, two assume that the residual variance is homogeneous (Models ME and INT, with main effects and interaction effects, respectively, acting on the mean). In this case,

$\sigma _{i\comma M}^{\setnum{2}} \equals \exp\!\lpar \mu \ast \rpar \comma$

where μ* is an unknown scalar parameter in the real line. The term exp(μ*) can be interpreted as an unobserved average environmental variance. The remaining four models assume heterogeneity of the residual variance and the general model for this variance is

$\sigma _{i\comma M}^{\setnum{2}} \equals \exp\!\lpar \mu \ast \plus {\bf w_{\it i}{\!\prime }\, {\bf b} \ast } \rpar.$

Here w _i′ is the ith row in the observed incidence matrix W and b* is a parameter vector with the same classification variables as in b. The data vector y is of length n=4824. All the models assume normality of the unobserved conditional distribution of the observations (1).

The distribution of the fitted residuals from all the models displayed left skewness, violating the assumption of normality in (1). Therefore, the six models were also fitted to transformed data using the Box–Cox non-linear transformation parameter λ. The general form of the models under the transformation is

(2)

$\eqalign {\tab{\bf y}^{\lpar \lambda \rpar } \vert {\bf b}\comma \lpar \sigma _{i\comma M}^{\setnum{2}} \rpar _{i \equals \setnum{1}\comma \ldots \comma n} \sim N\lpar {{\bf Xb}\comma {\rm diag}\lpar \sigma _{i\comma M}^{\setnum{2}} \comma i \equals 1\comma \ldots \comma n\rpar } \rpar\comma \cr \tab {\bf y}^{\lpar \lambda \rpar } \equals \left\{ {y_{i}^{\lpar \lambda \rpar } } \right\}$

and y _i ^(λ) is defined as (Box & Cox, Reference Box and Cox1964)

(3)

$y^{\lpar \lambda \rpar } \equals \left\{ {\matrix{ {{{y^{\lambda } \minus 1} \over \lambda }} \tab {\lpar \lambda \ne 0\rpar \comma } \cr {\log y} \tab {\lpar \lambda \equals 0\rpar \comma } \cr} } \right.$

for some unobserved transformation parameter λ. The transformation parameter λ is introduced to induce normality of the conditional distribution and linearity of the conditional expectation, but not to achieve constant variance.

The Bayesian analyses are based on fitting the above linear models, and slight variations of them, conditional on either λ=1 (corresponding to an analysis on the original scale) or on the maximum likelihood estimate of λ (corresponding to an analysis under the transformed scale). Given λ, posterior distributions have known closed forms for all the models but due to availability of software we choose a Markov chain Monte Carlo (McMC) implementation. This has the advantage of providing a flexible environment for reporting inferences. Details of the Bayesian models and their McMC implementation have been reported in Yang et al. (Reference Yang, Christensen and Sorensen2011).

(ii) Bayesian measures of model comparison and model fit

The models are compared using the pseudo log-marginal probability of the data, which is a commonly used quantity to compare the relative overall fit of a model (Gelfand, Reference Gelfand, Gilks, Richardson and Spiegelhalter1996). A larger value of the log-marginal probability indicates a better overall fit.

Departure from normality of the conditional distribution of the data is studied by means of posterior predictive model checking (Gelman et al., Reference Gelman, Carlin, Stern and Rubin2004). A Monte Carlo estimate of the posterior distribution of the sample coefficient of skewness is computed under λ=1 or under $\lambda \equals \hats\lambda$ , where $\hats\lambda$ is the maximum likelihood estimate of λ. The idea is to check whether residuals under λ=1 or under $\lambda \equals \hats\lambda$ , satisfy the assumption of normality in (1) using the discrepancy measure

(4)

$T\lpar {{\bf z}\comma \bimtheta ^{\lpar j\rpar } } \rpar \equals {{{\textstyle{1 \over n}}\sum\nolimits_{i \equals \setnum{1}}^{n} {z_{i}^{\setnum{3}} } \,} \over {\mathop {\left( {{\textstyle{1 \over n}}\sum\nolimits_{i \equals \setnum{1}}^{n} {z_{i}^{\setnum{2}} \minus } \mathop {\left( {{\textstyle{1 \over n}}\sum\nolimits_{i \equals \setnum{1}}^{n} {z_{i} } \,} \right)}\nolimits^{\setnum{2}} } \right)}\nolimits^{\setnum{3}/ \setnum{2}} }}\comma$

where

$z_{i} \equals {{y_{i}^{\lpar \lambda \rpar } \minus {\bf x}_{i}{\!\prime }\, {\bf b}^{\lpar j\rpar } } \over {\exp\!\mathop {\left( {\mu { \ast } \plus {\bf w}_{i}{\!\prime }\, {\bf b}\ast ^{\lpar j\rpar } } \right)}\nolimits^{{\setnum{1}/ \setnum{2}}} }}$

and θ ^(j)′=(b ^(j)′, μ*^(j), b*^(j)′) represent draws of the parameter vectors of a given model at the jth McMC iteration. Depending on whether the null model or the transformed model is under scrutiny, λ=1 or λ is equal to its maximum likelihood estimate. A graphical display involves plotting histograms of T(z, θ ^(j)) for both models.

The discrepancy measure T(z, θ ^(j)) represents the jth draw from the estimate of the posterior distribution of the sample skewness. The plot constructed using all the draws, displays the Monte Carlo estimate of the posterior distribution of the coefficient of skewness of the conditional distribution (1). Under normality the posterior distribution of the coefficient of skewness should be centred around a posterior mean of zero.

5. Results of the non-parametric analyses

The non-parametric analysis of the 61 sampling variances corresponding to the 61 genotype–location– year interaction subclasses, based on the Monte Carlo estimate of the sampling distribution of Bartlett's statistic, resulted in a p-value of the order of 10⁻⁵ which provides strong evidence against the null hypothesis of variance homogeneity.

To concentrate specifically on the effects of genotype on the variance free from the effects of location– years, the non-parametric analysis of the sampling variances was also performed within the five location– years where the three genotypes are represented with a minimum of 110 records per genotype. This resulted in p-values equal to 0·001, 0·0003, 0·25, 0·27 and 0·90 (ordered from left to right matching the order of the five locations in Table 2), revealing significant effects of genotype in the first two location–years.

Table 2. Average yield ( $\bars{Y}$ ), sample variance (S²) and number of records (n) per genotype (G, one row per genotype), for the five year–location classes where the number of records per genotype is greater than 110

To give an idea of the size of effects, Table 2 shows average yield, sample variance and number of records per genotype, for the five location–year classes, following the same order as the p-values above. The differences in sample variance within year–locations are small, relative to differences between year– locations (there are many locations with less than 100 records per genotype, with sample variance well over 1·3). The figures do not show association between mean yield and variance.

6. Results of the Bayesian analysis

(i) The need for transforming the data

Maximum likelihood estimates of the transformation parameter λ for the six models are as follows. Model ME: 2·5, model INT: 2·5, model ME–ME: 1·8, model ME–INT: 1·9, model INT–ME: 1·8, model INT– INT: 1·8. Under variance homogeneity (models ME and INT) a larger value of λ is necessary to induce normality of residuals.

Four out of the six models were chosen to illustrate the effect of the transformation on the degree of skewness of the data (results for models INT and ME, and for models ME–INT and INT–INT, are very similar). Figure 1, top panel, shows posterior distributions of the coefficient of skewness (4) from the four chosen models fitted to untransformed data. The bottom panel shows the same distributions obtained from analyses of the transformed data. The first figure of the top panel indicates very severe left skewness under the ME model (the distribution of the coefficient of skewness has a posterior mean less than zero), which is only partly accommodated by fitting either the ME–ME model (second figure in top panel), or other heterogeneous variance models. The figures in the bottom panel show that the Monte Carlo estimate of the posterior distributions of the coefficient of skewness from analyses of transformed data are symmetrically distributed around a mean of zero, indicating absence of skewness. This provides strong evidence for the need of transforming the data.

(ii) The effect of genotype on environmental variance: analysis of the complete data

Table 3 shows the measure of global fit of the models when data are analysed in the original metric or in the transformed scale. The figures indicate that in all cases, the quality of fit improves when the analysis is performed on the transformed scale, and when variance heterogeneity is accounted for. The best fitting model is INT–ME followed closely by INT–INT. Two other models with interaction effects acting on the mean were also fitted, one with only location–year effects acting on the variance, and the other with only genotype effects. The differences in log-marginal probabilities of the data between these models and model INT–ME are −32 and −266, respectively (in favour of the INT–ME model). The model with only location–year effects acting on the variance produced a considerably better fit than the one with only genotype effects, and both were outperformed by model INT–ME. This indicates that the effect of genotype should not be ignored as a source of environmental variance heterogeneity. However, the major source of environmental variance heterogeneity in these data is due to location–year effects.

Table 3. Pseudo log-marginal probability of the data for the six models using untransformed data (λ=1) and data analysed at $\lambda \equals \hats\lambda$ , the ML of λ. The figure for a particular model is expressed as the log-marginal probability for model ME under λ=1 minus the log-marginal probability for the particular model (a more extreme negative value indicates a better overall fit)

Posterior means of the ratio of variances of different genotypes G _i, i=1, 2, 3 (95% posterior intervals in parentheses) obtained under the best fitting model, (INT–ME) are shown in Table 4. The analysis in the transformed scale provides support for an effect of genotype on environmental variance (the value of 1 is at the extreme of the posterior interval in the first two cases). As an illustration, the table also shows that an analysis in the original scale (λ=1), without correcting for skewness, can lead to different results.

Table 4. Ratios of environmental variance involving the three genotypic classes (95% posterior intervals in parentheses) under model INT–ME for analyses in the untransformed (λ=1) and transformed ( $\lambda \equals \hats\lambda$ ) scales

(iii) The effect of genotype on environmental variance: analysis within location–years

In order to avoid the overwhelming effect of location– years on variance heterogeneity, five independent analyses were undertaken within the same five location–years in which the number of records per genotype is greater than 110 (see also Table 2). In agreement with the previous two approaches, no support for genetic heterogeneity was detected from the three right-most locations in the table and results are presented in Table 5 for the first two, labelled location–year 1 and 2, respectively. The models fitted are labelled G-HOM, with genotype effects on the mean and homogeneous environmental variance, and G-HET with genotype effects on both mean and variance. For location–year 1, the maximum likelihood estimates of λ using both models is equal to 2·6. For location–year 2, the ML estimate of λ is 2·4 under G-HOM and 1·5 under G-HET. For both models, the distribution of residuals in the original scale is left-skewed and symmetry is restored incorporating the transformation parameter in the analysis (not shown).

Table 5. Pseudo log-marginal probability of the data for models G-HOM and G-HET fitted to subsets of data from location–years 1 and 2, in the original scale (λ=1) and in the transformed scale ( $\lambda \equals \hats\lambda$ ). The log-marginal probabilities for each model within each location–year are expressed as differences from model G-HOM under λ=1 minus the log-marginal probability for the particular model (comparisons are valid within location–years and a more extreme negative value indicates a better overall fit)

Table 5 shows the measure of global fit for data analysed in the original and transformed scales. For both location–years, the genetically heterogeneous variance model G-HET is favoured under both scales.

Table 6 shows posterior means and 95% posterior intervals of the ratios of variances for the 3 genotypic classes, for the two location–years, with data analysed in the transformed scale under model G-HET. There is a very strong evidence for genetic variance heterogeneity in both locations (the value of 1 is excluded from the 95% posterior interval in two out of three cases in both locations). The evidence for genetic variance heterogeneity is stronger in location–year 2, where relative differences of the order of 50% are reported. In these two locations there is no evidence of location–year by genotype interaction at the level of the variance.

Table 6. Ratios of environmental variance involving the three genotypic classes (95% posterior intervals in parentheses) under model G-HET for analyses in the transformed ( $\lambda \equals \hats{\lambda }$ ) scale. First row: location–year 1; bottom row: location–year 2

7. Discussion

The objective of this work was to provide experimental support for the presence of genetic factors operating at the level of the environmental variance. The data available seem at first sight well suited to study this problem, allowing an analysis free from the assumptions inherent of complex models. However, despite the availability of replicated individuals of the same genotype, the absence of a well-defined homogeneous environment created the need to account for a non-genetic source of variation, that could act at the level of mean and variance, and could interact with genotype. This partly interfered with the goal of simplicity, and was dealt with using a non-parametric approach and more formally, applying a Bayesian analysis incorporating a Box–Cox transformation parameter to induce normality. The latter is particularly important because inferences on variances based on normality assumptions can be misleading if the data display skewness, as shown by Yang et al. (Reference Yang, Christensen and Sorensen2011).

The non-parametric analysis of the 61 genotype– location–year subclasses led to a rejection of the null hypothesis of equality of variances. To study heterogeneity due to genotypic effects only, analyses were carried out within the largest five location–year classes where the number of records per genotype exceed 110. Out of five such subsets of data marked differences in residual variance between genotypes were detected in two out of the five location–years. The same conclusion was reached using the Bayesian analyses. The latter allowed a proper quantification of the effect of genotypes on variance, leading to estimates of differences of the order of 30 and 50% in environmental variance between genotypes. The best-fitting Bayesian model postulates interaction of genotype by location–year effects on the mean, and main effects of genotype and of location–year at the level of the environmental variance. The Bayesian analysis also allowed establishing that in the current data, the biggest source of variance heterogeneity is due to location–year effects.

The analyses of dry matter grain yield in these data indicate that differences in the effects of genotypes on environmental variance is location–year dependent. This result fits well with recent work in yeast, reporting that levels of noise in the expression of a gene varies across environments and genetic background (Ansel et al., Reference Ansel, Bottin, Rodriguez-Beltran, Damon, Nagarajan, Fehrmann, François and Yvert2008).

Identical number of seeds were planted per plot at each location and germination rate is known to be greater than 95%. Therefore, the number of plants harvested per plot varies little across plots, and this is expected to have a minor effect on plot yield. Any variation between plots that may arise due to this factor contributes to the macro-environmental variation. If variability in number of plants harvested per plot within genotypes were to be associated with genotype-dependent variability in germination rate, this would contribute positively to differences in within genotype variance between genotypes. An approximate idea of the size of this effect can be obtained as follows (Hozo et al., Reference Hozo, Djulbegovic and Hozo2005). Chebyshev's inequality (Mood et al., Reference Mood, Graybill and Boes1974) states that for a random variable X with mean μ and variance σ² following an arbitrary distribution,

${\rm Pr}\left( { \minus k\sigma \lt X \minus \mu \lt k\sigma } \right)\ges 1 \minus {1 \over {k^{\setnum{2}} }}.$

Setting k=3 yields 0·89 in the right-hand side. Therefore, the range R covers approximately 6σ and σ≈R/6. The average yield per plot is of the order of 10 kg, and with 100 plants per plot, the average contribution to plot yield per plant is 0·1 kg. Assuming that the minimum and maximum number of plants per plot is 90 and 100 plants, the range in yield is 1 kg, which results in a SD from this source equal to 1/6 kg and a variance of around 0·03 kg². Using the figures of sampling variances S ² in Table 2, this only amounts to between 2 and 6% of within genotype variability.

In conclusion, this work has provided strong experimental evidence for the presence of genetic effects operating at the level of the residual variance of dry matter grain yield in maize. However, for a given genotype, its effect on the variance is environment (i.e. location–year) specific. Further, in the present analysis restricted to three genotypes, the biggest contributing source of heterogeneity are location–year effects. An assessment of the possibility of reducing residual variance by selecting specific genotypes would require an analysis involving a larger number of genotypes.

We are grateful to KWS SAAT AG for providing the maize data, and to Professor W. G. Hill for comments on an earlier draft of this manuscript.

References

Ansel, J., Bottin, H., Rodriguez-Beltran, C., Damon, C., Nagarajan, M., Fehrmann, S., François, J. & Yvert, G. (2008). Cell-to-cell stochastic variation in gene expression is a complex genetic trait. PLoS Genetics 4. doi:10.1371/journal.pgen.1000049.CrossRef Google Scholar PubMed

Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Proceedings of the Royal Statistical Society, A 160, 268–282.Google Scholar

Box, G. E. P. & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society Series B 26, 211–252.Google Scholar

Cotes, J. M., Crossa, J., Sanches, A. & Cornelius, P. L. (2006). A Bayesian approach for assessing the stability of genotypes. Crop Science 46, 2654–2665.CrossRef Google Scholar

Edwards, J. W. & Jannink, J. L. (2006). Bayesian modeling of heterogeneous error and genotype×environment interaction variances. Crop Science 2006, 820–833.CrossRef Google Scholar

Falconer, D. S. & Mackay, T. F. C. (1996). Introduction to Quantitative Genetics. Longman.Google Scholar

Gelfand, A. E. (1996). Model determination using sampling-based methods. In Markov Chain Monte Carlo in Practice ( ed. Gilks, W. R., Richardson, S. & Spiegelhalter, D. J.), pp. 145–161. London: Chapman and Hall.Google Scholar

Gelman, A., Carlin, J. B., Stern, H. S. & Rubin, D. B. (2004). Bayesian Data Analysis. London: Chapman and Hall.Google Scholar

Hill, W. G. & Mulder, H. A. (2010). Genetic analysis of environmental variation. Genetics Research 92, 381–395.CrossRef Google Scholar PubMed

Hozo, S. P., Djulbegovic, B. & Hozo, I. (2005). Estimating the mean and variance from the median, range, and the size of a sample. BMC Medical Research Methodology 5. doi:10.1186/1471-2288-5-13.CrossRef Google Scholar PubMed

Jinks, J. L. & Pooni, H. S. (1988). The genetic basis of environmental sensitivity. In Proceedings of the Second International Conference on Quantitative Genetics. North Carolina State University, NC, USA: Sinauer Associates, Inc., pp. 505–522.Google Scholar

Mackay, T. F. C. & Lyman, R. F. (2005). Drosophila bristles and the nature of quantitative genetic variation. Philosophical Transactions of the Royal Society, B 360, 1513–1527.CrossRef Google Scholar PubMed

Mood, A. M., Graybill, F. A. & Boes, D. C. (1974). Introduction to the Theory of Statistics. New York: McGraw-Hill.Google Scholar

Ordas, B., Malvar, R. A. & Hill, W. G. (2008). Genetic variation and quantitative trait loci associated developmental stability and the environmental correlation between traits in maize. Genetical Research 90, 385–395.CrossRef Google Scholar PubMed

Piepho, H. P. (1999). Stability analysis using the SAS system. Agronomy Journal 91, 154–160.CrossRef Google Scholar

Sorensen, D. & Waagepetersen, R. (2003). Normal linear models with genetically structured residual variance heterogeneity: a case study. Genetical Research 82, 207–222.CrossRef Google Scholar PubMed

Yang, Y., Christensen, O. F. & Sorensen, D. (2011). Analysis of a genetically structured variance heterogeneity model using the Box–Cox transformation. Genetics Research 93, 33–46.CrossRef Google Scholar PubMed

Zhang, X. S. & Hill, W. G. (2005). Evolution of the environmental component of phenotypic variance: Stabilizing selection in changing environments and the cost of homogeneity. Evolution 59, 1237–1244.Google Scholar PubMed

Table 1. Description of the six models

Table 2. Average yield (\bars{Y}), sample variance (S2) and number of records (n) per genotype (G, one row per genotype), for the five year–location classes where the number of records per genotype is greater than 110

Table 3. Pseudo log-marginal probability of the data for the six models using untransformed data (λ=1) and data analysed at \lambda \equals \hats\lambda, the ML of λ. The figure for a particular model is expressed as the log-marginal probability for model ME under λ=1 minus the log-marginal probability for the particular model (a more extreme negative value indicates a better overall fit)

Table 4. Ratios of environmental variance involving the three genotypic classes (95% posterior intervals in parentheses) under model INT–ME for analyses in the untransformed (λ=1) and transformed (\lambda \equals \hats\lambda) scales

Table 5. Pseudo log-marginal probability of the data for models G-HOM and G-HET fitted to subsets of data from location–years 1 and 2, in the original scale (λ=1) and in the transformed scale (\lambda \equals \hats\lambda). The log-marginal probabilities for each model within each location–year are expressed as differences from model G-HOM under λ=1 minus the log-marginal probability for the particular model (comparisons are valid within location–years and a more extreme negative value indicates a better overall fit)

Table 6. Ratios of environmental variance involving the three genotypic classes (95% posterior intervals in parentheses) under model G-HET for analyses in the transformed (\lambda \equals \hats{\lambda }) scale. First row: location–year 1; bottom row: location–year 2

Article contents

The genetics of environmental variation of dry matter grain yield in maize

Summary

1. Introduction

2. Data and preliminary calculations

3. Non-parametric analysis

4. Parametric Bayesian analysis

(i) Likelihoods

(ii) Bayesian measures of model comparison and model fit

5. Results of the non-parametric analyses

6. Results of the Bayesian analysis

(i) The need for transforming the data

(ii) The effect of genotype on environmental variance: analysis of the complete data

(iii) The effect of genotype on environmental variance: analysis within location–years

7. Discussion

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests