Introduction
Age at first calving (AFC) is an important indicator trait of reproductive efficiency that is genetically monitored by the Milk-Recording Buffalo Program in Brazil (Aspilcueta-Borquis et al., Reference Aspilcueta-Borquis, Seno, Araujo Neto, Santos, Hurtado-Lugo and Tonhati2022). It is well known, however, that the genetic progress obtained by direct selection of AFC can be very slow due to the low heritability magnitude of this trait in Buffaloes (Camargo et al., Reference Camargo, Aspilcueta-Borquis, Fortes, Porto-Neto, Cardoso, Santos, Lehnert, Reverter, Moore and Tonhati2015). Since genomic selection is more accurate than traditional selection, especially for low-heritability traits (Calus et al., Reference Calus, Meuwissen, De Roos and Veerkamp2008), genomic evaluation is the best strategy for predicting breeding values for AFC. Furthermore, Camargo et al. (Reference Camargo, Aspilcueta-Borquis, Fortes, Porto-Neto, Cardoso, Santos, Lehnert, Reverter, Moore and Tonhati2015) and Araujo Neto et al. (Reference Araujo Neto, Takada, Santos, Aspilcueta-Borquis, Cardoso, Nascimento, Leão, Oliveira and Tonhati2020a) have reported some specific genomic regions highly associated to this trait in GWAS studies in this population, indicating that differential shrinkage models, such as Bayesian alphabet and weighted single-step GBLUP models, would be a reasonable alternative to be explored.
The single-step GBLUP (ssGBLUP) method has become the main methodology used for genomic evaluations in dairy buffaloes (Araujo Neto et al., Reference Araujo Neto, Takada, Santos, Aspilcueta-Borquis, Cardoso, Nascimento, Leão, Oliveira and Tonhati2020a, Reference Araujo Neto, Santos, Fernandes Júnior, Aspilcueta-Borquis, Nascimento, Seno, Tonhati and Oliveira2020b; Cesarani et al., Reference Cesarani, Biffani, Garcia, Lourenco, Bertolini, Neglia, Misztal and Macciotta2021; Lázaro et al., Reference Lázaro, Tonhati, Oliveira, Silva, Nascimento, Santos, Stefani and Brito2021; Araujo Neto et al., Reference Araujo Neto, Santos, Silva Arce, Borquis, Santos, Guimarães, Nascimento, Oliveira and Tonhati2022) given the ease of integrating the relationship matrices, based on pedigree (A) and genomic (G) information, without major changes in the mixed model equations (Misztal et al., Reference Misztal, Lourenco and Legarra2020). Moreover, this methodology allows assigning different weights for each marker, originating the weighted single-step GBLUP method – WssGBLUP (Wang et al., Reference Wang, Misztal, Aguilar, Legarra and Muir2012; Zhang et al., Reference Zhang, Lourenco, Aguilar, Legarra and Misztal2016).
Subsequently, Fernando et al. (Reference Fernando, Dekkers and Garrick2014) proposed single-step methodologies combined to Bayesian regression methods (ssBR), as an option to ssGBLUP. Nonetheless, few studies have been developed comparing the results of the different single-step methodologies (Lee et al., Reference Lee, Cheng, Garrick, Golden, Dekkers, Park, Lee and Fernando2017; Gao et al., Reference Gao, Koivula, Jensen, Strandén, Madsen, Pitkänen, Aamand and Mäntysaari2018; Zhou et al., Reference Zhou, Mrode, Zhang, Zhang, Li and Liu2018). Based on this, it becomes interesting to perform similar analyses in buffalo databases to assist in decision making in breeding programmes that employ genomic technology. Thus, this work was designed with the aim of comparing the predictive ability of different single-step methods using AFC information from Murrah buffaloes.
Materials and methods
Dataset
The phenotypic dataset consisted of 2,314AFC records of Murrah buffalo cows born between 1995 and 2017. Contemporary groups (CG) were formed considering animals born on the same farm, year and season of birth, which was divided into two (dry and rainy season). Animals with records outside the range between ±3 standard deviations from their CG averages and animals belonging to GCs with less than five individuals were removed.
From a pedigree file containing 3320 buffaloes, 553 animals (539 dams and 14 sires) were genotyped with the Axiom Buffalo Genotyping Array 90 K (Iamartino et al., Reference Iamartino, Nicolazzi, Van Tassell, Reecy, Fritz-Waters, Koltes, Biffani, Sonstegard, Schroeder, Ajmone-Marsan, Negrini, Pasquariello, Ramelli, Coletta, Garcia, Ali, Ramunno, Cosenza, Oliveira, Drummond, Bastianetto, Davassi, Pirani, Brew and Williams2017). Only SNPs present on autosomal chromosomes (BBU1-BBU24 referenced by UOA_WB_1 genome assembly), as well as those with call rate > 95%, MAF > 3% and significance level for Hardy Weinberg equilibrium test was 10−5 , were remained in the analysis. All samples had a call rate >90%. The database description (phenotypic and genotypic information) after the consistency step is in Table 1.
PBLUP, ssGBLUP and WssGBLUP models
This set of methodologies is based on the use of relationship matrices between individuals in mixed model equations, with the differences consisting of the type of information used (pedigree or genomic) and the way these matrices are constructed. The model using pedigree based BLUP (PBLUP) can be described as:
where, y is the vector of phenotypic records (AFC); β is the fixed effect vector (CG), X is the incidence matrix that associates β with y; α is the vector of additive genetic effects, Z is an incidence matrix associating α with y and, e is the vector of residuals. The following assumptions were made:
where $\sigma _\alpha ^2$ and $\sigma _\varepsilon ^2$ represent the additive genetic and residual variances; A and In are the pedigree relationship and an identity matrices, respectively.
The single step GBLUP (ssGBLUP) method is an extension of the PBLUP model, in which the pedigree-based (A) and genomic (G) relationship matrices are combined into a single matrix (H), as described by Aguilar et al. (Reference Aguilar, Misztal, Johnson, Legarra, Tsuruta and Lawlor2010). So, its inverse can be obtained as:
where A−1 and G−1 are the inverse matrices of A and G respectively, and ${\boldsymbol A}_{22}^{{-}1}$ is the inverse of the section of A related to the genotyped animals. The G matrix was obtained according to VanRaden (Reference VanRaden2008) as:
where p i is the minor allele frequency (MAF) of SNP i, Z is a matrix relating genotypes of each locus centred by allele frequencies, and D is a diagonal matrix of weights for SNP variances, (with elements d i = SNP i weight). In ssGBLUP model is set D = I, so that the weight of all SNPs is equal to 1.
In WssGBLUP method (Wang et al., Reference Wang, Misztal, Aguilar, Legarra and Muir2012) markers are assigned with different weights using an iterative process described with the following steps:
1. Set D (t) = I, when t = 1
2. The genomic relationship matrix is setup for t as ${\boldsymbol G}^{( t ) } = {\boldsymbol Z}{\boldsymbol D}^{( {\boldsymbol t} ) }{\boldsymbol {Z}^{\prime}}/( 2\;\sum p_i( {1-p_i} )$
3. The Genomic Breeding Value (GEBV or $\hat{\alpha }$) for t are obtained;
4. 4.The GEBV $\hat{\alpha }^{( t ) }$are converted to SNP effects ($\hat{u}$) as $\hat{u}^t = {\boldsymbol D}^{( t ) }{\boldsymbol {Z}^{\prime}}( {\boldsymbol G}^{( t ) }) ^{{-}1}\hat{\alpha }^{( t ) }$
5. The weight of the ith SNP (the ith element of D or d i) is calculated;
6. SNP weights are normalized in D(t+1) to have constant genetic variances of SNP effects;
7. Loop to step 2 three times and exit.
For the methods mentioned above, the program BLUPF90+ (Misztal et al., Reference Misztal, Tsuruta, Lourenço, Masuda, Aguilar, Legarra and Vitezica2014) was used for the estimation of the variance components and for the validation step. The estimation of the effects of the markers and the calculation of the weighting (WssGBLUP) were performed with the program POSTGSF90 (Misztal et al., Reference Misztal, Tsuruta, Lourenço, Masuda, Aguilar, Legarra and Vitezica2014). The heritability estimates for AFC were obtained from these estimated variance components.
Single-step Bayesian regression models (ssBR)
The ssBR models combine all available data from genotyped and non-genotyped animals and use imputed marker covariates for animals that are not genotyped (Fernando et al., Reference Fernando, Dekkers and Garrick2014), can be described as:
where vectors and matrices for non-genotyped animals are denoted with the subscript n and for genotyped animals with the subscript g. Thus, y n and y g represent the vectors of phenotypic observations; β is the vector of systematic effects of GC (equivalent to the fixed effect in frequentist methods); Xn and Xg design matrices for the fixed effects; Zn and Zg represent incidence matrices associated with the genomic values of the animals; α, $\epsilon$ and e, represent in this order, the vectors of marker effects, imputation residual and model residual; Mg is the marker matrix for the genotyped animals and Mn is the imputed marker matrix for the non-genotyped animals. The matrix Mn can be obtained as:
where Ang represents the pedigree-based relationship matrix between genotyped and non-genotyped animals and; ${\boldsymbol A}_{2g2g}^{{-}1}$ represents the inverse of the pedigree-based relationship matrix between genotyped animals. The following ssBR methods were used: BayesA (ssBA- Meuwissen et al., Reference Meuwissen, Hayes and Goddard2001), BayesBπ (ssBBπ- Meuwissen et al., Reference Meuwissen, Hayes and Goddard2001), BayesCπ (ssBCπ- Habier et al., Reference Habier, Fernando, Kizilkaya and Garrick2011), Lasso Bayesian (ssBL- Yi and Xu, Reference Yi and Xu2008) and Bayesian ridge regression (ssBRR- Campos et al., Reference Campos, Hickey, Pong-Wong, Daetwyler and Calus2013), where the assumptions for markers effects are shown in Table 2. The hibayes package (Yin et al., Reference Yin, Zhang, Li, Zhao and Liu2022), available for the R program (R Core Team, 2021), was used to perform the ssBR analyses. A total of 350 000 samples and a burn-in period of 150 000 samples were generated. The convergence was evaluated using graphical analysis. The heritability estimates for AFC were also obtained using these models.
Validation of (G)EBVs
The predictive ability of the methods was assessed using accuracy and dispersion statistics proposed by Legarra and Reverter (Reference Legarra and Reverter2018). This methodology is based on predicting (G)EBV using full ($\hat{u} _w$) and a partial $( \hat{u} _p)$ datasets. The partial dataset (reference group) was obtained by omitting phenotypic records of genotyped animals born after 2010 (validation group). Thus, to assess the prediction ability of the models, only genotyped animals from the validation group were used to calculate the following statistics:
where $\bar{F}$ is the average inbreeding coefficient of the validation individuals and $\hat{\sigma }_{g, i}^2$ is the estimate of additive genetic variance.
Results
In this study, the estimated heritability for AFC was 0.17 using BLUP-based methods and 0.16 to 0.18 using ssBR methods and, the estimates of the effects of the markers can be seen in Figure 1. Accuracy estimates for the genotyped animals ranged from 0.30 (ssBRR) to 0.39 (WssGBLUP) (Fig. 2). Estimates of accuracy with low magnitude were observed for non-genotyped animals, with values ranging from 0.29 (ssGBLUP) to 0.33 (ssBA). The PBLUP model presented accuracies of 0.30 and 0.29 for genotyped and non-genotyped animals, respectively.
Regarding the dispersion of the GEBV prediction (Fig. 2), ssBCπ (0.99) and WssGBLUP (1.00) obtained the best results considering only genotyped animals, while for non-genotyped animals only WssGBLUP (1.02) showed the best value, although it was also the higher than 1. The PBLUP model had the worst result with values straying 35% below from the reference value (1.0).
In the full dataset, Pearson's correlation between GEBVs and EBVs (predicted via PBLUP) ranged from 0.92 (ssBRR) to 0.93 (ssGBLUP), while for animals with both genotypic and phenotypic information, the correlation between GEBVs and EBVs ranged from 0.71 (ssBRR) to 0.77 (WssGBLUP). The correlations between all GEBVs predicted with the single-step genomic models were higher than 0.95.
Discussion
In this study, a heritability estimate obtained for AFC, had magnitude similar to those described in several studies with the buffalo species, with values ranging from 0.13 to 0.19 (Agudelo et al., Reference Agudelo-Gomez, Pineda-Sierra and Cerón-Muñoz2015; Thiruvenkadan et al., Reference Thiruvenkadan, Panneerselvam and Murali2015; Fernandes et al., Reference Fernandes, Marques, de Araujo Neto, de Oliveira, Hurtado-Lugo, Aspilcueta-Borquis and Tonhati2016; Araujo Neto et al., Reference Araujo Neto, Takada, Santos, Aspilcueta-Borquis, Cardoso, Nascimento, Leão, Oliveira and Tonhati2020a, Reference Araujo Neto, Santos, Fernandes Júnior, Aspilcueta-Borquis, Nascimento, Seno, Tonhati and Oliveira2020b). These results show that despite the greater information provided by the markers in the analysis, there was very little influence on the estimation of the genetic parameters, possibly due to the polygenic nature of the trait. There are few reports in the literature on comparison of genomic prediction for reproductive traits, and it is not yet possible to find studies that compared the predictive ability between different single-step methods for AFC. Moreover, several accuracy measures have been reported by different authors, which makes the comparison between their values even more difficult. For instance, Costa et al. (Reference Costa, Irano, Diaz, Takada, Hermisdorff, Carvalheiro, Baldi, Oliveira, Tonhati and Albuquerque2019), working only with genotyped Nellore breed animals and considering the correlation between GEBV and adjusted AFC (r GEBV,Y*), estimated values with magnitudes similar to those found in our study with BayesCπ (0.31) and Bayes-Lasso (0.31), while Toghiani et al. (Reference Toghiani, Hay, Sumreddee, Geary, Rekaya and Roberts2017) reported lower values with BayesA (0.148) and BayesCπ (0.15), studying the same breed and using the same type of accuracy. In both cited studies, accuracy estimates were not presented for PBLUP model, which does not allow us to fully contrast with our findings. Possibly, for comparison purposes, checking the accuracy increments in relation to PBLUP is the best approach to discuss our results. Thus, all literature discussed about predictability were made based on the difference in performance between methods also used in our study.
Considering the percentage of accuracy increment in relation to PBLUP predictions in only genotyped animals, we observed three groups of models: increments less than 5% – ssBL and ssBRR; increments close 10% – ssBBπ, ssBCπ and ssGBLUP and; increments greater than 25% – ssBA and WssGBLUP. Estimates of accuracy increments using SNP markers with magnitudes similar to those we have found for AFC, have been described in the literature for productive traits in buffaloes (Cesarani et al., Reference Cesarani, Biffani, Garcia, Lourenco, Bertolini, Neglia, Misztal and Macciotta2021; Herrera et al., Reference Herrera, Flores, Duijvesteijn, Moghaddar and van der Werf2021).
For milk, fat and protein yield, greater increases in accuracy (r GEBV,Y*) are presented by Herrera et al. (Reference Herrera, Flores, Duijvesteijn, Moghaddar and van der Werf2021) (13.04 to 76.47%) and Cesarani et al. (Reference Cesarani, Biffani, Garcia, Lourenco, Bertolini, Neglia, Misztal and Macciotta2021) (15.28 to 33.33%). Aspilcueta-Borquis et al. (Reference Aspilcueta-Borquis, Araujo Neto, Santos, Hurtado-Lugo, Silva and Tonhati2015), studying the same population of this study and ssGBLUP model, reported similar increase in the average accuracy (based on prediction error variance) of (G)EBVs, which ranged from 8.52 to 12.05% for several dairy traits. This gain in accuracy observed with the use of genomic information is due to the additional capture of both Mendelian sampling variations and relationships between animals (Christensen et al., Reference Christensen, Madsen, Nielsen, Ostersen and Su2012; Cesarani et al., Reference Cesarani, Garcia, Hidalgo, Degano, Vicario, Macciotta and Lourenco2020). The similarity in predictive performance between the different single-step genomic methods, however, could be related to the reduced number of animals evaluated, the small subset of genotyped animals analysed, and the polygenic nature of the AFC (Calus, Reference Calus2010; Campos et al., Reference Campos, Hickey, Pong-Wong, Daetwyler and Calus2013).
The proximity between ssBA and GRM-based model observed in this study (by the magnitude of estimated accuracies) may be explained by the polygenic nature of the trait and the robustness of ssBA method for different genetic architectures. Zhou et al. (Reference Zhou, Mrode, Zhang, Zhang, Li and Liu2018), studying the effects of QTL number on the accuracy prediction with simulated data, also verified that ssBA method was robust and performed similarly to ssGBLUP in scenarios with a large number of QTL.
The dispersion criterion is related to the degree of inflation or deflation of predictions, which was measured in our validation set as the slope of the regression of the GEBV obtained with the full dataset on GEBV from the partial dataset. Our results showed that the inclusion of genomic information reduces the dispersion of prediction compared to the traditional evaluations (PBLUP), regardless of the single-step model considered. The same improvement in prediction slopes by using genomic information was reported by Lázaro et al. (Reference Lázaro, Tonhati, Oliveira, Silva, Nascimento, Santos, Stefani and Brito2021) studying several milk-related traits with random regression models in the same buffalo population, while other authors, such as Gao et al. (Reference Gao, Koivula, Jensen, Strandén, Madsen, Pitkänen, Aamand and Mäntysaari2018) also analysing dairy traits in Finnish red dairy cattle, report no benefit. Moreover, the GEBVs predicted with WssGBPUP did not disperse in relation to what was expected for the subset of genotyped animals and presented little deflation for the prediction of non-genotyped animals. The low dispersion of GEBVs predicted with WssGBLUP compared to the predictions from other genomic models is probably due to the self-fitting of the SNP weights to the dataset (weighting steps), which regulates the scale of the additive effect to the full dataset previously (a priori).
The results indicate that the use of genomic information can improve the genetic gain for AFC by increasing the accuracy and reducing inflation/deflation of predictions of this trait compared to the traditional pedigree-based model. In addition, among all genomic single-step models studied, WssGBLUP and ssBA were the most advantageous models to be used for the genomic evaluation of AFC of buffaloes from this population.
Authors’ contributions
Jessica Cristina Gonçalves dos Santos: Methodology, Analysis , Writing – original draft.; Francisco Ribeiro de Araujo Neto: Conceptualization, Methodology, Statistical analysis, Writing – review; Gabriela Stefani Fernandez: Database; Daniel Jordan de Abreu Santos: Writing – review & editing; Felipe Pereira Cunha: Writing editing; Rusbel Raul Aspilcueta-Borquis: Statistical analysis; Humberto Tonhati: Conceptualization, Database, Supervision.
Funding statement
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Competing interests
None.
Ethical standards
Not applicable (the work was developed with a database).