Hostname: page-component-586b7cd67f-l7hp2 Total loading time: 0 Render date: 2024-11-22T05:59:50.210Z Has data issue: false hasContentIssue false

Genome-wide association study identifies significant SNP and related genes associated with body size in Yorkshire pigs using latent variable modelling

Published online by Cambridge University Press:  07 September 2023

Elahe Sanjari Banestani
Affiliation:
Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, Iran
Ali Esmailizadeh
Affiliation:
Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, Iran
Mehdi Momen
Affiliation:
Department of Surgical Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI 53706, USA
Ahmad Ayatollahi Mehrgardi
Affiliation:
Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, Iran
Morteza Mokhtari*
Affiliation:
Department of Animal Science, Faculty of Agriculture, University of Jiroft, P.O. Box 7867155311, Jiroft, Iran
*
Corresponding author: Morteza Mokhtari; Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

This study aimed to quantify a latent variable for body size (BS) in pigs by using five linear body measurements including body length (BL), body height (BH), chest width (CW), chest girth (CG) and tube girth (TG), and also to identify the most associated single nucleotide polymorphisms (SNP) and related genes with BS by using the genomic best linear unbiased prediction (GBLUP) based genome-wide association study (GWAS) or GBLUP-GWAS methodology. To perform a GWAS on the BS latent trait, we used a mixed linear model and identified a total of 53 significant SNPs. Additionally, we found that nine genes, including Rho GTPase activating protein 12 (ARHGAP12), transmembrane protein 108 (TMEM108), T-cell lymphoma invasion and metastasis inducing factor 1 (TIAM1), ras homologue gene family member B (RHOB), POU class 4 homeobox 1 (POU4F1), follistatin-related protein 4 (FSTL4), cellular communication network factor 2 (CCN2), beaded filament structural protein 2 (BFSP2) and attractin-like protein 1 (ATRNL1) were associated with the BS trait in pigs. These genes are involved in several biological processes, including the regulation of anatomical structure, morphogenesis, the regulation of cell size and growth. The results suggest that the identified SNP and related genes may play important roles in regulating the growth and development of pigs. The results imply that these genes could be promising candidates for further exploration of the underlying mechanisms of body size variation. Furthermore, the findings have significant practical implications for enhancing the efficiency and profitability of pig farming through genetic selection.

Type
Animal Research Paper
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press

Introduction

Multivariate models are of great importance in quantitative genetics. In animal breeding, the value of a candidate for selection as a prospective parent of the next generation often is a function of several traits (Gianola and Sorensen, Reference Gianola and Sorensen2004). A typical characteristic of the multi-trait mixed models is the increase in the number of parameters with the increase in the number of the traits involved in genetic analysis, which may compromise the feasibility of genetic analysis (Silva et al., Reference Silva, Paiva, Botelho, Carrara, Lopes, Silva, Veroneze, Sterman Ferraz, Eler, Mattos and Gaya2021). Structural equation modelling (SEM) is an advanced statistical technique developed by Wright (Reference Wright1921), which allows for the exploration of causal relationships among phenotypes in a multivariate setting. SEM employs diverse models to depict the relationships among observed variables and enables a quantitative evaluation of a theoretical or measurement model for these relationships (Schumacker and Lomax, Reference Schumacker and Lomax1996). This kind of modelling assesses the functional connections between the traits, allowing inferences to be made about causal relationships and the strength of their associations (Rosa et al., Reference Rosa, Valente, de los Campos, Wu, Gianola and Silva2011). Latent variable modelling is a primary application of SEM in a range of scientific disciplines (Schumacker and Lomax, Reference Schumacker and Lomax1996).

As a dimension reduction technique, latent variable modelling can be applied to phenotypic data for reducing dataset dimensionality and minimizing computational complexity. For example, Silva et al. (Reference Silva, Paiva, Botelho, Carrara, Lopes, Silva, Veroneze, Sterman Ferraz, Eler, Mattos and Gaya2021) demonstrated that a small set of latent variables can effectively reduce the dimensionality of phenotypic data, thereby alleviating the complexity that arises from the over-parameterization of the model. Statistically, latent variables are defined as variables that are not directly measurable but can be characterized by several observed phenotypes. A latent variable is constructed or measured using observed phenotypes (Schumacker and Lomax, Reference Schumacker and Lomax1996).

Latent variable modelling provides the opportunity to investigate biologically complex phenomena by reducing the data dimensionality in such a way that many phenotypes are combined to represent a few underlying concepts of interest (Leal-Gutierrez et al., Reference Leal-Gutiérrez, Rezende, Elzo, Johnson, Penagaricano and Mateescu2018). The latent traits are assumed to be unobserved, but they are believed to explain the covariation among the observed variables. Confirmatory factor analysis (CFA) is one of the methods that can be used in SEM to reduce the dimensionality of phenotypic data in genomic studies (Yu et al., Reference Yu, Campbell, Zhang, Walia and Morota2019). CFA is usually used to test the hypothesis that a set of observed variables are related to a smaller number of latent factors, also known as latent variables or latent traits (Momen et al., Reference Momen, Bhatta, Hussain, Yu and Morota2021).

Genome-wide association study (GWAS) is a well-known method for detecting causal genes or genomic regions which are related to economic traits in livestock populations. This approach uses many single nucleotide polymorphisms (SNP) to screen the whole genome for target genes that are associated with a trait of interest. So, GWAS has become a valuable and effective method to identify candidate genes for important economic traits in livestock. In recent years, GWAS have been included in genetic breeding programmes for livestock species such as pigs and have identified many genes or molecular markers that could regulate important economic traits (Le et al., Reference Le, Christensen, Nielsen and Sahana2017; Liu et al., Reference Liu, Song, Jiang, Jiang, Zhang, Liu, Shi, Ding and Wang2021; Wang et al., Reference Wang, Wang, Li, Sun, Chen, Yan, Dong, Pan and Lu2022).

In production animals, body size (BS) is an important trait that reflects the overall appearance of animals (Liu et al., Reference Liu, Song, Jiang, Jiang, Zhang, Liu, Shi, Ding and Wang2021). Body size is a typical quantitative (or complex) trait and understanding the genetic mechanisms controlling body size differences among individuals can effectively help control the growth and production of animals (Niu et al., Reference Niu, Kim, Choi, Kim, Kim and Kim2013). For example in pigs, body size can objectively reflect the response of pigs to their environment (Ohnishi and Satoh, Reference Ohnishi and Satoh2018). In pig breeding, the body shape character index is often used as the most direct production index of a pig (Liu et al., Reference Liu, Song, Jiang, Jiang, Zhang, Liu, Shi, Ding and Wang2021). Various body measurements and morphological traits such as body length, body height, chest width, rump width and heart girth have been considered as measures of body size in pigs (Song et al., Reference Song, Zhang, Zhang and Ding2019; Liu et al., Reference Liu, Song, Jiang, Jiang, Zhang, Liu, Shi, Ding and Wang2021) species. Leal-Gutierrez et al. (Reference Leal-Gutiérrez, Rezende, Elzo, Johnson, Penagaricano and Mateescu2018) identified latent variables underlying carcass and meat quality traits in beef cattle and performed whole-genome scans for these latent variables to identify associated genomic regions and individual genes, the application of defining a latent trait for body size measurement in genomic analysis has yet to be studied in pigs.

This study aimed to model and identify the latent trait of body size (BS) using five measurable body dimensions, conduct a GWAS for BS, and identify candidate genes and genomic regions associated with BS in Yorkshire pigs.

Materials and methods

Data source and phenotypes

The data used in the present study originated from an elite Chinese pig breeding farm in Beijing that is a descendant of American Yorkshire populations (http://figshare.com/articles/single-step-strategies/7434203). In the present study, the phenotypic records of five linear body measurements of 5573 Yorkshire pigs including body length (BL), body height (BH), chest width (CW), chest girth (CG) and tube girth (TG) were used. Descriptive statistics for the measured body dimension traits used in the present study are shown in Supplementary Table 1. More detailed information about the animals and phenotypes was presented by Song et al. (Reference Song, Zhang, Zhang and Ding2019) and Liu et al. (Reference Liu, Song, Jiang, Jiang, Zhang, Liu, Shi, Ding and Wang2021).

Genotype data, quality control and imputation

In the present study, genotype data on 592 out of 5572 Yorkshire pigs (Song et al., Reference Song, Zhang, Zhang and Ding2019) were used. Animals were genotyped using the PorcineSNP80 Bead Chip (Illumina, San Diego, CA, United States), which includes 68 528 SNP across the whole pig genome. PLINK software version 2.0 (Purcell et al., Reference Purcell, Neale, Todd-Brown, Thomas, Ferreira, Bender, Maller, Sklar, De Bakker, Daly and Sham2007) was used for quality control of genotype data and SNP with a maximum missing rate of 0.10, a maximum individual missing rate of 0.10, a minor allele frequency (MAF) ⩽0.01, and Hardy-Weinberg equilibrium (HWE) with a P value < 10−7 were excluded. Missing genotypes were imputed using Beagle software version 5.4 (Browning and Browning, Reference Browning and Browning2007). Among 68 528 SNPs, we removed 6197 SNPs related to those unmapped and those located on mitochondrial DNA and sex chromosomes; 9621 SNP were discarded by applying the thresholds considered for maximum missing rate, maximum individual missing rate, minor allele frequency and Hardy-Weinberg equilibrium. We hypothesized that BS architecture may be influenced by multiple genetic loci across the genome, and this highlights the critical role of quality control measures in ensuring the reliability of GWAS results for this analysis. After quality control, 583 genotyped individuals remained and 52 710 SNPs were finally used for further analysis.

Latent variable modelling by CFA

The theoretical model hypothesized for constructing the BS latent variable from the considered five linear body dimensions is shown in Fig. 1. The CFA technique is used to extract the underlying latent factors that contribute to variations in body size measurements. To perform a CFA for extracting a latent factor related to body size, it is necessary to establish a measurement model that specifies the connections between the observed variables and the latent factor(s). The following CFA model was considered in this study:

$$X = {\bf \Lambda }\xi + \delta $$

Where, X is the matrix of measured variables (BL, BH, CW, CG and TG), ξ is the vector of latent factors. Λ matrix contains the factor loadings that associate these factors to the measured variables, and δ is the residuals vector. The model was fitted by applying the maximum likelihood estimation method. The lavaan R package (Rosseel, Reference Rosseel2012; R Core Team, 2021) was used to fit the above CFA model.

Figure 1. Representation of latent variable of BS and the respective relationships with body length (BL), body height (BH), chest width (CW), chest girth (CG) and tube girth (TG). For each observed variable, the estimation of loadings is presented.

The overall fit of the hypothesized model for constructing BS was evaluated by using fit indices, including standardized root mean square residual (SRMR) (Bentler, Reference Bentler1990), root mean square error of approximation (RMSEA) (Steiger, Reference Steiger1990), Tucker-Lewis Index (TLI) (Bentler, Reference Bentler1990) and comparative fit index (CFI) (Bentler, Reference Bentler1990).

Univariate genome-wide association analysis

A univariate GWAS model using the SNP Snappy strategy (Meyer and Tier, Reference Meyer and Tier2012) was applied by fitting the following model:

$$y = {\bf W}s + {\bf Xb} + {\bf Zg} + {\bf e}$$

where y is the vector of latent variable BS, s is the additive effect for SNP marker j, b is the vector of fixed effects including overall mean and herd-year-season-sex (the combination of the effects of herd, year, season, and sex of pigs) effect, g is the vector of genomic value effects and e is the vector of residual effects. Furthermore, the body weight of animals was considered a covariate. W is the matrix of genotype codes of SNP marker j (AA = 0, AB = 1 and BB = 2). X and Z are the incidence matrices associating b and g to y, g following a normal distribution of N (0, G$\sigma _g^2$), in which $\sigma _g^2$ is the variance of additive genetic effects. Here, G is the marker-based genomic relationship matrix (VanRaden, Reference VanRaden2008) to account for the effects of population structure and reduce the likelihood of false positives and increase the power to detect true associations between genetic variants and traits of interest. e following a normal distribution of N (0, I$\sigma _e^2$), in which $\sigma _e^2 \;$is the residual variance. The WOMBAT program (Meyer, Reference Meyer2007) was used for model fitting and estimating SNP solutions.

The genomic inflation factor was estimated as the median of the χ 2 test statistics of the nominal P values, divided by the expected median of the χ 2 distribution. If the data follow the standard χ 2 distribution, the expected genomic inflation factor value would be 1. If the genomic inflation factor value is greater than 1, it provides evidence for some systematic bias (Iterson et al., Reference Iterson, van Zwet and Heijmans2017). In the present study, the method proposed by Iterson et al. (Reference Iterson, van Zwet and Heijmans2017) was used to correct P values for the genomic inflation factor. In this method, test statistics (the ratio of SNP solution to the corresponding standard error) are divided by the genomic inflation factor. In the present study, the genomic inflation factor was obtained at 1.01.

Significance testing of SNP effects was done by applying a two-sided t test and the P value of each SNP effect was calculated accordingly. The Manhattan plot of –log10 (P values) of SNP effects was obtained by using the ‘qqman’ R package (Turner, Reference Turner2018).

To account for multiple testing that resulted in false positives, the false discovery rate (FDR) (Weller et al., Reference Weller, Song, Heyen, Lewi and Ron1998) was controlled at 5% genome-wide levels to identify significant associations. Significant thresholds for genome-wide (P < 0.05) and suggestive (that is, one false positive per genome scan) levels were 9.48 × 10−7 and 1.90 × 10−5, corresponding to the –log 10 (P value) of 6.02 and 4.72, respectively.

Identification of candidate genes and gene ontology enrichment analysis

In the present study, Ensembl Variant Effect Predictor (VEP) web interface (http://www.ensembl.org/ vep.) is used for the analysis, annotation and prioritization of genomic variants in the coding and non-coding regions (McLaren et al., Reference McLaren, Gil, Hunt, Riat, Ritchie, Thormann, Flicek and Cunningham2016). After identifying significant SNPs by GWAS, the genes located in the 50 Kb downstream and 50 Kb upstream region of the significant SNPs were annotated with reference to the Sus scrofa 11.1 genome assembly (http://www.ensembl.org/Sus_scrofa/Info/Index/). The gene ontology (GO) enrichment analysis is performed via the web-based Gene Ontology Resources interface (http://geneontology.org/) by considering the biological process of the identified genes.

Results

Extracting BS latent trait through factor analysis

Figure 1 provides a comprehensive representation of the hypothesized model considered for BS by underlying relationships within the system under investigation and incorporates BS as a latent variable in the centre of the five measured variables. The goodness of fit measures including CFI, TLI, RMSEA and SRMR were 0.96, 0.93, 0.09 and 0.03, respectively. The factor loadings for the observed variables of BL, CG, TG, BH and CW were 0.62, 0.84, 0.57, 0.54 and 0.63, respectively and statistically significant (P < 0.01). Positive factor loadings indicate that an increase in the observed variable is associated with an increase in the latent variable. In this study, the CG variable had the strongest relationship with the BS latent variable based on the estimated factors loading.

Genome-wide association analysis and selecting significant SNP

Figure 2 represents the Manhattan plot of the GWAS result for BS. By considering a threshold of 6.02 for genome-wide significant level (P < 0.05), a total of 53 significant SNPs was identified. As shown in Supplementary Table 2, 10 out of the 53 significant SNPs were located in chromosome 11.

Figure 2. Manhattan plot representing chromosome-wide association with the body size latent trait in Yorkshire pig using GBLUP-GWAS. –log10(P values) of SNPs (Y-axis) are shown across the 18 autosomal chromosomes (X-axis). Significant thresholds for genome-wide (P < 0.05) and suggestive levels were 9.48 × 10−7 and 1.90 × 10−5, corresponding to –log 10 (P value) of 6.02 and 4.72, are shown with red and blue lines, respectively.

GO enrichment analysis

The results obtained from GO revealed that 24 genes were in the flanking regions of the top SNP (Table 1). Some information about the characteristics of SNP related to the biological functions of the genes is presented in Supplementary Table 3. Among them, nine genes including Rho GTPase activating protein 12 (ARHGAP12), transmembrane protein 108 (TMEM108), T-cell lymphoma invasion and metastasis inducing factor 1 (TIAM1), ras homologue gene family member B (RHOB), POU class 4 homeobox 1 (POU4F1), follistatin-related protein 4 (FSTL4), cellular communication network factor 2 (CCN2), beaded filament structural protein 2 (BFSP2) and attractin-like protein 1 (ATRNL1) were found to be related to the BS. A summary of the biological processes related to these candidate genes is shown in Table 2.

Table 1. Significant single nucleotide polymorphisms (SNP) and associated genes for body size latent trait in pigs

SCA, Sus scrofa autosome chromosomes.

a False discovery rate (FDR) adjusted P values.

b Genes in ±50-kb flanking regions of significant SNP.

Table 2. Biological processes associated with candidate genes for body size in pig

Discussion

The goodness of fit measures appropriately indicates the adequacy of the confirmatory factor model proposed for the latent variable of body size (BS) in the present study. Leal-Gutierrez et al. (Reference Leal-Gutiérrez, Rezende, Elzo, Johnson, Penagaricano and Mateescu2018) applied a similar methodology for defining latent variable of carcass quality by applying observed variables of quality grade, fat over ribeye and marbling in beef cattle.

All loading factors were positive and implying that an increase in the observed variable is associated with an increase in the latent variable of BS. In this study, the CG variable had the strongest relationship with the BS latent variable based on the estimated loading factors. The directions of the loading factors with BS were positive and agree with the theoretical model where the considered observed variables are positively related to BS. The positive direction was expected given that the BS construct reflects mainly body measurement traits (Kominakis et al., Reference Kominakis, Hager-Theodorides, Zoidis, Saridaki, Antonakos and Tsiamis2017). The results suggest that the unobserved variable of BS plays a significant role in explaining the interrelationships among the observed variables. These findings underscore the necessity of incorporating latent variables when modelling intricate systems. Identifying genomic regions affecting BS can provide a suitable opportunity for improving this latent trait and correspondingly the associated observed and measurable traits in pigs. Liu et al. (Reference Liu, Song, Jiang, Jiang, Zhang, Liu, Shi, Ding and Wang2021) pointed out that SNPs controlling body size characteristics are widely distributed on the genome, implying that fitting the infinitesimal model is appropriate. Boyle et al. (Reference Boyle, Li and Pritchard2017) remembered that for complex traits such as body height, action sites are widely distributed across the entire genome, denoting that approximately all genes are involved in the regulation of this trait.

In the present study, nine candidate genes were detected which were potentially associated with BS in pigs. The relevant biological processes included anatomical structure development, homoeostasis, morphogenesis and maturation; regulation of anatomical structure size, regulation of anatomical morphogenesis, response to growth factor, developmental cell growth, developmental growth involved in morphogenesis, regulation of cell growth, regulation of cell size, regulation of anatomical structure size, cell morphogenesis, skeletal system morphogenesis, animal organ morphogenesis, anatomical structure formation involved in morphogenesis and tissue morphogenesis.

Cellular Communication Network-2 (CCN2), formerly termed Connective Tissue Growth Factor (CTGF), is a matricellular protein with four distinct structural modules that can exert a dual function as a matricellular protein and as a growth factor (Rodrigues-Diez et al., Reference Rodrigues-Diez, Tejera-Munoz, Esteban, Steffensen, Rodrigues-Díez, Orejudo, Rayego-Mateos, Falke, Cannata-Ortiz, Ortiz, Egido, Mallat, Briones, Auxiliadora Bajo, Goldschmeding and Ruiz-Ortega2022). CCN2 is a protein belonging to the Cellular Communication Network (CCN) family of secreted extracellular matrix-associated proteins (Leguit et al., Reference Leguit, Raymakers, Hebeda and Goldschmeding2021). Besides, CCN2 and its fragments have been involved in the regulation of several biological processes, including cell proliferation, differentiation, adhesion, migration, cell survival, apoptosis and the making of extracellular matrix products, as well as in more complex processes such as embryonic development, angiogenesis, chondrogenesis, osteogenesis, fibrosis, mechanotransduction and inflammation (Leguit et al., Reference Leguit, Raymakers, Hebeda and Goldschmeding2021). Rebolledo et al. (Reference Rebolledo, Acuna and Brandan2021) pointed out that in the skeletal muscle of mammals, most of the CCN family members are expressed during embryonic development or in adulthood. Their roles during the adult stage are related to the regulation of muscle mass and regeneration, maintaining vascularization and the modulation of skeletal muscle fibrosis. POU4F1, also known as Brn3a, plays a crucial role in normal cardiac development during embryonic stages (Maskell et al., Reference Maskell, Qamar, Babakr and Hawkin2017). In addition, studies in mice have shown that POU4F1 is associated with the development of spiral ganglion neurons (Maskell et al., Reference Maskell, Qamar, Babakr and Hawkin2017). RHOB, Ras Homologue family member B, is a GTP-binding protein involved in vesicular trafficking in anterior pituitary cells (Cussac et al., Reference Cussac, Leblanc, LHeritier, Bertoglio, Lang, Kordon, Enjalbert and Saltarelli1996) and influences endothelial cell morphogenesis through regulating growth factor receptor trafficking and signalling (Howe and Addison, Reference Howe and Addison2012). RHOB has proven to have specific and pleiotropic functions in organisms from mammalian development to DNA damage survival responses (Vega and Ridley, Reference Vega and Ridley2018).

TMEM108 or Transmembrane protein 108, also named Retrolinkin plays an important role in the central nervous system (Xu et al., Reference Xu, Ji, Zhang, Zhang, Nie, Xu, Zhang and Zhang2016). A GWAS found that TMEM108 is a susceptibility gene for psychiatric disorders, including schizophrenia and bipolar disorder (Neale et al., Reference Neale, Medland, Ripke, Asherson, Franke, Lesch, Faraone, Nguyen, Schafer, Holmans, Daly, Steinhausen, Freitag, Reif, Renner, Romanos, Romanos, Walitza, Warnke, Meyer, Palmason, Buitelaar, Vasquez, Lambregts-Rommelse, Gill, Anney, Langely, O'Donovan, Williams, Owen, Thapar, Kent, Sergeant, Roeyers, Mick, Biederman, Doyle, Smalley, Loo, Hakonarson, Elia, Todorov, Miranda, Mulas, Ebstein, Rothenberger, Banaschewski, Oades, Sonuga-Barke, McGough, Nisenbaum, Middleton, Hu and Nelson2010). TIAM1 regulates signalling pathways that affect the control of neuronal morphogenesis and neurite outgrowth by modulating the actin cytoskeletal network (Lu et al., Reference Lu, Hernan, Marcogliese, Huang, Gertler, Akcaboy, Liu, Chung, Pan, Sun, Oguz, Oztoprak, de Baaij, Ivanisevic, McGinnis, Guillen Sacoto, Chung and Bellen2022). FSTL4 regulates cell size and is considered as a candidate gene that might affect rump angle in Chinese Holstein cows (Lu et al., Reference Lu, Mohamed Abdalla, Nazar, Fan, Zhang, Wu, Xu and Yang2021). Intermediate filaments, which integrate individual cells into their respective tissues, are key components of the cytoskeleton in all vertebrate cells. BFSP2 is one of them (Song et al., Reference Song, Landsbury, Dahm, Liu, Zhang and Quinlan2009). ATRNL1 or Attractin-Like protein 1 is a 1379 amino acid single-pass type I membrane protein that may play a role in melanocortin signalling pathways that regulate energy homoeostasis. The ATRNL1 gene is conserved in canine, bovine, mouse, rat, chicken and zebrafish (Stark et al., Reference Stark, Bruno, Mountford, Lockhart and Amor2010). ARHGAP12 is specifically expressed in the CA1 region of the hippocampus, where it localizes to the post-synaptic compartment of excitatory synapses. ARHGAP12 negatively controls spine size. ARGHAP12 knockdown results in precocious maturation of excitatory synapses, as indicated by a reduction in the proportion of silent synapses. ARHGAP12 regulates excitatory synaptic structure and function during development (Ba et al., Reference Ba, Selten, van der Raadt, van Veen, Li, Benevento, Oudak er, Lasabuda, Letteboer, Roepman, van Wezel, Courtney, van Bokhoven and Nadif Kasri2016).

The results imply that these genes could be promising candidates for further exploration of the underlying mechanisms of body size variation. Furthermore, the findings have significant practical implications for enhancing the efficiency and profitability of pig farming through genetic selection and may advance our understanding of body size regulation in other organisms.

Conclusion

In the present study, we implemented appropriate latent variable modelling by applying the CFA technique for constructing the latent variable of BS from five body measurement traits in Yorkshire pigs. Genomic best linear unbiased prediction (GBLUP)-GWAS was used for identifying the SNPs and candidate genes associated with latent trait BS. In total, 53 significant SNPs were identified; widely distributed on all chromosomes except for chromosomes 9 and 18. Nine genes were related to BS in Yorkshire pigs which involved several biological processes like regulation of anatomical structure, morphogenesis and development, regulation of cell size, growth and morphogenesis.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S0021859623000424

Acknowledgments

We thank Mahmoud Amiri Roudbar for helping with genotype imputation.

Authors’ contributions

Ali Esmailizadeh, Mehdi Momen and Morteza Mokhtari conceptualized the study. Elahe Sanjari Banestani and Morteza Mokhtari performed statistical analyses and wrote the preliminary version of the manuscript. Ali Esmailizadeh, Ahmad Ayatollahi Mehrgardi and Mehdi Momen edited the initial version of the manuscript.

Funding statement

This research received no specific grant from any funding agency, commercial or not-for-profit sectors.

Competing interest

None.

Ethical standards

Not applicable.

References

Ba, W, Selten, MM, van der Raadt, J, van Veen, H, Li, LL, Benevento, M, Oudak er, AR, Lasabuda, RSE, Letteboer, SJ, Roepman, R, van Wezel, RJA, Courtney, MJ, van Bokhoven, H and Nadif Kasri, N (2016) ARHGAP12 functions as a developmental brake on excitatory synapse function. Cell Reports 14, 13551368.10.1016/j.celrep.2016.01.037CrossRefGoogle ScholarPubMed
Bentler, PM (1990) Comparative fit indexes in structural models. Psychological Bulletin 107, 238246.CrossRefGoogle ScholarPubMed
Boyle, EA, Li, YI and Pritchard, JK (2017) An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 7117771186.CrossRefGoogle ScholarPubMed
Browning, SR and Browning, BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. The American Journal of Human Genetics 81, 10841097.CrossRefGoogle ScholarPubMed
Cussac, D, Leblanc, P, LHeritier, A, Bertoglio, J, Lang, P, Kordon, C, Enjalbert, A and Saltarelli, D (1996) Rho proteins are localized with different membrane compartments involved in vesicular trafficking in anterior pituitary cells. Molecular and Cellular Endocrinology 119, 195206.CrossRefGoogle ScholarPubMed
Gianola, D and Sorensen, D (2004) Quantitative genetic models for describing simultaneous and recursive relationships between phenotypes. Genetics 167, 14071424.CrossRefGoogle ScholarPubMed
Howe, GA and Addison, CL (2012) RhoB controls endothelial cell morphogenesis in part via negative regulation of RhoA. Vascular Cell 4, 111.CrossRefGoogle ScholarPubMed
Iterson, MV, van Zwet, EW and Heijmans, BT (2017) Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution. Genome Biology 18, 19.CrossRefGoogle ScholarPubMed
Kominakis, A, Hager-Theodorides, AL, Zoidis, E, Saridaki, A, Antonakos, G and Tsiamis, G (2017) Combined GWAS and ‘guilt by association-based prioritization analysis identifies functional candidate genes for body size in sheep. Genetics Selection Evolution 49, 41.CrossRefGoogle ScholarPubMed
Le, TH, Christensen, OF, Nielsen, B and Sahana, G (2017) Genome-wide association study for conformation traits in three Danish pig breeds. Genetics Selection Evolution 49, 12.CrossRefGoogle ScholarPubMed
Leal-Gutiérrez, JD, Rezende, FM, Elzo, MA, Johnson, D, Penagaricano, F and Mateescu, RG (2018) Structural equation modeling and whole-genome scans uncover chromosome regions and enriched pathways for carcass and meat quality in beef. Frontiers in Genetics 9, 113.CrossRefGoogle ScholarPubMed
Leguit, RJ, Raymakers, RAP, Hebeda, KM and Goldschmeding, R (2021) CCN2 (cellular communication network factor 2) in the bone marrow microenvironment, normal and malignant hematopoiesis. Journal of Cell Communication and Signaling 15, 2556.CrossRefGoogle ScholarPubMed
Liu, H, Song, H, Jiang, Y, Jiang, Y, Zhang, F, Liu, Y, Shi, Y, Ding, X and Wang, C (2021) A single-step genome wide association study on body size traits using imputation-based whole-genome sequence data in Yorkshire pigs. Frontiers in Genetics 12, 114.Google ScholarPubMed
Lu, X, Mohamed Abdalla, I, Nazar, M, Fan, Y, Zhang, Z, Wu, X, Xu, T and Yang, Z (2021) Genome-wide association study on reproduction-related body-shape traits of Chinese Holstein cows. Animals 11, 1927.CrossRefGoogle ScholarPubMed
Lu, S, Hernan, R, Marcogliese, PC, Huang, Y, Gertler, TS, Akcaboy, M, Liu, S, Chung, HL, Pan, X, Sun, X, Oguz, MM, Oztoprak, U, de Baaij, JHF, Ivanisevic, J, McGinnis, E, Guillen Sacoto, MJ, Chung, WK and Bellen, HJ (2022) Loss-of-function variants in TIAM1 are associated with developmental delay, intellectual disability, and seizures. The American Journal of Human Genetics 109, 571586.CrossRefGoogle ScholarPubMed
Maskell, LJ, Qamar, K, Babakr, AA and Hawkin, TA (2017) Essential but partially redundant roles for POU4F1/Brn-3a and POU4F2/Brn-3b transcription factors in the developing heart. Cell Death and Disease 8, 112.CrossRefGoogle ScholarPubMed
McLaren, W, Gil, L, Hunt, SE, Riat, HS, Ritchie, GRS, Thormann, A, Flicek, P and Cunningham, F (2016) The ensembl variant effect predictor. Genome Biology 17, 122.CrossRefGoogle ScholarPubMed
Meyer, K (2007) WOMBAT – a tool for mixed model analyses in quantitative genetics by restricted maximum likelihood (REML). Journal of Zhejiang University Science B 8, 815821.CrossRefGoogle ScholarPubMed
Meyer, K and Tier, B (2012) “snp snappy”: a strategy for fast genome-wide association studies fitting a full mixed model. Genetics 190, 275277.CrossRefGoogle Scholar
Momen, M, Bhatta, M, Hussain, W, Yu, H and Morota, G (2021) Modeling multiple phenotypes in wheat using data-driven genomic exploratory factor analysis and Bayesian network learning. Plant Direct 5, e00304.CrossRefGoogle ScholarPubMed
Neale, BM, Medland, SE, Ripke, S, Asherson, P, Franke, B, Lesch, K-P, Faraone, SV, Nguyen, TT, Schafer, H, Holmans, P, Daly, M, Steinhausen, H-C, Freitag, C, Reif, A, Renner, TJ, Romanos, M, Romanos, J, Walitza, S, Warnke, A, Meyer, J, Palmason, H, Buitelaar, J, Vasquez, AA, Lambregts-Rommelse, N, Gill, M, Anney, RJL, Langely, K, O'Donovan, M, Williams, N, Owen, M, Thapar, A, Kent, L, Sergeant, J, Roeyers, H, Mick, E, Biederman, J, Doyle, A, Smalley, S, Loo, S, Hakonarson, H, Elia, J, Todorov, A, Miranda, A, Mulas, F, Ebstein, RP, Rothenberger, A, Banaschewski, T, Oades, RD, Sonuga-Barke, E, McGough, J, Nisenbaum, L, Middleton, F, Hu, X and Nelson, S (2010) Meta-analysis of genome-wide association studies of attention-deficit/hyperactivity disorder. Journal of the American Academy of Child and Adolescent Psychiatry 49, 884897.CrossRefGoogle ScholarPubMed
Niu, P, Kim, S, Choi, B, Kim, T, Kim, J and Kim, K (2013) Porcine insulinlike growth factor 1 (IGF1) gene polymorphisms are associated with body size variation. Genes and Genomics 35, 523528.CrossRefGoogle Scholar
Ohnishi, C and Satoh, M (2018) Estimation of genetic parameters for performance and body measurement traits in Duroc pigs selected for average daily gain, loin muscle area, and backfat thickness. Livestock Science 214, 161166.CrossRefGoogle Scholar
Purcell, S, Neale, B, Todd-Brown, K, Thomas, L, Ferreira, MA, Bender, D, Maller, J, Sklar, P, De Bakker, PI, Daly, MJ and Sham, PC (2007) Plink: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics 81, 559575.CrossRefGoogle ScholarPubMed
R Development Core Team (2021) R: A Language and Environment for Statistical Computing. Vienna, Austria: R foundation for statistical computing.Google Scholar
Rebolledo, DL, Acuna, MJ and Brandan, E (2021) Role of matricellular CCN proteins in skeletal muscle: focus on CCN2/CTGF and its regulation by vasoactive peptides. International Journal of Molecular Sciences 22, 5234.CrossRefGoogle ScholarPubMed
Rodrigues-Diez, RR, Tejera-Munoz, A, Esteban, V, Steffensen, LB, Rodrigues-Díez, R, Orejudo, M, Rayego-Mateos, S, Falke, LL, Cannata-Ortiz, P, Ortiz, A, Egido, J, Mallat, Z, Briones, AM, Auxiliadora Bajo, M, Goldschmeding, R and Ruiz-Ortega, M (2022) CCN2 (cellular communication network factor 2) deletion alters vascular integrity and function predisposing to aneurysm formation. Hypertension 79, 4255.CrossRefGoogle ScholarPubMed
Rosa, GJM, Valente, BD, de los Campos, G, Wu, XL, Gianola, D and Silva, MA (2011) Inferring causal phenotype networks using structural equation models. Genetics Selection Evolution 43, 6.CrossRefGoogle ScholarPubMed
Rosseel, Y (2012) lavaan: an R package for structural equation modeling. Journal of Statistical Software 48, 136.CrossRefGoogle Scholar
Schumacker, ER and Lomax, GR (1996) A Beginner's Guide to Structural Equation Modeling. Mahwah, NJ: Erlbaum.Google Scholar
Silva, HT, Paiva, JT, Botelho, ME, Carrara, ER, Lopes, PS, Silva, FF, Veroneze, R, Sterman Ferraz, JB, Eler, JP, Mattos, EC and Gaya, LG (2021) Searching for causal relationships among latent variables concerning performance, carcass, and meat quality traits in broilers. Journal of Animal Breeding and Genetics 139, 181192.CrossRefGoogle ScholarPubMed
Song, S, Landsbury, A, Dahm, R, Liu, Y, Zhang, Q and Quinlan, RA (2009) Functions of the intermediate filament cytoskeleton in the eye lens. The Journal of Clinical Investigation 119, 18371848.CrossRefGoogle ScholarPubMed
Song, H, Zhang, J, Zhang, Q and Ding, X (2019) Using different single-step strategies to improve the efficiency of genomic prediction on body measurement traits in pig. Frontiers in Genetics 9, 110.CrossRefGoogle ScholarPubMed
Stark, Z, Bruno, DL, Mountford, H, Lockhart, PJ and Amor, DJ (2010) De novo 325 kb microdeletion in chromosome band 10q25.3 including ATRNL1 in a boy with cognitive impairment, autism and dysmorphic features. European Journal of Medical Genetics 53, 337339.CrossRefGoogle Scholar
Steiger, JH (1990) Structural model evaluation and modification: an interval estimation approach. Multivariate Behavioral Research 25, 173180.CrossRefGoogle ScholarPubMed
Turner, SD (2018) qqman: an R package for visualizing GWAS results using Q-Q and Manhattan plots. The Journal of Open Source Software 3, 12.CrossRefGoogle Scholar
VanRaden, PM (2008) Efficient methods to compute genomic predictions. Journal of Dairy Science 91, 44144423.CrossRefGoogle ScholarPubMed
Vega, FM and Ridley, AJ (2018) The RhoB small GTPase in physiology and disease. Small GTPases 9, 384393.CrossRefGoogle ScholarPubMed
Wang, H, Wang, X, Li, M, Sun, H, Chen, Q, Yan, D, Dong, X, Pan, Y and Lu, S (2022) Genome-wide association study of growth traits in a four-way crossbred pig population. Genes 13, 1990.CrossRefGoogle Scholar
Weller, JI, Song, JZ, Heyen, DW, Lewi, HA and Ron, M (1998) A new approach to the problem of multiple comparisons in the genetic dissection of complex traits. Genetics 150, 16991706.CrossRefGoogle Scholar
Wright, S (1921) Correlation and causation. Journal of Agricultural Research 20, 557585.Google Scholar
Xu, Z, Ji, C, Zhang, Y, Zhang, Z, Nie, Q, Xu, J, Zhang, D and Zhang, X (2016) Combination analysis of genome-wide association and transcriptome sequencing of residual feed intake in quality chickens. BMC Genomics 17, 594.CrossRefGoogle ScholarPubMed
Yu, H, Campbell, MT, Zhang, Q, Walia, H and Morota, G (2019) Genomic Bayesian confirmatory factor analysis and Bayesian network to characterize a wide spectrum of rice phenotypes. Genes. Genomes Genetics 9, 19751986.Google Scholar
Figure 0

Figure 1. Representation of latent variable of BS and the respective relationships with body length (BL), body height (BH), chest width (CW), chest girth (CG) and tube girth (TG). For each observed variable, the estimation of loadings is presented.

Figure 1

Figure 2. Manhattan plot representing chromosome-wide association with the body size latent trait in Yorkshire pig using GBLUP-GWAS. –log10(P values) of SNPs (Y-axis) are shown across the 18 autosomal chromosomes (X-axis). Significant thresholds for genome-wide (P < 0.05) and suggestive levels were 9.48 × 10−7 and 1.90 × 10−5, corresponding to –log 10 (P value) of 6.02 and 4.72, are shown with red and blue lines, respectively.

Figure 2

Table 1. Significant single nucleotide polymorphisms (SNP) and associated genes for body size latent trait in pigs

Figure 3

Table 2. Biological processes associated with candidate genes for body size in pig

Supplementary material: File

Sanjari Banestani et al. supplementary material 1

Sanjari Banestani et al. supplementary material
Download Sanjari Banestani et al. supplementary material 1(File)
File 14.3 KB
Supplementary material: File

Sanjari Banestani et al. supplementary material 2

Sanjari Banestani et al. supplementary material
Download Sanjari Banestani et al. supplementary material 2(File)
File 10.6 KB
Supplementary material: File

Sanjari Banestani et al. supplementary material 3

Sanjari Banestani et al. supplementary material
Download Sanjari Banestani et al. supplementary material 3(File)
File 14.1 KB