Hostname: page-component-78c5997874-lj6df Total loading time: 0 Render date: 2024-11-19T08:37:50.208Z Has data issue: false hasContentIssue false

Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds

Published online by Cambridge University Press:  23 June 2017

F. Bertolini
Affiliation:
Department of Agricultural and Food Sciences, Division of Animal Sciences, University of Bologna, Viale Fanin 46, 40127 Bologna, Italy
G. Galimberti
Affiliation:
Department of Statistical Sciences “Paolo Fortunati”, University of Bologna, Via delle Belle Arti 41, 40126 Bologna, Italy
G. Schiavo
Affiliation:
Department of Agricultural and Food Sciences, Division of Animal Sciences, University of Bologna, Viale Fanin 46, 40127 Bologna, Italy
S. Mastrangelo
Affiliation:
Department of Agricultural and Forestry Sciences, University of Palermo, Viale delle Scienze, 90128 Palermo, Italy
R. Di Gerlando
Affiliation:
Department of Agricultural and Forestry Sciences, University of Palermo, Viale delle Scienze, 90128 Palermo, Italy
M. G. Strillacci
Affiliation:
Department of Veterinary Medicine, Università degli Studi di Milano, Via Celoria 10, 20133 Milano, Italy
A. Bagnato
Affiliation:
Department of Veterinary Medicine, Università degli Studi di Milano, Via Celoria 10, 20133 Milano, Italy
B. Portolano
Affiliation:
Department of Agricultural and Forestry Sciences, University of Palermo, Viale delle Scienze, 90128 Palermo, Italy
L. Fontanesi*
Affiliation:
Department of Agricultural and Food Sciences, Division of Animal Sciences, University of Bologna, Viale Fanin 46, 40127 Bologna, Italy
*
Get access

Abstract

Commercial single nucleotide polymorphism (SNP) arrays have been recently developed for several species and can be used to identify informative markers to differentiate breeds or populations for several downstream applications. To identify the most discriminating genetic markers among thousands of genotyped SNPs, a few statistical approaches have been proposed. In this work, we compared several methods of SNPs preselection (Delta, Fst and principal component analyses (PCA)) in addition to Random Forest classifications to analyse SNP data from six dairy cattle breeds, including cosmopolitan (Holstein, Brown and Simmental) and autochthonous Italian breeds raised in two different regions and subjected to limited or no breeding programmes (Cinisara, Modicana, raised only in Sicily and Reggiana, raised only in Emilia Romagna). From these classifications, two panels of 96 and 48 SNPs that contain the most discriminant SNPs were created for each preselection method. These panels were evaluated in terms of the ability to discriminate as a whole and breed-by-breed, as well as linkage disequilibrium within each panel. The obtained results showed that for the 48-SNP panel, the error rate increased mainly for autochthonous breeds, probably as a consequence of their admixed origin lower selection pressure and by ascertaining bias in the construction of the SNP chip. The 96-SNP panels were generally more able to discriminate all breeds. The panel derived by PCA-chrom (obtained by a preselection chromosome by chromosome) could identify informative SNPs that were particularly useful for the assignment of minor breeds that reached the lowest value of Out Of Bag error even in the Cinisara, whose value was quite high in all other panels. Moreover, this panel contained also the lowest number of SNPs in linkage disequilibrium. Several selected SNPs are located nearby genes affecting breed-specific phenotypic traits (coat colour and stature) or associated with production traits. In general, our results demonstrated the usefulness of Random Forest in combination to other reduction techniques to identify population informative SNPs.

Type
Research Article
Copyright
© The Animal Consortium 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Allan, MF, Thallman, RM, Cushman, RA, Echternkamp, SE, White, SN, Kuehn, LA, Casas, E and Smith, TP 2007. Association of a single nucleotide polymorphism in SPP1 with growth traits and twinning in a cattle population selected for twinning rate. Journal of Animal Science 85, 341347.CrossRefGoogle Scholar
Andersson, L and Georges, M 2004. Domestic-animal genomics: deciphering the genetics of complex traits. Nature Review Genetics 5, 202212.Google Scholar
Bertolini, F, Galimberti, G, Calò, DG, Schiavo, G, Matassino, D and Fontanesi, L 2015. Combined use of principal component analysis and Random Forests identify population-informative single nucleotide polymorphisms: application in cattle breeds. Journal of Animal Breeding Genetics 132, 346356.CrossRefGoogle ScholarPubMed
Bowcock, AM, Ruiz-Linares, A, Tomfohrde, J, Minch, E, Kidd, JR and Cavalli-Sforza, LL 1994. High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368, 455457.Google Scholar
Browning, SR and Browning, BL 2007. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. American Journal of Human Genetics 81, 10841097.Google Scholar
Breiman, L 2001. Random Forests. Machine Learning 45, 532.Google Scholar
Cohen-Zinder, M, Seroussi, E, Larkin, DM, Loor, JJ, Everts-van der Wind, A, Lee, JH, Drackley, JK, Band, MR, Hernandez, AG, Shani, M, Lewin, HA, Weller, JI and Ron, M 2005. Identification of a missense mutation in the bovine ABCG2 gene with a major effect on the QTL on chromosome 6 affecting milk yield and composition in Holstein cattle. Genome Research 15, 936944.Google Scholar
Cole, JB, Wiggans, GR, Ma, L, Sonstegard, TS, Lawlor, TJ Jr, Crooker, BA, Van Tassell, CP, Yang, J, Wang, S, Matukumalli, LK and Da, Y 2011. Genome-wide association analysis of thirty one production, health, reproduction and body conformation traits in contemporary U.S. Holstein cows. BMC Genomics 12, 408.Google Scholar
Decker, JE, McKay, SD, Rolf, MM, Kim, J, Molina Alcalá, A, Sonstegard, TS, Hanotte, O, Götherström, A, Seabury, CM, Praharani, L, Babar, ME, Correia de Almeida Regitano, L, Yildiz, MA, Heaton, MP, Liu, WS, Lei, CZ, Reecy, JM, Saif-Ur-Rehman, M, Schnabel, RD and Taylor, JF 2014. Worldwide patterns of ancestry, divergence, and admixture in domesticated cattle. PLoS Genetics 10, e1004254.CrossRefGoogle ScholarPubMed
Fontanesi, L, Scotti, E and Russo, V 2010a. Analysis of SNPs in the KIT gene of cattle with different coat colour patterns and perspectives to use these markers for breed traceability and authentication of beef and dairy products. Italian Journal of Animal Science 9, e42.Google Scholar
Fontanesi, L, Scotti, E, Samorè, AB, Bagnato, A and Russo, V 2015. Association of 20 candidate gene markers with milk production and composition traits in sires of Reggiana breed, a local dairy cattle population. Livestock Science 176, 1421.Google Scholar
Fontanesi, L, Tazzoli, M, Russo, V and Beever, J 2010b. Genetic heterogeneity at the bovine KIT gene in cattle breeds carrying different putative alleles at the spotting locus. Animal Genetics 41, 295303.Google Scholar
Hastie, T, Tibshirani, R and Friedman, JH 2009. The elements of statistical learning, 2nd edition. Springer, New York.CrossRefGoogle Scholar
Hulsegge, B, Calus, MP, Windig, JJ, Hoving-Bolink, AH, Maurice-van Eijndhoven, MH and Hiemstra, SJ 2013. Selection of SNP from 50K and 777K arrays to predict breed of origin in cattle. Journal of Animal Science 91, 51285134.Google Scholar
Jiang, R, Tang, W, Wu, X and Fu, W 2009. A Random Forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinformatics 10, S65.Google Scholar
Jolliffe, IT 2002. Principal component analysis. 2nd edn. Springer-Verlag, New York, NY, USA.Google Scholar
Jolliffe, IT and Cadima, J 2016. Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society. Series A, Mathematical, Physical, and Engineering Sciences 374, 20150202.Google Scholar
Karlsson, EK, Baranowska, I, Wade, CM, Salmon Hillbertz, NH, Zody, MC, Anderson, N, Biagi, TM, Patterson, N, Pielberg, GR, Kulbokas, EJ 3rd, Comstock, KE, Keller, ET, Mesirov, JP, von Euler, H, Kämpe, O, Hedhammar, A, Lander, ES, Andersson, G, Andersson, L and Lindblad-Toh, K 2007. Efficient mapping of mendelian traits in dogs through genome-wide association. Nature Genetics 39, 13211328.Google Scholar
Kijas, JW, Serrano, M, McCulloch, R, Li, Y, Salces Ortiz, J, Calvo, JH and Pérez-Guzmán, MD, International Sheep Genomics Consortium 2013. Genome wide association for a dominant pigmentation gene in sheep. Journal of Animal Breeding and Genetics 130, 468475.CrossRefGoogle ScholarPubMed
Lewis, J, Abas, Z, Dadousis, C, Lykidis, D, Paschou, P and Drineas, P 2011. Tracing cattle breeds with principal components analysis ancestry informative SNPs. PLoS One 6, e18007.CrossRefGoogle ScholarPubMed
Liaw, A and Wiener, M 2002. Classification and regression by Random Forest. R News 2, 1822.Google Scholar
Lunetta, KL, Hayward, LB, Segal, J and Van Eerdewegh, P 2004. Screening large-scale association study data: exploiting interactions using Random Forests. BMC Genetics 5, 32.Google Scholar
Mastrangelo, S, Saura, M, Tolone, M, Salces-Ortiz, J, Di Gerlando, R, Bertolini, F, Fontanesi, L, Sardina, MT, Serrano, M and Portolano, B 2014. The genome-wide structure of two economically important indigenous Sicilian cattle breeds. Journal of Animal Science 92, 48334842.Google Scholar
Mastrangelo, S, Tolone, M, Di Gerlando, R, Fontanesi, L, Sardina, MT and Portolano, B 2016. Genomic inbreeding estimation in small populations: evaluation of runs of homozygosity in three local dairy cattle breeds. Animal 10, 746754.CrossRefGoogle ScholarPubMed
Matukumalli, LK, Lawley, CT, Schnabel, RD, Taylor, JF, Allan, MF, Heaton, MP, O’Connell, J, Moore, SS, Smith, TP, Sonstegard, TS and Van Tassell, CP 2009. Development and characterization of a high density SNP genotyping assay for cattle. PLoS One 4, e5350.Google Scholar
Notter, DR 1999. The importance of genetic diversity in livestock populations of the future. Journal of Animal Science 77, 6169.Google Scholar
Paschou, P, Ziv, E, Burchard, EG, Choudhry, S, Rodriguez-Cintron, W, Mahoney, MW and Drineas, P 2007. PCA-correlated SNPs for structure identification in worldwide human populations. PLoS Genetics 3, 16721686.Google Scholar
Purcell, S, Neale, B, Todd-Brown, K, Thomas, L, Ferreira, MA, Bender, D, Maller, J, Sklar, P, de Bakker, PI, Daly, MJ and Sham, PC 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics 81, 559575.Google Scholar
Quinlan, AR and Hall, IM 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841842.CrossRefGoogle ScholarPubMed
Reinsch, N, Thomsen, H, Xu, N, Brink, M, Looft, C, Kalm, E, Brockmann, GA, Grupe, S, Kühn, C, Schwerin, M, Leyhe, B, Hiendleder, S, Erhardt, G, Medjugorac, I, Russ, I, Förster, M, Reents, R and Averdunk, G 1999. A QTL for the degree of spotting in cattle shows synteny with the KIT locus on chromosome 6. Journal of Heredity 90, 629634.Google Scholar
Saatchi, M, Schnabel, RD, Taylor, JF and Garrick, DJ 2014. Large-effect pleiotropic or closely linked QTL segregate within and across ten US cattle breeds. BMC Genomics 15, 442.Google Scholar
Shriver, MD, Smith, MW, Jin, L, Marcini, A, Akey, JM, Deka, R and Ferrell, RE 1997. Ethnic-affiliation estimation by use of population-specific DNA markers. American Journal of Human Genetics 60, 957964.Google ScholarPubMed
Smith, MW, Lautenberger, JA, Shin, HD, Chretien, JP, Shrestha, S, Gilbert, DA and O’Brien, SJ 2001. Markers for mapping by admixture linkage disequilibrium in African American and Hispanic populations. American Journal of Human Genetics 69, 10801094.Google Scholar
Takasuga, A 2016. PLAG1 and NCAPG-LCORL in livestock. Animal Science Journal 87, 159167.Google Scholar
Wilkinson, S, Wiener, P, Archibald, AL, Law, A, Schnabel, RD, McKay, SD, Taylor, JF and Ogden, R 2011. Evaluation of approaches for identifying population informative markers from high density SNP chips. BMC Genetics 12, 45.Google Scholar
Wright, S 1951. The genetical structure of populations. Annals of Human Genetics 15, 323354.Google Scholar
Supplementary material: File

Bertolini supplementary material

Bertolini supplementary material

Download Bertolini supplementary material(File)
File 477.2 KB