Hostname: page-component-586b7cd67f-vdxz6 Total loading time: 0 Render date: 2024-11-28T13:47:16.610Z Has data issue: false hasContentIssue false

An approach to the development of a core set of germplasm using a mixture of qualitative and quantitative data

Published online by Cambridge University Press:  26 June 2014

Rupam Kumar Sarkar
Affiliation:
Indian Agricultural Statistics Research Institute, New Delhi 110012, India
Prabina Kumar Meher
Affiliation:
Indian Agricultural Statistics Research Institute, New Delhi 110012, India
S. D. Wahi
Affiliation:
Indian Agricultural Statistics Research Institute, New Delhi 110012, India
T. Mohapatra
Affiliation:
Central Rice Research Institute, Cuttack, Odisha 753006, India
A. R. Rao*
Affiliation:
Indian Agricultural Statistics Research Institute, New Delhi 110012, India
*
* Corresponding authors: E-mail: [email protected]; [email protected]

Abstract

Development of a representative and well-diversified core with minimum duplicate accessions and maximum diversity from a larger population of germplasm is highly essential for breeders involved in crop improvement programmes. Most of the existing methodologies for the identification of a core set are either based on qualitative or quantitative data. In this study, an approach to the identification of a core set of germplasm based on the response from a mixture of qualitative (single nucleotide polymorphism genotyping) and quantitative data was proposed. For this purpose, six different combined distance measures, three for quantitative data and two for qualitative data, were proposed and evaluated. The combined distance matrices were used as inputs to seven different clustering procedures for classifying the population of germplasm into homogeneous groups. Subsequently, an optimum number of clusters based on all clustering methodologies using different combined distance measures were identified on a consensus basis. Average cluster robustness values across all the identified optimum number of clusters under each clustering methodology were calculated. Overall, three different allocation methods were applied to sample the accessions that were selected from the clusters identified under each clustering methodology, with the highest average cluster robustness value being used to formulate a core set. Furthermore, an index was proposed for the evaluation of diversity in the core set. The results reveal that the combined distance measure A 1 B 2 – the distance based on the average of the range-standardized absolute difference for quantitative data with the rescaled distance based on the average absolute difference for qualitative data – from which three clusters that were identified by using the k-means clustering algorithm along with the proportional allocation method was suitable for the identification of a core set from a collection of rice germplasm.

Type
Research Article
Copyright
Copyright © NIAB 2014 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agrama, HA, Yan, WG, Lee, F, Fjellstrom, R, Chen, M-H, Jia, M and McClung, A (2009) Genetic assessment of a mini-core subset developed from the USDA rice genebank. Crop Science 49: 13361346.CrossRefGoogle Scholar
Crossa, J and Franco, J (2004) Statistical methods for classifying genotypes. Euphytica 137: 1937.CrossRefGoogle Scholar
Doring, C, Borgelt, C and Kruse, R (2004) Fuzzy clustering of quantitative and qualitative data. In Proceedings of the 2004 NAFIPS. Banff, Alberta, Canada, pp. 8489.Google Scholar
Everitt, BS (1979) Unresolved problems in cluster analysis. Biometrics 35: 169181.CrossRefGoogle Scholar
Frankel, OH and Brown, AHD (1984) Plant genetic resources today: a critical appraisal. In: Holden, JHW and Williams, JT (eds) Crop Genetic Resources: Conservation and Evaluation. London: George Allen & Unwin Ltd, pp. 249257.Google Scholar
Gangopadhyay, KK, Mahajan, RK, Kumar, G, Yadav, SK, Meena, BL, Pandey, C, Bisht, IS, Mishra, SK, Sivaraj, N, Gambhir, R, Sharma, SK and Dhillon, BS (2010) Development of a core set in brinjal (Solanum melongena L.). Crop Science 50: 755762.CrossRefGoogle Scholar
Gibert, K and Cortes, U (1997) Weighting quantitative and qualitative variables in clustering methods. Mathware & Soft Computing 4: 251266.Google Scholar
Gower, JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27: 857874.CrossRefGoogle Scholar
Hu, J, Zhu, J and Xu, HM (2000) Methods of constructing core collections by stepwise clustering with three sampling strategies based on the genotypic values of crops. Theoretical and Applied Genetics 101: 264268.CrossRefGoogle Scholar
Kim, KW, Chung, HK, Cho, GT, Ma, KH, Chandrabalan, D, Gwag, JG, Kim, TS, Cho, EG and Park, YJ (2007) PowerCore: a program applying the advanced M strategy with a heuristic search for establishing core sets. Bioinformatics 23: 515526.CrossRefGoogle ScholarPubMed
Monti, S, Tamayo, P, Mesirov, J and Golub, T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52: 91118.CrossRefGoogle Scholar
Munneke, B, Schlauch, KA, Simonsen, KL, Beavis, WD and Doerge, RW (2005) Adding confidence to gene expression clustering. Genetics 170: 20032011.CrossRefGoogle ScholarPubMed
Odong, TL, van Heerwaarden, J, Jansen, J, van Hintum, TJL and van Eeuwijk, FA (2011) Determination of genetic structure of germplasm collections: are traditional hierarchical clustering methods appropriate for molecular marker data? Theoretical and Applied Genetics 123: 195205.CrossRefGoogle ScholarPubMed
Odong, TL, Jansen, J, van Eeuwijk, FA and van Hintum, TJL (2013) Quality of core collections for effective utilisation of genetic resources review, discussion and interpretation. Theoretical and Applied Genetics 126: 289305.CrossRefGoogle ScholarPubMed
Sarkar, RK, Rao, AR, Wahi, SD and Bhat, KV (2011) A comparative performance of clustering procedures for mixture of qualitative and quantitative data – an application to black gram. Plant Genetic Resources: Characterisation and Utilization 9: 523527.CrossRefGoogle Scholar
Sharma, R, Rao, VP, Upadhyaya, HD, Reddy, VG and Thakur, RP (2010) Resistance to grain mold and downy mildew in a mini-core collection of sorghum germplasm. Plant Disease 94: 439444.CrossRefGoogle Scholar
Simpson, TI (2010) clusterCons: Calculate the consensus clustering result from re-sampled clustering experiments with the option of using multiple algorithms and parameter, R package version 3.0.2. http://cran.r-project.org/src/contrib/Archive/clusterCons/.Google Scholar
Simpson, TI, Armstrong, JD and Jarman, AP (2010) Merged consensus clustering to assess and improve class discovery with microarray data. BMC Bioinformatics 11: 590.CrossRefGoogle ScholarPubMed
Studnicki, M and Debski, K (2012) ccChooser: Developing a core collections, R package version 3.0.2. http://cran.r-project.org/package=ccChooser.Google Scholar
van Hintum, T and Th, JL (1999) The Core Selector, a system to generate representative selections of germplasm accessions. Plant Genetic Resources Newsletter 118: 6467.Google Scholar
van Hintum, T, Brown, AHD, Spillane, C and Hodgkin, T (2000) Core collections of plant genetic resources. IPGRI Technical Bulletin No. 3. International Plant Genetic Resources Institute, Rome, Italy. Google Scholar
Wen, W, Franco, J, Chavez-Tovar, VH, Yan, J and Taba, S (2012) Genetic characterization of a core set of a tropical maize race Tuxpeño for further use in maize improvement. PLoS ONE 7: e32626.CrossRefGoogle ScholarPubMed
Yan, W, Rutger, JN, Bryant, RJ, Bockelman, HE, Fjellstrom, RG, Thomas, MC, Tai, H and McClung, AM (2007) Development and evaluation of a core subset of the USDA rice germplasm collection. Crop Science 47: 869876.CrossRefGoogle Scholar
Yu, JZ, Kohel, RJ, Fang, DD, Cho, J, Van Deynze, A, Ulloa, M, Hoffman, SM, Pepper, AE, Stelly, DM, Jenkins, JN, Saha, S, Kumpatla, SP, Shah, MR, Hugie, WV and Percy, RG (2012) A high-density simple sequence repeat and single nucleotide polymorphism genetic map of the tetraploid cotton genome. Genes Genomes Genetics 2: 4358.CrossRefGoogle ScholarPubMed