Published online by Cambridge University Press: 26 June 2014
Development of a representative and well-diversified core with minimum duplicate accessions and maximum diversity from a larger population of germplasm is highly essential for breeders involved in crop improvement programmes. Most of the existing methodologies for the identification of a core set are either based on qualitative or quantitative data. In this study, an approach to the identification of a core set of germplasm based on the response from a mixture of qualitative (single nucleotide polymorphism genotyping) and quantitative data was proposed. For this purpose, six different combined distance measures, three for quantitative data and two for qualitative data, were proposed and evaluated. The combined distance matrices were used as inputs to seven different clustering procedures for classifying the population of germplasm into homogeneous groups. Subsequently, an optimum number of clusters based on all clustering methodologies using different combined distance measures were identified on a consensus basis. Average cluster robustness values across all the identified optimum number of clusters under each clustering methodology were calculated. Overall, three different allocation methods were applied to sample the accessions that were selected from the clusters identified under each clustering methodology, with the highest average cluster robustness value being used to formulate a core set. Furthermore, an index was proposed for the evaluation of diversity in the core set. The results reveal that the combined distance measure A 1 B 2 – the distance based on the average of the range-standardized absolute difference for quantitative data with the rescaled distance based on the average absolute difference for qualitative data – from which three clusters that were identified by using the k-means clustering algorithm along with the proportional allocation method was suitable for the identification of a core set from a collection of rice germplasm.