Introduction
Soybean (Glycine max L.) is one of the most important oil seed crops worldwide. It belongs to family Fabaceae and subfamily Faboide (Ali et al., Reference Ali, Khan, Khan, Ali, Hussain and Ahmad2015). Soybean seeds are enriched with protein (40–42%) and oil (18–22%). Its oil has two important fatty acids, i.e. linoleic and linolenic acid (85%) that normally are not produced by human body (Antalina, Reference Antalina2000; Balasubramaniyan and Palaniappan, Reference Balasubramaniyan and Palaniappan2003). Soybean oil is also highly recommended for human diet due to its cholesterol-free composition. Owing to its multipurpose feature and excellent nutritional value, it is called by different names such as ‘meat without bones’, ‘golden bean’, ‘wonder crop’, ‘queen of pulses’, ‘agriculture's Cinderella’, ‘meat of the field’ and ‘farmers friend’ (Kumar and Sharma, Reference Kumar and Sharma2018; Akram and Ahmad, Reference Akram and Ahmad2019).
Globally, during the year of 2019–20, production of oilseeds was recorded as 554.61 million metric tons (MMT) in which 341.76 MMT was contributed by soybean. Brazil is one of the countries that leads the production followed by the United States with 126 and 96.841 MMT, respectively. After Brazil and USA, Argentina is one of the major producers of soybean with 54 MMT ranking third in the world (USDA, 2020).
Despite the favourable climatic and soil conditions in Pakistan, cultivation of soybean is limited due to lack of diverse germplasm, coherent policy, disease-resistant genotypes and lack of photo-insensitive soybean lines for different regions of Pakistan. These are the major factors that limit the popularity of this crop among farmers (Asad et al., Reference Asad, Wahid, Farina, Ali and Muhammad2020). There are only 7–9 soybean cultivars that are generally cultivated in the country, and only produce economical yield under short-day conditions (autumn season). Pakistan faces severe shortage of edible oil since the local production of edible oil is less than 20% of country's requirements (GOP, 2019). In the latest scenario, import of edible oil was 2.68 million tons with a value of US$3.56 billion that negatively affected the economy (Pakistan, 2022–2023). Now there is a need to fulfil the local demand through increasing domestic cultivation of soybean in both spring and autumn seasons.
Morphological and genetic dissection is a prerequisite of crop improvement programmes. This pre-breeding morphological characterization will provide information about endogenous genetic diversity that will be helpful for selection of parental combinations. That ultimately leads to introgression of desirable genes or chromosome segments from diverse sources into elite germplasm (Thompson et al., Reference Thompson, Nelson and Vodkin1998; Das et al., Reference Das, Harer and Biradar2001; Iqbal et al., Reference Iqbal, Arshad, Ashraf, Mahmood and Waheed2008). It is crucial for plant breeders to accurately characterize genotypes at every point of the breeding process, from selecting parents to applying genotypes to breeding programmes (Haussmann et al., Reference Haussmann, Parzies, Presterl, Susic and Miedaner2004; Haider et al., Reference Haider, Khan, Jaskani, Naqvi, Hameed, Azam, Khan and Pintaud2015; Mehmood et al., Reference Mehmood, Luo, Ahmad, Dong, Mahmood, Sajid, Jaskani and Sharp2016).
The estimate of endogenous genetic variation in a population can be made statistically through multivariate analysis. Multivariate analysis can be used to assess the genetic diversity of germplasm. The genotypes and data are plotted in two-dimensional graph to represent its dissociation, variability and correlation (Chakravorty et al., Reference Chakravorty, Ghosh and Sahu2013). Agro-morphological trait information speeds up the process of strategic trait-based breeding as well as increases the probability of positive allele pyramiding in contrast to uncharacterized parents (Reynolds and Langridge, Reference Reynolds and Langridge2016).
The prime focus of this current research is to detect the genetic variation in the advance soybean germplasm by dissociation of agro-morphological traits to select germplasm for future spring season cultivation in Pakistan.
Materials and methods
Germplasm collection and phenotyping
A total of 123 soybean (G. max L.) accessions from diverse maturity groups (000-IV) were collected from the United States Department of Agriculture (USDA), Agricultural Research Institute Mingora, Swat, Ayub Agricultural Research Institute (AARI), Faisalabad, National Agricultural Research Center (NARC), Islamabad (online Supplementary Table S1). Healthy seeds of all genotypes were grown in early and mid-March at CABB, University of Agriculture Faisalabad (31°26′N, 73°6′E) for the period of two years 2020 and 2021 on two different plantation dates each year. Each entry was planted in 1.5 m long row by keeping row-to-row distance 45 cm and plantation was done by planting one seed per hole and maintaining 8 cm plant-to-plant distance. The crop was raised by following all standard agronomics practices. The experiment was conducted following the augmented RCBD (Federer, Reference Federer2002) with three replications. The whole material was divided into four blocks each comprising 29 entries. A set of seven check varieties (Ajmeri, Faisal soya, Malakand, NARCII, Rawal, Swat18 and William) were randomized in each block.
One healthy plant from each replication was randomly selected and tagged from each genotype for morphological data collection. Agro-morphological traits data were recorded for each genotype (Chen et al., Reference Chen, Zhang, Liu, Xin, Qiu, Shan and Hu2007; Dubey et al., Reference Dubey, Avinashe and Shrivastava2018a) which were plant height (PH; cm), number of pods per plant (Num), number of seeds per plant (Num), 100-seed weight (HSW; g), yield per plant (YPP; g) and days to maturity (DM; Days).
Statistical data analysis
Statistical parameters including standard deviation, minimum, maximum, mean, variance, coefficient of variation and analysis of variance (ANOVA) with significance P < 0.001 for agronomic traits were calculated using OriginPro (2021) statistical analysis software (Mukul and Akter, Reference Mukul and Akter2021). Genetic diversity of agro-morphological data was determined by using multivariate analysis (Zafar et al., Reference Zafar, Ahmad and Rehman2008). The construction of dendrogram on the basis of agro-morphological data for both cropping years was done by using Ward's clustering method.
Results
Statistical summary of 123 soybean genotypes was to determine the behaviour of six significant agro-morphological traits during the years of 2020 and 2021 (online Supplementary Table S2). However, genotype A62 showed highest PH, while the genotype A88 showed minimum PH. Minimum YPP was observed in A30, while in the first year the genotype A99 had higher yield. Furthermore, the genotype A54 had maximum HSW (17.63 g) among all 123 genotypes (online Supplementary Table S2).
The range of statistical values, standard deviation (n), maximum value, mean, minimum value, variance and coefficient of variation (CV%) of six agro-morphological traits provided a precise view of the population data. The range of the morphological data for the year of 2020 was: PH (10.9–72), pods (9–119), seeds (17–249), HSW (7.5–17.6), YPP (1.6–38.7) and DM (96–133). In the second year, genotype A110 had maximum HSW, while the genotype A99 had maximum YPP (online Supplementary Table S2). In both years, ANOVA of six yield-related traits showed significance in all genotypes (Table 1).
Correlation analysis
The relationship among six agro-morphological traits was determined by using Pearson correlation coefficient method. Correlation matrix of subjected traits for two cropping seasons was given in Table 2. PH showed positive correlation with pods and seeds, but non-significant correlation with HSW and YPP. Furthermore, PH showed negative correlation with DM in both years (Table 2). Seeds showed positive correlation with YPP in both years and showed non-significant but positive correlation with HSW and DM in both years. The pods showed significant and positive correlation with number of seeds and YPP in both years. Furthermore, pods showed non-significant correlation with HSW and DM. HSW revealed positive correlation with YPP and DM in both years. The YPP showed negative correlation with DM in both years. Moreover, DM showed positive but non-significant correlation with pods and seeds as well as it showed negative non-significant correlation with PH and YPP in both cropping years (Table 2).
**Significant at 0.01; *significant at 0.05. Top value of each box from the year 2020 and bottom value from the year 2021.
Multivariate analysis based on cumulative variability of soybean agronomical traits
In this study, based on eigenvalue and variability, morphological data of the two consecutive years are divided into six principal factors (Fs). In both cropping years 2020 and 2021, the first three principal factors (Fs) with eigenvalue >1 highly contributed to variability as mentioned in Table 3. For further analysis and selection of principal factors (Fs), eigenvalue 1 was mainly used as a cut-off value. In both years 2020 and 2021, first three principal factors (Fs) highly contributed to variability of 84 and 88%, respectively. The first two factors F1 (48; 20%) and F2 (47; 26%) highly contributed to variability for the years 2020 and 2021, respectively, indicating their significance in the construction of biplot (Table 3). In the year of 2021, factor F2 contributed higher to variability of 44.6% followed by F1 (34%) and F3 (10%). These three principal factors showed maximum contribution in the construction of biplot.
In the principal component analysis (PCA), scree plot is a line plot based on cumulative variability and eigenvalue showed the six principal factors and their contribution (Fig. 1). The first three principal factors in the scree plot (F1, F2, F3) distributed the genotypes based on agronomical traits. In this plot, other three factors with eigenvalue <1 played minimum role in the total variability that accounts to 15% for the year of 2020 while the last three principal factors contributed to 12% variability for the year of 2021 which is insignificant for further analysis (Fig. 1).
Pods (32%) and seeds (33%) followed by YPP (31.9%) had majorly contributed to variability of the F1 principal factor for the first year 2020 (Table 3). Although DM and PH contributed to minimum variability (0.3; 1%), in principal factor F2 DM and HSW contributed to highest variability of 49 and 47%, respectively. On the other hand, in 2021, the variables that showed maximum contribution in variability for F1 were pods (33%), seeds (32%) and YPP (30%). HSW showed minimum contribution (0.12%) in total variability (Table 3). HSW, PH and DM contributed higher to total variability of F2 factor with 44, 37.9 and 15.3%, respectively, for the year of 2021. Minimum contribution in the total variability of 0.2–1.2% by pods, seeds and YPP is observed.
Variability analysis based on correlation, principal factors and variability
The year of 2020 total variability showed by first two principal components (PCs) F1 (48%) and F2 (20%) (Table 3) was used for the construction of biplot. Variables were imposed as vectors whose length in both principal factors showed the combined variability and their major effects on the yield (Yan and Tinker, Reference Yan and Tinker2005). However, positive and negative factors describe the behaviour of correlation among variables. In biplot factor loading values divided the plot into four groups. The variable PH closest to the origin in biplot has less variability as compared to YPP, pods, seeds, DM and HSW for the year 2020 as shown in Table 3.
On the other hand, in the year of 2021, the factors F1 and F2 were used to construct the variable biplot with variability 47.7 and 26.1%, respectively (Table 3). The vector including number of seeds, number of pods, PH and HSW had more contribution in variability based on their combined variability and length. On the other hand, both principal factors DM and YPP had less contribution in variability. The correlation and variability of agronomical traits revealed by multivariate analysis can be utilized for further selection of genotypes.
Variable plot
The graphical representation of diversity among two PCs (F1 and F2) disperses the 123 soybean accessions in two-dimensional scale. Biplot revealed the association of vectors of traits as well as the genotypes with one another. In the biplot graph for the year of 2020 using maximum variability from first two principal factors, the accessions were distributed into four diverse groups on the bases of x-y plane. The accessions that were present in the second group (positive axis) performed well in the year of 2020 and are more likely to be linked with HSW, DM and YPP. The genotypes that were away from the origin (negative axes) were negatively associated with that trait. The common genotypes in the second group of the year 2020 on both plantation dates were A110, A95, A109, A49, A51, A41, A67, A34 and A62 (Fig. 2). On the other hand, in the year of 2021, the common genotypes on both plantation dates in the first group were A112, A116, A89, A91 and A60 that were linked with HSW and DM. In group 2 of both years (2020 and 2021), the common genotypes were A99, A49, A95, A107, A44, A108, A97, A62, A10, A67, A104, A114, A37, A51, A54, A41, A4 and A62 (Fig. 3).
Mean comparison revealed maximum PH attained in early plantation for the years 2020 and 2021 (online Supplementary Fig. S1). The common genotypes in both years were A1, A34, A62, A97 and A123. On the other hand, maximum number of pods were attained in early plantation of first year and mid-March plantation of second year. The common genotypes in both years were A51, A52, A109, A48, A49, A60, A62, A63, A64, A65, A67, A70, A76, A79, A80, A85, A90, A99, A114, A119, A120, A121 and A123 (online Supplementary Fig. S1).
Maximum number of seeds were acquired in early plantation in 2020 and 2021 and common genotypes in both years were A4, A14, A16, A23, A27, A34, A37, A41, A46, A49, A51, A81, A85, A95, A103, A109, A114, A117, A118, A119, A120, A121, A122 and A123 (online Supplementary Fig. S2). The maximum HSW was obtained in early plantation of both years. The common genotypes for the years 2020 and 2021 were A2, A3, A13, A14, A19, A20, A21, A22, A24, A31, A34, A36, A44, A49, A51, A54, A55, A56, A57, A59, A62, A66, A67, A72, A75, A76, A77, A79, A81, A83, A89, A93, A95, A101, A102, A104, A106, A107, A108, A110, A113, A116 and A118 (online Supplementary Fig. S2).
Maximum YPP was obtained in the early plantation of first year and mid-March plantation of second year. The common genotypes for the years 2020 and 2021were A1, A23, A34, A41, A51, A52, A62, A95, A110, A118, A120, A121 and A123 (online Supplementary Fig. S3).
Combined agro-morphological clustering of advance soybean germplasm
Ward's method was used for construction of dendrogram. All soybean accessions were divided into two major groups (C1 and C2) for the year of 2020 including 45 genotypes in C1 and 201 in C2 (online Supplementary Fig. S5). On the other hand, for the year of 2021, 83 and 163 genotypes were present in C1 and C2, respectively (online Supplementary Fig. S6).
Discussion
The present research was carried out for the genetic characterization of advanced soybean genotypes in local conditions by using PCA. However, mean value of all phenotypic traits specified the enormous variation revealed by the genotypes for PH, number of pods, number of seeds, HSW, YPP and DM. This huge variability provides scientific assistance to improve these traits with respect to diverse biotic and abiotic stresses. In this present era, statistics has become a standard for the dissection of agro-morphological traits (Yaqoob, Reference Yaqoob2016; Iqbal et al., Reference Iqbal, Raja, Yasmeen, Hussain, Ejaz and Shah2017; Shahid et al., Reference Shahid, Saleem, Anjum and Afzal2017; Din et al., Reference Din, Munsif, Shah, Khan, Khan, Uddin and Islam2018; Pooja et al., Reference Pooja, Dhanda, Yadav, Beniwal and Anu2018; Gulnaz et al., Reference Gulnaz, Zulkiffal, Sajjad, Ahmed, Musa, Abdullah, Ahsan and Rehman2019).
The PCA or multivariate analysis has become an exceptional data reduction technique to identify smaller number of traits which provide maximum variability and based on PC score to prioritize the genotypes (Dubey et al., Reference Dubey, Avinashe and Shrivastava2018b).
The data showed that the first two principal factors of both years possessing eigenvalue >1 contributed maximum in biplot construction variability for both years 2020 and 2021 (Zafar et al., Reference Zafar, Ahmad and Rehman2008). In the first year, principal factor F1 showed 48% and F2 showed 47% variability; on the other hand, in the second year, F1 showed 20% and F2 showed 26% variability (Wang et al., Reference Wang, Zhang, Dai, Wang, Li and Xu2013; Hashash, Reference Hashash2016). PH has positive correlation with HSW and YPP. Number of pods showed positive correlation with number of seeds and YPP. Number of seeds showed positive correlation with YPP. Mean comparison of both years revealed that significant yield enhancement traits of soybean advance lines showed better response in early-March plantation of both years. In combined cluster analysis of both years, the genotypes were distributed in two groups C1 and C2. Maximum number of genotypes were present in subclusters of C2 with same geographical background.
Temperature is one of the most important parameters that can adversely affect the yield and quality of soybean. Dornbos and Mullen (Reference Dornbos and Mullen1991) reported that the optimum day/night temperature (26/20°C) helps to increase the soybean yield. However, a significant yield decline was observed at seed filling stage, when the temperature increases (29/20°C). In our study, the population was exposed to high temperature of >35°C at seed filling stage on both plantation dates. However, early-March plantation provides fine seed quality with better yield as compared to mid-March plantation that gives shrivelled seed due to high temperature. In the spring season under local Pakistani conditions, temperature generally remains above the optimum temperature required for soybean growth during grain filling stage as indicated (online Supplementary Fig. S4) in both growing seasons 2020 and 2021. In view of our experimental results, it is suggested that the second half of the February would be the best time for spring plantation under local conditions. Plantation after mid-February not only improves the seed germination and quality but also improves the yield due to early plantation which would save the crop from harmful effects of temperature that leads to seed shrivelling and quality deterioration.
Broschat (Reference Broschat1979) evaluated diverse soybean panel and reported that for data reduction it is one of the most powerful techniques which wipeout interrelationships among components. Different studies reported that PCA is one of the valid systems to deal with diverse germplasm. Smith et al. (Reference Smith, Guarino, Doss and Conta1995) characterized the soybean germplasm, conducted linkage cluster and PCA and published the value of these ramifications in the utilization and preservation of germplasm. Ghafoor et al. (Reference Ghafoor, Sharif, Ahmad, Zahid and Rabbani2001) evaluated the morphological genetic diversity in black gram accessions using multivariate analysis and concluded that the first four PCs explained 79% of genetic variability in the data. Another study conducted by Ghafoor et al. (Reference Ghafoor, Gulbaaz, Afzal, Ashraf and Arshad2003) reported that the first three PCs explained 83.3% of genetic variability in the data.
Construction of different clusters based on morphological traits in diversity panel of soybean was also published (Cui et al., Reference Cui, Carter, Burton and Wells2001; Iqbal et al., Reference Iqbal, Arshad, Ashraf, Mahmood and Waheed2008, Ojo et al., Reference Ojo, Ajayi and Oduwaye2012), as well as those genotypes have similar morphological traits present in the same cluster. Similar ramifications were also published by Yu et al. (Reference Yu, Hu, Zhao, Guo and Sun2005) and Iqbal et al. (Reference Iqbal, Arshad, Ashraf, Mahmood and Waheed2008) and in oil palm by Abdullah et al. (Reference Abdullah, Rafii Yusop, Ithnin, Saleh and Latif2011). Jha et al. (Reference Jha, Shrivastava and Mishra2016) evaluated 50 soybean advanced genotypes by using PCA to specify the ranking of genotypes on the basis of the combination of different phenotypic traits. Five principal factors with <1 eigenvalue were considered to be more significant. The traits, i.e. HSE, number of seeds per plant, number of pods per plant, PH, DM, biological YPP, are very imperative yield-contributing traits. These results indicate that genetic selection and wide hybridization among selected genotypes from the distantly related clusters could be a promising strategy for improving soybean yield in domestic Pakistani conditions.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1479262123001120
Acknowledgements
The authors are thankful to the Center for Advance Studies (CAS), UAF, for providing the laboratory facility.