Introduction
Soybean (Glycine max (L.) Merr.) seeds contain about 40% protein and 20% oil, making this crop one of the most important sources of protein and oil for human consumption. Currently, soybean provides >71% of the total vegetative protein and >29% of oil worldwide (http://www.soystats.com). Soybean protein is considered a complete protein, as it contains well-balanced essential amino acids necessary for human nutrition (Qin et al., Reference Qin, Wang and Luo2022). Soybean oil contains unsaturated fatty acids, particularly linoleic and oleic acids. It is also low in saturated fat and contains no cholesterol, making it a healthier alternative to other oils sourced from vegetable and animal fat. However, soybean protein content is negatively related to its oil content (Wilcox, Reference Wilcox1998; Clemente and Cahoon, Reference Clemente and Cahoon2009; Kambhampati et al., Reference Kambhampati, Aznar-Moreno, Hostetler, Caso, Bailey, Hubbard, Durrett and Allen2019). When breeding a soybean variety with one having higher protein content, it generally results in lower oil content in the variety created. Understanding the genetic mechanisms controlling protein and oil contents might enable the development of a novel soybean variety with higher protein and oil contents.
Using DNA markers, quantitative trait loci (QTLs) related to protein and oil contents in soybean have been reported. In SoyBase (https://soybase.org), 255 seed protein-content QTLs and 322 seed oil-content QTLs were reported in various populations and environments. These QTLs were identified using bi-parental mapping populations or natural populations. Although numerous QTLs related to protein and oil contents were reported, most of these QTLs were either population and environment dependent, duplicated or not validated.
The wild soybean (Glycine soja Sieb. & Zucc.), which is the ancestor of cultivated soybean, demonstrated to have higher genetic variation as compared to cultivated soybean (Zhou et al., Reference Zhou, Jiang, Wang, Gou, Lyu, Li, Yu, Shu, Zhao, Ma, Fang, Shen, Liu, Li, Li, Wu, Wang, Wu, Dong, Wan, Wang, Ding, Gao, Xiang and Tian2015), thus making wild soybean a valuable genetic resource for the improvement of cultivated soybean. Wild soybean has proved to be a potential genetic resource for improving seed yield (Concibido et al., Reference Concibido, La Vallee, Mclaird, Pineda, Meyer, Hummel, Yang, Wu and Delannay2003), salinity tolerance (Hamwieh and Xu, Reference Hamwieh and Xu2008), disease resistance (Zhang et al., Reference Zhang, Li, Davis, Wang, Griffin, Kofsky and Song2016) and seed nutritional components (Wang et al., Reference Wang, Kanamaru, Li, Abe, Yamada and Kitamura2007). Wild soybean is known to possess higher seed protein content than their domesticated counterparts (Sebolt et al., Reference Sebolt, Shoemaker and Diers2000). It might be used as an important genetic resource for improving protein content in soybean breeding.
QTLs associated with the protein content have been identified by mapping the population involved wild soybean. Diers et al. (Reference Diers, Keim, Fehr and Shoemaker1992) identified two major QTLs associated with protein and oil contents on chromosomes (Chr) 15 and 20 using RFLP markers in an F2 population by crossing a G. max experimental line A81-356022 and a G. soja accession PI 468916. Also, Sebolt et al. (Reference Sebolt, Shoemaker and Diers2000) identified two protein and one oil content-related QTLs in a BC3 population of A81-356022 and a G. soja accession PI468916. The analysis indicated that the wild soybean allele of the QTL on Chr 20 was associated with a higher protein and less oil content. Patil et al. (Reference Patil, Vuong, Kale, Valliyodan, Deshmukh, Zhu, Wu, Bai, Yungbluth, Lu, Kumpatla, Varshney and Nguyen2018) studied a recombinant inbred lines (RIL) population derived from a cross between cultivar ‘Williams 82’, and G. soja accession PI483460B and via composite interval mapping method identified 5 QTLs for seed protein content on Chr 6, 8, 13, 19 and 20 and 9 QTLs for seed oil content on Chr 2, 7, 8, 9, 14, 15, 17, 19 and 20. The major QTLs for protein and oil contents were mapped on Chr 20. Using an RIL population derived from G. max line Osage and a G. soja accession PI593983, Yang et al. (Reference Yang, La, Gillman, Lyu, Joshi, Usovsky, Song and Scaboo2022) identified two significant QTLs for oil contents on Chr 8 and 20 with the log of odds (LOD) values 9.8–25.9, and four significant QTLs for protein content on Chr 14 and 20 with LOD values of 5.3–31.7. The results showed that wild soybean allele on these QTLs could enhance protein content suggesting that wild soybean is a potential genetic resource for improving soybean protein content.
On the other hand, wild soybean possesses some undesirable agricultural traits such as small seed size, twining stem, and pod shattering that make wild soybean direct usage in soybean breeding programme difficult. To address this problem, backcrossing was used as an efficient way to eliminate the undesirable agricultural traits of wild soybean and retain the target traits in the progenies of cultivated soybean and wild soybean. With the backcrossing strategy, advanced backcrossing populations, such as chromosome segment substitution lines (CSSL), have been developed to identify favorable genes (allele) in wild soybean (Wang et al., Reference Wang, He, Yang, Xiang, Zhao and Gai2013; He et al., Reference He, Yang, Xiang, Tian, Wang, Zhao and Gai2015). In our previous study, a BC3F5 wild soybean CSSL population was created and was used to identify seed weight and flowering time QTLs (Liu et al., Reference Liu, Yan, Fujita and Xu2018a; Reference Liu, Yan, Fujita and Xu2018b). In the present study, we created a BC4F6 wild soybean CSSL population. The population was cultivated in field conditions for 3 years to identify favourable protein QTL alleles from wild soybean for soybean breeding to improve seed quality.
Materials and methods
Plant materials
A CSSL population with 113 lines was derived from a cross between a cultivated soybean ‘Jackson’ and a wild soybean accession JWS156-1. The cultivated soybean variety ‘Jackson’ (PI548657) was obtained from the US National Plant Germplasm System (NPGS), and the wild soybean accession JWS156-1 was originally from the Kinki area of Japan provided by the National BioResource Project (Lotus japonicus and G. max) (https://legumebase.nbrp.jp/legumebase/top.jsp). The CSSLs were developed by crossing ‘Jackson’ with JWS156-1 and backcrossed with ‘Jackson’ for four generations, followed by successive self-pollination until the BC4F6 generation. The protein content, determined by Kjeldahl method (Nozawa et al., Reference Nozawa, Hakoda, Sakaida, Suzuki and Yasui2005) and the oil content, determined by Soxhlet method (Rodrigues et al., Reference Rodrigues, Cardozo-Filho and Silva2017) of ‘Jackson’ was 35.3% and 21.8%, while those of JWS156-1 was 44.0% and 10.6%.
Field experiment and measurement of protein and oil contents
The 113 BC4F6 CSSLs and their recurrent parent ‘Jackson’ were cultivated in 2018, 2019 and 2020 in the experimental farm of the Japan International Research Center for Agricultural Sciences, Japan (36.05°N, 140.08°E). All CSSLs were randomly arranged with one replication (2018 and 2019) or two replications (2020) in the field conditions. Each line was planted in a single-row plot 6 m long, with 60 cm spacing between rows and 20 cm spacing between plants. Seeds were harvested as plot bulk. Seeds were dried naturally, and seed protein and oil contents were measured using an InfraTec Nova instrument (FOSS Analytics, Hillerød, Denmark). The broad-sense heritability of protein and oil contents was calculated using the R ‘inti’ package (Lozano-Isla, Reference Lozano-Isla2021) developed to analyse multi-environment trials using the linear mixed model.
Simple sequence repeats (SSR) marker analysis
A total DNA sample was extracted from the leaf tissue using a modified CTAB method. A total of 243 SSR markers that showed polymorphism between the two original parents, ‘Jackson’ and JWS156-1, were analyzed for the CSSL population. All SSR markers were selected from the genetic maps of Song et al. (Reference Song, Marek, Shoemaker, Lark, Concibido, Delannay, Specht and Cregan2004, Reference Song, Jia, Zhu, Grant, Nelson, Hwang, Hyten and Cregan2010) and Cregan et al. (Reference Cregan, Jarvik, Bush, Shoemaker, Lark, Kahler, Kaya, VanToai, Lohnes, Chung and Specht1999). The PCR mixture comprised 3 μl (10–50 ng) template DNA, 2 μl (10 pmol) of each primer, and 10 μl Quick Taq™ HS DyeMix (Toyobo, Osaka, Japan) in a total volume of 20 μl. PCR amplification was performed in a TAdvanced 384 thermal cycler (Biometra, Göttingen, Germany) with the following PCR reaction parameters: 94 °C for 30 s, followed by 30 cycles of 30 s at 94 °C, 30 s at 57 °C, 40 s at 72 °C, and a final extension at 72 °C for 10 min. After amplification, 10 μl of each PCR product was separated on 8% denaturing polyacrylamide gel in 1 × TBE running buffer for ~240 min at 200 V and stained using ethidium bromide. The gel was scanned using the PharosFX Molecular Imager (Bio-Rad Laboratories, Hercules, CA, USA) to detect PCR fragment polymorphism.
QTL analysis
A software QTL IciMapping (Wang et al., Reference Wang, Li, Zhang and Meng2016) was employed to identify QTL associated with protein content, oil content, and a sum of protein and oil (protein + oil) content in the CSSL population using the RSTEP-LRT-ADD (Stepwise regression for additive QTL) method. A threshold of LOD score >3 was set to declare the existence of a QTL.
Validation of the major protein QTL qPro19 on Chr 19
To validate a major protein QTL on Chr 19 (qPro19) detected in the present study, near-isogenic lines (NIL) were developed by crossing a BC4F6 CSSL JJ4-188, which harboured the JWS156-1 allele on qPro19, with ‘Jackson’. BC5F2 plants were selected by using SSR markers BARCSOYSSR_19_0773, BARCSOYSSR_19_0800, BARCSOYSSR_19_0826, and Satt463. Homozygous JWS156-1 allele plants (BC5F2-W) and homozygous ‘Jackson’ allele plants (BC5F2-C) at qPro19 were selected. These two lines (BC5F2-W and BC5F2-C) had similar genetic backgrounds but differed for the protein QTL qPro19 and thus could be regarded as NILs. The NILs (BC5F2-W and BC5F2-C) were cultivated with three replicates in the field condition in 2022. After reaching maturity, 4 or 5 plants from each BC5F2-W and BC5F2-C in a replicate were harvested in bulk, and subsequently were subjected to protein and oil content measurements. The field cultivation methods and protein and oil content measurements were the same as those of the CSSL population described above.
Introducing the wild soybean allele of qPro19 into a soybean variety ‘Tachiyutaka’
To validate the effect of qPro19 in different genetic backgrounds, the wild soybean allele of qPro19 was introduced into a soybean variety ‘Tachiyutaka’ (PI594289) by crossing ‘Tachiyutaka’ with JWS156-1 and backcrossed with ‘Tachiyutaka’ for four generations, followed by successive self-pollination until the BC4F6 generation. A BC4F6 line (T-678) with homozygous JWS156-1 allele at qPro19 was selected with the assistance of DNA markers (BARCSOYSSR_19_0773, BARCSOYSSR_19_0800, BARCSOYSSR_19_0826, and Satt463). The T-678 line was cultivated in the field conditions for 4 years (2018, 2019, 2020 and 2022). Seed protein and oil contents were measured using the same method described above.
Results
Genotypic characterization of the BC4F6 wild soybean CSSL population
The graphical genotypes of the 113 CSSLs are shown in Fig. S1. All 243 markers had at least one JWS156-1 allele in the 113 CSSLs. The CSSLs were almost recovered by the recurrent parent ‘Jackson’ after backcrossing four times, and no lines with abnormal growth were observed in the 113 CSSLs. The proportion of the recurrent parent ‘Jackson’ alleles in each CSSL ranged from 79.4% to 99.9%, with an average of 94.2%, slightly lower than the expected value of 96.9%.
Phenotypic variations of protein, oil and sum of protein and oil contents in the CSSL population
As shown in Fig. 1, the distributions of seed protein in the CSSLs varied in different years (2018, 2019 and 2020). The range of protein content in the CSSL population was 39.3%–46.2% (average: 41.6%), 39.3%–45.6% (average: 41.9%), and 38.1%–43.8% (average: 40.4%) in 2018, 2019 and 2020, respectively. The oil and protein + oil contents also showed phenotypic variations among different years (Supplementary Figs. S2, S3). These results indicated that the seed protein, oil and protein + oil contents were affected by environmental conditions. On the other hand, protein, oil and protein + oil contents of the 113 CSSLs showed significantly positive correlations among different years (Supplementary Fig. S4). The broad heritability (H2) for protein, oil, and protein + oil contents were 64.3%, 82.8% and 41.2%, respectively.
As depicted in Fig. 1, the CSSLs showed higher and lower values than those of the recurrent parent ‘Jackson’ for the seed protein, oil and protein + oil contents, suggesting transgressive segregation. Of the 113 CSSLs, 85 CSSLs (75.2%) showed higher protein content than their current variety, ‘Jackson’, in terms of the aggregate data of the 3 years, suggesting that wild soybean is a potential genetic resource for enhancing protein content in cultivated soybean. In contrast, only 49 CSSLs (43.4%) showed higher oil content than their current variety ‘Jackson’. In the case of the protein + oil content, 89 CSSLs (78.8%) showed higher values than ‘Jackson’.
The correlations among the protein, oil and protein + oil contents were consistent across the 3 years and also in the aggregate data of the 3 years (Fig. 2). The correlations between protein and oil contents were significantly negative, while those between the protein and protein + oil contents were significantly positive. The smallest correlation (negative) was observed between oil and protein + oil contents.
QTL analysis for protein and oil contents
Five QTLs associated with protein content were identified on Chr 1, 4, 8, 13 and 19 (Table 1). The markers BARCSOYSSR_19_0773 and Satt462, which are located in a nearby region, were regarded as the same QTL. The phenotypic variance explained (PVE) by the QTL values ranged from 7.27% to 19.31%. Wild soybean alleles at all the QTLs associated with protein content contributed to increasing effects on protein content except the qPro1 detected on Chr 1. Protein QTLs, qPro8 and qPro19, were identified in the data from all 3 years and in the aggregate data of the 3 years and thus were regarded as major and stable QTLs. The other protein QTLs were only detected in 1 year, i.e. 2020.
a The QTL name is defined by the trait name, chromosome number and its order on the chromosome.
b Starting point of the marker.
c The log of odds (LOD) value calculated from RSTEP-LRT-ADD method.
d Phenotypic variance explained by the QTL.
e The aggregate data of the 3 years.
Four QTLs associated with oil content were identified on Chr 8, 12 and 16 (Table 1). The PVE values ranged from 7.24% to 28.47%. Wild soybean allele corresponding to all the oil-content QTLs contributed to a reduction in oil content except the qOil16.1 detected in the data from 2019. The oil-content QTL, qOil8, was identified in all 3 years and the aggregate data of the 3 years, and thus was regarded as a major and stable QTL. The other QTLs for oil content were only detected in 1 or 2 years. Interestingly, qOil8 was located in the same position as the protein content QTL, qPro8 (Table 1, Fig. 3).
Three QTLs associated with protein + oil content were detected on Chr 8, 18 and 19, with PVE values ranging from 9.26% to 20.10% (Table 1). Of these, a QTL on Chr 19 (qP + O19), which was located at the same position as the protein content QTL qPro19, was also identified across the data from the 3 years of experiments and also in the aggregate data of the 3 years.
Validation of the major protein content QTL qPro19
To verify the effect of the protein content QTL qPro19 on Chr 19, two qPro19 NILs (BC5F2-W and BC5F2-C) were evaluated in the field conditions. NIR analysis revealed that BC5F2-W had a protein content of 45.11% ± 0.14%, which was significantly (P < 0.001) higher than that of the BC5F2-C (44.43% ± 0.23%) (Fig. 4). In contrast, no significant difference for oil content was observed between BC5F2-W (18.13% ± 0.05%) and BC5F2-C (18.18% ± 0.20%) (Fig. 4). This result confirmed the effect of qPro19, as an enhancer of protein content without reducing oil content in the soybean seeds.
Introducing the wild soybean allele of qPro19 into a soybean variety ‘Tachiyutaka’
To validate the effect of qPro19 in a different genetic background, the wild soybean allele of qPro19 was introduced into a soybean variety ‘Tachiyutaka’, and a BC4F6 line (T-678) with homozygous JWS156-1 allele at qPro19 was obtained based on genotypes of SSR markers BARCSOYSSR_19_0773, BARCSOYSSR_19_0800, BARCSOYSSR_19_0826 and Satt463. Field evaluations revealed that T-678 showed significantly higher protein content in comparison with its original parent ‘Tachiyutaka’. In contrast, no significant difference in oil content was observed between T-678 and ‘Tachiyutaka’ (Fig. 5). The effect of qPro19 was thus validated in a different genetic background.
Discussions
In soybean, protein and oil contents generally showed a negative relationship. This might be because these traits are associated with the same gene controlling both the protein and oil synthesis pathways. The QTLs for protein and oil contents might thus be mapped in the same position. For instance, Chung et al., (Reference Chung, Babka, Graef, Staswick, Lee, Cregan, Shoemaker and Specht2003) identified a QTL associated with protein, oil, and yield on Chr 20 using a RIL population derived from a cross between a high-protein G. max accession PI 437088A and a high-yield cultivar ‘Asgrow A3733.’ This QTLs was flanked by the SSR markers Satt496 and Satt239, with the allele from PI 437088A increasing protein content but decreasing oil content and yield. Additionally, Wang et al. (Reference Wang, Liu, Wang, Yokosho, Zhou, Yu, Liu, Frommer, Ma, Chen, Guan, Shou and Tian2020) and Peng et al. (Reference Peng, Qian, Wang, Liu, Song, Cheng, Yuan and Zhao2021) identified and revealed that GmSWEET10a (Glyma.15G049200) caused simultaneous changes in seed size, oil content and protein content. In the present study, a chromosome segment (QTL) around SSR marker Sat_212 on Chr 8 was identified to be associated with protein, oil and protein + oil contents across 3 consecutive years (2018. 2019 and 2020) along with the aggregate data of the 3 years. The PVE values of these QTLs ranged from 11. 48% to 19.31% for protein content, 23.17–28.47% for oil content and 11.61% (2018) for protein + oil content, indicating that these QTLs were the major stable QTLs controlling seed protein and oil contents. Several QTLs for protein or oil content have been already reported around this region. Warrington et al. (Reference Warrington, Abdel-Haleem, Hyten, Cregan, Orf, Killam, Bajjalieh and Boerma2015) reported a QTL associated with the lysine/crude protein ratio on Chr 8, which was near the QTL on Chr 8 identified in the present study. Zhang et al. (Reference Zhang, Hao, Zhang, Zhang, Wang, Du, Kan and Yu2021) reported a QTL for water-soluble protein content associated with an SNP marker AX-93930669 located on the physical position of 8,276,381 bp on Chr 8. In addition, Lu et al. (Reference Lu, Wen, Li, Yuan, Li, Zhang, Huang, Cui and Du2013), Pathan et al. (Reference Pathan, Vuong, Clark, Lee, Shannon, Roberts, Ellersieck, Burton, Cregan, Hyten, Nguyen and Sleper2013), Reinprecht et al. (Reference Reinprecht, Poysa, Yu, Rajcan, Ablett and Pauls2006) and Zhang et al. (Reference Zhang, Lü, Chu, Zhang, Zhang, Yang, Li and Yu2017) also reported protein-related QTLs close to the QTLs identified in the present study on Chr 8.
The QTLs of qPro8, qOil8, and qP + O8 were located in the same region and thus it may be speculated that the same gene might control them. Shook et al. (Reference Shook, Zhang, Jones, Singh, Diers and Singh2021) and Zhang et al. (Reference Zhang, Hao, Zhang, Zhang, Wang, Du, Kan and Yu2021) identified Glyma.08G107800, which belongs to the aspartokinase group, as a casual gene for the protein and oil contents QTLs on Chr 8. Glyma.08G10780 was located at approximately 0.74 Mb to the QTLs detected in the present study (qPro8, qOil8 and qPO8). Therefore, Glyma.08G10780 is most likely the candidate gene underlying qPro8, qOil8 and qP + O8.
A major stable QTL (qPro19) for protein content was identified on Chr 19. Wild soybean allele increased protein content with additive effects ranging from 0.81% to 1.03%. A QTL for protein + oil content (qP + O19) was also identified in the region of qPro19 with additive effects from 0.84% to 0.92%. This might be due to the close relationship between protein and protein + oil contents. However, no QTL for oil content was detected in the region of qPro19, indicating that the wild soybean allele of this QTL enhanced protein content but did not reduce oil content. Previous studies have reported several QTLs on Chr 19. Tajuddin et al. (Reference Tajuddin, Watanabe, Yamanaka and Harada2003) reported a QTL for protein content on Chr 19 flanked by SSR marker Satt156 at a distance of approximately 6.4 Mb from qPro19 (BARCSOYSSR_19_0773), the QTL identified in our study. Orf et al. (Reference Orf, Chase, Jarvik, Mansur, Cregan, Adler and Lark1999) also reported a QTL flanked by SSR marker Satt166 for protein content on Chr 19, and this QTL was around 8.4 Mb apart from qPro19 (BARCSOYSSR_19_0773). In addition, Chapman et al. (Reference Chapman, Pantalone, Ustun, Allen, Landau-Ellis, Trigiano and Gresshoff2003) reported a QTL for protein content on Chr 19 flanked by an SSR marker Satt373, and this QTL was at a distance of 15.1 Mb to qPro19 (BARCSOYSSR_19_0773). These QTLs may not be identical to the QTL, qPro19, identified in the present study.
In the qPro19 region, there was a Glyma.19G102100, which was a homolog gene of Glyma.08G107800 (the candidate gene of qPro8, qOil8, and qPO8). Glyma.19G102100 might be considered as the candidate gene of qPro19. However, Glyma.08G107800 was speculated to function for both protein and oil contents, but this function was not found to be consistent with the results that qPro19 has no effect on oil content.
In soybean, genes controlling protein content generally had pleiotropic effects, particularly showing a negative effect on oil content (Chung et al., Reference Chung, Babka, Graef, Staswick, Lee, Cregan, Shoemaker and Specht2003; Wang et al., Reference Wang, Liu, Wang, Yokosho, Zhou, Yu, Liu, Frommer, Ma, Chen, Guan, Shou and Tian2020; Peng et al., Reference Peng, Qian, Wang, Liu, Song, Cheng, Yuan and Zhao2021; Xu et al., Reference Xu, Wang, Zhang, Zhang, Liu, Song, Zhu, Cui, Chen and Chen2022). Up to now, nine candidate genes associated with seed protein content have been identified in soybean, as reviewed by Liu et al. (Reference Liu, Liu, Hou and Li2023). Most of these genes impact both protein and oil contents. The candidate gene underlying the QTLs of qPro19 appears to be functionally different from these genes involved in controlling protein biosynthesis. Since oil content remained unaffected, the gene underlying the QTL of qPro19 might not be involved in the biosynthesis of soybean seed oil.
Some QTLs associated with protein and oil contents were identified in the data from only 1 or 2 years. For example, protein content QTLs of qPro1, qPro4 and qPro13 were detected only in the data from 2020. Oil content QTLs of qOil16.1 and qOil16.2 were identified in 2019, and qOil12 was identified in 2018 and 2020. Further validations are required for such unstable QTLs.
Soybean breeders have made efforts to balance the increased seed protein content with oil content in a soybean variety. As demonstrated in the present study, by introducing the wild soybean allele of qPro19 into a Japanese variety, ‘Tachiyutaka,’ a backcrossed line with enhanced protein content without decreased oil content was obtained. Identifying the QTL allele from wild soybean that increased seed protein content without reducing the oil content might open a new approach to improving soybean seed quality through breeding practices.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1479262123000850.
Acknowledgements
This research was funded in-part by the Japan International Research Center for Agricultural Sciences (JIRCAS) under a research project ‘Resilient crops’.
Author contributions
Xu D. H. and Wang Q. Y. conceived the study. Park C., Nguyen T. T., and Liu D. Q. performed the experiments and data analyses. Park C. and Xu D. H. wrote the manuscript. All authors read and approved the manuscript.
Competing interests
The authors declare that they have no competing interests.