Introduction
Legumes are a large and diverse family of plants that provide us with sustainable sources of food, feed, fuel and feedstocks for industry (Foyer et al. Reference Foyer, Lam, Nguyen, Siddique, Varshney, Colmer, Cowling, Bramley, Mori and Hodgson2016; Semba et al. Reference Semba, Ramsing, Rahman, Kraemer and Bloem2021). Well-known legumes include: the common garden pea, which was instrumental in the work of Mendel that established the field of genetics; soybean, the most important legume on earth in terms of total seed production; other food legumes such as peanut, various types of bean, chickpea, lentil, liquorice, carob; forage species such as alfalfa/lucerne and clover; and many species of trees, such as acacias in Australia. Soybean is an important annual oil seed crop, but another legume, the perennial tree Pongamia pinnata, is emerging as a potential source of oil for biodiesel and sustainable aviation fuel, especially as it can be grown on land unsuitable for food crops (Scott et al. Reference Scott, Pregelj, Chen, Hadler, Djordjevic and Gresshoff2008).
Legumes play key roles in natural ecosystems because of their ability to form nitrogen-fixing symbioses with bacteria called rhizobia, which represent a major source of useful nitrogen (N) for terrestrial ecosystems (Canfield et al. Reference Canfield, Glazer and Falkowski2010; Gou et al. Reference Gou, Reich, Qiu, Shao, Wei, Wang and Wei2023). For the same reason, legumes were crucial to the emergence and/or persistence of agriculture in otherwise N-depauperate soils. Indeed, legumes remain essential for N-supply in sustainable agricultural systems, even though they have been sidelined somewhat in conventional agricultural systems by the massive use of synthetic N-fertilizers. Amounting to well over 100 million tonnes of N per year, fertilizer-N helped fuel the Green Revolution and subsequent massive growth in the human population over the past 60 years (Smil Reference Smil2001). Synthetic fertilizer-N use is a double-edged sword, however, as it generates massive amounts of carbon dioxide during its production, distribution and application, and another, 300-times more potent greenhouse gas, nitrous oxide, when metabolized by soil microbes (Snyder et al. Reference Snyder, Bruulsema, Jensen and Fixen2009; Pan et al. Reference Pan, He, Lin, Fan and Chang2022). Additionally, about half of N-fertilizers applied to agricultural fields globally are not used by the target crop and are lost to the environment via leaching, erosion and gaseous emissions, which wreaks havoc with natural ecosystems and human health (Sutton et al. Reference Sutton, Oenema, Erisman, Leip, van Grinsven and Winiwarter2011; Steffen et al. Reference Steffen, Richardson, Rockström, Cornell, Fetzer, Bennett, Biggs, Carpenter, De Vries and De Wit2015). This brings us back to legumes, which by virtue of SNF have much higher intrinsic nitrogen use efficiency (NUE: the ratio of N captured in plant products divided by N-inputs) in cropping systems, e.g., 80% for soybean, compared with other crops such as cereal, which hover around 40% (Zhang et al. Reference Zhang, Davidson, Mauzerall, Searchinger, Dumas and Shen2015). Ultimately, this means that less nitrogen is lost from agricultural soils because of legume cultivation than from other crops, such as cereals (Udvardi et al. Reference Udvardi, Below, Castellano, Eagle, Giller, Ladha, Liu, Maaz, Nova-Franco and Raghuram2021).
Despite their ability to use atmospheric nitrogen for growth, legumes prefer soil nitrogen for energetic reasons, which typically supplies about 30–60% of total plant-N (Herridge et al. Reference Herridge, Peoples and Boddey2008). Thus, legumes in agriculture derive about 40–70% of their N from the atmosphere, well short of the theoretical maximum of 100% that can be approached under experimental conditions when no N is provided in the growth substrate. Whatever soil-N is removed with harvested plant parts is no longer available to subsequent crops. Of course, the same is true of nitrogen derived from the atmosphere (Ndfa), although some of this N remains in the soil system in the form of crop residue. Thus, legume cultivation results in a net gain of soil-N in many systems (Herridge et al. Reference Herridge, Peoples and Boddey2008). Nonetheless, given the gap between potential and actual %Ndfa achieved by legumes in agriculture, there are opportunities to increase this net gain further by improving SNF in ways we explore below.
Genetics and Genomics of SNF
Symbiotic nitrogen fixation (SNF) in legumes is a complex trait that involves the development of specialized organs called nodules, which generally form on roots (Brewin Reference Brewin1991; Ferguson et al. Reference Ferguson, Indrasumunar, Hayashi, Lin, Lin, Reid and Gresshoff2010). During nodule development, rhizobia bacteria typically enter plant cells via ‘infection threads’ in epidermal root hair cells that provide access to underlying cortical cells, which accommodate and feed the bacteria as they multiply and eventually differentiate into nitrogen-fixing ‘bacteroids’. In exchange for ammonium derived from nitrogen fixation, the plant provides its microsymbionts with all the nutrients required for growth, including organic compounds that are used for energy and biosynthesis and inorganic elements such as iron, sulphur and molybdenum that are required for the production of enzymes such as nitrogenase, which catalyses nitrogen fixation (Udvardi and Poole Reference Udvardi and Poole2013). Ultimately, ammonium produced by bacteroids is assimilated into amino acids and other compounds by host plant cells before export from nodules to the rest of the plant.
Thousands of plant and bacterial genes are expressed during nodule development and differentiation and presumably are required for SNF (Colebatch et al. Reference Colebatch, Desbrosses, Ott, Krusell, Montanari, Kloska, Kopka and Udvardi2004; Benedito et al. Reference Benedito, Torres-Jerez, Murray, Andriankaja, Allen, Kakar, Wandrey, Verdier, Zuber and Ott2008). Indeed, hundreds of plant genes have been shown to be necessary for SNF, via genetic studies (Roy et al. Reference Roy, Liu, Nandety, Crook, Mysore, Pislariu, Frugoli, Dickstein and Udvardi2020). These mostly rely on loss of gene function, from mutations in the DNA or interference in gene expression at the level of RNA (RNAi), to illuminate the roles of genes based on aberrant phenotypes (Szczyglowski et al. Reference Szczyglowski, Shaw, Wopereis, Copeland, Hamburger, Kasiborski, Dazzo and de Bruijn1998; Penmetsa and Cook Reference Penmetsa and Cook2000; Ott et al. Reference Ott, van Dongen, Gu, Krusell, Desbrosses, Vigeolas, Bock, Czechowski, Geigenberger and Udvardi2005; Tadege et al. Reference Tadege, Wen, He, Tu, Kwak, Eschstruth, Cayrel, Endre, Zhao and Chabaud2008; Fukai et al. Reference Fukai, Soyano, Umehara, Nakayama, Hirakawa, Tabata, Sato and Hayashi2012; Zhang et al. Reference Zhang, Zhang, Jiang, Qiu, Jia, Wang, Ye and Song2022). Genes involved in various aspects of nodule development and function have been characterized genetically, including: (i) initial chemical signalling between the plant and rhizobia, which determines partner compatibility, and subsequently triggers gene expression and cell division in the plant, initiating nodule development; (ii) cell biological processes required for bacterial entry and accommodation in plant cells; (iii) nitrogen-dependent repression and autoregulation of nodule development, which restricts and optimizes the number of nodules produced; (iv) nutrient transport and nodule metabolism that enables metabolic cooperation between the symbiotic partners; (v) nodule oxygen homeostasis to establish the low-oxygen environment required by oxygen-labile nitrogenase; and, finally, (vi) nodule senescence, which can shut down nitrogen fixation in response to increased soil-N, environmental stress, and internal developmental cues, and enables recycling of resources to the rest of the plant (reviewed in Roy et al. Reference Roy, Liu, Nandety, Crook, Mysore, Pislariu, Frugoli, Dickstein and Udvardi2020).
Discovery of genes involved in SNF was largely facilitated by whole genome sequencing, assembly and annotation of the model legumes, Medicago truncatula and Lotus japonicus (Sato et al. Reference Sato, Nakamura, Kaneko, Asamizu, Kato, Nakao, Sasamoto, Watanabe, Ono and Kawashima2008; Young et al. Reference Young, Debellé, Oldroyd, Geurts, Cannon, Udvardi, Benedito, Mayer, Gouzy and Schoof2011) and of crops such as soybean (Schmutz et al. Reference Schmutz, Cannon, Schlueter, Ma, Mitros, Nelson, Hyten, Song, Thelen and Cheng2010). Genome sequence information accelerated map-based cloning of genes, identification of mutations caused and ‘tagged’ by foreign DNA sequences, and selection of genes for ‘reverse genetics’, in which individuals with defects in specific genes or their expression are isolated before determining the mutant phenotype (Roy et al. Reference Roy, Liu, Nandety, Crook, Mysore, Pislariu, Frugoli, Dickstein and Udvardi2020).
Given the complexity of SNF, the thousands of genes involved and the countless interactions between the products of these genes and the processes they control, it is little wonder that genetic knowledge of SNF has not yet been harnessed to increase SNF effectiveness in crop or pasture legumes. Indeed, with so many ‘players’, it is hard to know where to ‘place your bets’, at least when it comes to contemplating engineering approaches. We will come back to this later, but next consider more conventional approaches to plant improvement that use natural variation in traits of interest.
Natural Variation and Opportunities to Improve SNF
Decades of research have shown that legume SNF effectiveness (quantified as Ndfa) varies with changes in the environment (E), including edaphic or soil factors and climatic conditions (Sulieman and Tran Reference Sulieman and Tran2016; Santachiara et al. Reference Santachiara, Salvagiotti and Rotundo2019), and with the strain of rhizobia chosen as partner (rhizobial genotype, Gr; Mendoza-Suárez et al. Reference Mendoza-Suárez, Geddes, Sánchez-Cañizares, Ramírez-González, Kirchhelle, Jorrin and Poole2020; Westhoek et al. Reference Westhoek, Clark, Culbert, Dalchau, Griffiths, Jorrin, Karunakaran, Ledermann, Tkacz, Webb, James, Poole and Turnbull2021). The resulting knowledge has guided agricultural management practices (M) to optimize SNF, or at least growth and yield of legume crops and forages (GRDC). Few studies have examined the plant genetic (Gp) contribution to SNF effectiveness, although it impacts SNF effectiveness through interactions with the other factors (conceptually, SNF = Gp × Gr × E × M). Recent studies with collections of diverse genotypes of several species have found variation in SNF associated with plant genotype and have mapped genetic loci, or Quantitative Trait Loci (QTL) for this complex trait in common bean, soybean and peanut (Kamfwa et al. Reference Kamfwa, Cichy and Kelly2019; Yang et al. Reference Yang, Yang, Xu, Lv and Liao2019; Bazzer et al. Reference Bazzer, Ray, Smith, Fritschi and Purcell2020; Thilakarathna et al. Reference Thilakarathna, Torkamaneh, Bruce, Rajcan, Chu, Grainger, Szczyglowski, Hill and Raizada2021; Nzepang et al. Reference Nzepang, Gully, Nguepjop, Zaiya Zazou, Tossim, Sambou, Rami, Hocher, Fall and Svistoonoff2023; Krueger et al. Reference Krueger, Ray, Smith, Dhanapal, Arifuzzaman, Gao and Fritschi2024). Interestingly, breeding for increased seed yield of soybean in Canada over 100 years has done little to increase %Ndfa or Ndfa per plant of current versus older varieties (Thilakarathna et al. Reference Thilakarathna, Torkamaneh, Bruce, Rajcan, Chu, Grainger, Szczyglowski, Hill and Raizada2021). Given available natural variation in SNF effectiveness within species, there are clearly opportunities to increase %Ndfa and total Ndfa associated with these crops, especially if Ndfa is measured directly or estimated accurately in the process, rather than simply relying on seed biomass/yield or even total seed N as a proxy for SNF.
One approach that we are taking begins with public ‘core’ or ‘mini-core’ collections of genotypes that represent much of the genetic diversity of the species, such as the international mungbean mini-core collection of 296 genotypes (Schafleitner et al. Reference Schafleitner, Nair, Rathore, Wang, Lin, Chu, Lin, Chang and Ebert2015). These lines are then phenotyped for SNF and related traits and genotyped to identify sequence variation associated with variation in traits of interest. ‘Genomic prediction’ models can then be generated that predict the performance of individuals based on the specific set of sequence variants, or haplotypes, they possess across the whole genome. These models are based on the phenotypic and genotypic profiles of a ‘training set’ of lines analysed initially, e.g., a mini-core collection or part thereof. The models can be tested by their ability to predict the performance of lines not in the training set based on their genotypes. Ultimately, such models can be used to select parents for crossing that are likely to generate offspring with more favourable sets of haplotypes, to optimize performance. This process can be aided by artificial intelligence to select a series of crosses over multiple generations to obtain an optimal ‘stack’ of haplotypes to increase %Ndfa, total Ndfa, etc., ideally without compromising yield (Hickey et al. Reference Hickey, Hafeez, Robinson, Jackson, Leal-Bertioli, Tester, Gao, Godwin, Hayes and Wulff2019; Hayes et al. Reference Hayes, Chen, Powell, Dinglasan, Villiers, Kemper and Hickey2023). The latter can be avoided by including yield data in prediction models and selecting for multiple desired traits in parallel (Hayes et al., Reference Hayes, Chen, Powell, Dinglasan, Villiers, Kemper and Hickey2023).
Attempts to harness novel variability in legume species for plant breeding are being aided by pan genome projects. A pan genome is the collective set of DNA sequences of all or a representative subset of individuals of a species, much of which is shared between all individuals (the core genome), while a significant fraction, ranging from 4.8% in narrow-leafed lupin to 49.9% in soybean, for example (Garg et al. Reference Garg, Kamphuis, Bayer, Kaur, Dudchenko, Taylor, Frick, Foley, Gao, Lieberman, Edwards and Singh2022; Liu et al. Reference Liu, Du, Li, Shen, Peng, Liu, Zhou, Zhang, Liu and Shi2020), is present in some but not all members of the species (the ancillary genome representing presence–absence variations (PAV) and gene copy number variation (CNV)). Initial attempts to assemble pan genomes relied on a single, high-quality reference sequence, which was used to identify core and ancillary DNA sequences in other individuals of the species, mainly using relatively short sequences of a few hundred base pairs from high-throughput sequencing instruments. Such an approach leaves a lot of ancillary genome sequences ‘orphaned’ and unable to be positioned on specific chromosomes. Very recently, however, high-throughput, long-read (thousands to tens of thousands of base pairs) sequencing has made it possible to assemble, de novo, entire genomes of multiple, diverse individuals of a species (Sharma et al. Reference Sharma, Masouleh, Topp, Furtado and Henry2022). For example, we recently sequenced eight diverse genomes of mungbean, using HiFi PacBio sequencing, which each assembled into the 11 chromosomes of mungbean with cumulative sequences ranging from 512 to 577 Mbp covering more than 99% of expected conserved genes (Mens et al. Reference Mensunpublished). The resulting mungbean pan-genome will aid our efforts to improve SNF in this species, via predictive plant breeding, as outlined above.
Control of SNF and Opportunities to Engineer It
Biological nitrogen fixation, which takes place exclusively in prokaryotes (eubacteria and archaebacteria), requires biological energy (adenosine triphosphate (ATP) and reducing equivalents (e-), see reaction below) and oxygen levels within the organism that are orders of magnitude lower than ambient levels, because nitrogenase enzyme complexes are rapidity inactivated by free oxygen.

where ADP is adenosine diphosphate and Pi is a phosphate. The oxygen-labile nature of nitrogenase causes a conundrum for many nitrogen-fixing diazotrophs because they require oxygen for respiratory metabolism and energy production. This ‘paradox’ has been resolved in multiple, interesting ways via evolution. These include barriers to reduce oxygen diffusion into cells, rapid respiration to maintain steady-state levels of oxygen low enough for nitrogenase to function, and spatial or temporal separation of nitrogen fixation from photosynthesis, which generates oxygen in photoautotrophic diazotrophs (Robson and Postgate Reference Robson and Postgate1980). In nitrogen-fixing legume nodules, the plant and resident rhizobia work together to achieve nanomolar concentrations of oxygen in infected cells, as opposed to the equilibrium level of 250 micromolar oxygen in sterile water. This is achieved via a combination of gaseous diffusion resistance in the outer cell layers of nodules, high concentrations of the oxygen-binding and transporting protein leghemoglobin, and high rates of respiration in plant and rhizobial cells within nodules (Ott et al. Reference Ott, van Dongen, Gu, Krusell, Desbrosses, Vigeolas, Bock, Czechowski, Geigenberger and Udvardi2005; Layzell and Hunt Reference Layzell and Hunt1990; Appleby Reference Appleby1984; Dakora and Aitkins Reference Dakora and Aitkins1989; Bryan et al. Reference Bryan, Hunt, Glenn, Walsh, Roy, David and Layzell1988). Leghemoglobins are the most abundant plant proteins in nodules, and the corresponding gene families in legumes have expanded relative to those in non-legumes, presumably because extra copies were selected for their crucial role in nodule oxygen homeostasis. Indeed, the leghemoglobin genes are the archetypal nodulin genes, or genes expressed specifically in nodules. It remains to be seen whether some or all legumes have evolved optimal levels of leghemoglobin for maximal SNF. It will be interesting to determine if there are differences in leghemoglobin levels in different genotypes of the same species and whether these correlate with SNF performance. If there is latitude to further optimize leghemoglobin levels in some species, this could be done by plant breeding using natural variation, and/or by engineering projects that increase the number or activity of leghemoglobin genes. The latter could conceivably be done by adding genes to plant genomes or by genome editing existing genes to increase their expression, for example.
Typically, diazotrophs control the expression of nitrogenase genes in several ways, including repression by oxygen and repression by alternative sources of N, to avoid wasting resources and energy on a process that either cannot proceed or is not needed (Roberts and Brill Reference Roberts and Brill1981; Merrick and Edwards Reference Merrick and Edwards1995). The curious thing about rhizobia is that although they retain oxygen control over nitrogenase gene expression (Fischer Reference Fischer1994; Rutten and Poole Reference Rutten and Poole2019), they appear to have lost the capacity to repress nitrogenase expression in response to alternative sources of N (Udvardi et al. Reference Udvardi, Lister and Day1992; Udvardi and Day Reference Udvardi and Day1997). This is great for legumes, as they are able to ‘corner’ rhizobia into fixing nitrogen and into releasing ammonia to host cells in nodules, by controlling oxygen levels and a strict diet of carbon compounds they deliver to their captive rhizobia (Ott et al. Reference Ott, van Dongen, Gu, Krusell, Desbrosses, Vigeolas, Bock, Czechowski, Geigenberger and Udvardi2005; Schulte et al. Reference Schulte, Borah, Wheatley, Terpolilli, Saalbach, Crang, de Groot, Ratcliffe, Kruger, Papachristodoulou and Poole2021). Despite the apparent loss of bacterial N-control over SNF, plants impose their own layer of N-control on the process, by repressing nodule development and/or nodule persistence when sufficient N is available in the soil (Figure 1). Control of nodulation is affected both locally and systemically via a system involving rhizobia- and/or nitrate-responsive signalling peptides, called CLAVATA3/endosperm-surrounding region-related (CLE) peptides that interact with leucine-rich receptor-like kinases in the root or shoot during autoregulation of nodulation (AON) or nitrate-dependent regulation of nodulation respectively (as reviewed in Ferguson et al. Reference Ferguson, Mens, Hastwell, Zhang, Su, Jones, Chu and Gresshoff2019). This results in lower abundance of the micro-RNA miR2111, which under conditions allowing for nodulation targets the mRNA of too much love (TML, a root regulator in Lotus japonicus) for degradation (Tsikou et al. Reference Tsikou, Yan, Holt, Abel, Reid, Madsen, Bhasin, Sexauer, Stougaard and Markmann2018; Zhang et al. Reference Zhang, Su, Gresshoff and Ferguson2021). This F-box protein in turn targets an unknown regulator of cell division and nodule development for degradation, resulting in inhibition of nodulation (Takahara et al. Reference Takahara, Magori, Soyano, Okamoto, Yoshida, Yano, Sato, Tabata, Yamaguchi, Shigenobu, Takeda, Suzaki and Kawaguchi2013). The miR2111 was first characterized as a regulator of phosphate starvation suggesting a more general role in plant nutrition. In this way, the presence or addition of sufficient soil-N inhibits nodule development, sparing the associated plant resources for growth elsewhere.

Figure 1. Autoregulation and nitrate repression of nodulation in legumes. CLE peptides produced in roots and/or nodules in response to rhizobia (e.g., GmRIC1/2, MtCLE13/35, LjCLE-RS1/2) or to soil nitrate (e.g., GmNIC1, MtCLE35, LjCLE-RS2) act locally within roots in the case of nitrate repression, or systemically through the autoregulation of nodulation (AON) pathway, via CLE receptors belonging to the leucine-rich repeat receptor-like kinase (LRR-RLK) family to repress further nodulation. In the case of AON at least, perception of the CLEs in the shoot results in a decrease of active miRNA (miR2111) that targets transcripts of the F-box protein, too much love (TML) for degradation in the roots. TML is part of the 26S proteasome pathway and targets an unknown positive regulator of nodulation for degradation, thereby inhibiting nodulation.
Legumes are also able to put the brakes on metabolic activity and energy consumption of mature nodules in response to sufficient soil-N, via nitrogen-responsive genes that orchestrate nodule senescence. Senescence is a genetically controlled developmental process that allows plants to recycle cellular constituents and reuse them for growth elsewhere. Nodule senescence is triggered by several environmental and endogenous plant signals (Swaraj and Bishnoi Reference Swaraj and Bishnoi1996; Kazmierczak et al. Reference Kazmierczak, Yang, Boncompagni, Meilhoc, Frugier, Frendo, Bruand, Gruber and Brouquisse2020; Wang et al. Reference Wang, Qiu, Zhu, Wang, Bai, Kuang, Cai, Zhong, Kong and Lü2023), including signals related to soil-N or reproductive development when the demand for N and other nutrients to support seed production is high and vegetative tissues are being sacrificed to supply this demand. Recently, genes encoding transcription factors that regulate nodule senescence (NAC, acronym derived from three reported proteins that contain a highly conserved domain in their N-terminal region) have been described (Wang et al. Reference Wang, Qiu, Zhu, Wang, Bai, Kuang, Cai, Zhong, Kong and Lü2023; Yu et al. Reference Yu, Xiao, Wu, Li, Duan, Chen, Zhu and Cao2023).
Knowledge of mechanisms and genes underpinning N-repression of nodulation and activation of nodule senescence opens the possibility of optimizing nodule numbers and/or prolonging their longevity to increase SNF, especially in agricultural soils with relatively high levels of available soil-N. For instance, by reducing or nullifying the activity of specific genes involved in N-regulation of nodule development, such as one or more of the CLE genes, via genome editing, it may be possible to maintain high levels of nodulation and SNF under conditions of relatively high soil-N. This could potentially spare soil N for subsequent crops, forcing the legume to utilize atmospheric-N instead. While this might be expected to reduce legume productivity slightly, the overall economic, environmental and social benefits to the system, including reduced need for fertilizer-N and concomitant losses to the environment might well outweigh the costs.
Although there are other potential targets for optimizing and increasing SNF, such as control of carbon import and distribution by sugar and organic-acid transporters in nodules and control of nitrogen removal from nodules via amino acid or ureide transporters (Udvardi and Poole Reference Udvardi and Poole2013; Tegeder Reference Tegeder2014) the quantitative nature of SNF, the complexity of gene interaction networks underlying SNF and the environmental challenges that plants face in agricultural systems suggest that the potential for large increases in SNF in the field from changes in one or two genes will be small in most cases. A possible exception to this might be engineering/editing of genes controlling nodule development and longevity in response to soil N. Conceivably, uncoupling nodule development and nitrogen fixation from control by soil-N, which would undermine the competitiveness of legumes in natural ecosystems, could lead to large increases in SNF under high soil-N conditions under non-competitive cropping conditions.
Looking Forward
To summarize, two decades of genetic and genomic research have given us deep insight into the biochemistry and molecular and cell biology of legume nodule development and symbiotic nitrogen fixation. This, together with the recent development of legume pan genomes, the demonstration of natural variation in SNF effectiveness within species and advances in genome editing, genome-based predictive breeding and artificial intelligence to optimize haplotype stacking, opens two complementary pathways to improve SNF in crop and pasture legumes (Figure 2). Starting with a high-quality pan genome representing diverse germplasm (seeds, plants, or plant parts useful in crop breeding) available to plant breeders, one pathway to SNF enhancement (upper pathway in Figure 2) involves phenotyping hundreds of genetically diverse individuals to determine their cumulative nitrogen fixation activity. Integrative measures of SNF, such as mass-spectrometric measurements of 15N/14N ratio and %N in tissues combined with biomass measures, to determine total Ndfa are key to this. Together with genomic sequence information for each individual, they will enable associations to be made between DNA sequence and phenotypic variation across the whole genome. In particular, genome-wide association studies (GWAS) can identify specific genes and gene variants that have the greatest impact on SNF, which will complement existing knowledge of SNF genetics and identify new targets for detailed genetic analysis and, potentially, genome editing (lower pathway in Figure 2). Association genomics also informs genomic prediction models that can guide the genomic selection of best parents to bring together optimal haplotypes for SNF in progeny. After multiple generations of progeny genotyping, selection and crossing, individuals with desired genotypes/haplotypes are tested for SNF effectiveness, with improved genotypes entering commercial breeding programmes. In parallel, genome editing may produce genetic and phenotypic novelty not present in natural populations, thus complementing the genome-enabled breeding pathway. Such novel material can also enter commercial breeding pipelines, most easily so if genome editing is done in genotypes that are similar to, if not identical to, advanced stage breeding material. Thus, there is work to do for both molecular biologists and molecular plant breeders, and a need for dialogue between the two to ensure efforts are coordinated and aligned with breeding objectives. For early-career scientists just entering this area of R&D, there has never been a better time to tackle the task of increasing SNF in legumes to help solve the wicked problem of feeding the growing population without destroying the environment upon which all life depends.

Figure 2. Pathways to increase SNF in legumes. Pan genomes reveal the full genetic potential of a species, including the complete set of genes and variation in DNA sequence and content amongst individuals. This enables discovery of genes involved in SNF, via genome wide association studies (GWAS) of all genes and genetic studies of specific genes. The potential of such genes and their natural alleles to improve SNF can be tested by genomic selection of parents that contribute desired sets of alleles or haplotypes to offspring. In parallel, genome editing can generate novel genetic variation that may be incorporated into breeding programmes, along with optimal haplotype stacks from genomic selection, to increase SNF.
About the Authors
Michael Udvardi is Professor of Legume Genomics with the Queensland Alliance for Agriculture and Food Innovation at the University of Queensland, Australia. He earned his PhD in plant biochemistry from the Australian National University in 1989. He is primarily interested in how plants obtain nitrogen for growth, either as mineral nitrogen from the soil or from atmospheric di-nitrogen via symbiotic nitrogen fixation in bacteria. He has contributed to our understanding of symbiotic nitrogen fixation in legumes, especially of transport and metabolism in root nodules, using biochemical, molecular, genetic and genomic methods. He was amongst the first to characterize ammonium and nitrate transporters in plants. He was part of a large international team that sequenced and analysed the genome of the model legume, Medicago truncatula. Currently, his group focuses on the development of pan-genomic resources to accelerate breeding of tropical pulses, including mungbean and pigeonpea. Dr Udvardi has published over 200 papers in refereed scientific journals. He was Elected Fellow of the American Association for the Advancement of Science in 2012 for his contributions to our understanding of legume biology, especially symbiotic nitrogen fixation. In 2023, he was awarded the Adam Kondorosi Academia Europaea Award for Advanced Research in recognition of the impact of his research on plant–microbe interactions and plant science and his generous service to the scientific community.
Celine Mens is a Postdoctoral Research Fellow with the Queensland Alliance for Agriculture and Food Innovation at the University of Queensland, Australia. Dr Mens obtained her PhD in Plant Molecular Biology at the University of Queensland in 2022, focused on the characterization of molecular signals in Autoregulation of Nodulation and nitrate-dependent regulation of nodulation in model legumes such as soybean and M. truncatula. She is now a postdoc in Michael Udvardi’s lab working on genome assembly and annotation, as well as quantifying symbiotic nitrogen fixation in diverse mungbean populations.
Estelle Grundy is a postdoctoral research fellow working for Professor Michael Udvardi with the Queensland Alliance for Agriculture and Food Innovation at the University of Queensland (UQ), Australia. She recently completed her PhD thesis on the functional characterization of defence genes in soybean nodulation at the Integrative Legume Research Group (School of Agriculture and Food Sustainability, UQ) headed by A/Prof. Brett Ferguson. Her current research is centred on understanding the genetic basis of symbiotic nitrogen fixation (SNF) in mungbean by exploring the diversity of SNF across diverse genotypes.