Requirement for life’s origin: persistent propagation
What was the origin of life? A pre-requisite for answering that question is to define the difference between dead and alive. Defining life has been notoriously challenging (Schrodinger, Reference Schrodinger1944; Cleland and Chyba, Reference Cleland and Chyba2002; Popa, Reference Popa2004; Benner, Reference Benner2010; Machery, Reference Machery2012; Pross, Reference Pross2016; Plaxco and Gross, Reference Plaxco and Gross2021). For example, the ability to metabolise, grow, and duplicate is not sufficient to distinguish life from a candle flame. Nevertheless, a good consensus definition is from NASA: ‘Life is a self-sustaining chemical system capable of Darwinian evolution’ (Joyce et al., Reference Joyce, Deamer and Fleischaker1994). The italics are ours; the clear implication is that life could not begin before some dynamical adaptation process of molecules was already operative.
So, in order to seek the origins of life, we ask what physico-chemical process(es) could have driven prebiotic molecules to become autocatalytic and adaptive. We call this The Day Two Problem, to distinguish it from traditional questions about ‘What Came First’, which we call The Day One Problem.
-
• The ‘Day One’ Question: What material came first? Think of the metaphorical ‘chicken or the egg problem’ (although in the real world, the chicken-and-egg framing is misleading because many things, including chickens and eggs, emerged in parallel, not in series). For example: Did life start as an RNA World (Gilbert, Reference Gilbert1986; Joyce and Szostak, Reference Joyce and Szostak2018)? Or a lipid world (Segré et al., Reference Segré, Ben-Eli, Deamer and Lancet2001; Deamer, Reference Deamer2011), an amyloid world (Maury, Reference Maury2009, Reference Maury2015, Reference Maury2018) or through metabolism first (Wächtershäuser, Reference Wächtershäuser1988; De Duve and De Neufville, Reference De Duve and De Neufville1991; De Duve, Reference De Duve1995; Cody, Reference Cody2004; Shapiro, Reference Shapiro2006; Jordan et al., Reference Jordan, Ioannou, Rammu, Halpern, Bogart, Ahn, Vasiliadou, Christodoulou, Maréchal and Lane2021; Matsuo and Kurihara, Reference Matsuo and Kurihara2021)? Our view is that the full origin of life required many aspects – metabolism, information, protein-like catalysts and makers, and others – to arise together.
-
• The ‘Day Two’ Question: What dynamical process might have driven prebiotic molecules to become self-sustaining and self-serving? How might prebiotic molecules undergo directed change from Day One to Day Two and beyond? For biology to operate, it requires an operating system. What would drive molecules to evolve further on their own?
We give here a hypothesis. We describe a mechanism by which prebiotic undirected syntheses of short peptides could plausibly become ‘makers’, that is molecules that persistently make other molecules. We give a summary and perspective of recent computer modelling of a disorder-to-order process that achieves positive feedback against the forces of degradation. We start below by positing Darwinian evolutionary dynamics as a driven machine cycle of steps, because this vantage point illuminates principles about possible molecular origins.
Description of Darwinian dynamics as a cyclic machine
Darwinian Dynamics has three well-known features: (1) Replication, moms making more moms; (2) Mutation, search and discovery of sequence → function polymers that create new molecular entities and mechanisms and (3) Selection, competition-driven upratcheting of fitness. Fig. 1 expresses these properties in the form of a machine-like, biosphere-wide nonequilibrium (NEQ) cycle that we call the Darwinian Evolution Machine (DEM). Described in detail elsewhere (Kocher and Dill, Reference Kocher and Dill2023), the cycle operates from left to right. (a) At time $ t $ , $ X $ indicates a wild-type (status quo) population. (b) A mutation occurs in some individual. (c) That individual cell is grown up into a population $ Y $ . Populations $ X $ and $ Y $ compete for resources. (a(t + 1)) The winner becomes the new status quo wild-type population, thus becoming ‘remembered’ in the population, and gains more resources. The cycle repeats, driven by a persistent external supply of resources.
The steps (b) and (c) of population growth on the resources, competition and winning can be expressed as population-resource dynamics. For example, if we have $ N $ competitors $ {A}_n $ , labelled by their phenotype, fighting for one resource $ r $ ,
where $ g(r) $ describes the NEQ input of the resource, $ {D}_r(r) $ and $ {D}_n\left({A}_n\right) $ are decay/death terms, $ {U}_n\left({A}_n,r\right) $ is the total resource use rate by moms of type $ n $ , and $ {R}_n $ is the reproduction rate of one mom of type $ n $ given that mom eats resource at a rate $ {U}_n\left({A}_n,r\right)/{A}_n $ . Mutational discovery will introduce new moms, say $ {A}_{N+1} $ , which are then selected for or against by the DEM. The most competitive moms are ‘remembered’ by the DEM because they are good enough autocatalysts to maintain persistent populations. ‘Fitness’, which is the term that describes this competition-driven selection, is non-trivial to define, and can be model-dependent; see Supplementary Material SI.2 for more discussion.
Known biology constrains the mathematical form that is needed for the function $ U $ . Often, population genetics modelling approximates it as linear in both resources and moms, $ {U}_n\left({A}_n,r\right)\hskip0.35em =\hskip0.35em {rk}_n{A}_n $ . But, such linearity leads to ‘winner-take-all’ (WTA) dynamics (Volterra, Reference Volterra1928; Fisher, Reference Fisher1930; Gause, Reference Gause1934; MacArthur, Reference MacArthur1970; Hsu et al., Reference Hsu, Hubbell and Waltman1977; Tilman, Reference Tilman1982; Chesson, Reference Chesson1990; Lifson, Reference Lifson1997; Pross, Reference Pross2011; van Opheusden et al., Reference van Opheusden, Hemerik, van Opheusden and van der Werf2015), a dynamics that misses important features of evolution. Instead, evolution often gives ‘peaceful coexistence’ of multiple species on a given resource (Hutchinson, Reference Hutchinson1961; Armstrong and McGehee, Reference Armstrong and McGehee1980; Chesson, Reference Chesson2000; Chesson and Kuang, Reference Chesson and Kuang2008; Charlebois and Balázsi, Reference Charlebois and Balázsi2016; Barabás et al., Reference Barabás, D’Andrea and Stump2018; Goyal et al., Reference Goyal, Dubinkina and Maslov2018; Wang et al., Reference Wang, Fridman, Maslov and Goyal2022). Peaceful coexistence is captured using a saturating function (Beddington, Reference Beddington1975; DeAngelis et al., Reference DeAngelis, Goldstein and O’Neill1975; Novak and Stouffer, Reference Novak and Stouffer2021; Stouffer and Novak, Reference Stouffer and Novak2021; Kocher and Dill, Reference Kocher and Dill2023),
which simply expresses two natural limits, that maximum concentrations of moms are finite and that speeds of producing offspring are finite. What is novel in the present DEM perspective is the combining of this generalised form of population-genetics (Eq. (1)) with iterative mutation, competition and selection cycles (Kocher and Dill, Reference Kocher and Dill2023). In the following section, we extract four principles from this DEM perspective that helps us formulate possible molecular precursors in the next sections.
Features of evolution that are relevant for origins
The DEM model perspective illuminates what is needed for the origin of life. First, the DEM is a maker of makers, a process of moms creating more moms. The DEM is an autocatalytic set, or a collection of entities, each of which can be created catalytically by other entities, such that as a whole, the set is able to catalyse its own production (Hordijk, Reference Hordijk2019). An extensive literature describes the importance of autocatalytic sets in the origins of life (Eigen, Reference Eigen1971; Kauffman, Reference Kauffman1971, Reference Kauffman1986; Kauffman et al., Reference Kauffman1993; Dyson, Reference Dyson1999, Reference Dyson1982; Jain and Krishna, Reference Jain and Krishna2002; Hordijk and Steel, Reference Hordijk and Steel2014; Hordijk, Reference Hordijk2019; Hordijk et al., Reference Hordijk, Steel and Kauffman2022). The positive feedback of makers making makers contributes to the self-sustaining nature of evolution.
Second, environments that are unruly and fluctuating can sort winners from losers through booms and busts (Doebeli et al., Reference Doebeli, Jaque and Ispolatov2021; Wang et al., Reference Wang, Fridman, Maslov and Goyal2022). Booms and busts drive the recycling of resources, taking resources away from the losers and giving them to the winners, thus driving the rich to get richer on the road to autocatalysis.
Third, DEM Dynamics can sustain peaceful coexistence among multiple agents at the same time. In a world of winners-taking-all (WTA), without peaceful coexistence, evolution would have been brittle, always on the edge of extinction. If $ X $ is more fit than $ Y $ in environment $ {E}_1 $ , $ X $ would be the lone survivor in a WTA model. Now, if the environment fluctuates to $ {E}_2 $ , which kills $ X $ , then the whole ecosystem dies. Instead, in a world of coexistence, diversity preserves the ecosystem. An ecosystem that has an ensemble of backup moms is more robust to unpredictable new environments. Ensembles are crucial for long-term survival and persistence of the DEM.
And fourth, the biosphere-wide DEM is a driven machine: its cycles of molecule-making are powered by uptake of out-of-equilibrium resources from the environment. There are different tendencies for driven systems than for equilibrium processes. Some detail is given in Supplementary Material SI.1; here we just give a few examples. (1) A fluid subject to gravity will flow down a hill and stop in the valley at the bottom. But, a fluid subject to a strong force can flow beyond the valley to cross the next hill and beyond. (2) A TV set or computer performs intricate functions as long as it is ‘plugged in’. Its current flows are not predicted by principles of equilibrium, such as the Second Law. Such devices only tend to equilibrium when they are unplugged. Think about an electromagnet. A metal rod will not pick up nails, but when a current is driven around the bar, it will. An electromagnet is driven by the current input. Its action is a nonequilibrium (NEQ) force. That force goes to zero when the input current is turned off. (3) While equilibrium systems tend downhill in energy (or free energy), driven systems can also go uphill. Think of chemical reactions, like binding events or protein folding or molecular association or partitioning processes; under common circumstances, their stable states are predicted by tendencies towards minimum free energies. But, living systems have biochemical cycles, where uphill steps are driven by a coupling to downhill steps. The persistence of the DEM for 3.5 billion years is because biology has become so capable of exploiting the food, energy, and matter out-of-equilibrium aspects of its environments. How might the DEM have arisen from prebiotic molecular processes? Below are some of the key questions.
Puzzles about the molecular origins of the DEM
-
• What molecules were the first ‘makers?’ Today’s biological maker molecules are proteins and RNA, chain molecules that encode different functionalities as different monomer sequences. How did prebiotic polymers come to have sequence $ \to $ function relationships? What simple molecular process started producing self-sustaining maker molecules?
-
• How was molecule-making powered by external forces? How did molecule–making come to outcompete molecule degradation and become so sustainable?
-
• How did makers and catalysts become mobile, molecular-scale and editable? An enormously transformative event in the prebiotic transition from chemistry to biology was the transition from catalysts that were immutable macroscale surfaces to microscale mobile editable proteins. Current thinking is that prebiotic reactions were first catalysed by mineral or clay surfaces (Wächtershäuser, Reference Wächtershäuser1988), or interfaces (Holden et al., Reference Holden, Morato and Cooks2022), or in hot volcanic vents (Martin et al., Reference Martin, Baross, Kelley and Russell2008). Such catalysts are macroscopic, geographically immovable, and fixed in their single-reaction catalysis under fixed conditions. But cells need whole biochemical pathways, where multiple reactions are strung together to achieve complex chemistry. Each step has a tailored catalyst that provides precisely the right acceleration of precisely that reaction, is mobile and small enough to fit inside a cell, and functions in the same water solvent as all the other requisite catalysts at room temperature. As a metaphor, consider the importance in the Industrial Revolution of steam engines, which replaced immovable energy sources of rivers and waterfalls by power that was mobile and tailorable to circumstances. How did prebiotic catalysis ‘learn’ to become untethered from rigid macroscopics to become flexible mobile microscopic biopolymers?
-
• What was ‘fitness’ before there were cells? Biological organisms are self-serving. This is captured in the multi-faceted notion of fitness. In contrast, molecules are not self-serving. How would molecules start becoming selected for or against?
-
• Needles in haystacks and blind watchmakers: overcoming the infinitesimal probabilities. Life’s origin is often considered impossibly improbable, like finding a needle in a haystack, or finding a watch made by a blind watchmaker (Dawkins et al., Reference Dawkins1996). But those arguments are based on models that assume many improbable steps happen independently. There is a problem with those models. Life’s originating events were surely not independent: they were correlated. The key questions are: (1) What was the nature of those correlations? and (2) In what physical process does each step build on the advantages of the preceding steps to give cumulative long-term sustainability?
The case for proteins and the folding process
On the one hand, our view is that even the earliest life requires multiple components – functional molecules like proteins, informational molecules like RNA/DNA, encapsulation like lipids, and on-board energy like the ATP; see Supplementary Material SI.3 and Carter and Kraut (Reference Carter and Kraut1974) and Frenkel-Pinter et al. (Reference Frenkel-Pinter, Haynes, Mohyeldin, Sargon, Petrov, Krishnamurthy, Hud, Williams and Leman2020). On the other hand, our goal here is more modest, namely just to explain the roots of evolution-like dynamics, how molecules became makers, and how maker molecules developed sequence-to-function relationships.
In principle, the first sequence-to-function maker molecules could have been either RNA or proteins. The pros and cons of the RNA world hypothesis, that RNA came first, are discussed elsewhere (Joyce et al., Reference Joyce, Deamer and Fleischaker1994; Atkins et al., Reference Atkins, Gesteland and Cech2011; Robertson and Joyce, Reference Robertson and Joyce2012; Joyce and Szostak, Reference Joyce and Szostak2018; Wills and Carter, Reference Wills and Carter2018). Here, we postulate that proteins came first, both for reasons discussed in those references and because of the need to first establish some form of propagation dynamics. Here is a short summary. (1) Proteins are most of a cell’s mass, so the differential growth rates of cell evolution are largely a matter of differential protein production. (2) Proteins are today’s main maker molecules, catalysing the reactions of cell growth. (3) Proteins are unique in having sequence $ \to $ structure $ \to $ function relationships. Most other polymers, including most RNAs, do not. Proteins achieve their actions, functions and mechanisms by virtue of their native molecular structures. The folding code is primarily a hydrophobic (H) and polar (P) code, which other linear biomolecules do not have. Consequently, while some RNA molecules do fold uniquely and are catalysts, they are driven by different forces. Proteins are compact because they are dominated by hydrophobic tertiary interactions, whereas RNA molecules tend to be stringy because they are dominated by secondary-structure interactions of hydrogen bonding and base stacking. Moreover, because hydrogen bonds and base stacking are relatively sequence-independent, where chain slippage leads to many local minima in free energy, RNA folding landscapes are bumpier and less funnelled than protein landscapes (Chen and Dill, Reference Chen and Dill2000). Thus, even RNAs that actually have folded structures tend to have multiple ones, and those structures are only weakly specified by RNA sequences. (4) Proteins’ unique folded states make proteins good catalysts. Folded proteins are miniature solids. Being a solid is exactly what is needed to catalyse chemical reactions, because catalyst atoms need to hold their places long enough to assist the reaction. (5) A 20 amino acid alphabet spans a range of chemistries, so they catalyse a range of reactions. For these purposes, RNA molecules are not as good as proteins. Even where a given reaction can be catalysed by either proteins or RNAs, proteins are often better (Plaxco and Gross, Reference Plaxco and Gross2021). (6) While some RNA molecules can self-copy, those molecules would need to have very low error rates in order to persist (Eigen, Reference Eigen1971; Jeancolas et al., Reference Jeancolas, Malaterre and Nghe2020). The first copying machines would have to have had near-perfect fidelities. However, exact copying would be too brittle, for the same reasons we explained above that winner-takes-all (WTA) competitions are: without a way of generating diversity, exact copying is too prone to extinction in the face of environmental changes. In our view, prebiotic forces did not aim at self-copying; they aimed instead towards becoming autocatalytic sets, not strict autocatalysts. Variance is crucial. Progeny must not be identical to moms. The origins process must have some aspects of replication that are also to some degree unfaithful.
In terms of a dynamical process, protein folding has pertinent features. Protein folding entails a probabilistic needle-in-a-haystack search challenge through a disorder-to-order transformation. The folding search problem is now well understood in terms of funnel-shaped energy landscapes (Chan and Dill, Reference Chan and Dill1991; Dill and Shortle, Reference Dill and Shortle1991; Wolynes et al., Reference Wolynes, Onuchic and Thirumalai1995; Onuchic et al., Reference Onuchic, Luthey-Schulten and Wolynes1997; Wolynes, Reference Wolynes1997; Dill et al., Reference Dill, Ozkan, Weikl, Chodera and Voelz2007, Reference Dill, Ozkan, Shell and Weikl2008; Thirumalai et al., Reference Thirumalai, O’Brien, Morrison and Hyeon2010; Rollins and Dill, Reference Rollins and Dill2014; Nassar et al., Reference Nassar, Dignon, Razban and Dill2021). Fig. 2 compares a funnel landscape to a ‘golf-course’ landscape, which is premised on the assumption of uncorrelated independent events. ‘Funnel’ refers to the coarsest level of kinetic features, and not the potentially many finer-grained kinetic traps. Protein folding occurs so rapidly and towards such a unique ordered state because small random local steps combine together to lead effectively to the native state. In short, many proteins fold by rapidly finding needles in haystacks and creating complex watchmaker-like structures through small random correlated actions following combinatorially many microscopic routes via opportunistic chemical preferences. Protein folding gives both a metaphor for needle-in-a-haystack searching and a specific physical process, as described below, that could have become evolutionary dynamics.
Emergent autocatalysis from HP foldcats
Here is our hypothesis, first in overview, then in more detail. We postulate that prebiotic syntheses could produce short peptide chains, some of which collapse into compact structures in water because of their hydrophobic content. A fraction of those collapsed chains will have exposed hydrophobic surfaces, active as a primitive catalytic site, slightly accelerating the binding and elongation of other peptides. Computer simulations show that this mechanism leads to autocatalytic sets. The premise that amino acids could be produced and could polymerise into short random peptide chains under plausible prebiotic conditions is well-established (Miller and Urey, Reference Miller and Urey1959; Wächtershäuser, Reference Wächtershäuser1988; Botta and Bada, Reference Botta and Bada2002; Johnson et al., Reference Johnson, Cleaves, Dworkin, Glavin, Lazcano and Bada2008; Lambert, Reference Lambert2008; Ikehara, Reference Ikehara2014; Foden et al., Reference Foden, Islam, Fernández-Garca, Maugeri, Sheppard and Powner2020; Frenkel-Pinter et al., Reference Frenkel-Pinter, Haynes, Mohyeldin, Sargon, Petrov, Krishnamurthy, Hud, Williams and Leman2020; Muchowska et al., Reference Muchowska, Varma and Moran2020; Holden et al., Reference Holden, Morato and Cooks2022; Krasnokutski et al., Reference Krasnokutski, Chuang, Jäger, Ueberschaar and Henning2022).
However, existing peptide synthesis experiments do not explain how chains could have become long enough to fold and function like proteins; how they could become catalysts and makers; how the process could become autocatalytic; how they could give non-random sequence $ \to $ structure relationships; how catalysis became mobile; or what are the molecular origins of fitness. We address these below.
Fig. 3 illustrates the chain elongation challenge. Typical polymer syntheses give mostly only short chains that are not long enough to fold and function as today’s proteins do. However, it has been found in computer modelling that some heteropolymers behave differently (Guseva et al., Reference Guseva, Zuckermann and Dill2017). Chains that have particular sequences of hydrophobic (H) and polar (P) types of monomers, called HP polymers, collapse in water into compact states due to the hydrophobic effect. Even some relatively short sequences can collapse. Here, we call those chains foldamers. Furthermore, a small fraction of foldamer sequences can act as primitive catalysts, described below.
HP chains can fold, catalyse and elongate
Fig. 4 shows that HP chain molecules have three general classes of behaviour in water, depending on their sequence of H and P monomers. (1) Some chains do not fold at all (think of the all-P sequence, for example). (2) Some HP sequences are foldamers, compact with hydrophobic cores. And (3) a fraction of HP foldamers happen to have surface patches that are concentrated in hydrophobic monomers; we call these surface regions ‘landing pads’, because these are regions that are sticky for other hydrophobic molecules floating in solution, Landing pads can be regions of catalysis. We call collapsed chains having landing pads foldcats, short for foldamer-catalysts.
These landing pads on foldcats could catalyse the covalent elongation of other ‘client’ HP sequences. The mechanism of this catalysis process is shown in Fig. 5. Each foldcat sequence balls up, leaving a sticky spot (clustered H monomers) on its surface. A different peptide chain, call it a client, lands with its H monomers binding hydrophobically to the landing pad of the foldcat. A free H monomer from solution also lands on the landing pad. The spatial colocalization of the H monomer adjacent to the client chain can reduce the kinetic barrier to elongation of the client chain. The foldcat’s job is to keep all the required pieces for elongation (the growing chain and a free monomer) in the same place. Peptide bond formation has a transition state barrier of 18 kcal mol−1 (Gindulyte et al., Reference Gindulyte, Bashan, Agmon, Massa, Yonath and Karle2006). Spatial localization of two reactants, often called proximity effects or enhanced effective concentrations, is known to accelerate covalent bond formation reactions by as much as $ {10}^8 $ (Menger and Nome, Reference Menger and Nome2019). For illustration, we have supposed only a binary code and only hydrophobicity-based landing pads. More realistically, a code will have more than two amino acid types, and more diverse interactions. The expectation that prebiotic peptides would have had both H and P amino acids is supported by the Miller–Urey experiment and recent variants of it (Miller and Urey, Reference Miller and Urey1959; Botta and Bada, Reference Botta and Bada2002; Johnson et al., Reference Johnson, Cleaves, Dworkin, Glavin, Lazcano and Bada2008). A proposed minimal set would be GADV peptides (Ikehara, Reference Ikehara2014), although there would be value in including cysteine (Foden et al., Reference Foden, Islam, Fernández-Garca, Maugeri, Sheppard and Powner2020), and lysine or arginine for breadth of chemistry and control of aggregation.
This HP foldcat mechanism has recently been observed and explored in computer simulations (Guseva et al., Reference Guseva, Zuckermann and Dill2017). First, note that all these effects would likely have been almost negligibly small at first. Foldamers constitute only a fraction of all HP sequences; foldcats constitute an even smaller fraction; and colocalization-based rate enhancements are unlikely to be greater than a few $ \mathrm{kT} $ in free energy (based on hydrophobicity estimates). But, it is not the smallness of populations or actions that matter. Rather, it is whether one step to the next entails some form of systematic positive cooperativity. What matters for origins (as well as for evolution in general) is whether some sub-population, even a very small one, is capable of some action – call it emergent behaviour – of positive feedback, so that it grows relative to other sub-populations, ultimately overcoming the relatively fixed forces of degradation. In general, a big challenge in origins-of-life research is that the initial seeding event is likely to be a very small signal in very large noise – precisely the sort of event for which devising a good experiment is difficult. Below, we describe how the HP foldcat mechanism predicts such emergent behaviours.
How the folding process leads to the evolution process
Here are the emergent behaviours of the HP folding and catalysis mechanism.
-
• Emergence of makers, catalysts and molecular functionalities. From the short random peptides that are plausibly synthesised prebiotically, the HP foldcat mechanism produces longer chains; see Fig. 6a. On average, longer HP chains are more stably folded and more protein-like (because they bury more hydrophobic surface). So, as long as amino acids are input, the HP foldamer mechanism pushes from peptides towards proteins, creating more catalytic power and functional diversity. These foldcats are makers that make makers. An alternative mechanism proposed for chain elongation is templated ligation, but it requires enzyme assistance (Tkachenko and Maslov, Reference Tkachenko and Maslov2015; Kudella et al., Reference Kudella, Tkachenko, Salditt, Maslov and Braun2021).
-
• Emergence of sequence-to-function relationships. This mechanism amplifies the populations of foldamers and foldcats, simply because foldcats are a larger proportion of longer-chain sequences. Foldcats form an autocatalytic set; Fig. 6b. Such situations, where some sequences are populated selectively relative to other sequences based on their functionalities, are the basis for sequence-to-function relationships.
-
• Emergence of programmable mobile molecular machines. Presumably, the first prebiotic peptides were synthesised on macroscale catalysts, fixed in space and inflexible in their actions. But, the foldcat mechanism then produces its own catalysts, poor at first and better later. This untethers the peptide catalysis process from fixed spaces. Now, catalysts are at the microscale: they are mobile; and they are diverse and programmable by virtue of the sampling of sequence space. We regard this untethering, from macro to micro, from fixed to mobile, to have been a transformative step from prebiotic chemistry to biology.
-
• Emergence of adaptation. Arguably, evolution’s central principle is that organisms adapt to environments. Evolution’s great power of innovation and resourcefulness comes from its mutational search, competition, and fitness-based selection. The HP foldamer perspective posits that such adaptivity could have originated from a disorder-to-order process, in which chain molecules sample different sequences; molecules compete for limited resources; and winners are those that are more stable and get more resources.
-
• What is ‘fitness’ among molecules? First, just persistence. Darwinian evolution chooses winners and losers based on fitness ratcheting. What are winners and losers in a prebiotic world of molecules? HP chains persist in stably folded states for longer or shorter times, based on their sequences. Longer chains are more stable because they bury more hydrophobic residues upon folding, and because compactness limits access to chemical agents that hydrolyze proteins. In unruly environments of booms and busts, molecules that are more stable persist by scavenging the recycled monomers and peptides from molecules that are less stable.
-
• Emergence of a tipping point from error catastrophes to success catastrophe. Prebiotic molecules are subject to degradation. Error catastrophes are unavoidable in direct-replicator mechanisms (Eigen, Reference Eigen1971; Jeancolas et al., Reference Jeancolas, Malaterre and Nghe2020). Short peptides will hydrolyze to monomers. The origin of life was a tipping point from error catastrophes (where degradation dominates), to a ‘success catastrophe’, where maker molecules establish persistent populations. Beyond this point, evolution and growth then prevail over degradation. Three factors explain this tipping point in the HP foldamer model: (1) Autocatalysis. As noted above, peptides grow longer, more stably folded, and form an autocatalytic set. This contributes positive cooperativity towards self-sustainability. (2) A driven machine. Like a TV set that is ‘plugged in’, the HP foldamer mechanism is driven by a persistent input. The input is amino acids (and at early stages, also a catalyst of peptide synthesis). It does not matter that most product peptides fail and degrade; being ‘plugged in’ means that the system keeps pumping to push the chain lengths higher. (3) Adaptivity. Autocatalysis and input power alone are not sufficient. Environments are unruly. Biology would not have survived without adaptability to changing environments. The combination of these factors contribute to a drive to ratchet up persistence over time; see Fig. 7.
To summarise, the HP foldamer mechanism explains how peptide synthesis and folding could result in the emergence of evolution-like propagation; see Table 1. But to be clear, we regard this not as the origin of life itself, but rather only as a precursor to it. Origins surely required much more than this: cell-like encapsulation, information and heritability, and more (some further discussion is given in the Supplementary Material).
Evidence supporting the HP foldcat mechanism
Although there is no direct experiment testing this foldcat mechanism, several of its components are supported by experiments. HP chains can fold and function. The binary HP code dominates protein folding (Lim and Sauer, Reference Lim and Sauer1989; Bowie et al., Reference Bowie, Reidhaar-Olson, Lim and Sauer1990; Kamtekar et al., Reference Kamtekar, Schiffer, Xiong, Babik and Hecht1993; Dill et al., Reference Dill, Bromberg, Yue, Chan, Ftebig, Yee and Thomas1995; Dill and MacCallum, Reference Dill and MacCallum2012; Koga et al., Reference Koga, Yamamoto, Kosugi, Kobayashi, Sugiki, Fujiwara and Koga2020). But also, a biomolecule backbone is not required; HP peptoids can fold and function too (Lee et al., Reference Lee, Zuckermann and Dill2005; Yoo and Kirshenbaum, Reference Yoo and Kirshenbaum2008). Some random peptide sequences can fold. It is not an infinitely dilute space of sequences that can fold. As discussed in Guseva et al. (Reference Guseva, Zuckermann and Dill2017), for HP chains up to length 25, 2.3% fold to unique structures and 12.7% of those foldamers, or 0.3% of all sequences, have the foldcat catalytic surface. Peptide syntheses occur naturally. Even in interstellar space, 6–8-mer peptides have been found (Kebukawa et al., Reference Kebukawa, Asano, Tani, Yoda and Kobayashi2022; Krasnokutski et al., Reference Krasnokutski, Chuang, Jäger, Ueberschaar and Henning2022). Sea spray or air-water surfaces could catalyse small peptide formation (Griffith and Vaida, Reference Griffith and Vaida2012; Deal et al., Reference Deal, Rapf and Vaida2021; Holden et al., Reference Holden, Morato and Cooks2022). Some short peptides can catalyse reactions (Adamala and Szostak, Reference Adamala and Szostak2013; Rufo et al., Reference Rufo, Moroz, Moroz, Stöhr, Smith, Hu, DeGrado and Korendovych2014). Hydrophobic patches are common on proteins (Lijnzaad et al., Reference Lijnzaad, Berendsen and Argos1996; Tonddast-Navaei and Skolnick, Reference Tonddast-Navaei and Skolnick2015), which we call landing pads in the foldamer mechanism. Some proteins are synthesised without ribosomes (Finking and Marahiel, Reference Finking and Marahiel2004; Miller and Gulick, Reference Miller and Gulick2016).
Outlook: from protein folding to evolution
We have posited that the origins of life could not have arisen without first a Darwin-like propagation mechanism. We believe function came before information, because we know of no driving force for the reverse. Rather than genes using proteins to make new genes (as in the Selfish Gene hypothesis Dawkins, Reference Dawkins1976), our view is that proteins use genes to make new proteins. And, the foldcat mechanism indicates a way that the middleman – the gene – was simply not needed at first. This mechanism is based on solution physics – the oil–water and hydrogen-bonding forces of protein folding, the ability of miniature solids having different chemical moieties to catalyse reactions, and the ability of random syntheses to find and retain useful sequences based on their persistences. The foldcat mechanism addresses an important problem of origins research: it does not require a guiding hand of a researcher who chooses molecules, systems or processes. Instead, the foldcat mechanism is a disorder-to-order transition that bootstraps functional advantages that it finds from random search.
Open peer review
To view the open peer review materials for this article, please visit http://doi.org/10.1017/qrd.2023.2.
Supplementary materials
To view supplementary material for this article, please visit http://doi.org/10.1017/qrd.2023.2.
Acknowledgements
We thank Luca Agozzino and Gabor Balazsi for early discussions and Charlie Carter for extensive insightful comments. We are grateful to the Templeton Foundation and the Laufer Center for their support.
Author contribution
All authors contributed equally to this work.
Financial support
This work was supported by the John Templeton Foundation (grant ID 62564).
Competing interest
The authors declare none.
Comments
Comments to Author: The Authors Present Their Work “Origins Of Life: First Came Evolutionary Dynamics” where they present a Darwinian Evolution Machine (DEM) that explains the evolution of a population where mutations are possible, which in turn create differential populations that will compete for resources and will lead the system to have one winner population that takes the system to a new equilibrium where that population wins and becomes dominant. However, it is important to note that their DEM takes into account the possibility of peaceful coexistence between different populations instead of a winner-takes-all model. Peaceful coexistence would provide evolution with the robustness needed to be possible, otherwise extinction would have been most likely the rule. In addition, the DEM is not a closed system. Instead the DEM self-sustains due to the uptake of external resources.
Given the previous definitions the authors navigate in the description of which types of molecules would have had the necessary properties to be the protagonists in such a DEM model.
I think that the introduction is very well structured as well as the initial explanations that lead the author to the understanding of the properties the DEM, first makers need to have.
I think that the logical steps to explain how certain molecules become makers and these in turn develop sequence to function relationships is really appealing. It is also very interesting that the hypothesis that RNA and its ability to replicate is not the important property for this beginning of makers’ emergence but the ability of dynamic propagation by autocatalytic molecules. Furthermore the way in which the authors link the previous introduction of terms and concepts with the funneled energy landscapes theory is very nice.
To be honest, I have liked the paper a lot and since it is more of an hypothesis that builds up on several of the authors’ previous publications I do not have much to correct.
I do have some questions though.
The authors say that the evolutionary landscape that led to the initial pre-protein molecules was somehow funneled and not a golf-course. What I understand from this analogy is that if the landscape, because of the biophysical properties of the early makers, was already funneled and therefore success was nearly inevitable because of that (the authors say in Fig2 legend that they believe evolution is more funnel-like). If this is true, do you think then that the probability of having protein-like systems in planets like planet earth are almost certain, given that that would already set up the conditions to funnel evolutionary landscapes? What the consequences of this theoretical framework would be for the likely development of life or at least of early makers outside of earth?
The authors provide an explanation of the emergence of foldamers and foldcats based on a hydrophobic and polar set of amino acids. Can the authors maybe hypothesize or reflect on what kind of alphabet would be needed and compatible with the emergence of foldamers and foldcats. Do they have any preference towards some study that has hypothesized about the size of the primitive alphabet for when proteins emerged?
Minor comments
There is a mention to Fig3 which has panels a and b. But there is no mention of the individual panels. Panel b is difficult to understand. I found that a similar figure is reported in Guzeva et al, PNAS 2017. There the figure is better explained (also there is another panel) where the colors are meaningful. I would improve that figure panel to make it easier to understand. I would also place the y-axis next to the y-axis instead of in the corner. I was confused for a bit about if it was a title or an axis label.
In fig 4. There is no mention of the non folders in the legend as there are for the others.
Both Figs4 and 5 would benefit from saying which color is the H and which one is the P type of amino acid. I know it is a conceptual figure, but still I tried to mentally make sense of it and it became hard for a while.
Regarding Fig5 and the explanation from lines 186 to lines 193.. I did not fully understand it. I kind of understand the mechanism but somehow some details are escaping from my understanding. In particular I don’t understand how “These landing pads on foldcats could catalyze the COVALENT elongation of other client HP sequences. Maybe a better description in the conceptual schema form Fig5 would help or a modification of Fig5 to make it clearer.
In Fig6. There is no explanation to the labels in panel b (Agr, Au, Af…, etc). What are they?
This is just a suggestion that in my opinion would close the paper in a nicer way. I am missing a connection between the title and the latest part of the OUTLOOK section. The title states that “First Came evolutionary dynamics”. Although I understand what the authors refer to along the manuscript.. I final punch going back to that concept in a more explicit way would be really nice to close the paper.
There is a repeated “the” at line 117 in page 4.
Fig8 appears in the supplementary. Maybe it should be named FigS1?
The rest of the paper after Fig5 and its reference goes very smoothly and is nicely structured making its interpretation really straight forward.