Hostname: page-component-586b7cd67f-tf8b9 Total loading time: 0 Render date: 2024-11-21T23:54:04.192Z Has data issue: false hasContentIssue false

Resolving the fine structure in the energy landscapes of repeat proteins

Published online by Cambridge University Press:  10 June 2022

Murilo N. Sanches
Affiliation:
Department of Physics, Institute of Biosciences, Humanities and Exact Sciences, São Paulo State University (UNESP), São José do Rio Preto, SP 15054-000, Brazil
R. Gonzalo Parra
Affiliation:
Barcelona Supercomputing Center (BSC), Barcelona, Spain
Rafael G. Viegas
Affiliation:
Department of Physics, Institute of Biosciences, Humanities and Exact Sciences, São Paulo State University (UNESP), São José do Rio Preto, SP 15054-000, Brazil Federal Institute of Education, Science and Technology of São Paulo (IFSP), Catanduva, SP 15.808-305, Brazil
Antonio B. Oliveira Jr.
Affiliation:
Center for Theoretical Biological Physics, Rice University, Houston, TX, USA
Peter G. Wolynes
Affiliation:
Center for Theoretical Biological Physics, Rice University, Houston, TX, USA
Diego U. Ferreiro*
Affiliation:
Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales, C1428EGA Buenos Aires, Argentina
Vitor B.P. Leite*
Affiliation:
Department of Physics, Institute of Biosciences, Humanities and Exact Sciences, São Paulo State University (UNESP), São José do Rio Preto, SP 15054-000, Brazil
*
*Author for correspondence: Diego U. Ferreiro, E-mail: [email protected] *Vitor B.P. Leite, E-mail: [email protected]
*Author for correspondence: Diego U. Ferreiro, E-mail: [email protected] *Vitor B.P. Leite, E-mail: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Ankyrin (ANK) repeat proteins are coded by tandem occurrences of patterns with around 33 amino acids. They often mediate protein–protein interactions in a diversity of biological systems. These proteins have an elongated non-globular shape and often display complex folding mechanisms. This work investigates the energy landscape of representative proteins of this class made up of 3, 4 and 6 ANK repeats using the energy-landscape visualisation method (ELViM). By combining biased and unbiased coarse-grained molecular dynamics AWSEM simulations that sample conformations along the folding trajectories with the ELViM structure-based phase space, one finds a three-dimensional representation of the globally funnelled energy surface. In this representation, it is possible to delineate distinct folding pathways. We show that ELViMs can project, in a natural way, the intricacies of the highly dimensional energy landscapes encoded by the highly symmetric ankyrin repeat proteins into useful low-dimensional representations. These projections can discriminate between multiplicities of specific parallel folding mechanisms that otherwise can be hidden in oversimplified depictions.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Introduction

The structural domains that are visible in X-ray crystal structures of proteins are often thought of as modules that can fold, function and evolve independently. Nevertheless, large proteins made up of tandem repetitions of apparently modular structure do not fold by independently organizing those modules, but rather the modules cooperate in stabilizing structural intermediates that comprise several repeat units (Paladin et al., Reference Paladin, Bevilacqua, Errigo, Piovesan, Mičetić, Necci, Monzon, Fabre, Lopez, Nilsson, Rios, Menna, Cabrera, Buitron, Kulik, Fernandez-Alberti, Fornasari, Parisi, Lagares, Hirsh, Andrade-Navarro, Kajava and Tosatto2020). Repeat proteins can be classified into many different categories, based on the length of their repeating units, the type of secondary structure elements of which they are composed and their overall architecture. A specific class of repeat proteins, the so-called solenoids, is constructed from 20 to 40 similar amino acid stretches that fold up into elongated architectures of stacked repeating structural motifs. In Fig. 1, we show three examples of alpha solenoids, from the ankyrin (ANK) repeat protein family. Proteins in this family are made up of a variable number of repetitions of a 33-residue-length structural motif (Parra et al., Reference Parra, Espada, Verstraete and Ferreiro2015). For these proteins, exactly how to separate the structure into ‘domains’ is not obvious in a mechanistically correct way (Parra et al., Reference Parra, Espada, Sánchez, Sippl and Ferreiro2013; Espada et al., Reference Espada, Parra, Sippl, Mora, Walczak and Ferreiro2015). A coarse representation of ankyrin repeat proteins as quasi-1D objects has, however, yielded surprisingly rich insights into their folding dynamics (Petersen and Barrick, Reference Petersen and Barrick2021). The one-dimensional representation of the stabilization mechanism of this class of repeat proteins arises because they are stabilized only by interactions within each repeat and between neighbouring repeats, there being no obvious contacts between residues much more distant in sequence. While ankyrin repeat proteins can be pictured as elongated objects that can be broken down to repeat units, the precise mechanism of the folding of the array reflects subtle balances and imbalances between the energetics within the repeats and the interaction between repeats (Ferreiro et al., Reference Ferreiro, Walczak, Komives and Wolynes2008). For many natural repeat proteins, it has been shown that weakening the energetic links between repeats leads to the breakdown of cooperativity and the appearance of folding subdomains within an apparently regular repeat array (Aksel et al., Reference Aksel, Majumdar and Barrick2011). In general, the folding mechanisms are defined by an initial nucleation in some region of the repeating array and the propagation of structure to their near neighbours. When the local energetics are similar along the assemblage, parallel folding routes can be identified (Werbeck and Itzhaki, Reference Werbeck and Itzhaki2007; Aksel and Barrick, Reference Aksel and Barrick2014) and the routes can be switched by (de)stabilising regions along the array (Tripp and Barrick, Reference Tripp and Barrick2008; Werbeck et al., Reference Werbeck, Rowling, Chellamuthu and Itzhaki2008). Thus, the energy landscapes of repeat proteins can be very rich and amenable to design (Galpern et al., Reference Galpern, Marchi, Mora, Walczak and Ferreiro2022). Most importantly, in various cases, the detailed folding mechanism of repeat arrays has been identified to play a major role in their biological function (Löw et al., Reference Löw, Weininger, Neumann, Klepsch, Lilie, Stubbs and Balbach2008; Barrick, Reference Barrick2009; Ferreiro and Komives, Reference Ferreiro and Komives2010; Kumar and Balbach, Reference Kumar and Balbach2021). How can we picture the energy landscapes of these systems?

Fig. 1. Representation of the tertiary structures of Ankyrin repeat proteins. The high-resolution structures of the proteins studied are shown in ribbons coloured in blue to red from N to C termini. 3ANK corresponds to PDB 1N0QA, 4ANK with PDB 1N0R and 6ANK with PDB 1NFIE.

The notion of protein folding Energy Landscapes has illuminated many of the mysteries of how proteins fold. Based on solid theoretical arguments, the synergy of experiments and simulations has shown that the folding of globular proteins occurs on landscapes that can be described as rough funnels (Wolynes et al., Reference Wolynes, Eaton and Fersht2012), where both the solvent averaged free energy of configurations and their entropy decrease nearly in parallel as structures ever more closely approximate the native state. This correlation arises from the ‘Principle of Minimal Frustration’ (Bryngelson and Wolynes, Reference Bryngelson and Wolynes1987), which argues that the native interactions of evolved proteins are on average stronger than competing possible non-native interactions. The fact that simple structure-based models of protein folding can recapitulate even detailed features of protein folding mechanisms, such as φ-values, further supports this view (Clementi, Reference Clementi2008). Still, both topological frustration and energetic frustration that accompany functional constraints, such as binding and catalysis, play roles in the folding mechanism of several globular protein systems (Ferreiro et al., Reference Ferreiro, Komives and Wolynes2018).

It seems the energy landscapes of repeat proteins follow the funnelling criterion, both overall (Mello and Barrick, Reference Mello and Barrick2004; Ferreiro et al., Reference Ferreiro, Walczak, Komives and Wolynes2008) and in the folding of consecutive units and individual units, thus suggesting a landscape of funnels within funnels (Ferreiro and Komives, Reference Ferreiro and Komives2007). The fact that repeat proteins can be treated as quasi-one-dimensional objects, however, weakens the necessity for a deeply funnelled landscape, as conflicting interactions that may arise from the frustration of interactions far distant in sequence are not as predominant as in globular domains. Simple structure-based models of repeat-protein folding have been shown to predict folding behaviour consistent with the overall behaviour of repeat arrays seen in the laboratory (Ferreiro et al., Reference Ferreiro, Cho, Komives and Wolynes2005; Barrick et al., Reference Barrick, Ferreiro and Komives2008). More complex models, such as all-atom simulations, have been applied to study the folding of ankyrin repeat proteins under force (Serquera et al., Reference Serquera, Lee, Settanni, Marszalek, Paci and Itzhaki2010). In these, parallel routes, intermediates, and partial folding of repeats have been described and characterised, and these may be stabilised by non-native interactions. Moreover, high-temperature unfolding of designed ankyrin repeat proteins has been analysed and used to redesign the terminal repeats (Interlandi et al., Reference Interlandi, Wetzel, Settanni, Plückthun and Caflisch2008). Since we want to visualise a large fraction of the energy landscape, we use here the coarse-grained AWSEM model (Davtyan et al., Reference Davtyan, Schafer, Zheng, Clementi, Wolynes and Papoian2012), and we make use of a novel energy-landscape visualisation method (ELViM; Oliveira et al., Reference Oliveira, Yang, Whitford and Leite2019) to analyse the folding of repeat arrays in the ankyrin family.

Given a dataset of sampled conformations classified using an appropriate metric, ELViM first calculates a matrix comprised of the internal distances between every pair of sampled conformations. Ideally, these distances should correlate with energetic changes on the landscape. This matrix, which represents the dataset in the high-dimensional phase space, is then projected onto an effective 2D or 3D phase space, which preserves in an optimal way these distances. Since folding is most facile between similar structures, ELViM provides a meaningful visualisation of the folding mechanism. In this article, we apply ELViM to explore the energy landscapes of three Ankyrin Repeat Proteins having each 3, 4 and 6 repeats (Fig. 1) and document ELViM’s ability to capture the dynamic energetic behaviour of the different repeats and the interactions among them as the protein navigates its conformational space. ELViM by faithfully visualising the vast information in the content of molecular simulations offers a powerful tool to study repeat-protein folding mechanisms.

Methods

Simulations details

We performed coarse-grained molecular dynamics simulations using the AWSEM-MD suite (Davtyan et al., Reference Davtyan, Schafer, Zheng, Clementi, Wolynes and Papoian2012). We used a structure-based model called AMH-Go that includes a nonadditivity term that allows for a more realistic simulation of the cooperativity among native interactions (Eastwood and Wolynes, Reference Eastwood and Wolynes2001). By virtue of being a structure-based model, this landscape is perfectly funnelled. The measure of similarity between structures k and l used here is based on internal distances between amino acids, and it is given by

(1)$$ {q}_{k,l}\hskip0.35em =\hskip0.35em \frac{1}{N_p}{\sum}_{i,j\in \mathrm{pairs}}\exp \left[\frac{-({r}_{i,j}^k-{r}_{i,j}^l{)}^2}{2{\sigma}_{i,j}^2}\right], $$

where Np is the total number of pairs of residues, $ {r}_{i,j}^k $ ($ {r}_{i,j}^l $) is the distance between the residues i and j in the conformation k(l) and σ i,j is the Gaussian standard deviation and it accounts for the increasing variances when one considers residues far from each other along the primary sequence. It is defined as σ i,j = σ 0|i − j|ε, with σ 0 = 1 Å and ε = 0.15, as used in previous studies (Lätzer et al., Reference Lätzer, Papoian, Prentiss, Komives and Wolynes2007). qk,l is normalised and unitless. The energy landscape will locally correlate with this relative q distance measure. The particular similarity between a structure k and the native state n of the entire protein, qk,n, can be used as an approximate reaction coordinate, and it is referred to as the global reaction coordinate or Qw. The sum in Eq. (1) when carried out over a small subset of residues, such as a single repeat, defines a local coordinate $ {Q}_W^{\mathrm{Rn}} $, and when the sum is associated with only a single residue i, $ {Q}_W^i $. Another coordinate that is well established and often used as a reaction coordinate is the fraction of native contacts, Qo (Best et al., Reference Best, Hummer and Eaton2013), defined as

(2)$$ {Q}_o(X)\hskip0.35em =\hskip0.35em \frac{1}{N_s}{\sum}_{i,j\in S}\frac{1}{1+\exp \left[{\beta}^0\left({r}_{ij}(X)-\lambda {r}_{ij}^0\right)\right]}, $$

where rij(X) is the distance between the residues (i, j) in a conformation X, $ {r}_{ij}^0 $ is the distance between the amino acid in the corresponding pair in the native state, S is the set of all pairs of native contacts (i, j) from the native structure, NS is the number of pairs in S, β 0 is a smoothing parameter and λ is a factor that takes into account the fluctuations of the contacts. In this work, we consider any pair of atoms as being in contact when the partners are more than three residues apart along the chain and if $ {r}_{ij}^0 $ < 1.2 nm. The parameters in Eq. (2) were taken to be β 0 = 50 nm−1 and λ = 1.2 nm, as suggested for AWSEM model (Habibi et al., Reference Habibi, Rottler and Plotkin2016).

To estimate the folding temperature TF, we performed melting simulations heating the systems from a low temperature (300 K) to sufficiently high temperatures to ensure unfolding (800 K). The temperature variations were made along 10 million steps, with a time step of 3 fs. During this process, due to the cooperativity among native interactions, a sharp transition occurs in the reaction coordinate QW as a function of temperature, where the system goes from largely folded to largely unfolded states. TF is estimated to be near the midpoint of the transition between these states.

We then performed constant temperature simulations at the estimated TF using the Umbrella Sampling method (Torrie and Valleau, Reference Torrie and Valleau1977), and using QW as the global reaction coordinate. In this method, the reaction coordinate range is divided in a number of consecutive sampling windows, which are centred at different values of the reaction coordinate. In each window, a biasing potential is added to maintain the protein inside the window, enhancing the sampling near specific values of the reaction coordinate. By using this method, it is possible to sample at low Qw values, which would be unreachable by standard (unbiased) simulations. In this work, we divided the Qw sampling interval in 40 equally spaced windows and explored the conformational space with simulations of 10 million steps in each of the windows with a time step of 3 fs. For an intuitive understanding of its meaning, Qw = 0.25 corresponds to already random looking structures, whereas Qw = 0.7 corresponds to structures that typically are just a few angstroms RMSD from the native structure (Schafer et al., Reference Schafer, Kim, Zheng and Wolynes2014). Finally, the 40 simulations are integrated using the weighted histogram analysis method (Kumar et al., Reference Kumar, Rosenberg, Bouzida, Swendsen and Kollman1992) to obtain the thermodynamic parameters as the Free-energy values, projected in different coordinates like Qw, Qo and radius of gyration (Rg).

To investigate the most preferred folding routes, we also performed unbiased simulations that were carried out using the AWSEM-MD (Davtyan et al., Reference Davtyan, Schafer, Zheng, Clementi, Wolynes and Papoian2012), with a time step of 3 fs at the TF. For the 3ANK and 4ANK, three replicas of the dynamics starting from the folded structure were performed over 3 × 109 steps.

Energy-landscape visualisation method

ELViM uses qk,l to describe the distance between all pairs of conformations in the real multidimensional phase space, which is given by δk,l = 1 − qk,l (Oliveira et al., Reference Oliveira, Yang, Whitford and Leite2019). Starting with an ensemble of structures obtained from the simulations, ELViM applies this metric for each pair of conformations, in order to build a dissimilarity matrix $ \overline{\overline{\unicode{x03B4}}} $. Next, these data may also be processed so that sufficiently similar conformations (classified by the metric) are clustered into a single representative conformation. The goal in this step is to decrease the number of conformations to be visualised. Finally, a multidimensional projection is obtained that represents each conformation in a reduced 2D effective phase space, obtaining distances given by a new distance matrix dk,l. The method aims to minimise |δk,l − dk,l|, for all pairs of conformations k, l. In other words, the computation of the optimal projection consists of a minimisation procedure in which all the distances between clusters in the real multidimensional phase space δk,l are made to correspond to equal distances dk,l in a 2D phase space. This type of method is generally known as a multidimensional scaling method (Cox and Cox, Reference Cox and Cox2000; France and Carroll, Reference France and Carroll2011). In this analysis, it is possible to visualise and identify the routes and transition states between free-energy barriers, mapping the trajectories without the need for a reference conformation or reaction coordinates.

Results and discussion

3ANK: The simplest and well-behaved funnel

The results for the 3ANK (PDB ID = 1n0q) protein, as shown in Fig. 2a, are consistent with a two-state folding process, with a free-energy barrier of 19.3 kT at Qo ≈ 0.35, with the folded state at Qo ≈ 0.85. In order to investigate the folding mechanism, we have analysed how the value of Qo for each residue evolves as a function of the reaction coordinate, the global Qo (Fig. 2b). In this figure, the ordinate is the global Qo, whereas the colourmap shows the local Qo, which is calculated by averaging the local Qo for each residue over the entire set of conformations with a given global Qo. It is possible to see that the native interactions are not symmetrically distributed. Native contacts begin to be made by some residues in repeats R1 and R2 as the Qo approaches the transition state value of around 0.3. As the global Qo increases, the repeat R3 starts to fold at Qo ≈ 0.6. Fig. 2a,b shows that the protein undergoes an apparent two-state transition, with the nucleation starting in some regions of R2 and in parts of the N-terminal region. It should be noted, however, that the local Qo of some residues in R1 slightly decreases when the global Qo varies over the range of 0.6–0.8. Different mechanisms could be compatible with these observations, for example, multiple, parallel pathways, backtracking or kinetic traps. From the free-energy profile with respect to Qo alone, however, it is not possible to discriminate among these mechanisms.

Fig. 2. Analysis of the biased trajectory of 3ANK. (a) Two-dimensional free-energy profile as a function of Qo for 3ANK. (b) Degree of folding of each residue as a function of Qo. The colour indicates the average local Qo for a specific residue over the entire set of structures of a given global Qo for 3ANK. (c) Conformational phase space of 3ANK visualised through energy-landscape visualisation method (ELViM) as a function of Qo. (d) Three-dimensional ELViM projection of the 3ANK biased trajectories in function of the dynamic total energy with the unbiased trajectory structures highlighted in black and the Z-axis corresponding to the Qo coordinate.

The energy landscape of this protein analysed using ELViM results in an effective 2D phase space, as shown in Fig. 2c. Each point in this projection corresponds to a sampled conformation and is coloured according to its Qo value (see also Supplementary Fig. 1). It can be seen that the unfolded conformations populate the left upper region of the projection (dark blue), whereas the folded basin is made up of configurations clustered in the red right central portion of the projection (region I). According to the free-energy profile in Fig. 2a, transition states have a Qo value of about 0.3 and are represented by light blue dots. These transition conformations now divide up into two distinct paths, linking the folded and unfolded basins by the upper (region III) or alternatively through the lower part (region II) of the projection. We can see that there is a greater density of states in the lower part of the projection (region II), suggesting this is the more favourable folding route. For the purpose of discriminating these regions, we calculated mean contact maps for selected conformations that belong to each of these regions (available in Supplementary Figs 2 and 3). These contact maps show that region III is made up of structures in which repeats 2 and 3 are partially folded, whereas region II is composed of structures in which repeats 1 and 2 are partially folded. This analysis also reveals that these transition states have nearly the same global Qo in spite of being quite structurally different (Fig. 2c and Supplementary Fig. 3).

Next, we examined whether region II is indeed the kinetically more favourable folding pathway by running a set of unbiased simulations with AWSEM-MD and reconstructing the 2D projection containing structures from both the biased and the unbiased simulations (available in Supplementary Fig. 1c). Although the biased simulation shows a broad exploration of the structural phase space, the unbiased sampling shows a clear kinetically preferred pathway towards the native state using region II.

In addition, we have plotted a 3D representation of this effective phase space by adding a third axis, corresponding to the global Qo, to the ELViM projection. In Fig. 2d, the colours of the dots now correspond to the total energy of each conformation, with the structures corresponding to the unbiased simulation highlighted in black, and the native state at the global minimum. This 3D projection has a funnel-like shape, and it is shown in three different views. It is possible to distinguish the main route leading to the native state, which is composed by structures of unbiased simulation and those from region II (Fig. 2c) clustered in its midst.

It is important to note that the conformations from region III (Fig. 2c) were not sampled in the unbiased simulation, suggesting that they are high-energy transition states. This result is consistent with the discussion of Cho et al. (Reference Cho, Levy and Wolynes2006). The fact that conformations from these two parallel routes have similar global Qo but may nucleate starting either from the C- or N-terminal leads to the appearance of backtracking that was observed when ordering structures by its global Qo alone, as shown in Fig. 2b. Thus, the ELViM demonstrates that relying only on one-dimensional analysis is not always sufficient to distinguish between these two parallel sets of routes.

Lastly, we also note that there are multiple stripes of points that emerge from the global minimum, which correspond to the ‘fraying’ of terminal parts of repeats from the fully folded array, as previously reported by Cortajarena et al. (Reference Cortajarena, Mochrie and Regan2008).

4ANK: Preferential path ensembles

4ANK folds in a similar way as 3ANK folds, with an apparent free-energy barrier of 19.5 kT for Qo ≈ 0.4, with the folded state located at Qo ≈ 0.8 (Fig. 3a). Analysing the local Qo as a function of global Qo (Fig. 3b), we can see that the nucleation occurs also in an asymmetric fashion, starting near the C-terminal repeats and propagating to the N-terminal repeats in a cooperative way. At the transition state (Qo ≈ 0.3), R3 and R4 are already folded. By the time some residues in R2 have folded (Qo ≈ 0.4), R4 and some residues of R3 appear to undergo an unfolding process that will be reversed again for Qo > 0.6. This is very similar to what was seen for R1 in 3ANK and once more is explained by the analysis of the contact map (Supplementary Fig. 4) as an overlap of distinct families of structures with the same global Qo. The 2D ELViM projection for 4ANK (Fig. 3c) resembles the ELViM projection for 3ANK – Qo varies smoothly from the outer left unfolded states to the central right folded basin. An important difference is that the intermediate states (shown in light blue with Qo ≈ 0.5) do not seem to connect the folded/unfolded region in dense pathways. Instead, they form almost two separate blocks: (I) one block that is densely populated (above), and (II) another block that is densely populated below the folded basin. The mean contact maps (available in Supplementary Figs 5 and 6) indicate that region I is populated by structures which have their N-terminal region partially folded, whereas region II is populated by structures which have their C-terminal region partially folded. Thus, the ELViM energy landscape again suggests two competing parallel folding pathways. Few structures seem to join the unfolded and folded basins through the pathway I, suggesting that it represents high-energy transition structures. To test this idea, we ran a set of unbiased simulations with the AWSEM-MD and made a new 2D projection that now contains all biased and unbiased conformations (available in Supplementary Fig. 4). This new projection shows that all unbiased folding trajectories proceeded by region II, demonstrating the existence of a favourable folding pathway.

Fig. 3. Analysis of the biased trajectory of 4ANK. (a) Two-dimensional free-energy profiles as a function of Qo for 4ANK. (b) Degree of folding of each residue as a function of Qo. The colour indicates the average local Qo for a specific residue over the entire set of structures of a given global Qo for 4ANK. (c) Energy-landscape visualisation method (ELViM) 2D projection of the conformational phase space of 4ANK as a function of the Qo coordinate; the transition regions I and II are indicated. (d) Three-dimensional ELViM projection of the 4ANK biased trajectories in function of the dynamic total energy with the unbiased trajectory structures highlighted in black and the Z-axis corresponding to the Qo coordinate.

This pathway is better seen in the 3D energy surface projection shown from three different points of view in Fig. 3d, with the unbiased sampling represented in black. In this representation, the cluster of intermediate energy is clearly seen almost disconnected from the surface’s minimum (I). To determine whether this cluster corresponds to structures that are also capable of folding, we ran several dynamic runs at different temperatures that started from structures from region I and from region II. The results show that the conformations from region II were capable of reaching the native state 76% of the time, whereas those started from region I succeeded only 37% of the time. This indicates that although the folding can occur through either region, structures in region II are more prone to reach the folded state, defining the pathway highlighted by the unbiased trajectory as being preferred over the route with region I. Supplementary Fig. 7 shows the fraction of folding events for both regions at the different temperatures. The stripes emerging from the native basin correspond to fraying of the terminal repeats, as also seen for 3ANK.

When we try to find similarities between 3ANK and 4ANK, we observe that both indicate two possible very distinctly folding modes. We note that the most effective folding mode for them is the same, through the initial folding of R2. From Fig. 3b, one may have the impression that for 4ANK, R3 is the first region to fold. However, when we see the details of ELViM projection for Qo < 0.5, the folded R3 conformations occur in region I, which is in the least favourable path (see Supplementary Fig. 12). Therefore, the most likely folding path is the same, through the initial folding of R2.

6ANK (IκBα): The most complex with multiple folding paths

Finally, we analysed the folding of a larger array, corresponding to the IκBα protein, which consists of six ankyrin repeats. In contrast to 3ANK and 4ANK which are synthetic constructs made of identical repeats yielding highly symmetric structures, IκBα is a naturally occurring protein with a very complex energy landscape because of its larger repeat array, but also because of the presence of insertions and deletions within and between some repeats, and thus the energy distribution along the array is not symmetric. Due to 6ANK complexity and asymmetries, it presents a very distinct energy landscape when compared with the other two simpler constructs.

The free-energy profile (Fig. 4a) presents a much lower barrier at Qo ≈ 0.6 that separates the folded state at Qo ≈ 0.8 from an intermediary state at Qo ≈ 0.35, with this state corresponding to two repeats folded (R2 and R3) and four unfolded. Once again, the analysis of local Qo as a function of global Qo (Fig. 4b) shows that an apparent backtracking occurs in the folding of region R1 between Qo 0.6 and 0.8, that is explained by the analysis of the mean contact maps present in the support information (Supplementary Figs 8 and 9), revealing once again that the apparent backtracking is perhaps caused by there being different folding routes having structures with similar global Qo.

Fig. 4. Analysis of the biased trajectory of 6ANK. (a) Two-dimensional free-energy profiles as a function of Qo for 6ANK. (b) Degree of folding of each residue as a function of Qo. The colour indicates the average local Qo for a specific residue over the entire set of structures of a given global Qo for 6ANK. (c) Conformational phase space of 6ANK visualised through energy-landscape visualisation method (ELViM) as a function of Qo. (d) Three-dimensional ELViM projection of the 6ANK biased trajectories in function of the dynamic total energy, with the Z-axis corresponding to the Qo coordinate.

Fig. 4c shows the complex landscape that drives this folding, with multiple paths with intermediate clusters leading to the folded state. It is possible to see four different regions based on the coordinate Qo: (1) the dark blue (between 0 and 0.2) at the bottom, corresponding to the unfolded states; (2) the light blue (between 0.3 and 0.4) at the left, corresponding to the intermediate states; (3) the light green (around 0.5) at the upper middle, corresponding to the transition states and (4) the orange to red (0.7–1.0) at the right, corresponding to the peak of the free-energy barrier to the totally folded conformation, showing the process which the protein goes through. The complexity of the energy landscape is better illustrated by the 3D projection in Fig. 4d, which shows that the high Qo region is almost detached from the rest of the funnel, with few paths connecting them.

Notably, even though this protein appears highly symmetrical, the folding routes are not equally populated, revealing an asymmetry of folding. It should be noticed that the IκBα is found to be only partially folded in its free state, but it becomes fully folded once it interacts with NFκB (Ferreiro and Komives, Reference Ferreiro and Komives2010), its cognate binding partner. As in the previous cases, fraying of the terminal repeats can be visualised near the native basin.

The formation of each repeat with and without bias for the 3ANK, 4ANK and 6ANK through the ELViM are available in Supplementary Figs 10, 12, and 14 in the Supplementary Material, respectively, as a function of Qo and in Supplementary Figs 11, 13, and 15, respectively, as a function of Qw.

Conclusion

Although the overall funnel-like landscape of the AWSEM structure-based models is expected from the model construction, we see the occurrence of intermediates and transition state locations is ruled by the detailed topology of the proteins. Yet, even when the topology appears to be highly symmetrical, the population of folding routes turn out to be strongly asymmetrical. We found that all three systems appear to backtrack during the formation of some elements when monitored only using the average local Qo, but this effect can be understood as arising from there being different structures from parallel folding routes that have the same global Qo, as demonstrated in Supplementary Figs 11 and 13. We also note that conspicuous ‘fraying’ of terminal repeats arises from the fully folded conformation in the native basin, as it is expected, and has been reported for repeating arrays (Cortajarena et al., Reference Cortajarena, Mochrie and Regan2008).

It is likely that as repeat arrays grow longer, the cooperativity between repeats breaks down, as the energy of the finite-size interfaces must get stronger such to overcome the entropy of incorporating a broken defect (Galpern et al., Reference Galpern, Freiberger and Ferreiro2020). This is observed for the largest system we analysed, IκBα that has a rather flat region of the landscape that consists of a folded subdomain of about three consecutive repeat units, which is consistent with the structural ensemble proposed for this protein in the free unbound state and that is strongly related to its functional mechanism (Truhlar et al., Reference Truhlar, Mathes, Cervantes, Ghosh and Komives2008).

Supplementary Materials

To view supplementary material for this article, please visit http://doi.org/10.1017/qrd.2022.4.

Acknowledgements

M.N.S. was supported by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq; Grant 130147/2020-6). This research was supported by the Center for Theoretical Biological Physics sponsored by the NSF (Grant PHY-2019745). A.B.O. acknowledges the Robert A. Welch Postdoctoral Fellow program. P.G.W. is also supported by the D.R. Bullard-Welch Chair at Rice University (Grant C-0016). V.B.P.L. was supported by CNPq (Grant 310017/2020-3) and FAPESP (Grants 2019/22540-3 and 2018/18668-1). D.U.F. is a CONICET researcher and is supported by Grant PICT2016/1467, UBACYT, and NASA Astrobiology Institute-Enigma (Grant 80NSSC18M0093).

Footnotes

M.N.S. and R.G.P. equally contributed to this work.

References

Aksel, T and Barrick, D (2014) Direct observation of parallel folding pathways revealed using a symmetric repeat protein system. Biophysical Journal 107, 220232.CrossRefGoogle ScholarPubMed
Aksel, T, Majumdar, A and Barrick, D (2011) The contribution of entropy, enthalpy, and hydrophobic desolvation to cooperativity in repeat-protein folding. Structure 19, 349360.CrossRefGoogle ScholarPubMed
Barrick, D (2009) Biological regulation via ankyrin repeat folding. ACS Chemical Biology 4, 1922.CrossRefGoogle ScholarPubMed
Barrick, D, Ferreiro, DU and Komives, EA (2008) Folding landscapes of ankyrin repeat proteins: experiments meet theory. Current Opinion in Structural Biology 18, 2734.CrossRefGoogle ScholarPubMed
Best, RB, Hummer, G and Eaton, WA (2013) Native contacts determine protein folding mechanisms in atomistic simulations. Proceedings of the National Academy of Sciences 110, 1787417879.CrossRefGoogle ScholarPubMed
Bryngelson, JD and Wolynes, PG (1987) Spin glasses and the statistical mechanics of protein folding. Proceedings of the National Academy of Sciences 84, 75247528.CrossRefGoogle ScholarPubMed
Cho, SS, Levy, Y and Wolynes, PG (2006) P versus Q: structural reaction coordinates capture protein folding on smooth landscapes. Proceedings of the National Academy of Sciences 103, 586591.CrossRefGoogle ScholarPubMed
Clementi, C (2008) Coarse-grained models of protein folding: toy models or predictive tools? Current Opinion in Structural Biology 18, 1015.CrossRefGoogle ScholarPubMed
Cortajarena, AL, Mochrie, SG and Regan, L (2008) Mapping the energy landscape of repeat proteins using NMR-detected hydrogen exchange. Journal of Molecular Biology 379, 617626.CrossRefGoogle ScholarPubMed
Cox, TF and Cox, MAA (2000) Multidimensional Scaling. London: Taylor & Francis.CrossRefGoogle Scholar
Davtyan, A, Schafer, NP, Zheng, W, Clementi, C, Wolynes, PG and Papoian, GA (2012) AWSEM-MD: protein structure prediction using coarse grained physical potentials and bioinformatically based local structure biasing. The Journal of Physical Chemistry B 116, 84948503.CrossRefGoogle ScholarPubMed
Eastwood, MP and Wolynes, PG (2001) Role of explicitly cooperative interactions in protein folding funnels: a simulation study. The Journal of Chemical Physics 114, 47024716.CrossRefGoogle Scholar
Espada, R, Parra, RG, Sippl, MJ, Mora, T, Walczak, AM and Ferreiro, DU (2015) Repeat proteins challenge the concept of structural domains. Biochemical Society Transactions 43, 844849.CrossRefGoogle ScholarPubMed
Ferreiro, DU, Cho, SS, Komives, EA and Wolynes, PG (2005) The energy landscape of modular repeat proteins: topology determines folding mechanism in the ankyrin family. Journal of Molecular Biology 354, 679692.CrossRefGoogle ScholarPubMed
Ferreiro, DU and Komives, EA (2007) The plastic landscape of repeat proteins. Proceedings of the National Academy of Sciences 104, 77357736.CrossRefGoogle ScholarPubMed
Ferreiro, DU and Komives, EA (2010) Molecular mechanisms of system control of NF-κB signaling by IκBα. Biochemistry 49, 15601567.CrossRefGoogle ScholarPubMed
Ferreiro, DU, Komives, EA and Wolynes, PG (2018) Frustration, function and folding. Current Opinion in Structural Biology 48, 6873.CrossRefGoogle ScholarPubMed
Ferreiro, DU, Walczak, AM, Komives, EA and Wolynes, PG (2008) The energy landscapes of repeat-containing proteins: topology, cooperativity, and the folding funnels of one-dimensional architectures. PLoS Computational Biology 4, 113.CrossRefGoogle ScholarPubMed
France, SL and Carroll, JD (2011) Two-way multidimensional scaling: a review. IEEE Transactions on Systems, Man, and Cybernetics Part C: Applications and Reviews 41, 644661.CrossRefGoogle Scholar
Galpern, EA, Freiberger, MI and Ferreiro, DU (2020) Large Ankyrin repeat proteins are formed with similar and energetically favorable units. PLoS One 15, e0233865.CrossRefGoogle ScholarPubMed
Galpern, EA, Marchi, J, Mora, T, Walczak, AM and Ferreiro, DU (2022) From evolution to folding of repeat proteins. Preprint, arXiv:2202.12223.Google Scholar
Habibi, M, Rottler, J and Plotkin, SS (2016) As simple as possible, but not simpler: exploring the fidelity of coarse-grained protein models for simulated force spectroscopy. PLoS Computational Biology 12, e1005211.CrossRefGoogle Scholar
Interlandi, G, Wetzel, SK, Settanni, G, Plückthun, A and Caflisch, A (2008) Characterization and further stabilization of designed ankyrin repeat proteins by combining molecular dynamics simulations and experiments. Journal of Molecular Biology 375, 837854.CrossRefGoogle ScholarPubMed
Kumar, A and Balbach, J (2021) Folding and stability of Ankyrin repeats control biological protein function. Biomolecules 11, 840.CrossRefGoogle ScholarPubMed
Kumar, S, Rosenberg, J, Bouzida, D, Swendsen, R and Kollman, P (1992) The weighted histogram analysis method for free-energy calculations on biomolecules I. The method. Journal of Computational Chemistry 13, 10111021.CrossRefGoogle Scholar
Lätzer, J, Papoian, GA, Prentiss, MC, Komives, EA and Wolynes, PG (2007) Induced fit, folding, and recognition of the NF-κB-nuclear localization signals by IκBα and IκBβ. Journal of Molecular Biology 367, 262274.CrossRefGoogle ScholarPubMed
Löw, C, Weininger, U, Neumann, P, Klepsch, M, Lilie, H, Stubbs, MT and Balbach, J (2008) Structural insights into an equilibrium folding intermediate of an archaeal ankyrin repeat protein. Proceedings of the National Academy of Sciences 105, 37793784.CrossRefGoogle ScholarPubMed
Mello, CC and Barrick, D (2004) An experimentally determined protein folding energy landscape. Proceedings of the National Academy of Sciences 101, 1410214107.CrossRefGoogle ScholarPubMed
Oliveira, AB, Yang, H, Whitford, PC and Leite, VBP (2019) Distinguishing biomolecular pathways and metastable states. Journal of Chemical Theory and Computation 15, 64826490.CrossRefGoogle ScholarPubMed
Paladin, Lisanna, Bevilacqua, Martina, Errigo, Sara, Piovesan, Damiano, Mičetić, Ivan, Necci, Marco, Monzon, Alexander Miguel, Fabre, Maria Laura, Lopez, Jose Luis, Nilsson, Juliet F, Rios, Javier, Menna, Pablo Lorenzano, Cabrera, Maia, Buitron, Martin Gonzalez, Kulik, Mariane Goncalves, Fernandez-Alberti, Sebastian, Fornasari, Maria Silvina, Parisi, Gustavo, Lagares, Antonio, Hirsh, Layla, Andrade-Navarro, Miguel A, Kajava, Andrey V, Tosatto, Silvio C E, et al. (2020) RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures. Nucleic Acids Research 49, D452D457.CrossRefGoogle Scholar
Parra, RG, Espada, R, Sánchez, IE, Sippl, MJ and Ferreiro, DU (2013) Detecting repetitions and periodicities in proteins by tiling the structural space. The Journal of Physical Chemistry B 117, 1288712897.CrossRefGoogle ScholarPubMed
Parra, RG, Espada, R, Verstraete, N and Ferreiro, DU (2015) Structural and energetic characterization of the ankyrin repeat protein family. PLoS Computational Biology 11, e1004659.CrossRefGoogle ScholarPubMed
Petersen, M and Barrick, D (2021) Analysis of tandem repeat protein folding using nearest-neighbor models. Annual Review of Biophysics 50, 245265.CrossRefGoogle ScholarPubMed
Schafer, NP, Kim, BL, Zheng, W and Wolynes, PG (2014) Learning to fold proteins using energy landscape theory. Israel Journal of Chemistry 54, 13111337.CrossRefGoogle ScholarPubMed
Serquera, D, Lee, W, Settanni, G, Marszalek, PE, Paci, E and Itzhaki, LS (2010) Mechanical unfolding of an ankyrin repeat protein. Biophysical Journal 98, 12941301.CrossRefGoogle ScholarPubMed
Torrie, G and Valleau, J (1977) Nonphysical sampling distributions in Monte Carlo free-energy estimation: umbrella sampling. Journal of Computational Physics 23, 187199.CrossRefGoogle Scholar
Tripp, KW and Barrick, D (2008) Rerouting the folding pathway of the Notch ankyrin domain by reshaping the energy landscape. Journal of the American Chemical Society 130, 56815688.CrossRefGoogle ScholarPubMed
Truhlar, SM, Mathes, E, Cervantes, CF, Ghosh, G and Komives, EA (2008) Pre-folding IκBα alters control of NF-κB signaling. Journal of Molecular Biology 380, 6782.CrossRefGoogle ScholarPubMed
Werbeck, ND and Itzhaki, LS (2007) Probing a moving target with a plastic unfolding intermediate of an ankyrin-repeat protein. Proceedings of the National Academy of Sciences 104, 78637868.CrossRefGoogle ScholarPubMed
Werbeck, ND, Rowling, PJ, Chellamuthu, VR and Itzhaki, LS (2008) Shifting transition states in the unfolding of a large ankyrin repeat protein. Proceedings of the National Academy of Sciences 105, 99829987.CrossRefGoogle ScholarPubMed
Wolynes, PG, Eaton, WA and Fersht, AR (2012) Chemical physics of protein folding. Proceedings of the National Academy of Sciences 109, 1777017771.CrossRefGoogle ScholarPubMed
Figure 0

Fig. 1. Representation of the tertiary structures of Ankyrin repeat proteins. The high-resolution structures of the proteins studied are shown in ribbons coloured in blue to red from N to C termini. 3ANK corresponds to PDB 1N0QA, 4ANK with PDB 1N0R and 6ANK with PDB 1NFIE.

Figure 1

Fig. 2. Analysis of the biased trajectory of 3ANK. (a) Two-dimensional free-energy profile as a function of Qo for 3ANK. (b) Degree of folding of each residue as a function of Qo. The colour indicates the average local Qo for a specific residue over the entire set of structures of a given global Qo for 3ANK. (c) Conformational phase space of 3ANK visualised through energy-landscape visualisation method (ELViM) as a function of Qo. (d) Three-dimensional ELViM projection of the 3ANK biased trajectories in function of the dynamic total energy with the unbiased trajectory structures highlighted in black and the Z-axis corresponding to the Qo coordinate.

Figure 2

Fig. 3. Analysis of the biased trajectory of 4ANK. (a) Two-dimensional free-energy profiles as a function of Qo for 4ANK. (b) Degree of folding of each residue as a function of Qo. The colour indicates the average local Qo for a specific residue over the entire set of structures of a given global Qo for 4ANK. (c) Energy-landscape visualisation method (ELViM) 2D projection of the conformational phase space of 4ANK as a function of the Qo coordinate; the transition regions I and II are indicated. (d) Three-dimensional ELViM projection of the 4ANK biased trajectories in function of the dynamic total energy with the unbiased trajectory structures highlighted in black and the Z-axis corresponding to the Qo coordinate.

Figure 3

Fig. 4. Analysis of the biased trajectory of 6ANK. (a) Two-dimensional free-energy profiles as a function of Qo for 6ANK. (b) Degree of folding of each residue as a function of Qo. The colour indicates the average local Qo for a specific residue over the entire set of structures of a given global Qo for 6ANK. (c) Conformational phase space of 6ANK visualised through energy-landscape visualisation method (ELViM) as a function of Qo. (d) Three-dimensional ELViM projection of the 6ANK biased trajectories in function of the dynamic total energy, with the Z-axis corresponding to the Qo coordinate.

Supplementary material: File

Sanches et al. supplementary material

Figures S1-S15

Download Sanches et al. supplementary material(File)
File 5 MB

Review: Resolving the Fine Structure in the Energy Landscapes of Repeat Proteins — R0/PR1

Conflict of interest statement

Reviewer declares none.

Comments

Comments to Author: Overall Comments:

I enjoyed this paper a lot. The authors present a novel approach to the investigation of folding pathways performed on an ideal test-case protein (multiple repeating subunits along the sequence). The work is also performed without the use of machine learning, which I appreciate in today’s computational science environment! Three different systems are studied (3ANK, 4ANK and 6ANK being 3,4 and 6 subunit repeats) using the exact same simulation procedure, in principle allowing for both independent discussion of each system’s folding behaviour and comparison between each simulation suite. With their independent system analysis, their method is able to take two structural reaction coordinates and produce a multidimensional mapped representation of the folding pathways of the protein. They are able to extract free energy barriers to folding and identify multiple folding pathways. Importantly, they are able to observe which of these pathways are dominant and the reason why (in terms of their reaction coordinates). Unfortunately, the text is currently lacking in comparison between the three different systems, which is an issue given that the addition of extra subunit repeats along the protein does not just modify the folding pathways, but appears to fundamentally change them. I therefore require extra discussions (and a few other tweaks) to be included to justify publication. Nothing major and no further simulations. Revisions are given in the order they appear in the article.

Revisions:

1. Can the authors please provide explicit detail on what σ_ij is and why it is suitable as a weighting parameter in Eq.1. I appreciate that this is made clear in the references, but it is extremely important to this structural metric and has the form of a Gaussian standard deviation, so the distinction should be made explicitly clear here.

2. Can the authors please add a clarification on the umbrella sampling method for non-experts. For example "...using Qw as the global reaction coordinate to accelerate sampling at the low Qw values unreachable by standard (unbiased) simulations."

3. Each Qw value could represent a multitude of unique conformations, especially at low Qw. Can the authors please specify whether the set of conformations used represent the full conformational space available to the protein, and whether the reaction pathways / trajectories obtained are global or specific to the conformations used. (I imagine this has something to do with the ELViM multidimensional projection, but the specifics were not clear to me in the text or Reference 15).

4. On page 8 the authors claim that "Figures 2A and 2B show that the folding is highly cooperative" because different parts of the protein begin to fold at the same time. I do not believe a claim of cooperativity can be made at this stage without either explicit proof of stabilising interactions between repeats, or additional simulations of R1, R2 and R3 in isolation. I will accept, however, that cooperativity can be inferred from the folding pathways and transition states shown in the Figure 2C projection, and in the 4ANK/6ANK sections. Can the authors please relax the initial claim and instead make the claim following the discussion of transition pathways on page 9 (or explain to me in your reply why Figures 2A and 2B are sufficient).

5. In the 3ANK section, the authors find that the preferential folding pathway is when R1 and R2 fold first (in intermediate region II), implying that the N-terminal region is folded. The secondary pathway is when R2 and R3 fold first,implying that the C-terminal region is folded. In 4ANK, however, we find that the preferential intermediate region (II) is populated by partially folded C-terminal structures, and the secondary intermediate region (I) by partially folded N-terminal structures, to the extent that unbiased simulations are only observed to proceed through region II. Thus the 4ANK folding mechanism appears to be exactly opposite to that of 3ANK in terms of preferential folding, which can be seen even more clearly by comparing Figures 2B and 3B. The 6ANK system then seems to more closely resembles 3ANK than 4ANK, in that the N-terminal portion of the construct is shown to fold first in Figure 4B. Can the authors please discuss, either section by section or in the conclusions, the differences and similarities between the folding pathways of the different systems, and why they might occur.

6. In Figure 4A, we see the emergence of a second free energy barrier and thus, an intermediate (potentially stable) free energy minimum. Can the authors elucidate why this new barrier appears, and why this reduces the overall free energy barrier of 19.5kT (from the 3ANK and 4ANK systems) to just 10kT (in the 6ANK system).

7. Figure 5, the graphical abstract / contents figure, is very beautiful. I like it a lot

Minor Suggestions (just suggestions, not revisions):

1. If Figures 2A and 2B could be combined such that they share a global Qo axis that would be great. Then we could compare residue behaviour and the free energy profile directly! The same in Figures 3 and 4.

2. In Figures 2A, 2B and 2C, red implies high Qo and blue implies low Qo, whereas in Figures 2D and 2E the opposite is true, as red is used to imply high energy instead. This is visually confusing. Could you reverse the energy colour scheme? The same in Figures 3 and 4.

Typos and Grammar:

1. Pg4: in the folding of consecutive -> and in the folding of consecutive

2. Pg6: time steps -> time step

3. Pg9: In Figure 2D, the colors of the dots correspond -> In Figure 2D, the colors of the dots now correspond

4. Pg11: bellow -> below

5. Pg11: new projection show -> new projection shows

6. Pg13: The Figure 4C -> Figure 4C

7. Pg13: maybe is caused -> is perhaps caused

8. Pg15: funneldness of the -> funnel-like

Review: Resolving the Fine Structure in the Energy Landscapes of Repeat Proteins — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

Comments to Author: This is an interesting paper. I recommend publication with the following changes:

1. I couldn’t find statements of what the three proteins were - their pdb files were listed but not the protein names. Is 6ANK IkBa (as mentioned towards the end of the paper)? The authors should also explain why they chose these particular proteins.

2. Some context is required to compare this new model with other models studied in the literature.

3. Broader referencing would be good. Many of the references are to reviews, and there are some papers on experimental studies missing - e.g. natural repeat proteins and the presence of intermediates and their relationship to function (Balbach), parallel folding pathways (Itzhaki).

Recommendation: Resolving the Fine Structure in the Energy Landscapes of Repeat Proteins — R0/PR3

Comments

Comments to Author: Reviewer #1: This is an interesting paper. I recommend publication with the following changes:

1. I couldn’t find statements of what the three proteins were - their pdb files were listed but not the protein names. Is 6ANK IkBa (as mentioned towards the end of the paper)? The authors should also explain why they chose these particular proteins.

2. Some context is required to compare this new model with other models studied in the literature.

3. Broader referencing would be good. Many of the references are to reviews, and there are some papers on experimental studies missing - e.g. natural repeat proteins and the presence of intermediates and their relationship to function (Balbach), parallel folding pathways (Itzhaki).

Reviewer #2: Overall Comments:

I enjoyed this paper a lot. The authors present a novel approach to the investigation of folding pathways performed on an ideal test-case protein (multiple repeating subunits along the sequence). The work is also performed without the use of machine learning, which I appreciate in today’s computational science environment! Three different systems are studied (3ANK, 4ANK and 6ANK being 3,4 and 6 subunit repeats) using the exact same simulation procedure, in principle allowing for both independent discussion of each system’s folding behaviour and comparison between each simulation suite. With their independent system analysis, their method is able to take two structural reaction coordinates and produce a multidimensional mapped representation of the folding pathways of the protein. They are able to extract free energy barriers to folding and identify multiple folding pathways. Importantly, they are able to observe which of these pathways are dominant and the reason why (in terms of their reaction coordinates). Unfortunately, the text is currently lacking in comparison between the three different systems, which is an issue given that the addition of extra subunit repeats along the protein does not just modify the folding pathways, but appears to fundamentally change them. I therefore require extra discussions (and a few other tweaks) to be included to justify publication. Nothing major and no further simulations. Revisions are given in the order they appear in the article.

Revisions:

1. Can the authors please provide explicit detail on what σ_ij is and why it is suitable as a weighting parameter in Eq.1. I appreciate that this is made clear in the references, but it is extremely important to this structural metric and has the form of a Gaussian standard deviation, so the distinction should be made explicitly clear here.

2. Can the authors please add a clarification on the umbrella sampling method for non-experts. For example "...using Qw as the global reaction coordinate to accelerate sampling at the low Qw values unreachable by standard (unbiased) simulations."

3. Each Qw value could represent a multitude of unique conformations, especially at low Qw. Can the authors please specify whether the set of conformations used represent the full conformational space available to the protein, and whether the reaction pathways / trajectories obtained are global or specific to the conformations used. (I imagine this has something to do with the ELViM multidimensional projection, but the specifics were not clear to me in the text or Reference 15).

4. On page 8 the authors claim that "Figures 2A and 2B show that the folding is highly cooperative" because different parts of the protein begin to fold at the same time. I do not believe a claim of cooperativity can be made at this stage without either explicit proof of stabilising interactions between repeats, or additional simulations of R1, R2 and R3 in isolation. I will accept, however, that cooperativity can be inferred from the folding pathways and transition states shown in the Figure 2C projection, and in the 4ANK/6ANK sections. Can the authors please relax the initial claim and instead make the claim following the discussion of transition pathways on page 9 (or explain to me in your reply why Figures 2A and 2B are sufficient).

5. In the 3ANK section, the authors find that the preferential folding pathway is when R1 and R2 fold first (in intermediate region II), implying that the N-terminal region is folded. The secondary pathway is when R2 and R3 fold first,implying that the C-terminal region is folded. In 4ANK, however, we find that the preferential intermediate region (II) is populated by partially folded C-terminal structures, and the secondary intermediate region (I) by partially folded N-terminal structures, to the extent that unbiased simulations are only observed to proceed through region II. Thus the 4ANK folding mechanism appears to be exactly opposite to that of 3ANK in terms of preferential folding, which can be seen even more clearly by comparing Figures 2B and 3B. The 6ANK system then seems to more closely resembles 3ANK than 4ANK, in that the N-terminal portion of the construct is shown to fold first in Figure 4B. Can the authors please discuss, either section by section or in the conclusions, the differences and similarities between the folding pathways of the different systems, and why they might occur.

6. In Figure 4A, we see the emergence of a second free energy barrier and thus, an intermediate (potentially stable) free energy minimum. Can the authors elucidate why this new barrier appears, and why this reduces the overall free energy barrier of 19.5kT (from the 3ANK and 4ANK systems) to just 10kT (in the 6ANK system).

7. Figure 5, the graphical abstract / contents figure, is very beautiful. I like it a lot

Minor Suggestions (just suggestions, not revisions):

1. If Figures 2A and 2B could be combined such that they share a global Qo axis that would be great. Then we could compare residue behaviour and the free energy profile directly! The same in Figures 3 and 4.

2. In Figures 2A, 2B and 2C, red implies high Qo and blue implies low Qo, whereas in Figures 2D and 2E the opposite is true, as red is used to imply high energy instead. This is visually confusing. Could you reverse the energy colour scheme? The same in Figures 3 and 4.

Typos and Grammar:

1. Pg4: in the folding of consecutive -> and in the folding of consecutive

2. Pg6: time steps -> time step

3. Pg9: In Figure 2D, the colors of the dots correspond -> In Figure 2D, the colors of the dots now correspond

4. Pg11: bellow -> below

5. Pg11: new projection show -> new projection shows

6. Pg13: The Figure 4C -> Figure 4C

7. Pg13: maybe is caused -> is perhaps caused

8. Pg15: funneldness of the -> funnel-like

Recommendation: Resolving the Fine Structure in the Energy Landscapes of Repeat Proteins — R1/PR4

Comments

No accompanying comment.

Recommendation: Resolving the Fine Structure in the Energy Landscapes of Repeat Proteins — R2/PR5

Comments

No accompanying comment.