Hostname: page-component-cd9895bd7-p9bg8 Total loading time: 0 Render date: 2024-12-22T21:54:19.477Z Has data issue: false hasContentIssue false

Modelling peptide–protein complexes: docking, simulations and machine learning

Published online by Cambridge University Press:  19 September 2022

Arup Mondal
Affiliation:
Department of Chemistry, University of Florida, Gainesville, FL32611, USA Quantum Theory project, University of Florida, Gainesville, FL 32611, USA
Liwei Chang
Affiliation:
Department of Chemistry, University of Florida, Gainesville, FL32611, USA Quantum Theory project, University of Florida, Gainesville, FL 32611, USA
Alberto Perez*
Affiliation:
Department of Chemistry, University of Florida, Gainesville, FL32611, USA Quantum Theory project, University of Florida, Gainesville, FL 32611, USA
*
*Author for correspondence: Alberto Perez, E-mail: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Peptides mediate up to 40% of protein interactions, their high specificity and ability to bind in places where small molecules cannot make them potential drug candidates. However, predicting peptide–protein complexes remains more challenging than protein–protein or protein–small molecule interactions, in part due to the high flexibility peptides have. In this review, we look at the advances in docking, molecular simulations and machine learning to tackle problems related to peptides such as predicting structures, binding affinities or even kinetics. We specifically focus on explaining the number of docking programmes and force fields used in molecular simulations, so a prospective user can have an educated guess as to why choose one modelling tool or another to address their scientific questions.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Introduction

Protein–protein interactions (PPIs) are a vital component of pathways regulating the behaviour of cells. In disease, some of these pathways become aberrant; therefore, identifying ways of inhibiting them is of great therapeutic importance. Inhibiting PPI with small molecules is not always possible due to the large interface region, lack of binding cavities and specificity of the interaction. However, between 15 and 40% of PPI (London et al., Reference London, Raveh and Schueler-Furman2013) are mediated by peptide epitopes, giving rise to opportunities for peptide-based inhibition of PPIs. There is a growing market for peptide-based therapeutic agents (Martins et al., Reference Martins, Santos, Mariano, Queiroz, Bastos, Gomes, Fischer, Rocha, Silveira, Lima, Magalhães, Oliveira and Melo-Minardi2021; Wang et al., Reference Wang, Wang, Zhang, Cheng, Yan, Shao, Wang, Wang and Fu2022), and there are already more than 60 peptide drugs approved in the United States (Usmani et al., Reference Usmani, Bedi, Samuel, Singh, Kalra, Kumar, Ahuja, Sharma, Gautam and Raghava2017; Lau and Dunn, Reference Lau and Dunn2018). Peptides have well-known degradation pathways, lower toxicity than small molecules and are highly specific. Some of the challenges for peptide-based therapeutics such as the rapid degradation by proteases or limited ability to cross membranes (Fosgerau and Hoffmann, Reference Fosgerau and Hoffmann2015) can be overcome by using modified amino acids and cyclisation techniques (Bechtler and Lamers, Reference Bechtler and Lamers2021). Others, like efficient delivery strategies (e.g., oral delivery), limit broader interest (Ganesh et al., Reference Ganesh, Heusser, Garad and Sánchez-Félix2021).

The rational design of peptide-based therapeutic requires structural knowledge of peptide–protein complexes at the atomistic level. Experimental studies are challenging and expensive as peptide degrades fast, and the spectrum of target candidate is too broad to explore experimentally. While the use of computational pipelines is well-stablished in the early stages of drug discovery for small molecules, peptides present some unique challenges that have limited the success of computational pipelines for inhibitory peptide design. Structurally, peptides interact with proteins in different ways (Arkin et al., Reference Arkin, Tang and Wells2014): 1) as coils through specific amino acid interaction; 2) by adopting well-defined secondary structures (e.g. hairpins or helices) and 3) through discontinuous interactions along the peptide chain. In these interactions, peptide flexibility is important, as they are often intrinsically disordered in their free form and adopt well-defined structures upon binding – unlike small molecules where flexibility is more limited.

The reader is refered to several excellent and exhaustive reviews on peptides as therapeutics, and the role of docking to identify PPIs (Fosgerau and Hoffmann, Reference Fosgerau and Hoffmann2015; Ciemny et al., Reference Ciemny, Kurcinski, Kamel, Kolinski, Alam, Schueler-Furman and Kmiecik2018; Apostolopoulos et al., Reference Apostolopoulos, Bojarska, Chai, Elnagdy, Kaczmarek, Matsoukas, New, Parang, Lopez, Parhiz, Perera, Pickholz, Remko, Saviano, Skwarczynski, Tang, Wolf, Yoshiya, Zabrocki and Toth2021). In this review, we describe three major classes of computational methods that are routinely used to elucidate different aspects of PPIs: 1) docking; 2) molecular dynamics (MD) simulations and 3) machine learning approach. Docking approaches have traditionally been the most successful at exploring the possible orientations and interaction sites between proteins and peptides. Their use for peptide systems has evolved from protein–small molecule and protein–protein docking tools and face different challenges when accounting for the highly flexible nature of peptides. Hence, these methods are described in most detail. MD methods draw from the wealth of enhanced simulation methods available in the literature. Their purpose is to confer detail either about the binding energy landscape or binding mechanism. They are generally not high-throughput methods and complement studies where the bound structure is known. Finally, machine learning approaches are rapidly evolving, thanks to AlphaFold’s recent success in protein structure prediction. These methods have rapidly been ported to other applications such as peptide–protein structure prediction with a success rate to match those of the best performing docking programmes. Although different in methodology, the three classes of methods need to account for the challenges in sampling bound conformations as well as identifying them through some scoring function. Our goal with this review is to identify the current strategies to approach these challenges and provide a broad understanding of the advances and limitations in the field.

Docking section

Docking remains an efficient approach to sample bound conformations given the receptor structure and the ligand. Their success in small molecule-protein docking and their ease-of-use through webservers and standalone software have popularised this method for virtual screening in the early stages of drug discovery (Taylor et al., Reference Taylor, Jewsbury and Essex2002). Some of this success has been translated into the protein–protein docking field, as seen from the evolution of predictions in the CAPRI (Critical Assessment of PRediction of Interaction) competition (Lensink et al., Reference Lensink, Velankar and Wodak2017, Reference Lensink, Nadzirin, Velankar and Wodak2020). Despite these successes, peptide–protein docking remains a more challenging problem (Ciemny et al., Reference Ciemny, Kurcinski, Kamel, Kolinski, Alam, Schueler-Furman and Kmiecik2018). The success of docking relies on the ability to sample bound conformations, and the ability to identify native-like poses using a scoring function. The flexible nature of the peptides significantly increases the sampling problem with respect to small molecule docking and limits sampling native-like poses (Rentzsch and Renard, Reference Rentzsch and Renard2015). Similarly, the flexibility challenges the ability to transfer standard scoring functions to identify peptide–protein complexes. For example, despite the poly-aminoacidic nature of peptides, a straightforward application of protein scoring functions has limited success. Thus, modifications to protein–protein scoring functions are needed to correctly identify native-like poses (Agrawal et al., Reference Agrawal, Singh, Srivastava, Singh, Kishore and Raghava2019; Weng et al., Reference Weng, Gao, Wang, Wang, Hu, Yao, Cao and Hou2020). Innovation in the peptide docking field comes from strategies for more efficient handling of flexibility and overcoming limitations in scoring. Along with these improvements, the curation of peptide–protein databases is crucial for systematic testing, benchmarking and assessing these docking methods (Hauser and Windshügel, Reference Hauser and Windshügel2016). There are already excellent reviews and benchmark studies assessing the performance of different methods (Wang et al., Reference Wang, Sun, Yao, Li, Xu, Li, Tian and Hou2016; Ciemny et al., Reference Ciemny, Kurcinski, Kamel, Kolinski, Alam, Schueler-Furman and Kmiecik2018; Agrawal et al., Reference Agrawal, Singh, Srivastava, Singh, Kishore and Raghava2019; A. C.-L. Lee et al., Reference Lee, Harris, Khanna and Hong2019; Weng et al., Reference Weng, Gao, Wang, Wang, Hu, Yao, Cao and Hou2020). Hence, we will limit this section to the nature of the databases and different sampling/scoring strategies prevalent in the field.

Databases

Many peptide docking methods have evolved from either protein–protein or protein–small molecule docking tools. Their modification includes better handling of flexibility and specific scoring functions. To test their performance on peptide–protein complexes, several efforts distil the structural information from the PDB, identifying sets of peptide–protein complexes amongst the ~150,000 structures deposited in the RCSB-PDB (Berman et al., Reference Berman, Battistuz, Bhat, Bluhm, Bourne, Burkhardt, Feng, Gilliland, Iype, Jain, Fagan, Marvin, Padilla, Ravichandran, Schneider, Thanki, Weissig, Westbrook and Zardecki2002). Thus, the emergence of databases for peptide–protein complexes streamlines the process and accelerates the advancement of the field. Several databases are available, each with a specific purpose in mind – ranging from properties such as length of the peptides to types of binding motifs. Here, we provide an overview of the widely used databases. Table 1 provides a quick reference summary.

Table 1. Summary of the popular protein–peptide complexes datasets that are widely used for testing and benchmarking different docking tool

LEADS-PEP (Hauser and Windshügel, Reference Hauser and Windshügel2016) consists of 53 peptide–protein complexes with peptides ranging from 3 to 12 amino acids with resolution lower than 2 Å. The entries originate from a clustering on sequence space, retaining only complexes that are diverse in terms of sequence – excluding those that interact with DNA or RNA. Due to the short length of the peptides in this database, it is suitable for benchmarking docking tools originating from small molecule docking programmes.

PeptiDB (London et al., Reference London, Movshovitz-Attias and Schueler-Furman2010) is a non-redundant database of 103 high resolution peptide–protein complexes. These peptides are 5–15 amino acids long with diverse bound conformations (helix, β-strand and coil) and functionalities (such as signal-transduction, antibody binding, protein trafficking and transporting). The set includes complexes with a significant conformational change upon binding. These characteristics make PeptiDB appropriate for benchmarking docking tools that account for peptide flexibility.

PPDbench (Agrawal et al., Reference Agrawal, Singh, Srivastava, Singh, Kishore and Raghava2019) database has been used to benchmark 6 common docking programmes which contains 133 peptide–protein complexes with less than 40% sequence similarity. The set is diverse with respect to functionality, but the range of peptide lengths is narrower (9–15 amino acids). The benchmark study with this database by Agarwal et al. showed that different docking methods perform best on different class of peptides, classified in terms of their functionality such as enzymatic, signalling and many others (Agrawal et al., Reference Agrawal, Singh, Srivastava, Singh, Kishore and Raghava2019). In another study, Weng et al. created and used the PepSet (Weng et al., Reference Weng, Gao, Wang, Wang, Hu, Yao, Cao and Hou2020) database to benchmark 14 docking programmes. This database contains 185 high resolution peptide–protein complexes with less than 30% sequence similarity and peptide lengths ranging from 5 to 20 amino acids.

Both LEADS-PEP and PeptiDB have been widely used for benchmarking sampling and scoring ability, but they are limited by peptide length. Peptides longer than 20 amino acids are very common in nature. The PepPro (Xu and Zou, Reference Xu and Zou2020) database contains 89 non-redundant peptide–protein complexes with longer peptide sequences (5–30 resides) and diverse peptide secondary structures. As a useful feature of PepPro, the database contains 58 structures of the unbound receptor proteins – making it ideal to benchmark docking methods for predicting apo to holo conformational changes.

While the above databases have a limited number of complexes, a few databases are more inclusive in their search parameters. For example, PepX (Vanhee et al., Reference Vanhee, Reumers, Stricher, Baeten, Serrano, Schymkowitz and Rousseau2010) contains 1431 non-redundant complexes from the PDB with peptide size ranging from 5 to 35 amino acids and resolution less than 2.5 Å. There are redundancies in the database and the number of complexes can be reduced by clustering on their interaction interface reduces, resulting to 505 unique cluster centres. Similarly, the PepBind (Das et al., Reference Das, Sharma, Kumar, Krishna and Mathur2013) database is built on similar principles to PepX, without accounting for sequence/structural redundancy. PepBind contains 3100 protein peptide complexes. While larger databases may help in assessing the applicability of docking methods on predicting longer peptides, the databases are not well curated (e.g. complexes in the databases might contain non-interacting chains, small molecule ligands or ions which might lead to erroneous assessment of docking tools (Wen et al., Reference Wen, He, Tao and Huang2018)). To overcome this limitation, a curated database PepBDB (Wen et al., Reference Wen, He, Tao and Huang2018) was developed, containing peptides up to 50 amino acids, with nearly 13,000 complexes in the dataset. A more recent database, Propedia (Martins et al., Reference Martins, Santos, Mariano, Queiroz, Bastos, Gomes, Fischer, Rocha, Silveira, Lima, Magalhães, Oliveira and Melo-Minardi2021), contains over 20,000 high resolution complexes with peptides ranging from 2 to 50 residues. Propedia features a hybrid clustering based on sequence, interface structure or binding site that retains a lower number of clusters (1,845, 1,891 or 1,466, respectively), allowing the user to be flexible for benchmarking purposes. Finally, PixelDB (Frappier et al., Reference Frappier, Duran and Keating2018) contains close to 2,000 high resolution and non-redundant complexes. Unlike previous databases, this one relies on a machine learning algorithm along with a chain length cutoff to identify the receptor and peptide in a complex. This overcomes the issue of defining protein and peptide, for cases where the receptor size is smaller than its peptide binder.

Sampling

Docking methods can be classified in two major categories depending on their use of templates for modelling the complex (template-based and template-free docking).

Template-based docking

Template-based docking methods take advantage of known structures of either protein monomers or complexes and extract structural features to predict the unknown peptide–protein complex structure. We summarise these methods in Table 2 and provide further detail below. Template-based methods are grounded on the premise that the PPI interface is conserved and similar to either the PPI interface or the different interacting fragments in a protein. Based on conserved interfaces, these methods build a modelling scaffold for the target systems. Indeed, 80% of the peptide–protein interfaces can be derived from fragment interactions in monomeric proteins (Obarska-Kosinska et al., Reference Obarska-Kosinska, Iacoangeli, Lepore and Tramontano2016). The first step of these class of methods is finding suitable templates for the target system from different databases. The most popular template-based docking method is GalaxyPepDock (H. Lee et al., Reference Lee, Heo, Lee and Seok2015). It takes a peptide sequence and a receptor structure as inputs to search for structural similarity in the PepBind database. It then uses a score (S_complex) for each hit in the database, which is calculated combining the TM score of the receptor in the database with respect to the target receptor, and an interaction similarity score which is calculated based on the protein structure, peptide sequence and the interacting residue pairs (H. Lee et al., Reference Lee, Heo, Lee and Seok2015; ‘Modeling Peptide–Protein Interactions, Methods and Protocols’, 2017). The top 10 scoring templates with scores higher than 90% of the maximum score in the database are selected. These templates are then used to build models with GalaxyTBM predicting an estimated accuracy for each model. Testing on the PeptiDB database where it predicted 37 out of 57 complexes with acceptable or higher quality. In general, the GalaxyPepDock approach is quite reliably when the receptor TM score is greater than 0.7 (H. Lee and Seok, Reference Lee and Seok2017). GalaxyPepDock takes 2–3 hours to complete a prediction, making it fast compared to methods that rely on computationally expensive molecular simulations. The recently developed InterPep2 (and InterPep2-Refined) also belongs to the template-based category, with a similar performance to GalaxyPepDock when tested on unbound receptors set and slightly better when tested on the bound dataset (Johansson-Åkhe et al., Reference Johansson-Åkhe, Mirabello and Wallner2020).

Table 2. Summary of highlighted templated based docking tools

PepComposer (Obarska-Kosinska et al., Reference Obarska-Kosinska, Iacoangeli, Lepore and Tramontano2016) is an example of tool which uses structural knowledge of complexes to design a novel peptide sequence and dock it to the given receptor. This method, in the first step, finds a structurally similar fragment based on a given binding site and retrieves continuous backbone fragments from a structural database based on contacts to the prior fragment. In the next step, it predicts novel peptide sequences and bound complex structures using Monte Carlo moves embedded in a python-based tool (pyRosetta (Chaudhury et al., Reference Chaudhury, Lyskov and Gray2010)). Testing the method on the LEADS-PEP dataset returned a 50% success rate considering only the top model. However, the caveat here is that the designed peptides are generally shorter than the native which decreases the RMSD (Root Mean Square Deviation) value (Obarska-Kosinska et al., Reference Obarska-Kosinska, Iacoangeli, Lepore and Tramontano2016).

Template-Free Docking

This class of methods samples different peptide–protein orientations and positions as well as generates a diverse set of peptide conformations. Depending on the available knowledge of the binding site, these methods can be further divided into two subclasses: local and global docking. In local docking, the binding site is known, reducing search space. In global docking, no prior knowledge of the binding site is used, and the peptides explore the whole receptor surface (see Fig. 1). We will discuss these two subgroups in two different sections.

Fig. 1. Pipeline in popular template-free docking methods. (A) Input peptide conformations are generated in 3 major ways: 1) using peptide builder to generate major 3 conformations (alpha, polyproline II, extended); 2) molecular simulations are used to generate an ensemble of peptide conformations and 3) fragment pickers are used to select peptide fragments in the structural databases based on the peptide sequence. (B) If the binding site known, peptides are guided towards the binding site (local docking); else, peptides explore the whole protein surface (global docking). (C) Ensemble of docked poses. (D) Top score docked model representing the native structure.

Local Docking

As input, local docking methods need a set of user-defined information about the binding site, restricting the search of the ligand to the vicinity of this region, so we only need to sample the peptide conformations and orientations. Success of this group of docking methods depends on the accuracy of the initial information (Ciemny et al., Reference Ciemny, Kurcinski, Kamel, Kolinski, Alam, Schueler-Furman and Kmiecik2018). Knowledge of the binding site can come from diverse sources such as protein–protein complex interfaces, a docked pose from another docking tool, hotspot prediction or even experimental data. Each method has its own specific input and types of information it can handle, requiring the right fit between prior information and local docking programme used. There are also limitations regarding how much sampling of the protein and peptide conformations are needed. As an example, HADDOCK uses Ambiguous Interaction Restraints based on hotspot residues (e.g. from NMR chemical shift perturbation data) on the protein surface to guide sampling – but requires different peptide conformations as input (e.g. alpha helix, extended and polyproline II) to limit sampling to the relative position of the protein/peptide without sampling the peptide conformations (Trellet et al., Reference Trellet, Melquiond and Bonvin2013, Reference Trellet, Melquiond and Bonvin2014; Geng et al., Reference Geng, Narasimhan, Rodrigues and Bonvin2017). Once the preferred bound conformation is found, HADDOCK introduces peptide backbone flexibility to sample diverse conformations. HADDOCK has a 14.5% success rate when tested on the PeptiDB database. HPEPDOCK-local, using binding site hotspot information and shape complementarity followed by energy minimisation, produces fast and accurate predictions with 33.9% success rate when tested under the same condition as of HADDOCK (P. Zhou et al., Reference Zhou, Jin, Li and Huang2018; Johansson-Åkhe et al., Reference Johansson-Åkhe, Mirabello and Wallner2019). HPEPDOCK relies on MODPEP to generate an ensemble of unbound peptide conformations. An advantage of HADDOCK over HPEPDOCK is its ability to handle ambiguous data for the binding site (Williamson, Reference Williamson2013; Deplazes et al., Reference Deplazes, Davies, Bonvin, King and Mark2016). Finally, Rosetta FlexPepDock ab initio can also produce a diverse set of peptide conformations for binding, but this comes at a higher computationally expense (Raveh et al., Reference Raveh, London and Schueler-Furman2010, Reference Raveh, London, Zimmerman and Schueler-Furman2011).

The second group of local docking methods are derived from small molecule docking programmes (such as AutoDock Vina (Rentzsch and Renard, Reference Rentzsch and Renard2015), GOLD (Verdonk et al., Reference Verdonk, Cole, Hartshorn, Murray and Taylor2003) or Surflex-Dock (Spitzer and Jain, Reference Spitzer and Jain2012) to name a few). Although no initial conformation of the peptide is required, the accuracy rapidly decays when sampling beyond 10 flexible bonds, limiting peptide size (Ciemny et al., Reference Ciemny, Kurcinski, Kamel, Kolinski, Alam, Schueler-Furman and Kmiecik2018). In this approach, the peptide is placed in the binding site and peptide conformations are sampled using either Monte Carlo moves (AutoDock Vina and GOLD) or rotamer libraries (Surflex-Dock). AutoDock is most reliable with peptide lengths between 2 and 4 amino acids. Several methods have been developed to tackle longer peptides through incremental docking approaches (e.g. DINC 2.0 (Antunes et al., Reference Antunes, Moll, Devaurs, Jackson, Lizée and Kavraki2017), DLPepDock (Sun et al., Reference Sun, Fu, Zhao, Fan and Zhong2021)) (Antunes et al., Reference Antunes, Moll, Devaurs, Jackson, Lizée and Kavraki2017). The incremental pipeline in DINC 2.0 has several stages: 1) dock a small fragment (preferably 6 rotatable bonds, roughly 2 amino acids) with AutoDock 4; 2) increase the peptide size by adding 3 more rotatable bonds and freezing 3 of the 6 previous rotatable bonds and 3) dock the peptide again (Antunes et al., Reference Antunes, Moll, Devaurs, Jackson, Lizée and Kavraki2017). Using this approach DINC 2.0 has been successful with up to 25 flexible bonds. The selection of the initial fragment is done based on heuristics, while the extension of the fragment follows the potential to maximise H-bonding with the receptor. The benchmark test included a custom dataset of 73 protein peptide complexes with multiple successes, including the docking of a B2 chicken MHC class I receptor and an 8-mer chicken peptide (1.61 Å RMSD from the native structure). The Glide SP-PEP method also uses fragment-based docking with Iterative Residue Docking and Linking to dock peptides smaller than 8 amino acids (Diharce et al., Reference Diharce, Cueto, Beltramo, Aucagne and Bonnet2019). It uses Glide’s SP-PEP module to dock each residue iteratively to the binding site and then uses the covalent module to create bonds between them. The success rate for this method was high in a custom-made benchmark set with 10 out of 11 successful docking examples.

The third group of local docking methods can be termed as refinement methods (DynaDock (Antes, Reference Antes 2010), PepCrawler (Donsky and Wolfson, Reference Donsky and Wolfson2011), Rosetta FlexPepDock (Raveh et al., Reference Raveh, London and Schueler-Furman2010)) or peptide inhibitor design methods (PepCrawler) rather than strictly docking tools. These methods need input structures of either coarse peptide–protein complexes (for refinement) or protein–protein complexes (for inhibitor design) (Ciemny et al., Reference Ciemny, Kurcinski, Kamel, Kolinski, Alam, Schueler-Furman and Kmiecik2018). DynaDock, in the first step, generates broad sampling of the peptide conformation at the binding site by performing random rotation of backbone torsion and sidechain. In the next step, it uses an MD (OPMD) based refinement of the bound modes which allow the full flexibility to the receptor (Antes, Reference Antes2010). Rosetta FlexPepDock uses Monte Carlo moves to sample diverse peptide conformations with full receptor flexibility with on-the-fly energy minimisation (Raveh et al., Reference Raveh, London and Schueler-Furman2010). Unlike these two methods, PepCrawler can be used in two ways – refinement and inhibitor peptide design. For refinement, it uses an initial protein peptide complex structure as input and samples a diverse range of peptide conformations with a Rapidly exploring Random Trees (RRT) based algorithm. To design inhibitor peptides, it uses provided protein–protein complexes to generate the peptide fragment with lowest binding energy in the first step, followed by the RRT algorithm as refinement to dock the diverse peptide conformation. PepCrawler allows peptide and protein sidechains flexibility and only the peptide backbone flexibility (Donsky and Wolfson, Reference Donsky and Wolfson2011). This group of methods produce best results for short peptides (< 15 amino acids) and when the initial conformation of the peptide is below 5 Å RMSD from the native protein peptide complex (Ciemny et al., Reference Ciemny, Kurcinski, Kamel, Kolinski, Alam, Schueler-Furman and Kmiecik2018).

Some of the popular local docking tools are summarised in Table 3, together with suggested applications where each method is most successful. The first two groups of docking methods can be used when we only have binding site information and do not have any structural information of the complex. They are often used to generate initial models which can be later refined with other methods. Tools such as PeptiMap (Lavi et al., Reference Lavi, Ngan, Movshovitz-Attias, Bohnuud, Yueh, Beglov, Schueler-Furman and Kozakov2013), PepSite (Trabuco et al., Reference Trabuco, Lise, Petsalaki and Russell2012), PEP-SiteFinder (Saladin et al., Reference Saladin, Rey, Thévenet, Zacharias, Moroy and Tufféry2014), SPRINT-str (Taherzadeh et al., Reference Taherzadeh, Zhou, Liew and Yang2017), ANCHORSmap (Ben-Shimon and Eisenstein, Reference Ben-Shimon and Eisenstein2010) or InterPep (Johansson-Åkhe et al., Reference Johansson-Åkhe, Mirabello and Wallner2019) can predict the binding site, when no other structural information is available. The most recent method, InterPep uses template-based knowledge and a machine learning-based model to predict the binding site, outperforming most the other existing tools (Johansson-Åkhe et al., Reference Johansson-Åkhe, Mirabello and Wallner2019). Experiments such as Chemical Shift Perturbation, alanine scan mutagenesis or ligand foot printing mass spectrometry provide information about the binding site and can be used alternatively to binding site predictors.

Table 3. Summary of highlighted ‘local docking’ tools. Here, acronyms are used as follows: Pstr, protein structure; pseq, peptide sequence; pconf, initial peptide conformation; BB, backbone; SC, sidechain

Global Docking

This class of template-free docking programmes becomes specially useful when there is no information about the binding site (see Fig.1). This class of methods is the most general as it requires the least amount of information provided by the user. However, the additional computational effort required to simultaneously sample the binding site as well as peptide conformations limits the success rate when compared to the previous classes of methods. Many of these approaches use a two-step procedure composed of a fast rigid docking stage to identify the bound state followed by a refinement strategy. Thus, the local docking methods described above can be used as part of the refining strategy.

There are several strategies to generate initial peptide conformations for the rigid docking stage: 1) methods such as MDockPeP (Yan et al., Reference Yan, Xu and Zou2016), MDockPeP2 (Xu and Zou, Reference Xu and Zou2022), Cluspro PeptiDock (Porter et al., Reference Porter, Xia, Beglov, Bohnuud, Alam, Schueler-Furman and Kozakov2017), use a MODELLER (Webb and Sali, Reference Webb and Sali2014) based algorithm and PIPER-FlexPepDock (PFPD) (Alam et al., Reference Alam, Goldstein, Xia, Porter, Kozakov and Schueler-Furman2017) uses the Rosetta fragment picker to extract a fragment from an interacting partner of a protein–protein complex with similar sequence. MDockPep2 additionally considers the physiochemical environment similarity of the binding interface along with sequence similarity in the fragment picking stage; 2) pepATTRACT (Schindler et al., Reference Schindler, de Vries and Zacharias2015; Vries et al., Reference Vries, Rey, Schindler, Zacharias and Tuffery2017) threads through three major secondary conformations (i.e. alpha, beta and coil) using PeptideBuilder (Tien et al., Reference Tien, Sydykova, Meyer and Wilke2013); and 3) HPEPDOCK global uses MODPEP to generate an ensemble of peptide conformations (P. Zhou et al., Reference Zhou, Jin, Li and Huang2018). Once the peptide conformation is identified, each method relies on their own rigid docking strategies. Cluspro PeptiDock and PFPD use a PIPER-based protocol (Kozakov et al., Reference Kozakov, Brenke, Comeau and Vajda2006); MDockPeP uses a modified version of AutoDock vina whereas MDockPeP2 uses ZDock (protein–protein docking tool) to carry out the rigid docking step (Xu and Zou, Reference Xu and Zou2022; Yan et al., Reference Yan, Xu and Zou2016); pepATTRACT uses ATTRACT to carry out rigid body docking of the peptide with the ATTRAC coarse-grained representation of the protein and peptide and HPEPDOCK uses a modified version of MDock making it suitable for protein–peptide systems to perform rigid body docking. At the end of the rigid docking step, these methods have their own ways to include flexibility in the system and refine the docked structures. These strategies include using local docking methods, MD or MC simulations or other energy minimisers. For instance, PFPD uses Rosetta FlexPepDock; pepATTRACT uses iATTRACT and AMBER MD simulation for refinement and HPEPDOCK uses a SIMPLEX energy minimiser as the fully flexible refinement step (Schindler et al., Reference Schindler, de Vries and Zacharias2015; Alam et al., Reference Alam, Goldstein, Xia, Porter, Kozakov and Schueler-Furman2017; P. Zhou et al., Reference Zhou, Jin, Li and Huang2018).

Methods like CABS-dock (Kurcinski et al., Reference Kurcinski, Jamroz, Blaszczyk, Kolinski and Kmiecik2015, Reference Kurcinski, Ciemny, Oleniecki, Kuriata, Badaczewska-Dawid, Kolinski and Kmiecik2019), AnchorDock (Ben-Shimon and Niv, Reference Ben-Shimon and Niv2015) and AutoDock CrankPep (ADCP) (Zhang and Sanner, Reference Zhang and Sanner2019) allow flexibility to the peptide during the whole docking process. CABS dock generates peptide conformations in explicit solvent in the presence of the interacting partner, allowing the peptide to adopt its bound conformation. Thus, this allows full flexibility on both the receptor and peptide side. It uses a coarse-grained representation of each amino acid where backbone and sidechains are represented by two pseudo atoms each for computationally efficiency (Kurcinski et al., Reference Kurcinski, Jamroz, Blaszczyk, Kolinski and Kmiecik2015, Reference Kurcinski, Ciemny, Oleniecki, Kuriata, Badaczewska-Dawid, Kolinski and Kmiecik2019). The limitation in this case is the need for the peptide’s secondary structure, which is not available in most cases. PSIPRED (McGuffin et al., Reference McGuffin, Bryson and Jones2000) is used when the secondary structure is not known – even though it is not ideal for predicting the secondary structure of peptides, which are typically intrinsically disordered in their free form (Yan et al., Reference Yan, Xu and Zou2016). Unlike other global docking tools, AnchorDock uses the prediction tool ANCHORSmap to identify the anchoring spot and then performs an anchor-guided MD simulation (Ben-Shimon and Niv, Reference Ben-Shimon and Niv2015). This strategy combines the speed-up of the restraints with full flexibility of the peptide/protein system. A recent method, ADCP, uses Monte Carlo moves to sample peptide conformations under the influence of the potential landscape generated by the receptor which helps it to find correct fold upon binding making it highly successful allowing fully flexible docking (Zhang and Sanner, Reference Zhang and Sanner2019).

Methods which use molecular simulations (extensive MD or MC) either in the docking stage or in refinement stage generally have higher accuracy than the other methods (Agrawal et al., Reference Agrawal, Singh, Srivastava, Singh, Kishore and Raghava2019; J. Wang et al., Reference Wang, Alekseenko, Kozakov and Miao2019; Weng et al., Reference Weng, Gao, Wang, Wang, Hu, Yao, Cao and Hou2020). For example, despite the global nature of pepATTRACT, it can be as successful as some of the local docking approaches, with a local version of the method (pepATTRACT-local) having a higher success rate (Schindler et al., Reference Schindler, de Vries and Zacharias2015). PeptiDock+ Gaussian accelerated molecular dynamics (GaMD) which refines Cluspro PeptiDock results with Gaussian accelerated MD (discussed in the MD section), performs significantly better than the traditional Cluspro PeptiDock (J. Wang et al., Reference Wang, Alekseenko, Kozakov and Miao2019). AnchorDock correctly predicted 10 of 13 complexes (RMSD <2.2 Å) in a custom dataset (Ben-Shimon and Niv, Reference Ben-Shimon and Niv2015). ADCP has shown better performance than most of the other existing docking tool for peptide ranging 16–20 residues with an 87% success rate (considering 10 models) when tested on LEADS-PEP dataset (Zhang and Sanner, Reference Zhang and Sanner2019). Finally, PFPD, considered one of the state-of-the-art methods, can produce near native complex structures in 70% for bound test sets and 40% for unbound test sets (Alam et al., Reference Alam, Goldstein, Xia, Porter, Kozakov and Schueler-Furman2017). However, due to the nature of the simulation, these methods are significantly slower compared to others. The computational resource requirements increase with longer peptides while maintaining the accuracy (e.g. simulations running for hours on GPUs or longer in CPUs) (Ciemny et al., Reference Ciemny, Kurcinski, Kamel, Kolinski, Alam, Schueler-Furman and Kmiecik2018). These expensive simulations are needed to achieve the higher accuracy as shown by the pepATTRACT web server which removes the refinement step (Vries et al., Reference Vries, Rey, Schindler, Zacharias and Tuffery2017). When using the PeptiDB benchmark set, the webserver predicts 14 out of 80 complexes correctly (within 2 Å RMSD of the experimental structure), whereas the full pepATTRACT version correctly predicts 38 complexes (Vries et al., Reference Vries, Rey, Schindler, Zacharias and Tuffery2017).

Recently, the Furman lab introduced patchMAN, a motif search method (using the MASTER algorithm) combined with Rosetta FlexPepDock refinement, which outperforms other methods (Khramushin et al., Reference Khramushin, Ben-Aharon and Schueler-Furman2022). At the first step, it searches for the receptor surface motifs in a non-redundant protein database followed the finding peptide templates that interact with these motifs and the target peptide sequence is threaded through the models. In the end, Rosetta FlexPepDock is used to refine all the models. This method allows fully flexibility of the binding site of the receptor as well as the peptide. When tested on the custom-made PFPD database, it outperforms PFPD and even recent machine learning-based revolutionary AlphaFold (Jumper et al., Reference Jumper, Evans, Pritzel, Green, Figurnov, Ronneberger, Tunyasuvunakool, Bates, Žídek, Potapenko, Bridgland, Meyer, Kohl, Ballard, Cowie, Romera-Paredes, Nikolov, Jain, Adler and Hassabis2021), considering the success criteria as a 2 Å RMSD cutoff from the native. However, on a different dataset (LNR) its performance is comparable to AlphaFold (Khramushin et al., Reference Khramushin, Ben-Aharon and Schueler-Furman2022). Table 4 summarises the global docking tools listing their features and suitable application cases.

Table 4. Summary of highlighted ‘global docking’ tools. Here, acronyms are used as follows: Pstr, protein structure; pseq, peptide sequence; BB, backbone; SC, sidechain

Scoring

At the sampling step, docking methods obtain an ensemble of docked poses – some of them are native-like, while some are far from native. The state of the art in peptide docking is reliable at sampling the correct binding site. For example, when we only consider sampling efficiency, MDockPeP has success rate of 95% when starting from bound conformations and 93% when starting with challenging unbound structures (Yan et al., Reference Yan, Xu and Zou2016). The recent method patchMAN can sample within 5 Å RMSD from the native complex in 100% cases (Khramushin et al., Reference Khramushin, Ben-Aharon and Schueler-Furman2022). This implies that currently, the limitation and overall successes of the docking tools can be attributed to the scoring stage majorly. Thus, the next crucial step is to find the best docked model, representing the native complex, in the ensembles of the docked poses (Ciemny et al., Reference Ciemny, Kurcinski, Kamel, Kolinski, Alam, Schueler-Furman and Kmiecik2018; Weng et al., Reference Weng, Wang, Chen, Sun, Wang and Hou2019) (see Fig. 1). These methods can be classified in several major groups such as using a knowledge-based scoring function, energy-based method, clustering-based method and integrative or combinational approach. One important feature of a good scoring method is that it should consider entropic contribution due to conformational change as well as the interaction energy.

There are a series of scoring function used successfully in small molecule and protein–protein docking field. End point methods like MM/PBSA and MM/GBSA, mentioned in the MD section, are widely used for small molecule binding-free affinity calculation and scoring (Hou et al., Reference Hou, Wang, Li and Wang2011; Pu et al., Reference Pu, Yan, Shi and Li2017; E. Wang et al., Reference Wang, Sun, Wang, Wang, Liu, Zhang and Hou2019; Weng et al., Reference Weng, Wang, Chen, Sun, Wang and Hou2019). When applied to peptide–protein systems, if these methods are used with appropriate parameter, they outperform pepATTRACT (which uses ATTRACT scoring function) and produce similar quality as HPEPDOCK-local (which uses an iterative knowledge-based scoring function coming from protein–protein docking tool MDock) (Weng et al., Reference Weng, Wang, Chen, Sun, Wang and Hou2019). However, these methods do not consider entropic contributions due to peptide conformational changes, limiting their success to binding processes without significant changes in the peptide conformation. Ideally, these methods should be modified or combined with others for generalised use (Spiliotopoulos et al., Reference Spiliotopoulos, Kastritis, Melquiond, Bonvin, Musco, Rocchia and Spitaleri2016; Tao et al., Reference Tao, Zhang and Huang2020). As an improvement, BiPPred and HADDOCK use a dampened version of MMPBSA named dMMPBSA algorithm as a scoring function to calculate the free energy and rank docked poses (Spiliotopoulos et al., Reference Spiliotopoulos, Kastritis, Melquiond, Bonvin, Musco, Rocchia and Spitaleri2016). In this approach, they reduce the Coulombic interaction and polar solvation term by factor of 5 to compensate the overestimation of free energy due to the omission of entropy. Another recent approach by H. Tao et al. combines MM-GBSA scoring function with knowledge-based scoring function ITScorePP to consider the conformational entropy part (Tao et al., Reference Tao, Zhang and Huang2020). ITScorePP is derived from atomic distance-based energies parametrised iteratively using statistical mechanics. Their work has shown rescoring the pose clustering with this combined scoring function makes the results significantly better when compared against pepATTRACT, CABS-dock and HPEPDOCK result with LEADS-PEP dataset.

A group of methods use clustering algorithms on the ensemble of docked poses (or filtered docked poses) based on structural RMSD – but they have their own way to use clustering for selecting structure. Cluspro PeptiDock assigns highest score to the most populated cluster’s medoid, whereas CABS-dock selects the consensus medoid obtained from different clustering protocols as the best model (Kurcinski et al., Reference Kurcinski, Jamroz, Blaszczyk, Kolinski and Kmiecik2015, Reference Kurcinski, Ciemny, Oleniecki, Kuriata, Badaczewska-Dawid, Kolinski and Kmiecik2019; Porter et al., Reference Porter, Xia, Beglov, Bohnuud, Alam, Schueler-Furman and Kozakov2017). Rosetta FlexPepDock and PFPD perform clustering and score the top clusters with a modified Rosetta ab-initio energy function (Alam et al., Reference Alam, Goldstein, Xia, Porter, Kozakov and Schueler-Furman2017; Raveh et al., Reference Raveh, London and Schueler-Furman2010). The modified version of the Rosetta ab-initio energy function has been shown be successful as it combines standard all atom Rosetta energy with internal peptide energy and interaction energy. AnchorDock also uses clustering algorithm on all snapshots from molecular simulation trajectories and scores the clusters based on the average potential energies of the best 15 models (in terms of binding energy) in each cluster (Ben-Shimon and Niv, Reference Ben-Shimon and Niv2015). pepATTRACT’s ATTRACT scoring function is based on modified Lenard–Jones function to select 1000 models and those are further refined by AMBER followed by performing clustering on simulated trajectory. The clusters are ranked based on the average ATTRACT energy of four lowest energy models (Schindler et al., Reference Schindler, de Vries and Zacharias2015).

Integrative scoring methods combine external information like agreement with co-evolutionary data or mutagenesis data with energy-based or clustering-based scoring which have performed very well in the recent CAPRI competitions (Yu et al., Reference Yu, Andreani, Ochsenbein and Guerois2017; Lensink et al., Reference Lensink, Nadzirin, Velankar and Wodak2020). Machine learning is recently becoming one of the methods of choice to derive scoring functions. InterPepRank (Johansson-Åkhe et al., Reference Johansson-Åkhe, Mirabello and Wallner2021) is such an example which uses deep graph-based neural network mapping the protein peptide complex, and DockQ (Johansson-Åkhe et al., Reference Johansson-Åkhe, Mirabello and Wallner2020) uses a Random Forest model to score and rank the docked poses. Recently, the field of docking has started using combined multiple scoring method together to compensate each other limitation. For example, when InterPepRank is combined with PFPD pipeline as rescoring, success rate increases significantly (40% for the high-quality prediction). Simultaneously, it filters out some of the non-native dock poses using an InterPepRank score cutoff, reducing the number of hits for refinement steps which increases the computation efficiency (Johansson-Åkhe et al., Reference Johansson-Åkhe, Mirabello and Wallner2021).

Summary

The wealth and diversity of available docking programmes and servers for peptide–protein complexes pose a barrier of entry to newcomers to the field or would-be users. It is difficult to answer a question like, ‘which is the best?’ as it will depend on the system and available information. Each method has advantages and disadvantages such as the ability to work on peptides of different size, or the ability to explore large conformational changes upon binding. In general, local docking tools outperform the global tools, but the latter does not need any information about the binding site. Most of the global docking tools can be used as local docking tools when the binding site information is provided leading to higher success rates (Schindler et al., Reference Schindler, de Vries and Zacharias2015; P. Zhou et al., Reference Zhou, Jin, Li and Huang2018). A benchmark study by Weng et al. showed that performance of local docking methods that are based on AutoDock drops significantly depending on erroneous inputs of the size of binding sites, especially for peptides longer than 10 residues (Weng et al., Reference Weng, Gao, Wang, Wang, Hu, Yao, Cao and Hou2020). In summary, for a user when the target receptor can find a template with enough homology (TM score > 0.7), template-based methods generally offer the highest success rates. Alternatively, when the TM score is lower than 0.7, template-free methods should be used. For peptide smaller than 5 residues, the docking tools coming from small molecule docking work the best (Ciemny et al., Reference Ciemny, Kurcinski, Kamel, Kolinski, Alam, Schueler-Furman and Kmiecik2018). For the longer peptides, computationally extensive molecular simulation-based tools such as PFPD, or ADCP and patchMAN become more relevant. Running short MD simulation-based refinements have shown higher successes; however, computational expenses also rise. In future, coarse graining on the non-interacting residues might be a way to reduce the computer time required for these molecular simulation-based tools. Moreover, the docking field has started to apply combined approaches like combined predictions from InterPep2-refined and PFPD which has shown better performance than the individual one (Johansson-Åkhe et al., Reference Johansson-Åkhe, Mirabello and Wallner2020).

MD section

The nature of MD simulations allows, in principle, to obtain thermodynamics, kinetics and mechanistic understanding of the peptide–protein binding and unbinding process. As in the application to other biological problems, MD is limited by the accuracy of the physics model used and the ability to sample the complex energy landscape, which typically requires computational resources beyond our current capacity.

Modelling PPIs with modern force fields

The interactions of molecules can be theoretically determined using quantum mechanics but remain unaffordable in practical terms for large biomolecules. In practice, an empirical force field is used to model such interactions along with Newton’s equations of motion to simulate the dynamics. MD simulations have been shown to accurately predict the binding potency of diverse small molecule binders (L. Wang et al., Reference Wang, Wu, Deng, Kim, Pierce, Krilov, Lupyan, Robinson, Dahlgren, Greenwood, Romero, Masse, Knight, Steinbrecher, Beuming, Damm, Harder, Sherman, Brewer and Abel2015). However, there are key differences between protein binding with small molecule and peptide that makes the latter more computationally challenging. First, while small molecules have a few hotspot interactions that dominate recognition, peptide recognition can stem from many weak interactions. Second, the structure of peptides can be highly flexible, requiring finely tuned parameters that are sensitive to the change in conformational preferences between the free conformation of peptide and its bound form. Thus, peptides have been specially affected by known biases in secondary structure preferences present in some force fields (Perez et al., Reference Perez, MacCallum, Brini, Simmerling and Dill2015; Robinson et al., Reference Robinson, Monroe and Shell2016).

Early force field development was an art, guided by great scientific insights (e.g. some parameters originated from ‘guesses’ that have remained as part of the force field for decades), and carried out by a few expert groups. One of the challenges is the unexpected consequence of parameter changes as modifying one parameter might affect the accuracy of another parameter that was not adjusted due to the coupling between different terms. Even long MD simulations on a set of systems with times series, distributions, behaviour and stability analysis might not be enough to capture all possible issues. Some issues might arise in timescales beyond those studied during development or in systems not included in the benchmark test. While some groups might have made ad hoc modifications to deal with problems in specific systems, these modifications were not often properly benchmarked and rarely made it back into the main force field branch. Such trend has dramatically changed in recent years as measured by the number of force fields as well as involvement of many groups representing these improvements. Despite this, a general problem is the lack of a golden standard benchmark set for parameter development (e.g. for proteins, nucleic acids, lipids). The availability of open sharing resources would make the preparation and dissemination of such a benchmark test an easy endeavour. Community efforts such as the OpenFF are already on their way for continuous optimisation of small molecule force fields (Qiu et al., Reference Qiu, Smith, Boothroyd, Jang, Hahn, Wagner, Bannan, Gokey, Lim, Stern, Rizzi, Tjanaka, Tresadern, Lucas, Shirts, Gilson, Chodera, Bayly, Mobley and Wang2021).

Here, we focus on recent efforts to improve the description of peptides and intrinsically disordered proteins (IDPs) while maintaining the stable properties for folded proteins (Fig. 2), more extensive surveys can be found elsewhere (Rahman et al., Reference Rahman, Rehman, Liu and Chen2020; Mu et al., Reference Mu, Liu, Zhang, Luo and Chen2021). Force field development follows three main strategies: 1) modifying the dihedral angle parameters against experiments and/or quantum mechanical calculation; 2) adjusting non-bonded protein–water interactions and 3) balancing dihedrals with grid-based energy correction maps (CMAP).

Fig. 2. Overview of protein force field development after 2000. Each protein force field is classified by the year of publication, target systems for optimisation (folded, disordered or both) and additional underscores indicating whether it is a modification version of previous force fields using strategies including dihedral parameter adjustment (blue), CMAP correction (red) or parameter modification for protein–water interaction (gold).

Most effort is concentrated on directly refining the global backbone dihedral parameters including AMBER ff99SB* and ff03* (Best and Hummer, Reference Best and Hummer2009), ff14SB (Maier et al., Reference Maier, Martinez, Kasavajhala, Wickstrom, Hauser and Simmerling2015), ff99SB-ILDN (Lindorff-Larsen et al., Reference Lindorff-Larsen, Piana, Palmo, Maragakis, Klepeis, Dror and Shaw2010), CHARMM22* (Piana et al., Reference Piana, Lindorff-Larsen and Shaw2011), OPLS-AA/M (Robertson et al., Reference Robertson, Tirado-Rives and Jorgensen2015) and OPLS3 (Harder et al., Reference Harder, Damm, Maple, Wu, Reboul, Xiang, Wang, Lupyan, Dahlgren, Knight, Kaus, Cerutti, Krilov, Jorgensen, Abel and Friesner2016), while further improvements also involve refined side-chain dihedral terms. The residue specific force field (RSFF) approach involves residue-specific dihedral parameter refitting to achieve better agreement with experimental data. RSFF1 force field (Jiang et al., Reference Jiang, Zhou and Wu2014) is developed based on OPLS/AA (Kaminski et al., Reference Kaminski, Friesner, Tirado-Rives and Jorgensen2001), while RSFF2 (C.-Y. Zhou et al., Reference Zhou, Jiang and Wu2015) is based on the ff99SB force field (Hornak et al., Reference Hornak, Abel, Okur, Strockbine, Roitberg and Simmerling2006). Protein–water interactions are actively involved in peptide–protein binding process, where the ensemble of peptide conformations in its free form is highly sensitive to the solvation model used (e.g., overall compactness as defined by the radius of gyration). Not surprisingly, the AMBER ff03ws combined with a refined TIP4P/2005 solvation model (Best et al., Reference Best, Zheng and Mittal2014) and the CHARMM36m with an optimised TIP3P model (Huang et al., Reference Huang, Rauscher, Nawrocki, Ran, Feig, Groot, Grubmüller and MacKerell2017) have shown improvement in modelling IDPs ensembles. While some combinations of solvent force fields are designed to work with specific protein force fields, others such as the TIP4P-D (Piana et al., Reference Piana, Donchev, Robustelli and Shaw2015) aim to improve general deficiencies such as underestimation of London interactions by developing larger water dispersion coefficient, resulting in improved agreement with experimental observables over a broad range of force fields. The CMAP strategy was first introduced as a grid-based correction to the CHARMM22 force field (MacKerell et al., Reference MacKerell, Bashford, Bellott, Dunbrack, Evanseck, Field, Fischer, Gao, Guo, Ha, Joseph-McCarthy, Kuchnir, Kuczera, Lau, Mattos, Michnick, Ngo, Nguyen, Prodhom and Karplus1998) to account for coupling of ψ/φ torsion angles (CHARMM22/CMAP) (Mackerell et al., Reference Mackerell, Feig and Brooks2004). The latest iteration (CHARMM36) (Huang and MacKerell, Reference Huang and MacKerell2013) has been the starting point for new CMAP potentials that better balance between folded proteins and IDPs (CHARMM36m). The CMAP approximation has been adopted in other families of force fields such as AMBER (Perez et al., Reference Perez, MacCallum, Brini, Simmerling and Dill2015; Tian et al., Reference Tian, Kasavajhala, Belfon, Raguette, Huang, Migues, Bickel, Wang, Pincay, Wu and Simmerling2019) or OPLS (Yang et al., Reference Yang, Liu, Zhang, Lu and Chen2019). Over recent years, the CMAP strategy has led amino acid specific potentials, rather than using a few transferable potentials for all (e.g., non-glycine or proline). Thus, ff99IDPs (W. Wang et al., Reference Wang, Ye, Jiang, Luo and Chen2014) and ff14IDPs (Song et al., Reference Song, Luo and Chen2017) develop specific CMAP parameters for eight disordered-promoting amino acids. More recently, ff14IDPSFF (Song et al., Reference Song, Luo and Chen2017) and CHARMM36IDPSFF (Liu et al., Reference Liu, Song, Lu, Luo and Chen2018) add a different CMAP correction to each of the 20 amino acids.

Although such optimisations can better describe the more extended conformation of disordered peptides, some modified force fields generate unstable structures for folded proteins. Ideally, a force field that allows accurate descriptions of both folded and unfolded ensembles is preferable because it would better simulate transitions of peptides between disordered state to ordered state. The a99SB-disp force field developed by Robustelli et al. modified the ff99SB-ILDN parameters and adjusted TIP4P-D water model against experimental measurements, and the resulting force field has shown great improvement for modelling disordered ensembles and still maintains the accuracy for folded proteins (Robustelli et al., Reference Robustelli, Piana and Shaw2018). Another environment specific force field (ESFF1) was recently developed based on CMAP corrections of 71 different sequence environments (Song et al., Reference Song, Liu, Luo and Chen2020). These force fields have demonstrated an improved balance between modelling IDPs and folded proteins. With the number of choices available, it might be daunting to choose the right force field for your system. Many MD packages such as AMBER (Case et al., Reference Case, Cheatham, Darden, Gohlke, Luo, Merz, Onufriev, Simmerling, Wang and Woods2005), CHARMM (Brooks et al., Reference Brooks, Brooks, Mackerell, Nilsson, Petrella, Roux, Won, Archontis, Bartels, Boresch, Caflisch, Caves, Cui, Dinner, Feig, Fischer, Gao, Hodoscek, Im and Karplus2009), NAMD (Phillips et al., Reference Phillips, Braun, Wang, Gumbart, Tajkhorshid, Villa, Chipot, Skeel, Kalé and Schulten2005), Gromacs (Abraham et al., Reference Abraham, Murtola, Schulz, Páll, Smith, Hess and Lindahl2015), Tinker (Lagardère et al., Reference Lagardère, Jolly, Lipparini, Aviat, Stamm, Jing, Harger, Torabifard, Cisneros, Schnieders, Gresh, Maday, Ren, Ponder and Piquemal2017) or OpenMM (Eastman et al., Reference Eastman, Swails, Chodera, McGibbon, Zhao, Beauchamp, Wang, Simmonett, Harrigan, Stern, Wiewiora, Brooks and Pande2017) provide user a wide selection of force fields, even originating from different force field families (e.g., AMBER or CHARMM family of force fields). It is important that for whichever force field is selected the compatible solvent model and ion parameters tested by benchmark studies should also be used correspondingly.

Characterising peptide binding poses and affinities by MD simulations

MD is not high-throughput enough in its own to routinely determine the structures of peptide–protein complexes when the bound state is unknown. The three major applications for MD are:

1) Refinement of docking results; 2) estimating binding affinities based on known bound complexes and 3) use of integrative modelling strategies to determine structures of the complexes.

Docking approaches described in the previous section favour speed at the expense of accuracy, while MD approaches are accurate, but inefficient at identifying where and how a peptide binds from conventional MD simulations. Thus, short MD simulations are often the last step of docking calculations to eliminate steric overlap, account for local conformational changes and identify structures based on physico-chemical principles rather than relying on a scoring function. While this application is standard, it does not leverage the full potential of MD such as calculating binding affinities. Thus, recent integration of docking and MD-based techniques such as the combination of ClusPro PeptiDock with GaMD goes beyond refining the structure to provide free energy profiles (J. Wang et al., Reference Wang, Alekseenko, Kozakov and Miao2019). In this work, the authors benchmarked their method on three distinct model peptides achieving 0.6–2.7 Å improvement in peptide backbone structures. Moreover, the unbiased free energy profiles help identify key residues involved in significant conformational changes upon binding that can later be used for peptide sequence optimisation and design.

When the experimental structure of the complex is known, end-point methods based on MD are typically used to determine binding affinities. MM/GBSA and MM/PBSA are among the most popular methods in this category, introduced by the Kollman group two decades ago, the method is grounded on robust physico-chemical principles (E. Wang et al., Reference Wang, Sun, Wang, Wang, Liu, Zhang and Hou2019). The method has been well received by the community and still favoured over empirical and semi-empirical scoring functions designed for protein–peptide docking (see discussion in docking section). Despite its robust theoretical framework, its practical implementation results in approximations (such as flexibility, solvation and entropy) that limit the accuracy of the results. For example, prediction of peptide binding affinities for peptide–MHC complexes is highly desirable for vaccine design, but the flexible nature of the peptides make routine affinity prediction using bioinformatic pipelines insufficient. The inclusion of structural information is crucial to explore relevant molecular conformations of the peptide–protein complex, and therefore, key to understand its dynamic behaviour. Wan et al. combined the MM/PBSA and the conformational entropy method to compute peptide–MHC binding affinities from MD simulations where both the bound and unbound peptide were simulated (Wan et al., Reference Wan, Knapp, Wright, Deane and Coveney2015). The method achieves highly correlated binding affinity rankings with experimental estimates after normalising ΔGMM/PBSA with the hydrophobicity of peptides. Ochoa et al. generated conformations of the complexes from MD simulation, then using a scoring function to predict binding affinities in better agreement with experiments than either sequence-based predictions or single docking scoring methods (Ochoa et al., Reference Ochoa, Laio and Cossio2019). Pathway-based free energy calculation methods such as free energy perturbation (FEP) have achieved unprecedented accuracy in modelling protein binding with small molecule for a large set of ligands (L. Wang et al., Reference Wang, Wu, Deng, Kim, Pierce, Krilov, Lupyan, Robinson, Dahlgren, Greenwood, Romero, Masse, Knight, Steinbrecher, Beuming, Damm, Harder, Sherman, Brewer and Abel2015). However, directly transferring such approaches to estimate protein–peptide binding free energy is challenging due to the flexibility and size of peptides. Kilburg et al. introduced a single-decoupling alchemical method that successfully calculated the free energy for HIV1-IN binding with a series of cyclic peptides (Kilburg and Gallicchio, Reference Kilburg and Gallicchio2018). The calculation convergence is largely affected by the ladder parameters in Hamiltonian and temperature replica exchange; specifically, more dense parameters are required to increase the overlap between phase space of alchemical states in large ligand system. FEP-based simulations have also been applied to estimate the mutation effect in binding specificity change of PDZ-peptide system (Panel et al., Reference Panel, Villa, Fuentes and Simonson2018) and help select a potent blocker for Kv1.3 channel (Rashid et al., Reference Rashid, Heinzelmann, Huq, Tajhya, Chang, Chhabra, Pennington, Beeton, Norton and Kuyucak2013).

Integrative approaches combine computational modelling with experimental information to determine structures of peptide–protein complexes. Methods such as Rosetta and MODELLER are examples involving different types of modelling strategies. Other methods such as maximum entropy (Pitera and Chodera, Reference Pitera and Chodera2012) or Bayesian inference (MacCallum et al., Reference MacCallum, Perez and Dill2015) aim to identify distributions that agree with experimental data. Analysing such distributions yields the number of states that best represent the data. Our use of the modelling employing limited data approach for peptide binding has been successful for harnessing chemical shift perturbation NMR (Mondal et al., Reference Mondal, Swapna, Hao, Ma, Roth, Montelione and Perez2022)and ALA scan mutagenesis data in predicting conformations of the bound complex (Morrone et al., Reference Morrone, Perez, MacCallum and Dill2017b). Furthermore, its physico-chemical foundation allows the user to recover relative binding-free energies using a competitive binding protocol. These simulations sample peptides conformations in binding while allowing their full flexibility and accurately match experimental results for a series of peptides inhibiting the p53-MDM2 and MDMX interaction (Morrone et al., Reference Morrone, Perez, Deng, Ha, Holloway, Sawyer, Sherborne, Brown and Dill2017a,Reference Morrone, Perez, MacCallum and Dillb).

Unveiling peptide binding/unbinding kinetics through enhanced sampling

Accurate prediction of peptide–protein binding/unbinding kinetics from MD simulations requires extensive sampling of bound/unbound states, the transitions between them and possible intermediate states. However, the structural flexibility of many peptides challenges estimation of association and dissociation rates, represented by kon and koff, respectively, as complicated binding mechanism arises including folding upon binding of peptides to the receptor, and inherent structural heterogeneity arises from weak interactions in the binding interface. Recent simulation studies on PPIs employ various advanced sampling methods with critical thermodynamic and kinetic analysis. Markov state models (MSMs) have been widely applied to estimate kinetic quantities of biomolecular conformational dynamics from a set of short atomistic MD simulations (Chodera and Noé, Reference Chodera and Noé2014). Paul et al. used multi-ensemble Markov models, which combine conventional MD with Hamiltonian replica exchange enhanced sampling simulations, to characterise peptide–protein binding mechanism and kinetics beyond the seconds timescale of a nano-molar peptide inhibitor PMI to the MDM2 receptor (Paul et al., Reference Paul, Wehmeyer, Abualrous, Wu, Crabtree, Schöneberg, Clarke, Freund, Weikl and Noé2017). Zhou et al. studied the p53 binding with MDM2 by running near 1 ms unbiased simulations on a distributed computing platform (G. Zhou et al., Reference Zhou, Pantelopulos, Mukherjee and Voelz2017). Two key intermediate states were identified from a four-state kinetic model using MSM analysis and kon was predicted in good agreement with experimental estimation. Zwier et al. generated hundreds of continuous binding pathways from weighted ensemble simulations and obtained similar on-rate estimates (Zwier et al., Reference Zwier, Pratt, Adelman, Kaus, Zuckerman and Chong2016). In addition, they identified residue F19 from p53 might be a kinetically important residue for binding as the majority of conformations involve its partial or complete burial.

Metadynamics employs biasing potential as a function of collective variables by which the system is allowed to cross high-energy barriers that are conventionally difficult to sample (Bussi and Laio, Reference Bussi and Laio2020). Zou et al. investigated the folding and binding process of p53 to MDM2 using two metadynamics-based methods yielding a reasonable estimation for the on/off-rate constants and the binding-free energy profile (Zou et al., Reference Zou, Zhou, Wang, Kuang, Ågren, Wu and Tu2020). The anchor residues F19 and W23 of p53 were identified to follow the stepwise binding pattern. This finding helps explain certain mutants can be regulated by weak non-native interactions near bound state due to the disorder nature of p53. The consequence of secondary interactions on the binding mechanism was also addressed by extensive unbiased simulations combined with umbrella sampling to perform MSM analysis for a coronavirus-derived peptide, bound to a prevalent MHC receptor in humans (Abella et al., Reference Abella, Antunes, Jackson, Lizée, Clementi and Kavraki2020). The model reaffirms the major role of anchor positions in the peptide for establishing stable interactions and reveals the underestimated importance of a non-anchor position. The conclusion was confirmed by simulating the impact of specific peptide mutations and validated these predictions through competitive binding assays where stark differences in unbinding pathways were identified by comparing the MSM of the wild-type system with those of the D4A and D4P mutants.

Machine Learning

The role of machine learning in structural biology was greatly accelerated by the success of AlphaFold (AF) in the 13th installment of the Critical Assessment for Structure Prediction (CASP) event (Senior et al., Reference Senior, Evans, Jumper, Kirkpatrick, Sifre, Green, Qin, Žídek, Nelson, Bridgland, Penedones, Petersen, Simonyan, Crossan, Kohli, Jones, Silver, Kavukcuoglu and Hassabis2020). Two years later, after the field had replicated all the previous successes, a complete re-design of AF produced even higher accuracy structure predictions that surpassed any previous expectation (Baek et al., Reference Baek, DiMaio, Anishchenko, Dauparas, Ovchinnikov, Lee, Wang, Cong, Kinch, Schaeffer, Millán, Park, Adams, Glassman, DeGiovanni, Pereira, Rodrigues, Dijk, van, Ebrecht, Opperman, Sagmeister, Buhlheller, Pavkov-Keller, Rathinaswamy, Dalwadi, Yip, Burke, Garcia, Grishin, Adams, Read and Baker2021; Jumper et al., Reference Jumper, Evans, Pritzel, Green, Figurnov, Ronneberger, Tunyasuvunakool, Bates, Žídek, Potapenko, Bridgland, Meyer, Kohl, Ballard, Cowie, Romera-Paredes, Nikolov, Jain, Adler and Hassabis2021). The accuracy of such ML predictions is sometimes in better agreement with NMR data than the models generated by standard NMR pipelines (Tejero et al., Reference Tejero, Huang, Ramelot and Montelione2022). Not surprisingly, the field was soon ready to test the limits and possibilities of the AF approach. Early on, adding poly-glycine linker successfully tricked AF into predicting the structures of complexes where the linker remained unstructured. This strategy has produced a level of accuracy for peptide–protein complex structure prediction that surpasses state-of-the-art docking programmes in a recent benchmark test, especially for complexes with binding motifs. A retraining of AF for complexes was soon published online (AF-multimer), but the weights have not been as extensively refined as the ones for the original AF (Evans et al., Reference Evans, O’Neill, Pritzel, Antropova, Senior, Green, Žídek, Bates, Blackwell, Yim, Ronneberger, Bodenstein, Zielinski, Bridgland, Potapenko, Cowie, Tunyasuvunakool, Jain, Clancy and Hassabis2021).

The success of such approaches begs the question of why AF is performing well on complexes such as peptides. One observation points to the structural complementarity between peptides and proteins, where many peptides adopt well-defined secondary structures upon binding. Indeed, peptides that bind as coils are not predicted as well, although generally the binding site is still identified. The field of docking uses scoring functions to rank the different poses and compounds – a strategy that is very successful for small molecules, but which has not reached the same level of maturity for peptides (as described above in the docking section). AF’s pLDDT measure also lacks the possibility of ranking different peptides as different sets of peptides might be predicted with similar pLDDT scores despite very different binding affinities. Other ML approaches directly use structural ensembles and a measure of accuracy such as RMSD to assess the quality of the predicted structures (Townshend et al., Reference Townshend, Eismann, Watkins, Rangan, Karelina, Das and Dror2021). This begs the question of whether AF or other ML algorithms can learn something about the biophysical energy function that governs binding (or folding) and how it can be used towards predicting peptide–protein complex structures.

AlQuraishi’s group first addressed this question using a bespoke Hierarchical Statistical Modelling ML approach to learn the biophysical function that scores multiple peptides binding a receptor motif (Cunningham et al., Reference Cunningham, Koytiger, Sorger and AlQuraishi2020). Unfortunately, lack of data meant that this approach could only be used for eight protein families. Could AF capture such as biophysical function from its training? Recent work from Ovchinnikov’s group suggests that indeed AF has learnt such a function (Roney and Ovchinnikov, Reference Roney and Ovchinnikov2022). According to this study, MSAs serve the purpose of global sampling, focusing the search space in regions of interest, and the biophysical energy learned through the network is able to identify the best local structure. This is especially interesting for peptide–protein systems where the problem can be separated into two parts: (1) a template or MSA for the receptor and (2) a single sequence for the peptide. In this way, the learned function is responsible for finding where the peptide interacts, its conformation and any conformational changes required for the receptor. We put this notion to the test by using competitive binding in AF to determine which peptides had higher affinity to a series of receptors. Surprisingly, the method was very successful in ranking the strongest binders. As the differences in binding affinity become small, the method reflects this uncertainty, and the method is not suitable when both peptides are weak binders. A caveat of the above developments is the requirement to fit within the original hypothesis of peptide–protein complementarity – when the structure of the complex is not correctly predicted, then the competitive binding will also not work.

Turning this argument on its head, the Baker group uses the idealised version of proteins that neural networks learn to design new proteins. The process of deep neural network hallucination has already produced several structures that have been confirmed experimentally (Anishchenko et al., Reference Anishchenko, Pellock, Chidyausiku, Ramelot, Ovchinnikov, Hao, Bafna, Norn, Kang, Bera, DiMaio, Carter, Chow, Montelione and Baker2021). Adding constraints into the hallucination process can direct the design into areas of complementarity or desired functionality. As such, the complementary nature of peptides to receptor binding motifs can lead to the design of peptides or mini-proteins based on constrained hallucination to the known binding site (J. Wang et al., Reference Wang, Lisanza, Juergens, Tischer, Anishchenko, Baek, Watson, Chun, Milles, Dauparas, Expòsit, Yang, Saragovi, Ovchinnikov and Baker2021).

The advances in ML for structural prediction and accurate scoring give rise to the ability to query increasing large libraries of peptides (Chang and Perez, Reference Chang and Perez2022). Along these lines, PPI predictors are also starting to emerge to predict which peptides will interact with a certain protein and give insight into the peptide residues involved in the interaction (Casadio et al., Reference Casadio, Martelli and Savojardo2022; Lei et al., Reference Lei, Li, Liu, Wan, Tian, Li, Zhao and Zeng2021). Furthermore, ML is also offering ways to identify peptide sequences which are likely to have high biological activity against a particular pathology (Wu et al., Reference Wu, Gao, Zhang and Marinis2019) (e.g. anticancer (Chen et al., Reference Chen, Cheong and Siu2021) or antimicrobial (Dee, Reference Dee2022; E. Y. Lee et al., Reference Lee, Lee, Fulan, Ferguson and Wong2017; Plisson et al., Reference Plisson, Ramírez-Sánchez and Martínez-Hernández2020) peptides). Thus, we expect combining ML pipelines that act at the sequence level with those at the structural level will be able to create peptide libraries specific to a type of disease that can then be screened to predict structures and their relative binding affinities. This is a rapidly changing field, where ML is already having a big impact, and where many questions regarding the interpretability and applicability of the current technology need to be answered.

Conclusions

Our focus in this review has been to identify the different approaches that docking, molecular simulations and machine learning use to study peptide–protein systems. We expect new synergies between these three types of technologies will lead to more robust methodologies to capture peptide–protein systems. For example, docking can reliably identify the binding region and might provide good templates for ML to refine, to predict structures and screen peptides for binding-free energies. Meanwhile, the emphasis of new force fields in correctly describing intrinsically disordered peptides together with enhanced sampling can benefit from initial models to determine kinetic constants and binding affinities through an orthogonal approach (on a more limited set of systems to refine). A current limitation in machine learning is the dependence on natural amino acids. MD on the other hand has transferable potentials and can be used as an end stage in peptide optimisation for studying the effect of controlling flexibility (e.g., through chemical staples) or replacing some residues with non-natural amino acids such as peptoids.

Acknowledgement

The authors thank XSEDE resources.

Author contributions

Conceived the review: AP.; Performed research and analysed data: AM, LC.; Wrote manuscript: A.M.,L.C. and AP.

Conflict of interest

The authors declare no conflicts of interest.

Open Peer Review

To view the open peer review materials for this article, please visit http://doi.org/10.1017/qrd.2022.14.

Footnotes

A.M. and L.C. contributed equally to this work.

References

Abella, JR, Antunes, D, Jackson, K, Lizée, G, Clementi, C and Kavraki, LE (2020) Markov state modeling reveals alternative unbinding pathways for peptide–MHC complexes. Proceedings of the National Academy of Sciences 117(48), 3061030618. https://doi.org/10.1073/pnas.2007246117CrossRefGoogle ScholarPubMed
Abraham, MJ, Murtola, T, Schulz, R, Páll, S, Smith, JC, Hess, B and Lindahl, E (2015) GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1, 1925. https://doi.org/10.1016/j.softx.2015.06.001CrossRefGoogle Scholar
Agrawal, P, Singh, H, Srivastava, HK, Singh, S, Kishore, G and Raghava, GPS (2019) Benchmarking of different molecular docking methods for protein–peptide docking. BMC Bioinformatics 19(Suppl 13), 426. https://doi.org/10.1186/s12859-018-2449-yCrossRefGoogle ScholarPubMed
Alam, N, Goldstein, O, Xia, B, Porter, KA, Kozakov, D and Schueler-Furman, O (2017) High-resolution global peptide–protein docking using fragments-based PIPER-FlexPepDock. PLoS Computational Biology 13(12), e1005905. https://doi.org/10.1371/journal.pcbi.1005905CrossRefGoogle ScholarPubMed
Anishchenko, I, Pellock, SJ, Chidyausiku, TM, Ramelot, TA, Ovchinnikov, S, Hao, J, Bafna, K, Norn, C, Kang, A, Bera, A K, DiMaio, F, Carter, L, Chow, CM, Montelione, GT and Baker, D (2021) De novo protein design by deep network hallucination. Nature, 16. https://doi.org/10.1038/s41586-021-04184-wGoogle ScholarPubMed
Antes, I (2010) DynaDock: a new molecular dynamics-based algorithm for protein–peptide docking including receptor flexibility. Proteins: Structure, Function, and Bioinformatics 78(5), 10841104. https://doi.org/10.1002/prot.22629CrossRefGoogle ScholarPubMed
Antunes, DA, Moll, M, Devaurs, D, Jackson, KR, Lizée, G and Kavraki, LE (2017) DINC 2.0: a new protein–peptide docking webserver using an incremental approach. Cancer Research 77(21), e55e57. https://doi.org/10.1158/0008-5472.can-17-0511CrossRefGoogle ScholarPubMed
Apostolopoulos, V, Bojarska, J, Chai, T-T, Elnagdy, S, Kaczmarek, K, Matsoukas, J, New, R, Parang, K, Lopez, OP, Parhiz, H, Perera, CO, Pickholz, M, Remko, M, Saviano, M, Skwarczynski, M, Tang, Y, Wolf, WM, Yoshiya, T, Zabrocki, J, … Toth, I (2021) A global review on short peptides: Frontiers and perspectives. Molecules 26(2), 430. https://doi.org/10.3390/molecules26020430CrossRefGoogle Scholar
Arkin, MR, Tang, Y and Wells, JA (2014) Small-molecule inhibitors of protein-protein interactions: progressing toward the reality. Chemistry & Biology 21(9), 11021114. https://doi.org/10.1016/j.chembiol.2014.09.001CrossRefGoogle ScholarPubMed
Baek, M., DiMaio, F., Anishchenko, I., Dauparas, J., Ovchinnikov, S., Lee, G.R., Wang, J., Cong, Q., Kinch, L.N., Schaeffer, R.D., Millán, C., Park, H., Adams, C., Glassman, C.R., DeGiovanni, A., Pereira, J.H., Rodrigues, A.V., Dijk, A.A. van, Ebrecht, A.C., Opperman, D.J., Sagmeister, T., Buhlheller, C., Pavkov-Keller, T., Rathinaswamy, M.K., Dalwadi, U., Yip, C.K., Burke, J.E., Garcia, K.C., Grishin, N.V., Adams, P.D., Read, R.J. and Baker, D.(2021) Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557), pp. 871876. https://doi.org/10.1126/science.abj8754.CrossRefGoogle ScholarPubMed
Bechtler, C and Lamers, C (2021) Macrocyclization strategies for cyclic peptides and peptidomimetics. RSC Medicinal Chemistry 12(8), 13251351. https://doi.org/10.1039/d1md00083gCrossRefGoogle ScholarPubMed
Ben-Shimon, A and Eisenstein, M (2010) Computational mapping of anchoring spots on protein surfaces. Journal of Molecular Biology 402(1), 259277. https://doi.org/10.1016/j.jmb.2010.07.021CrossRefGoogle ScholarPubMed
Ben-Shimon, A and Niv, MY (2015) AnchorDock: blind and flexible anchor-driven peptide docking. Structure 23(5), 929940. https://doi.org/10.1016/j.str.2015.03.010CrossRefGoogle ScholarPubMed
Berman, HM, Battistuz, T, Bhat, TN, Bluhm, WF, Bourne, PE, Burkhardt, K, Feng, Z, Gilliland, GL, Iype, L, Jain, S, Fagan, P, Marvin, J, Padilla, D, Ravichandran, V, Schneider, B, Thanki, N, Weissig, H, Westbrook, J and Zardecki, C (2002) The protein data Bank. Acta Crystallographica Section D: Biological Crystallography 58(6), 899907. https://doi.org/10.1107/s0907444902003451CrossRefGoogle ScholarPubMed
Best, RB and Hummer, G (2009) Optimized molecular dynamics force fields applied to the helix−coil transition of polypeptides. The Journal of Physical Chemistry B 113(26), 90049015. https://doi.org/10.1021/jp901540tCrossRefGoogle Scholar
Best, RB, Zheng, W and Mittal, J (2014) Balanced protein–water interactions improve properties of disordered proteins and non-specific protein association. Journal of Chemical Theory and Computation 10(11), 51135124. https://doi.org/10.1021/ct500569bCrossRefGoogle ScholarPubMed
Brooks, BR, Brooks, CL, Mackerell, AD, Nilsson, L, Petrella, RJ, Roux, B, Won, Y, Archontis, G, Bartels, C, Boresch, S, Caflisch, A, Caves, L, Cui, Q, Dinner, AR, Feig, M, Fischer, S, Gao, J, Hodoscek, M, Im, W, … Karplus, M (2009) CHARMM: the biomolecular simulation program. Journal of Computational Chemistry 30(10), 15451614. https://doi.org/10.1002/jcc.21287CrossRefGoogle ScholarPubMed
Bussi, G and Laio, A (2020) Using metadynamics to explore complex free-energy landscapes. Nature Reviews Physics 2(4), 200212. https://doi.org/10.1038/s42254-020-0153-0CrossRefGoogle Scholar
Casadio, R, Martelli, PL and Savojardo, C (2022) Machine learning solutions for predicting protein–protein interactions. Wiley Interdisciplinary Reviews: Computational Molecular Science. https://doi.org/10.1002/wcms.1618Google Scholar
Case, DA, Cheatham, TE, Darden, T, Gohlke, H, Luo, R, Merz, KM, Onufriev, A, Simmerling, C, Wang, B and Woods, RJ (2005) The Amber biomolecular simulation programs. Journal of Computational Chemistry 26(16), 16681688. https://doi.org/10.1002/jcc.20290Google ScholarPubMed
Chang, L and Perez, A (2022) AlphaFold encodes the principles to identify high affinity peptide binders. BioRxiv. https://doi.org/10.1101/2022.03.18.484931Google Scholar
Chaudhury, S, Lyskov, S and Gray, JJ (2010) PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics 26(5), 689691. https://doi.org/10.1093/bioinformatics/btq007CrossRefGoogle ScholarPubMed
Chen, J, Cheong, HH and Siu, SWI (2021) xDeep-AcPEP: deep learning method for anticancer peptide activity prediction based on convolutional neural network and multitask learning. Journal of Chemical Information and Modeling 61(8), 37893803. https://doi.org/10.1021/acs.jcim.1c00181CrossRefGoogle ScholarPubMed
Chodera, JD and Noé, F (2014) Markov state models of biomolecular conformational dynamics. Current Opinion in Structural Biology 25, 135144. https://doi.org/10.1016/j.sbi.2014.04.002CrossRefGoogle ScholarPubMed
Ciemny, M, Kurcinski, M, Kamel, K, Kolinski, A, Alam, N, Schueler-Furman, O and Kmiecik, S (2018) Protein–peptide docking: opportunities and challenges. Drug Discovery Today 23(8), 15301537. https://doi.org/10.1016/j.drudis.2018.05.006CrossRefGoogle ScholarPubMed
Cunningham, JM, Koytiger, G, Sorger, PK and AlQuraishi, M (2020) Biophysical prediction of protein–peptide interactions and signaling networks using machine learning. Nature Methods 17(2), 175183. https://doi.org/10.1038/s41592-019-0687-1CrossRefGoogle ScholarPubMed
Das, AA, Sharma, OP, Kumar, MS, Krishna, R and Mathur, PP (2013) PepBind: a comprehensive database and computational tool for analysis of protein–peptide interactions. Genomics, Proteomics & Bioinformatics 11(4), 241246. https://doi.org/10.1016/j.gpb.2013.03.002CrossRefGoogle ScholarPubMed
Dee, W (2022) LMPred: predicting antimicrobial peptides using pre-trained language models and deep learning. Bioinformatics Advances 2(1). https://doi.org/10.1093/bioadv/vbac021CrossRefGoogle Scholar
Deplazes, E, Davies, J, Bonvin, AMJJ, King, GF and Mark, AE (2016) Combination of ambiguous and unambiguous data in the restraint-driven docking of flexible peptides with HADDOCK: the binding of the spider toxin PcTx1 to the acid sensing ion channel (ASIC) 1a. Journal of Chemical Information and Modeling 56(1), 127138. https://doi.org/10.1021/acs.jcim.5b00529CrossRefGoogle Scholar
Diharce, J, Cueto, M, Beltramo, M, Aucagne, V and Bonnet, P (2019) In silico peptide ligation: iterative residue docking and linking as a New approach to predict protein-peptide interactions. Molecules 24(7), 1351. https://doi.org/10.3390/molecules24071351CrossRefGoogle ScholarPubMed
Donsky, E and Wolfson, HJ (2011) PepCrawler: a fast RRT-based algorithm for high-resolution refinement and binding affinity estimation of peptide inhibitors. Bioinformatics 27(20), 28362842. https://doi.org/10.1093/bioinformatics/btr498CrossRefGoogle ScholarPubMed
Eastman, P, Swails, J, Chodera, JD, McGibbon, RT, Zhao, Y, Beauchamp, KA, Wang, L-P, Simmonett, AC, Harrigan, MP, Stern, CD, Wiewiora, RP, Brooks, BR and Pande, VS (2017) OpenMM 7: rapid development of high performance algorithms for molecular dynamics. PLoS Computational Biology 13(7), e1005659. https://doi.org/10.1371/journal.pcbi.1005659CrossRefGoogle ScholarPubMed
Evans, R, O’Neill, M, Pritzel, A, Antropova, N, Senior, A, Green, T, Žídek, A, Bates, R, Blackwell, S, Yim, J, Ronneberger, O, Bodenstein, S, Zielinski, M, Bridgland, A, Potapenko, A, Cowie, A, Tunyasuvunakool, K, Jain, R, Clancy, E, … Hassabis, D (2021) Protein complex prediction with AlphaFold-Multimer. BioRxiv. https://doi.org/10.1101/2021.10.04.463034Google Scholar
Fosgerau, K and Hoffmann, T (2015) Peptide therapeutics: current status and future directions. Drug Discovery Today 20(1), 122128. https://doi.org/10.1016/j.drudis.2014.10.003Google ScholarPubMed
Frappier, V, Duran, M and Keating, AE (2018) PixelDB: protein–peptide complexes annotated with structural conservation of the peptide binding mode. Protein Science 27(1), 276285. https://doi.org/10.1002/pro.3320CrossRefGoogle ScholarPubMed
Ganesh, AN, Heusser, C, Garad, S and Sánchez-Félix, MV (2021) Patient-centric design for peptide delivery: trends in routes of administration and advancement in drug delivery technologies. Medicine in Drug Discovery 9, 100079. https://doi.org/10.1016/j.medidd.2020.100079CrossRefGoogle Scholar
Geng, C, Narasimhan, S, Rodrigues, JPGLM and Bonvin, AMJJ (2017) Modeling peptide-protein interactions, methods and protocols. Methods in Molecular Biology 1561, 109138. https://doi.org/10.1007/978-1-4939-6798-8_8CrossRefGoogle Scholar
Harder, E, Damm, W, Maple, J, Wu, C, Reboul, M, Xiang, JY, Wang, L, Lupyan, D, Dahlgren, MK, Knight, JL, Kaus, JW, Cerutti, DS, Krilov, G, Jorgensen, WL, Abel, R and Friesner, RA (2016) OPLS3: a force Field providing broad coverage of drug-like small molecules and proteins. Journal of Chemical Theory and Computation 12(1), 281296. https://doi.org/10.1021/acs.jctc.5b00864CrossRefGoogle ScholarPubMed
Hauser, AS and Windshügel, B (2016) LEADS-PEP: a benchmark data set for assessment of peptide docking performance. Journal of Chemical Information and Modeling 56(1), 188200. https://doi.org/10.1021/acs.jcim.5b00234CrossRefGoogle ScholarPubMed
Hornak, V, Abel, R, Okur, A, Strockbine, B, Roitberg, A and Simmerling, C (2006) Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins: Structure, Function, and Bioinformatics 65(3), 712725. https://doi.org/10.1002/prot.21123CrossRefGoogle ScholarPubMed
Hou, T, Wang, J, Li, Y and Wang, W (2011) Assessing the performance of the MM/PBSA and MM/GBSA methods. 1. The accuracy of binding free energy calculations based on molecular dynamics simulations. Journal of Chemical Information and Modeling 51(1), 6982. https://doi.org/10.1021/ci100275aCrossRefGoogle ScholarPubMed
Huang, J and MacKerell, AD (2013) CHARMM36 all-atom additive protein force field: validation based on comparison to NMR data. Journal of Computational Chemistry 34(25), 21352145. https://doi.org/10.1002/jcc.23354CrossRefGoogle ScholarPubMed
Huang, J, Rauscher, S, Nawrocki, G, Ran, T, Feig, M, Groot, BL d, Grubmüller, H and MacKerell, AD (2017) CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nature Methods 14(1), 7173. https://doi.org/10.1038/nmeth.4067Google ScholarPubMed
Jiang, F, Zhou, C-Y and Wu, Y-D (2014) Residue-specific force Field based on the protein coil library. RSFF1: modification of OPLS-AA/L. The Journal of Physical Chemistry B 118(25), 69836998. https://doi.org/10.1021/jp5017449CrossRefGoogle ScholarPubMed
Johansson-Åkhe, I, Mirabello, C and Wallner, B (2019) Predicting protein–peptide interaction sites using distant protein complexes as structural templates. Scientific Reports 9(1), 4267. https://doi.org/10.1038/s41598-019-38498-7CrossRefGoogle ScholarPubMed
Johansson-Åkhe, I, Mirabello, C and Wallner, B (2020) InterPep2: global peptide–protein docking using interaction surface templates. Bioinformatics 36(8), 24582465. https://doi.org/10.1093/bioinformatics/btaa005CrossRefGoogle ScholarPubMed
Johansson-Åkhe, I, Mirabello, C and Wallner, B (2021) InterPepRank: assessment of docked peptide conformations by a deep graph network. Frontiers in Bioinformatics 1, 763102. https://doi.org/10.3389/fbinf.2021.763102CrossRefGoogle Scholar
Jumper, J, Evans, R, Pritzel, A, Green, T, Figurnov, M, Ronneberger, O, Tunyasuvunakool, K, Bates, R, Žídek, A, Potapenko, A, Bridgland, A, Meyer, C, Kohl, SAA, Ballard, AJ, Cowie, A, Romera-Paredes, B, Nikolov, S, Jain, R, Adler, J, … Hassabis, D (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596(7873), 583589. https://doi.org/10.1038/s41586-021-03819-2CrossRefGoogle ScholarPubMed
Kaminski, GA, Friesner, RA, Tirado-Rives, J and Jorgensen, WL (2001) Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. The Journal of Physical Chemistry B, 105(28), 64746487. https://doi.org/10.1021/jp003919dCrossRefGoogle Scholar
Khramushin, A, Ben-Aharon, Z and Schueler-Furman, O (2022) Matching protein surface structural patches for high-resolution blind peptide docking. PNAS.CrossRefGoogle ScholarPubMed
Kilburg, D and Gallicchio, E (2018) Assessment of a single decoupling alchemical approach for the calculation of the absolute binding free energies of protein-peptide complexes. Frontiers in Molecular Biosciences 5, 22. https://doi.org/10.3389/fmolb.2018.00022CrossRefGoogle ScholarPubMed
Kozakov, D, Brenke, R, Comeau, SR and Vajda, S (2006) PIPER: An FFT-based protein docking program with pairwise potentials. Proteins: Structure, Function, and Bioinformatics 65(2), 392406. https://doi.org/10.1002/prot.21117CrossRefGoogle ScholarPubMed
Kurcinski, M, Ciemny, MP, Oleniecki, T, Kuriata, A, Badaczewska-Dawid, AE, Kolinski, A and Kmiecik, S (2019) CABS-dock standalone: a toolbox for flexible protein–peptide docking. Bioinformatics 35(20), 41704172. https://doi.org/10.1093/bioinformatics/btz185CrossRefGoogle ScholarPubMed
Kurcinski, M, Jamroz, M, Blaszczyk, M, Kolinski, A and Kmiecik, S (2015) CABS-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site. Nucleic Acids Research 43(Web Server issue), W419W424. https://doi.org/10.1093/nar/gkv456CrossRefGoogle ScholarPubMed
Lagardère, L, Jolly, L-H, Lipparini, F, Aviat, F, Stamm, B, Jing, ZF, Harger, M, Torabifard, H, Cisneros, GA, Schnieders, MJ, Gresh, N, Maday, Y, Ren, PY, Ponder, JW and Piquemal, J-P (2017) Tinker-HP: a massively parallel molecular dynamics package for multiscale simulations of large complex systems with advanced point dipole polarizable force fields. Chemical Science 9(4), 956972. https://doi.org/10.1039/c7sc04531jCrossRefGoogle ScholarPubMed
Lau, JL and Dunn, MK (2018) Therapeutic peptides: historical perspectives, current development trends, and future directions. Bioorganic & Medicinal Chemistry 26(10), 27002707. https://doi.org/10.1016/j.bmc.2017.06.052CrossRefGoogle ScholarPubMed
Lavi, A, Ngan, CH, Movshovitz-Attias, D, Bohnuud, T, Yueh, C, Beglov, D, Schueler-Furman, O and Kozakov, D (2013) Detection of peptide-binding sites on protein surfaces: the first step toward the modeling and targeting of peptide-mediated interactions. Proteins: Structure, Function, and Bioinformatics 81(12), 20962105. https://doi.org/10.1002/prot.24422CrossRefGoogle ScholarPubMed
Lee, AC-L, Harris, JL, Khanna, KK and Hong, J-H (2019) A comprehensive review on current advances in peptide drug development and design. International Journal of Molecular Sciences 20(10), 2383. https://doi.org/10.3390/ijms20102383CrossRefGoogle ScholarPubMed
Lee, EY, Lee, MW, Fulan, BM, Ferguson, AL and Wong, GCL (2017) What can machine learning do for antimicrobial peptides, and what can antimicrobial peptides do for machine learning? Interface Focus 7(6), 20160153. https://doi.org/10.1098/rsfs.2016.0153CrossRefGoogle ScholarPubMed
Lee, H, Heo, L, Lee, MS and Seok, C (2015) GalaxyPepDock: a protein–peptide docking tool based on interaction similarity and energy optimization. Nucleic Acids Research 43(Web Server issue), W431W435. https://doi.org/10.1093/nar/gkv495CrossRefGoogle ScholarPubMed
Lee, H and Seok, C (2017) Modeling peptide-protein interactions, methods and protocols. Methods in Molecular Biology 1561, 3747. https://doi.org/10.1007/978-1-4939-6798-8_4CrossRefGoogle Scholar
Lei, Y, Li, S, Liu, Z, Wan, F, Tian, T, Li, S, Zhao, D and Zeng, J (2021) A deep-learning framework for multi-level peptide–protein interaction prediction. Nature Communications 12(1), 5465. https://doi.org/10.1038/s41467-021-25772-4CrossRefGoogle ScholarPubMed
Lensink, MF, Nadzirin, N, Velankar, S and Wodak, SJ (2020) Modeling protein–protein, protein–peptide, and protein–oligosaccharide complexes: CAPRI 7th edition. Proteins: Structure, Function, and Bioinformatics, 88(8), 916938. https://doi.org/10.1002/prot.25870CrossRefGoogle ScholarPubMed
Lensink, MF, Velankar, S and Wodak, SJ (2017) Modeling protein–protein and protein–peptide complexes: CAPRI 6th edition. Proteins: Structure, Function, and Bioinformatics, 85(3), 359377. https://doi.org/10.1002/prot.25215Google ScholarPubMed
Lindorff-Larsen, K, Piana, S, Palmo, K, Maragakis, P, Klepeis, JL, Dror, RO and Shaw, DE (2010) Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins 78(8), 19501958. https://doi.org/10.1002/prot.22711CrossRefGoogle ScholarPubMed
Liu, H, Song, D, Lu, H, Luo, R and Chen, H (2018) Intrinsically disordered protein-specific force field CHARMM36IDPSFF. Chemical Biology & Drug Design 92(4), 17221735. https://doi.org/10.1111/cbdd.13342CrossRefGoogle ScholarPubMed
London, N, Movshovitz-Attias, D and Schueler-Furman, O (2010) The structural basis of peptide-protein binding strategies. Structure 18(2), 188199. https://doi.org/10.1016/j.str.2009.11.012CrossRefGoogle ScholarPubMed
London, N, Raveh, B and Schueler-Furman, O (2013) Druggable protein–protein interactions – from hot spots to hot segments. Current Opinion in Chemical Biology 17(6), 952959. https://doi.org/10.1016/j.cbpa.2013.10.011CrossRefGoogle ScholarPubMed
MacCallum, JL, Perez, A and Dill, KA (2015) Determining protein structures by combining semireliable data with atomistic physical models by Bayesian inference. Proceedings of the National Academy of Sciences 112(22), 69856990. https://doi.org/10.1073/pnas.1506788112CrossRefGoogle ScholarPubMed
MacKerell, AD, Bashford, D, Bellott, M, Dunbrack, RL, Evanseck, JD, Field, MJ, Fischer, S, Gao, J, Guo, H, Ha, S, Joseph-McCarthy, D, Kuchnir, L, Kuczera, K, Lau, FTK, Mattos, C, Michnick, S, Ngo, T, Nguyen, DT, Prodhom, B, … Karplus, M (1998) All-atom empirical potential for molecular modeling and dynamics studies of proteins. The Journal of Physical Chemistry B, 102(18), 35863616. https://doi.org/10.1021/jp973084fCrossRefGoogle ScholarPubMed
Mackerell, AD, Feig, M and Brooks, CL (2004) Extending the treatment of backbone energetics in protein force fields: Limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations. Journal of Computational Chemistry 25(11), 14001415. https://doi.org/10.1002/jcc.20065CrossRefGoogle ScholarPubMed
Maier, JA, Martinez, C, Kasavajhala, K, Wickstrom, L, Hauser, KE and Simmerling, C (2015) ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. Journal of Chemical Theory and Computation 11(8), 36963713. https://doi.org/10.1021/acs.jctc.5b00255CrossRefGoogle ScholarPubMed
Martins, PM, Santos, LH, Mariano, D, Queiroz, FC, Bastos, LL, Gomes, I d S, Fischer, PHC, Rocha, REO, Silveira, SA, Lima, LHF d, Magalhães, MTQ d, Oliveira, MGA and Melo-Minardi, RC d (2021) Propedia: a database for protein–peptide identification based on a hybrid clustering algorithm. BMC Bioinformatics 22(1), 1. https://doi.org/10.1186/s12859-020-03881-zCrossRefGoogle ScholarPubMed
McGuffin, LJ, Bryson, K and Jones, DT (2000) The PSIPRED protein structure prediction server. Bioinformatics 16, 404405. https://doi.org/10.1093/bioinformatics/16.4.404CrossRefGoogle ScholarPubMed
Modeling Peptide–Protein Interactions, Methods and Protocols. (2017). Methods in Molecular Biology. https://doi.org/10.1007/978-1-4939-6798-8CrossRefGoogle Scholar
Mondal, A, Swapna, GVT, Hao, J, Ma, L, Roth, MJ, Montelione, GT and Perez, A (2022) Structure determination of protein–peptide complexes from NMR chemical shift data using MELD. BioRxiv. https://doi.org/10.1101/2021.12.31.474671Google Scholar
Morrone, JA, Perez, A, Deng, Q, Ha, SN, Holloway, MK, Sawyer, TK, Sherborne, BS, Brown, FK and Dill, KA (2017a) Molecular simulations identify binding poses and approximate affinities of stapled α-helical peptides to MDM2 and MDMX. Journal of Chemical Theory and Computation 13(2), 863869. https://doi.org/10.1021/acs.jctc.6b00978Google Scholar
Morrone, JA, Perez, A, MacCallum, J and Dill, KA (2017b) Computed binding of peptides to proteins with MELD-accelerated molecular dynamics. Journal of Chemical Theory and Computation 13(2), 870876. https://doi.org/10.1021/acs.jctc.6b00977CrossRefGoogle Scholar
Mu, J, Liu, H, Zhang, J, Luo, R and Chen, H-F (2021) Recent force field strategies for intrinsically disordered proteins. Journal of Chemical Information and Modeling 61(3), 10371047. https://doi.org/10.1021/acs.jcim.0c01175CrossRefGoogle ScholarPubMed
Obarska-Kosinska, A, Iacoangeli, A, Lepore, R and Tramontano, A (2016) PepComposer: computational design of peptides binding to a given protein surface. Nucleic Acids Research 44(Web Server issue), W522W528. https://doi.org/10.1093/nar/gkw366CrossRefGoogle ScholarPubMed
Ochoa, R, Laio, A and Cossio, P (2019) Predicting the affinity of peptides to major histocompatibility complex class II by scoring molecular dynamics simulations. Journal of Chemical Information and Modeling 59(8), 34643473. https://doi.org/10.1021/acs.jcim.9b00403CrossRefGoogle ScholarPubMed
Panel, N, Villa, F, Fuentes, EJ and Simonson, T (2018) Accurate PDZ/peptide binding specificity with additive and polarizable free energy simulations. Biophysical Journal 114(5), 10911102. https://doi.org/10.1016/j.bpj.2018.01.008CrossRefGoogle ScholarPubMed
Paul, F, Wehmeyer, C, Abualrous, ET, Wu, H, Crabtree, MD, Schöneberg, J, Clarke, J, Freund, C, Weikl, TR and Noé, F (2017) Protein–peptide association kinetics beyond the seconds timescale from atomistic simulations. Nature Communications 8(1), 1095. https://doi.org/10.1038/s41467-017-01163-6CrossRefGoogle ScholarPubMed
Perez, A, MacCallum, JL, Brini, E, Simmerling, C and Dill, KA (2015) Grid-based backbone correction to the ff12SB protein force Field for implicit-solvent simulations. Journal of Chemical Theory and Computation 11(10), 47704779. https://doi.org/10.1021/acs.jctc.5b00662CrossRefGoogle Scholar
Phillips, JC, Braun, R, Wang, W, Gumbart, J, Tajkhorshid, E, Villa, E, Chipot, C, Skeel, RD, Kalé, L and Schulten, K (2005) Scalable molecular dynamics with NAMD. Journal of Computational Chemistry 26(16), 17811802. https://doi.org/10.1002/jcc.20289CrossRefGoogle ScholarPubMed
Piana, S, Donchev, AG, Robustelli, P and Shaw, DE (2015) Water dispersion interactions strongly influence simulated structural properties of disordered protein states. The Journal of Physical Chemistry B 119(16), 51135123. https://doi.org/10.1021/jp508971mCrossRefGoogle ScholarPubMed
Piana, S, Lindorff-Larsen, K and Shaw, DE (2011) How robust are protein folding simulations with respect to force Field parameterization? Biophysical Journal 100(9), L47L49. https://doi.org/10.1016/j.bpj.2011.03.051CrossRefGoogle ScholarPubMed
Pitera, JW and Chodera, JD (2012) On the use of experimental observations to bias simulated ensembles. Journal of Chemical Theory and Computation 8(10), 34453451. https://doi.org/10.1021/ct300112vCrossRefGoogle ScholarPubMed
Plisson, F, Ramírez-Sánchez, O and Martínez-Hernández, C (2020) Machine learning-guided discovery and design of non-hemolytic peptides. Scientific Reports 10(1), 16581. https://doi.org/10.1038/s41598-020-73644-6CrossRefGoogle ScholarPubMed
Porter, KA, Xia, B, Beglov, D, Bohnuud, T, Alam, N, Schueler-Furman, O and Kozakov, D (2017) ClusPro PeptiDock: efficient global docking of peptide recognition motifs using FFT. Bioinformatics 33(20), 32993301. https://doi.org/10.1093/bioinformatics/btx216CrossRefGoogle ScholarPubMed
Pu, C, Yan, G, Shi, J and Li, R (2017) Assessing the performance of docking scoring function, FEP, MM-GBSA, and QM/MM-GBSA approaches on a series of PLK1 inhibitors. MedChemComm 8, 14521458. https://doi.org/10.1039/c7md00184cCrossRefGoogle ScholarPubMed
Qiu, Y, Smith, DGA, Boothroyd, S, Jang, H, Hahn, DF, Wagner, J, Bannan, CC, Gokey, T, Lim, VT, Stern, CD, Rizzi, A, Tjanaka, B, Tresadern, G, Lucas, X, Shirts, MR, Gilson, MK, Chodera, JD, Bayly, CI, Mobley, DL and Wang, L-P (2021) Development and benchmarking of open force Field v1.0.0—the parsley small-molecule force Field. Journal of Chemical Theory and Computation 17(10), 62626280. https://doi.org/10.1021/acs.jctc.1c00571CrossRefGoogle ScholarPubMed
Rahman, MU, Rehman, AU, Liu, H and Chen, H-F (2020) Comparison and evaluation of force fields for intrinsically disordered proteins. Journal of Chemical Information and Modeling 60(10), 49124923. https://doi.org/10.1021/acs.jcim.0c00762CrossRefGoogle ScholarPubMed
Rashid, MH, Heinzelmann, G, Huq, R, Tajhya, RB, Chang, SC, Chhabra, S, Pennington, MW, Beeton, C, Norton, RS and Kuyucak, S (2013) A potent and selective peptide blocker of the Kv1.3 channel: prediction from free-energy simulations and experimental confirmation. PLoS One 8(11), e78712. https://doi.org/10.1371/journal.pone.0078712CrossRefGoogle ScholarPubMed
Raveh, B, London, N and Schueler-Furman, O (2010) Sub-angstrom modeling of complexes between flexible peptides and globular proteins. Proteins: Structure, Function, and Bioinformatics 78(9), 20292040. https://doi.org/10.1002/prot.22716CrossRefGoogle ScholarPubMed
Raveh, B, London, N, Zimmerman, L and Schueler-Furman, O (2011) Rosetta FlexPepDock ab-initio: simultaneous folding, docking and refinement of peptides onto their receptors. PLoS One 6(4), e18934. https://doi.org/10.1371/journal.pone.0018934CrossRefGoogle ScholarPubMed
Rentzsch, R and Renard, BY (2015) Docking small peptides remains a great challenge: an assessment using AutoDock Vina. Briefings in Bioinformatics 16(6), 10451056. https://doi.org/10.1093/bib/bbv008CrossRefGoogle ScholarPubMed
Robertson, MJ, Tirado-Rives, J and Jorgensen, WL (2015) Improved peptide and protein torsional energetics with the OPLS-AA force Field. Journal of Chemical Theory and Computation 11(7), 34993509. https://doi.org/10.1021/acs.jctc.5b00356Google Scholar
Robinson, MK, Monroe, JI and Shell, MS (2016) Are AMBER force fields and implicit solvation models additive? A folding study with a balanced peptide test set. Journal of Chemical Theory and Computation 12(11), 56315642. https://doi.org/10.1021/acs.jctc.6b00788CrossRefGoogle ScholarPubMed
Robustelli, P, Piana, S and Shaw, DE (2018) Developing a molecular dynamics force field for both folded and disordered protein states. Proceedings of the National Academy of Sciences of the United States of America 115(21), E4758E4766. https://doi.org/10.1073/pnas.1800690115Google ScholarPubMed
Roney, JP and Ovchinnikov, S (2022) State-of-the-art estimation of protein model accuracy using AlphaFold. BioRxiv. https://doi.org/10.1101/2022.03.11.484043Google Scholar
Saladin, A, Rey, J, Thévenet, P, Zacharias, M, Moroy, G and Tufféry, P (2014) PEP-SiteFinder: a tool for the blind identification of peptide binding sites on protein surfaces. Nucleic Acids Research 42(W1), W221W226. https://doi.org/10.1093/nar/gku404CrossRefGoogle ScholarPubMed
Schindler, CEM, de Vries, SJ and Zacharias, M (2015) Fully blind peptide-protein docking with pepATTRACT. Structure 23(8), 15071515. https://doi.org/10.1016/j.str.2015.05.021CrossRefGoogle ScholarPubMed
Senior, AW, Evans, R, Jumper, J, Kirkpatrick, J, Sifre, L, Green, T, Qin, C, Žídek, A, Nelson, AWR, Bridgland, A, Penedones, H, Petersen, S, Simonyan, K, Crossan, S, Kohli, P, Jones, DT, Silver, D, Kavukcuoglu, K and Hassabis, D (2020) Improved protein structure prediction using potentials from deep learning. Nature 577(7792), 706710. https://doi.org/10.1038/s41586-019-1923-7CrossRefGoogle ScholarPubMed
Song, D, Liu, H, Luo, R and Chen, H-F (2020) Environment-specific force field for intrinsically disordered and ordered proteins. Journal of Chemical Information and Modeling 60(4), 22572267. https://doi.org/10.1021/acs.jcim.0c00059CrossRefGoogle ScholarPubMed
Song, D, Luo, R and Chen, H-F (2017) The IDP-specific force field ff14IDPSFF improves the conformer sampling of intrinsically disordered proteins. Journal of Chemical Information and Modeling 57(5), 11661178. https://doi.org/10.1021/acs.jcim.7b00135CrossRefGoogle ScholarPubMed
Spiliotopoulos, D, Kastritis, PL, Melquiond, ASJ, Bonvin, AMJJ, Musco, G, Rocchia, W and Spitaleri, A (2016) dMM-PBSA: a new HADDOCK scoring function for protein-peptide docking. Frontiers in Molecular Biosciences 3, 46. https://doi.org/10.3389/fmolb.2016.00046CrossRefGoogle ScholarPubMed
Spitzer, R and Jain, AN (2012) Surflex-dock: docking benchmarks and real-world application. Journal of Computer-Aided Molecular Design 26(6), 687699. https://doi.org/10.1007/s10822-011-9533-yCrossRefGoogle ScholarPubMed
Sun, L, Fu, T, Zhao, D, Fan, H and Zhong, S (2021) Divide-and-link peptide docking: a fragment-based peptide docking protocol. Physical Chemistry Chemical Physics 23, 2264722660. https://doi.org/10.1039/d1cp02098fCrossRefGoogle ScholarPubMed
Taherzadeh, G, Zhou, Y, Liew, AW-C and Yang, Y (2017) Structure-based prediction of protein–peptide binding regions using random Forest. Bioinformatics 34(3), 477484. https://doi.org/10.1093/bioinformatics/btx614CrossRefGoogle Scholar
Tao, H, Zhang, Y and Huang, S-Y (2020) Improving protein–peptide docking results via pose-clustering and rescoring with a combined knowledge-based and MM–GBSA scoring function. Journal of Chemical Information and Modeling 60(4), 23772387. https://doi.org/10.1021/acs.jcim.0c00058CrossRefGoogle ScholarPubMed
Taylor, RD, Jewsbury, PJ and Essex, JW (2002) A review of protein-small molecule docking methods. Journal of Computer-Aided Molecular Design 16(3), 151166. https://doi.org/10.1023/a:1020155510718CrossRefGoogle ScholarPubMed
Tejero, R, Huang, YJ, Ramelot, TA and Montelione, GT (2022) AlphaFold models of small proteins rival the accuracy of solution NMR structures. BioRxiv. https://doi.org/10.1101/2022.03.09.483701Google ScholarPubMed
Tian, C, Kasavajhala, K, Belfon, KAA, Raguette, L, Huang, H, Migues, AN, Bickel, J, Wang, Y, Pincay, J, Wu, Q and Simmerling, C (2019) ff19SB: amino-acid-specific protein backbone parameters trained against quantum mechanics energy surfaces in solution. Journal of Chemical Theory and Computation 16(1), 528552. https://doi.org/10.1021/acs.jctc.9b00591CrossRefGoogle ScholarPubMed
Tien, MZ, Sydykova, DK, Meyer, AG and Wilke, CO (2013) PeptideBuilder: a simple python library to generate model peptides. PeerJ 1, e80. https://doi.org/10.7717/peerj.80CrossRefGoogle ScholarPubMed
Townshend, RJL, Eismann, S, Watkins, AM, Rangan, R, Karelina, M, Das, R and Dror, RO (2021) Geometric deep learning of RNA structure. Science 373(6558), 10471051. https://doi.org/10.1126/science.abe5650CrossRefGoogle ScholarPubMed
Trabuco, LG, Lise, S, Petsalaki, E and Russell, RB (2012) PepSite: prediction of peptide-binding sites from protein surfaces. Nucleic Acids Research 40(W1), W423W427. https://doi.org/10.1093/nar/gks398CrossRefGoogle ScholarPubMed
Trellet, M, Melquiond, ASJ and Bonvin, AMJJ (2013) A unified conformational selection and induced fit approach to protein-peptide docking. PLoS One 8(3), e58769. https://doi.org/10.1371/journal.pone.0058769CrossRefGoogle ScholarPubMed
Trellet, M, Melquiond, ASJ and Bonvin, AMJJ (2014) Computational peptidology. Methods in Molecular Biology 1268, 221239. https://doi.org/10.1007/978-1-4939-2285-7_10CrossRefGoogle Scholar
Usmani, SS, Bedi, G, Samuel, JS, Singh, S, Kalra, S, Kumar, P, Ahuja, AA, Sharma, M, Gautam, A and Raghava, GPS (2017) THPdb: database of FDA-approved peptide and protein therapeutics. PLoS One 12(7), e0181748. https://doi.org/10.1371/journal.pone.0181748CrossRefGoogle ScholarPubMed
Vanhee, P, Reumers, J, Stricher, F, Baeten, L, Serrano, L, Schymkowitz, J and Rousseau, F (2010) PepX: a structural database of non-redundant protein–peptide complexes. Nucleic Acids Research 38(Database issue), D545D551. https://doi.org/10.1093/nar/gkp893CrossRefGoogle ScholarPubMed
Verdonk, ML, Cole, JC, Hartshorn, MJ, Murray, CW and Taylor, RD (2003) Improved protein–ligand docking using GOLD. Proteins: Structure, Function, and Bioinformatics 52(4), 609623. https://doi.org/10.1002/prot.10465CrossRefGoogle ScholarPubMed
Vries, SJ d, Rey, J, Schindler, CEM, Zacharias, M and Tuffery, P (2017) The pepATTRACT web server for blind, large-scale peptide–protein docking. Nucleic Acids Research 45(Web Server issue), W361W364. https://doi.org/10.1093/nar/gkx335Google ScholarPubMed
Wan, S, Knapp, B, Wright, DW, Deane, CM and Coveney, PV (2015) Rapid, precise, and reproducible prediction of peptide–MHC binding affinities from molecular dynamics that correlate well with experiment. Journal of Chemical Theory and Computation 11(7), 33463356. https://doi.org/10.1021/acs.jctc.5b00179CrossRefGoogle ScholarPubMed
Wang, E, Sun, H, Wang, J, Wang, Z, Liu, H, Zhang, JZH and Hou, T (2019) End-point binding free energy calculation with MM/PBSA and MM/GBSA: strategies and applications in drug design. Chemical Reviews 119(16), 94789508. https://doi.org/10.1021/acs.chemrev.9b00055CrossRefGoogle Scholar
Wang, J, Alekseenko, A, Kozakov, D and Miao, Y (2019) Improved modeling of peptide-protein binding through global docking and accelerated molecular dynamics simulations. Frontiers in Molecular Biosciences 6, 112. https://doi.org/10.3389/fmolb.2019.00112CrossRefGoogle ScholarPubMed
Wang, J, Lisanza, S, Juergens, D, Tischer, D, Anishchenko, I, Baek, M, Watson, JL, Chun, JH, Milles, LF, Dauparas, J, Expòsit, M, Yang, W, Saragovi, A,Ovchinnikov, S and Baker, D (2021) Deep learning methods for designing proteins scaffolding functional sites. BioRxiv. https://doi.org/10.1101/2021.11.10.468128Google Scholar
Wang, L, Wang, N, Zhang, W, Cheng, X, Yan, Z, Shao, G, Wang, X, Wang, R and Fu, C (2022) Therapeutic peptides: current applications and future directions. Signal Transduction and Targeted Therapy 7(1), 48. https://doi.org/10.1038/s41392-022-00904-4CrossRefGoogle ScholarPubMed
Wang, L, Wu, Y, Deng, Y, Kim, B, Pierce, L, Krilov, G, Lupyan, D, Robinson, S, Dahlgren, MK, Greenwood, J, Romero, DL, Masse, C, Knight, JL, Steinbrecher, T, Beuming, T, Damm, W, Harder, E, Sherman, W, Brewer, M, … Abel, R (2015) Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force Field. Journal of the American Chemical Society 137(7), 26952703. https://doi.org/10.1021/ja512751qGoogle ScholarPubMed
Wang, W, Ye, W, Jiang, C, Luo, R and Chen, H (2014) New force Field on modeling intrinsically disordered proteins. Chemical Biology & Drug Design 84(3), 253269. https://doi.org/10.1111/cbdd.12314CrossRefGoogle ScholarPubMed
Wang, Z, Sun, H, Yao, X, Li, D, Xu, L, Li, Y, Tian, S and Hou, T (2016) Comprehensive evaluation of ten docking programs on a diverse set of protein–ligand complexes: the prediction accuracy of sampling power and scoring power. Physical Chemistry Chemical Physics 18(18), 1296412975. https://doi.org/10.1039/c6cp01555gCrossRefGoogle ScholarPubMed
Webb, B and Sali, A (2014) Comparative protein structure modeling using MODELLER. Current Protocols in Bioinformatics 47(1), 5.6.15.6.32. https://doi.org/10.1002/0471250953.bi0506s47CrossRefGoogle ScholarPubMed
Wen, Z, He, J, Tao, H and Huang, S-Y (2018) PepBDB: A comprehensive structural database of biological peptide–protein interactions. Bioinformatics 35(1), 175177. https://doi.org/10.1093/bioinformatics/bty579Google Scholar
Weng, G, Gao, J, Wang, Z, Wang, E, Hu, X, Yao, X, Cao, D and Hou, T (2020) Comprehensive evaluation of fourteen docking programs on protein–peptide complexes. Journal of Chemical Theory and Computation 16(6), 39593969. https://doi.org/10.1021/acs.jctc.9b01208CrossRefGoogle ScholarPubMed
Weng, G, Wang, E, Chen, F, Sun, H, Wang, Z and Hou, T (2019) Assessing the performance of MM/PBSA and MM/GBSA methods. 9. Prediction reliability of binding affinities and binding poses for protein–peptide complexes. Physical Chemistry Chemical Physics 21(19), 1013510145. https://doi.org/10.1039/c9cp01674kCrossRefGoogle ScholarPubMed
Williamson, MP (2013) Using chemical shift perturbation to characterise ligand binding. Progress in Nuclear Magnetic Resonance Spectroscopy 73, 116. https://doi.org/10.1016/j.pnmrs.2013.02.001Google ScholarPubMed
Wu, C, Gao, R, Zhang, Y and Marinis, YD (2019) PTPD: predicting therapeutic peptides by deep learning and word2vec. BMC Bioinformatics 20(1), 456. https://doi.org/10.1186/s12859-019-3006-zCrossRefGoogle ScholarPubMed
Xu, X and Zou, X (2020) PepPro: a nonredundant structure data set for benchmarking peptide–protein computational docking. Journal of Computational Chemistry 41(4), 362369. https://doi.org/10.1002/jcc.26114CrossRefGoogle ScholarPubMed
Xu, X and Zou, X (2022) Predicting protein–peptide complex structures by accounting for peptide flexibility and the physicochemical environment. Journal of Chemical Information and Modeling 62(1), 2739. https://doi.org/10.1021/acs.jcim.1c00836CrossRefGoogle ScholarPubMed
Yan, C, Xu, X and Zou, X (2016) Fully blind docking at the atomic level for protein-peptide complex structure prediction. Structure 24(10), 18421853. https://doi.org/10.1016/j.str.2016.07.021CrossRefGoogle ScholarPubMed
Yang, S, Liu, H, Zhang, Y, Lu, H and Chen, H (2019) Residue-specific force Field improving the sample of intrinsically disordered proteins and folded proteins. Journal of Chemical Information and Modeling 59(11), 47934805. https://doi.org/10.1021/acs.jcim.9b00647Google ScholarPubMed
Yu, J, Andreani, J, Ochsenbein, F and Guerois, R (2017) Lessons from (co-)evolution in the docking of proteins and peptides for CAPRI rounds 28–35. Proteins: Structure, Function, and Bioinformatics 85(3), 378390. https://doi.org/10.1002/prot.25180CrossRefGoogle ScholarPubMed
Zhang, Y and Sanner, MF (2019) AutoDock CrankPep: Combining folding and docking to predict protein–peptide complexes. Bioinformatics 35(24), 51215127. https://doi.org/10.1093/bioinformatics/btz459CrossRefGoogle ScholarPubMed
Zhou, C-Y, Jiang, F and Wu, Y-D (2015) Residue-specific force Field based on protein coil library. RSFF2: Modification of AMBER ff99SB. The Journal of Physical Chemistry B 119(3), 10351047. https://doi.org/10.1021/jp5064676Google ScholarPubMed
Zhou, G, Pantelopulos, GA, Mukherjee, S and Voelz, VA (2017) Bridging microscopic and macroscopic mechanisms of p53-MDM2 binding with kinetic network models. Biophysical Journal 113(4), 785793. https://doi.org/10.1016/j.bpj.2017.07.009CrossRefGoogle ScholarPubMed
Zhou, P, Jin, B, Li, H and Huang, S-Y (2018) HPEPDOCK: a web server for blind peptide–protein docking based on a hierarchical algorithm. Nucleic Acids Research 46(Web Server issue), W443W450. https://doi.org/10.1093/nar/gky357CrossRefGoogle ScholarPubMed
Zou, R, Zhou, Y, Wang, Y, Kuang, G, Ågren, H, Wu, J and Tu, Y (2020) Free energy profile and kinetics of coupled folding and binding of the intrinsically disordered protein p53 with MDM2. Journal of Chemical Information and Modeling 60(3), 15511558. https://doi.org/10.1021/acs.jcim.9b00920Google ScholarPubMed
Zwier, MC, Pratt, AJ, Adelman, JL, Kaus, JW, Zuckerman, DM and Chong, LT (2016) Efficient atomistic simulation of pathways and calculation of rate constants for a protein–peptide binding process: Application to the MDM2 protein and an intrinsically disordered p53 peptide. The Journal of Physical Chemistry Letters 7(17), 34403445. https://doi.org/10.1021/acs.jpclett.6b01502Google Scholar
Figure 0

Table 1. Summary of the popular protein–peptide complexes datasets that are widely used for testing and benchmarking different docking tool

Figure 1

Table 2. Summary of highlighted templated based docking tools

Figure 2

Fig. 1. Pipeline in popular template-free docking methods. (A) Input peptide conformations are generated in 3 major ways: 1) using peptide builder to generate major 3 conformations (alpha, polyproline II, extended); 2) molecular simulations are used to generate an ensemble of peptide conformations and 3) fragment pickers are used to select peptide fragments in the structural databases based on the peptide sequence. (B) If the binding site known, peptides are guided towards the binding site (local docking); else, peptides explore the whole protein surface (global docking). (C) Ensemble of docked poses. (D) Top score docked model representing the native structure.

Figure 3

Table 3. Summary of highlighted ‘local docking’ tools. Here, acronyms are used as follows: Pstr, protein structure; pseq, peptide sequence; pconf, initial peptide conformation; BB, backbone; SC, sidechain

Figure 4

Table 4. Summary of highlighted ‘global docking’ tools. Here, acronyms are used as follows: Pstr, protein structure; pseq, peptide sequence; BB, backbone; SC, sidechain

Figure 5

Fig. 2. Overview of protein force field development after 2000. Each protein force field is classified by the year of publication, target systems for optimisation (folded, disordered or both) and additional underscores indicating whether it is a modification version of previous force fields using strategies including dihedral parameter adjustment (blue), CMAP correction (red) or parameter modification for protein–water interaction (gold).

Review: Modeling peptide-protein complexes: docking, simulations and machine learning — R0/PR1

Conflict of interest statement

Reviewer declares none.

Comments

Comments to Author: “Modeling peptide-protein complexes: docking, simulations, and machine learning”

This review explores three main approaches in modeling peptide-protein interactions; molecular docking, molecular dynamic simulations, and machine learning. Overall it is a comprehensive and well written review that provides prospective users with the advantages and limitations of each approach. This kind of reviews is indeed of a great importance to the filed of computational chemistry as it lowers the barrier of entry to newcomers and guide them through choosing the correct approach. I suggest considering the publication of this review after performing the following minor revisions:

1. Molecular docking section: This section is well written and the authors provided a very good literature review of protein peptide-docking tools. My sole concern is that with the large amount of information provided here, the reader may find a difficulty in extracting information easily. Adding summary tables of each section can help the reader extract information efficiently. For example, the authors can provide a table summarizing the main features of each databases and sampling method [or at least the top performing], its pros, cons, and suggested application. Based on the scientific question, the prospective user can pick a database or sampling method that matches their system characteristics and refer to the the details in the main text.

2. Molecular dynamics section:

The authors provided a summary of the evolution and limitation of the current forcefields and explained the applications of different MD simulations methods in studying protein-peptide interactions. Although the authors explained the applications of several end-point free energy methods in protein-ligand/protein- peptide interactions such as MM-PBSA and MM-GBSA,they did not mention examples of using pathway free energy methods especially the highly rigorous “from statistical mechanics point of view” alchemical free energy approaches. Alchemical free energy calculations is a pathway free energy method that compute the binding free energy difference between the bound and unbound states of the peptide-protein complex by linking them through a suitable thermodynamic non physical “alchemical” path. Although the application of this approach in protein-peptide interaction is less common and more challenging than other end-point methods, it is worth shedding the light on the few successful examples of this approach as it will make this review more comprehensive [please refer to the following examples]:

1- Kilburg, D., & Gallicchio, E. (2018). Assessment of a single decoupling alchemical approach for the calculation of the absolute binding free energies of protein-peptide complexes. Frontiers in molecular biosciences, 5, 22.

2- Rashid, M. H., Heinzelmann, G., Huq, R., Tajhya, R. B., Chang, S. C., Chhabra, S.,& Kuyucak, S. (2013). A potent and selective peptide blocker of the Kv1. 3 channel: prediction from free-energy simulations and experimental confirmation. PloS one, 8(11), e78712.

3- Panel, N., Villa, F., Fuentes, E. J., & Simonson, T. (2018). Accurate PDZ/peptide binding specificity with additive and polarizable free energy simulations. Biophysical journal, 114(5), 1091-1102.

3. Machine Learning: No revision suggested.

Review: Modeling peptide-protein complexes: docking, simulations and machine learning — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

Comments to Author: Brief Summary:

Firstly, I would like to thank the Editorial Board of Quarterly Reviews of Biophysics, Discovery, for the opportunity to review this article.

The review focuses on current computational approaches to analyze peptide-protein interactions. Due to the highly flexible nature of peptides, the problem of determining the peptide-protein complex is nontrivial, and is crucial for subsequent analysis of interactions. As such, the review primarily discusses modeling peptide-protein complexes, starting with docking-based methods, continuing with the molecular-dynamics based refinements and integrative modeling platforms, and concluding with the recent breakthroughs in machine learning-based techniques. The authors also comment - in the 'MD Section' - on using molecular dynamics, in particular enhanced sampling methods, to study binding and unbinding kinetics. Ultimately, this review gives the reader a good foundation of the current tools available to both sample bound configurations of peptides with their receptors, and to generally investigate peptide-protein interactions.

Major Comments:

The review is extensive, demonstrating the rather large breadth of resources available for modeling peptides and peptide-protein complexes. Therefore, the reader can make informed decisions on which docking programs to use, whether to use an MD-based refinement strategy, and further which force fields to use for said refinement. Additionally, the authors do not shy away from discussing the limitations of the tools presented, and difficulties in the field of peptide studies generally. Due to these successes, my comments have largely been focused on discussing clarity of the review and presentation.

This review discusses numerous tools for studying peptide-protein complexes, especially the available docking and scoring programs. As a recommendation, if possible, it may be beneficial to add tables to list the common programs, the subcategories they fall in (e.g. template-based docking tool) and where they find the greatest success - if unambiguous. This will make it easy for the reader to organize the information and to quickly identify which tools they'd like to study further. It may also allow the main text to discuss the tools in greater detail.

The 'Summary' subsection of the 'Docking Section' was especially appreciated, as it tied together the salient points of docking and scoring. That being said, the bulk of the first paragraph (lines 484 to 491) would seem better placed in the beginning of the 'Scoring' section. The 'Summary' seems more appropriately initiated by line 495.

Minor Comments / Grammar:

Some of the paragraphs of this review could use transition sentences to switch topics in a coherent fashion.

It is highly recommended that the paper be analyzed for grammatical errors and typos.

All in all, this review would certainly benefit those who'd like to know more about the available tools to analyze peptide-protein complexes, thank you for your time.

Recommendation: Modeling peptide-protein complexes: docking, simulations and machine learning — R0/PR3

Comments

Comments to Author: Reviewer #1: "Modeling peptide-protein complexes: docking, simulations, and machine learning"

This review explores three main approaches in modeling peptide-protein interactions; molecular docking, molecular dynamic simulations, and machine learning. Overall it is a comprehensive and well written review that provides prospective users with the advantages and limitations of each approach. This kind of reviews is indeed of a great importance to the filed of computational chemistry as it lowers the barrier of entry to newcomers and guide them through choosing the correct approach. I suggest considering the publication of this review after performing the following minor revisions:

1.Molecular docking section: This section is well written and the authors provided a very good literature review of protein peptide-docking tools. My sole concern is that with the large amount of information provided here, the reader may find a difficulty in extracting information easily. Adding summary tables of each section can help the reader extract information efficiently. For example, the authors can provide a table summarizing the main features of each databases and sampling method [or at least the top performing], its pros, cons, and suggested application. Based on the scientific question, the prospective user can pick a database or sampling method that matches their system characteristics and refer to the the details in the main text.

2.Molecular dynamics section:

The authors provided a summary of the evolution and limitation of the current forcefields and explained the applications of different MD simulations methods in studying protein-peptide interactions. Although the authors explained the applications of several end-point free energy methods in protein-ligand/protein- peptide interactions such as MM-PBSA and MM-GBSA,they did not mention examples of using pathway free energy methods especially the highly rigorous “from statistical mechanics point of view” alchemical free energy approaches. Alchemical free energy calculations is a pathway free energy method that compute the binding free energy difference between the bound and unbound states of the peptide-protein complex by linking them through a suitable thermodynamic non physical “alchemical” path. Although the application of this approach in protein-peptide interaction is less common and more challenging than other end-point methods, it is worth shedding the light on the few successful examples of this approach as it will make this review more comprehensive [please refer to the following examples]:

1- Kilburg, D., & Gallicchio, E. (2018). Assessment of a single decoupling alchemical approach for the calculation of the absolute binding free energies of protein-peptide complexes. Frontiers in molecular biosciences, 5, 22.

2- Rashid, M. H., Heinzelmann, G., Huq, R., Tajhya, R. B., Chang, S. C., Chhabra, S.,& Kuyucak, S. (2013). A potent and selective peptide blocker of the Kv1. 3 channel: prediction from free-energy simulations and experimental confirmation. PloS one, 8(11), e78712.

3- Panel, N., Villa, F., Fuentes, E. J., & Simonson, T. (2018). Accurate PDZ/peptide binding specificity with additive and polarizable free energy simulations. Biophysical journal, 114(5), 1091-1102.

3. Machine Learning: No revision suggested.

Reviewer #2: Brief Summary:

Firstly, I would like to thank the Editorial Board of Quarterly Reviews of Biophysics, Discovery, for the opportunity to review this article.

The review focuses on current computational approaches to analyze peptide-protein interactions. Due to the highly flexible nature of peptides, the problem of determining the peptide-protein complex is nontrivial, and is crucial for subsequent analysis of interactions. As such, the review primarily discusses modeling peptide-protein complexes, starting with docking-based methods, continuing with the molecular-dynamics based refinements and integrative modeling platforms, and concluding with the recent breakthroughs in machine learning-based techniques. The authors also comment - in the 'MD Section' - on using molecular dynamics, in particular enhanced sampling methods, to study binding and unbinding kinetics. Ultimately, this review gives the reader a good foundation of the current tools available to both sample bound configurations of peptides with their receptors, and to generally investigate peptide-protein interactions.

Major Comments:

The review is extensive, demonstrating the rather large breadth of resources available for modeling peptides and peptide-protein complexes. Therefore, the reader can make informed decisions on which docking programs to use, whether to use an MD-based refinement strategy, and further which force fields to use for said refinement. Additionally, the authors do not shy away from discussing the limitations of the tools presented, and difficulties in the field of peptide studies generally. Due to these successes, my comments have largely been focused on discussing clarity of the review and presentation.

This review discusses numerous tools for studying peptide-protein complexes, especially the available docking and scoring programs. As a recommendation, if possible, it may be beneficial to add tables to list the common programs, the subcategories they fall in (e.g. template-based docking tool) and where they find the greatest success - if unambiguous. This will make it easy for the reader to organize the information and to quickly identify which tools they'd like to study further. It may also allow the main text to discuss the tools in greater detail.

The 'Summary' subsection of the 'Docking Section' was especially appreciated, as it tied together the salient points of docking and scoring. That being said, the bulk of the first paragraph (lines 484 to 491) would seem better placed in the beginning of the 'Scoring' section. The 'Summary' seems more appropriately initiated by line 495.

Minor Comments / Grammar:

Some of the paragraphs of this review could use transition sentences to switch topics in a coherent fashion.

It is highly recommended that the paper be analyzed for grammatical errors and typos.

All in all, this review would certainly benefit those who'd like to know more about the available tools to analyze peptide-protein complexes, thank you for your time.

Recommendation: Modeling peptide-protein complexes: docking, simulations and machine learning — R1/PR4

Comments

No accompanying comment.

Recommendation: Modeling peptide-protein complexes: docking, simulations and machine learning — R2/PR5

Comments

No accompanying comment.