Introduction
The mitochondrial megacomplex produces most of the energy in the human body (Stroud et al., Reference Stroud, Surgenor, Formosa, Reljic, Frazier, Dibley, Osellame, Stait, Beilharz, Thorburn, Salim and Ryan2016; Wu et al Reference Wu, Gu, Guo, Huang and Yang2016; Guo et al., Reference Guo, Zong, Wu, Gu and Yang2017). The respiratory chain complexes (RCCs) include Complex I (CI), Complex II (CII), Complex III (CIII), and Complex IV (CIV) that are located in the inner mitochondrial membrane are critical in energy conversion. Complex I (NADH: ubiquinone oxidoreductase) is the entry point for electrons to enter the RCCs, where two electrons from NADH are catalyzed into quinone (Berrisford and Sazanov, Reference Berrisford and Sazanov2009; Efremov and Sazanov, Reference Efremov and Sazanov2012; Hirst, Reference Hirst2013). Then, Complex I, Complex III (NADH: CIII, cytochrome bc1 complex), and Complex IV (NADH: cytochrome c oxidase) couples electron transfer by using the reduced potential of NADH to drive four protons across the inner membrane, leading to ATP synthesis in Complex V (CV) (Berrisford and Sazanov, Reference Berrisford and Sazanov2009; Hirst, Reference Hirst2013; Vinothkumar et al., Reference Vinothkumar, Zhu and Hirst2014; Zickermann et al., Reference Zickermann, Wirth, Nasiri, Siegmund, Schwalbe, Hunte and Brandt2015; Zhu et al., Reference Zhu, Vinothkumar and Hirst2016).
The L-shaped Complex I enzyme is one of the largest multi-subunit membrane protein complexes with 45 subunits (Mimaki et al., Reference Mimaki, Wang, McKenzie, Thorburn and Ryan2012; Stroud et al., Reference Stroud, Surgenor, Formosa, Reljic, Frazier, Dibley, Osellame, Stait, Beilharz, Thorburn, Salim and Ryan2016; Wu et al., Reference Wu, Gu, Guo, Huang and Yang2016; Zhu et al., Reference Zhu, Vinothkumar and Hirst2016) split into three modules (Efremov et al., Reference Efremov, Baradaran and Sazanov2010; Wirth et al., Reference Wirth, Brandt, Hunte and Zickermann2016). The NADH oxidation module (N module) and ubiquinone (Q) reduction module (Q module) form the peripheral arm, and the proximal and distal proton translocation module (PP and PD modules) form the membrane arm (Sharma et al., Reference Sharma, Lu and Bai2009; Parey et al., Reference Parey, Lasham, Mills, Djurabekova, Haapanen, Yoga, Xie, Kühlbrandt, Sharma, Vonck and Zickermann2021). The hydrophobic transmembrane arm or the P module containing the mtDNA-encoded subunits is embedded in the inner mitochondrial membrane, where the subunits are stabilized by tightly bound lipids (Fiedorczuk et al., Reference Fiedorczuk, Letts, Degliesposti, Kaszuba, Skehel and Sazanov2016). The transmembrane arm includes 3 highly hydrophobic subunits of ND2, ND4, and ND5, which contains around 15 transmembrane domains (Mimaki et al., Reference Mimaki, Wang, McKenzie, Thorburn and Ryan2012). The three antiporter-like subunits located inside the membrane arm are largely responsible for proton pumping activities.
With Complex I being an integral part of the RCCs, the dysfunction of the complex impairs oxidative phosphorylation and reduces ATP synthesis. These impairments prevent metabolic processes and lead to diseases including Alzheimer’s and Parkinson’s diseases, Friedreich’s ataxia, amyotrophic lateral sclerosis, Hurthle cell thyroid carcinoma, Leber’s hereditary optic neuropathy, Leigh syndrome, and so forth (Distelmaier et al., Reference Distelmaier, Koopman, van den Heuvel, Rodenburg, Mayatepek, Willems and Smeitink2009; Guo et al., Reference Guo, Zong, Wu, Gu and Yang2017; McGregor et al., Reference McGregor, Acajjaoui, Desfosses, Saïdi, Bacia-Verloop, Schwarz, Juyoux, von Velsen, Bowler, McCarthy, Kandiah, Gutsche and Soler-Lopez2023; Menezes et al., Reference Menezes, Riley and Christodoulou2014; Rodenburg, Reference Rodenburg2016; Sharma et al., Reference Sharma, Lu and Bai2009). In addition, Complex I has been linked as a major source of reactive oxygen species, which could damage mitochondria DNA and lead to aging.
Our study focuses on the 20 inner membrane proteins of the Complex I membrane that have direct medical relevance, including NDUA1, NDUA3, NDUAB, NDUAD, NDUB1, NDUB3, NDUB4, NDUB5, NDUB6, NDUB8, NDUBB, NDUC1, NDUC2, NU1M, NU2M, NU3M, NU4M, NU5M, NU6M, and NU4LM (Table 1). The other non-membrane proteins in the megacomplex are not subjected to the current study.
Table 1. The protein names, UniProt ID, and CryoEM structure (Å) with PBD ID

Note: The lists of tissue location, medical relevance, and function are not exhaustive. Updated results become available from more and more recent studies.
Traditionally, researchers use X-ray crystallography and NMR spectroscopy to study protein structures. Recently, high-resolution cryo-electron microscopy (CryoEM) has become the mainstream method used to study protein structures at near-atomic resolution by freezing the target specimen at temperatures of liquid nitrogen or nitrogen helium (Henderson et al., Reference Henderson, Chen, Chen, Grigorieff, Passmore, Ciccarelli, Rubinstein, Crowther, Stewart and Rosenthal2011; Milne et al., Reference Milne, Borgnia, Bartesaghi, Tran, Earl, Schauder, Lengyel, Pierson, Patwardhan and Subramaniam2013; Vinothkumar and Henderson, Reference Vinothkumar and Henderson2016). In our study, our baseline native structure is from the CryoEM structure megacomplex at 3.70Å resolution (Guo et al., Reference Guo, Zong, Wu, Gu and Yang2017).
However, despite these advancements, studying the structure and functions of these multi-subunit membrane proteins remains challenging due to the need of detergent for solubilization after isolating the proteins from the hydrophobic transmembrane regions. This process is often complicated and time-consuming before obtaining a high-resolution structure elucidation (Carpenter et al., Reference Carpenter, Beis, Cameron and Iwata2008; Vinothkumar and Henderson, Reference Vinothkumar and Henderson2010).
Current efforts to solubilize proteins include ProteinMPNN, which utilizes message-passing neural networks to predict and design the amino acid sequence that would fold into the desired shape. ProteinMPNN yields better results in predicting the hydrophobic amino acids for a protein backbone compared to Rosetta (Dauparas et al., Reference Dauparas, Anishchenko, Bennett, Bai, Ragotte, Milles, Wicky, Courbet, de Haas, Bethel, Leung, Huddy, Pellock, Tischer, Chan, Koepnick, Nguyen, Kang, Sankaran, Bera, King and Baker2022). Recently, researchers built on top of ProteinMPNN to devise SolubleMPNN trained on only soluble proteins, which was applied to engineer soluble variants of bacteriorhodopsin, successfully converting a membrane protein into a soluble one, while maintaining its core function and ligand-binding ability (Nikolaev et al., Reference Nikolaev, Orlov, Tsybrov, Kuznetsova, Shishkin, Kuzmin, Mikhailov, Galochkina, Anuchina, Chizhov, Semenov, Kapranov, Borshchevskiy, Remeeva and Gushchin2024). A generalization approach for the computational design of soluble membrane proteins was also explored by using ProteinMPNN on AlphaFold 2-generated structures, which generated soluble analogs for both rhomboid protease fold and seven-helix GPCR fold (Goverde et al., Reference Goverde, Pacesa, Goldbach, Dornfeld, Balbi, Georgeon, Rosset, Kapoor, Choudhury, Dauparas, Schellhaas, Kozlov, Baker, Ovchinnikov, Vecchio and Correia2024).
Instead of taking a computational approach, we applied the QTY code to systematically engineer water-soluble analogs with reduced hydrophobicity in membrane proteins. The QTY concept was inspired by high-resolution (1.5Å) electron density maps, which revealed structural similarities between hydrophobic and polar amino acids leucine (L) vs glutamine (Q); isoleucine (I)/valine (V) vs threonine (T); and phenylalanine (F) vs tyrosine (Y) (Zhang et al., Reference Zhang, Tao, Qing, Tang, Skuhersky, Corin, Tegler, Wassie, Wassie, Kwon, Suter, Entzian, Schubert, Yang, Labahn, Kubicek and Maertens2018; Tegler et al., Reference Tegler, Corin, Pick, Brookes, Skuhersky, Vogel and Zhang2020; Zhang and Egli, Reference Zhang and Egli2022). In our previous experiments, using the simple and straightforward QTY code, we successfully bioengineered detergent-free chemokine (Zhang et al., Reference Zhang, Tao, Qing, Tang, Skuhersky, Corin, Tegler, Wassie, Wassie, Kwon, Suter, Entzian, Schubert, Yang, Labahn, Kubicek and Maertens2018; Qing et al., Reference Qing, Han, Skuhersky, Chung, Badr, Schubert and Zhang2019; Tegler et al., Reference Tegler, Corin, Pick, Brookes, Skuhersky, Vogel and Zhang2020), cytokine receptors (Hao et al., Reference Hao, Jin, Zhang and Qing2020) and bacterial histidine kinase (Li et al., Reference Li, Tang, Qing, Wang, Xu, Zhang and Tao2024). After these detergent-free membrane proteins were expressed and purified, these QTY analogs demonstrated structural stability, retained their ligand-binding capabilities, and intact four enzymatic activities, making them ideal candidates for further studies and use as antigens to generate therapeutic monoclonal antibodies (mAbs).
Google’s DeepMind released the breakthrough AlphaFold 2 in 2021 (Jumper et al., Reference Jumper, Evans, Pritzel, Green, Figurnov, Ronneberger, Tunyasuvunakool, Bates, Žídek, Potapenko, Bridgland, Meyer, Kohl, Ballard, Cowie, Romera-Paredes, Nikolov, Jain, Adler, Back, Petersen, Reiman, Clancy, Zielinski, Steinegger, Pacholska, Berghammer, Bodenstein, Silver, Vinyals, Senior, Kavukcuoglu, Kohli and Hassabis2021; Jumper and Hassabis, Reference Jumper and Hassabis2022), and it placed over 214 million AlphaFold 2 predicted protein structures at the European Bioinformatic Institute (EBI) (Tunyasuvunakool et al., Reference Tunyasuvunakool, Adler, Wu, Green, Zielinski, Žídek, Bridgland, Cowie, Meyer, Laydon, Velankar, Kleywegt, Bateman, Evans, Pritzel, Figurnov, Ronneberger, Bates, Kohl, Potapenko, Ballard, Romera-Paredes, Nikolov, Jain, Clancy, Reiman, Petersen, Senior, Kavukcuoglu, Birney, Kohli, Jumper and Hassabis2021). We previously used AlphaFold 2 to predict membrane protein QTY analog protein structures. The QTY code was applied to 7 chemokine receptors (Skuhersky et al., Reference Skuhersky, Tao, Qing, Smorodina, Jin and Zhang2021), human olfactory receptors (Johnsson et al., Reference Johnsson, Karagöl, Karagöl and Zhang2024), glucose transporters (Smorodina et al., Reference Smorodina, Tao, Qing, Jin, Yang and Zhang2022b), solute carrier transporters (Smorodina et al., Reference Smorodina, Diankin, Tao, Qing, Yang and Zhang2022a), ABC transporters (Pan et al., Reference Pan, Tao, Smorodina and Zhang2024), and neurological transporters including serotonin, norepinephrine, dopamine transporters (Karagöl et al., Reference Karagöl, Karagöl, Smorodina and Zhang2024) and another synaptic vesicle protein subgroup of glutamate transporters (VGLUTs) (Karagöl et al., Reference Karagöl, Karagöl and Zhang2024). We also designed reverse QTY analogs of human serum albumin to effectively facilitate the release of antitumor drugs in mice (Meng et al., Reference Meng, Hao, Sun, Hou, Hou, Wang, Deng, Deng, Yang, Xia, Wang, Qing and Zhang2023). The water-soluble chemokine receptor CXCR4QTY analog has been successfully used in biomimetic sensors (Qing et al., Reference Qing, Xue, Zhao, Wu, Breitwieser, Smorodina, Schubert, Azzellino, Jin, Kong, Palacios, Sleytr and Zhang2023). We also used AlphaFold 2 to predict QTY analogs of beta-sheet-rich antibody IgG (Li et al., Reference Li, Wang, Tao, Xu and Zhang2023) and bacterial beta-barrel proteins (Sajeev-Sheeja et al., Reference Sajeev-Sheeja, Smorodina and Zhang2023) and beta-barrel enzymes (Sajeev-Sheeja and Zhang, Reference Sajeev-Sheeja and Zhang2024).
In May 2024, AlphaFold was upgraded to version 3 as AlphaFold 3, featuring an enhanced diffusion-based architecture that enables accurate prediction of multiple structures of protein complexes. Additionally, AlphaFold 3 extends its capabilities beyond protein structure prediction to include DNA, RNA, and small molecules including ligands and other proteins (Abramson et al., Reference Abramson, Adler, Dunger, Evans, Green, Pritzel, Ronneberger, Willmore, Ballard, Bambrick, Bodenstein, Evans, Hung, O’Neill, Reiman, Tunyasuvunakool, Wu, Žemgulytė, Arvaniti, Beattie, Bertolli, Bridgland, Cherepanov, Congreve, Cowen-Rivers, Cowie, Figurnov, Fuchs, Gladman, Jain, Khan, Low, Perlin, Potapenko, Savy, Singh, Stecula, Thillaisundaram, Tong, Yakneen, Zhong, Zielinski, Žídek, Bapst, Kohli, Jaderberg, Hassabis and Jumper2024). Notably, it can model interactions between odorants and the human olfactory receptor OR1A2, as well as spermidine with the trace amine receptor TAAR9 (Johnsson et al., Reference Johnsson, Karagöl, Karagöl and Zhang2024).
To build on top of our previous studies and utilize AlphaFold 3’s advanced capabilities, we used AlphaFold 3 to test the structural stability of the QTY analog megacomplex of the human mitochondrial respiratory system. In addition, we conducted bioinformatic studies using AlphaFold 3 to predict the protein–protein interactions of QTY analogs compared to their native structures. Here, we report the structural bioinformatic studies of experimentally determined Complex I and its AlphaFold 3-predicted water-soluble QTY analog. We also provide the superpositions of native and QTY analog proteins, their surface hydrophobicity analyses, and finally the protein–protein interaction analyses of the hydrophobic native Complex I megacomplex and their hydrophilic QTY analogs.
Results and discussion
The rationale of the QTY Code
The hydrophobic nature of the membrane proteins makes it challenging to study their structure and function. We asked if it is possible to systematically exchange the hydrophobic amino acids into hydrophilic ones to make these membrane proteins more water-soluble. Indeed, the structural similarities between the electron density maps of Q and L, T and V/I, and Y and F make it possible to systematically replace the hydrophobic amino acids with hydrophilic ones: leucine (L) with glutamine (Q), isoleucine (I) and valine (V) with threonine (T), and phenylalanine (F) with tyrosine (Y). While bringing changes to protein sequence and amino acid composition, the QTY analogs demonstrate reduced hydrophobic surfaces and exhibit similar isoelectric points (pI) and molecular weights (MW) when compared to the native transmembrane proteins (Table 2).
Table 2. The characteristics of integral membrane protein enzymes and their QTY analogs

Note: The twenty membrane proteins are listed in the same order as Figure 1. RMSDs were calculated after missing residuals (unstructured loops) in the native CryoEM-determined structures and the corresponding residuals in the predicted QTY structures were cut out. If the native protein was a dimer, one monomer was also cut out. The QTY amino acid substitutions in the transmembrane (TM) are significant between 26.09% and 66.67%, whereas the overall structural changes are between 4.90% and 37.36%.
Abbreviations: pI, isoelectric focusing; MW, molecular weight; TM, transmembrane; –, not applicable, and RMSD, residue mean square distance.
Protein sequence alignments and other characteristics
The protein sequences of the twenty mitochondrial proteins are aligned with their QTY analogs (Figure 1). The QTY substitution of the twenty proteins resulted in overall changes to their amino acid composition from 4.90% to 37.36% and changes in the transmembrane domain from 26.09% to 66.67%. Despite the changes to the structure and composition, the pI only changed slightly due to the neutral charges of Q (glutamine), T (threonine), and Y (tyrosine). Thus, the substitutions introduced by the QTY code do not add any basic or acidic amino acids. The MW of the proteins increased slightly due to the replacement of leucine (L: 131.17 Da) vs glutamine (Q: 146.14 Da), isoleucine (I: 131.17 Da), valine (V: 117.15 Da) vs threonine (T: 119.12 Da), and phenylalanine (F: 165.19 Da) vs tyrosine (Y: 181.19 Da).

Figure 1. Protein sequence alignments of twenty integral membrane enzymes with their water-soluble QTY analogs. The symbols | and * indicate whether amino acids are identical or different, respectively. Please note the Q, T, and Y amino acids (red) replacing L, V, I, and F, respectively. The alpha helices (blue) are shown above the protein sequences. The characteristics of natural and QTY analogs listed are isoelectric focusing (pI), molecular weight (MW), total variation %, and transmembrane variation %. The alignments are: a) NDUA1 vs NDUA1QTY, b) NDUA3 vs NDUA3QTY, c) NDUAB vs NDUABQTY, d) NDUAD vs NDUADQTY, e) NDUB1 vs NDUB1QTY, f) NDUB3 vs NDUB3QTY, g) NDUB4 vs NDUB4QTY, h) NDUB5 vs NDUB5QTY, i) NDUB6 vs NDUB6QTY, j) NDUB8 vs NDUB8QTY, k) NDUBB vs NDUBBQTY, l) NDUC1 vs NDUC1QTY, m) NDUC2 vs NDUC2QTY, n) NU1M vs NU1MQTY, o) NU2M vs NU2MQTY, p) NU3M vs NU3MQTY, q) NU4M vs NU4MQTY, r) NU5M vs NU5MQTY, s) NU6M vs NU6MQTY, and t) NU4LM vs NU4LMQTY. Although there are significant QTY changes in the TM alpha helices (26.09%–66.67%), their changes in MW and pI are insignificant. The protein alignment panels in Figure 1 are too small to visualize. For enlarged individual panels, please see Supplementary Information.
Superpositions of native CryoEM transmembrane enzymes and their water-soluble QTY analogs
We asked if the molecular structure of the twenty proteins in the mitochondrial Complex I is similar to their QTY analogs after applying the QTY substitution (Figure 2). The native structures of the mitochondrial complex are determined experimentally using CryoEM (PDB: 5XTC). The structures of the QTY analogs are predicted using AlphaFold 3. The superpositions of the transmembrane enzymes and their respective QTY analogs are: NDUA1 vs NDUA1QTY, NDUA3 vs NDUA3QTY, NDUAB vs NDUABQTY, NDUAD vs NDUADQTY, NDUB1 vs NDUB1QTY, NDUB3 vs NDUB3QTY, NDUB4 vs NDUB4QTY, NDUB5 vs NDUB5QTY,NDUB6 vs NDUB6QTY, NDUB8 vs NDUB8QTY, NDUBB vs NDUBBQTY, NDUC1 vs NDUC1QTY, NDUC2 vs NDUC2QTY, NU1M vs NU1MQTY, NU2M vs NU2MQTY, NU3M vs NU3MQTY, NU4M vs NU4MQTY, NU5M vs NU5MQTY, NU6M vs NU6MQTY, and NU4LM vs NU4LMQTY (Figure 2).

Figure 2. Superpositions of twenty human CryoEM-determined structures of membrane enzymes and their AlphaFold 3-predicted water-soluble QTY analogs. The CryoEM-determined structures of the native transporters are obtained from the Protein Data Bank (PDB). The CryoEM structures (magenta) are superposed with their QTY analogs (cyan) predicted by AlphaFold 3. These superposed structures show that the membrane proteins and their QTY analogs have very similar structures. For clarity of direct comparisons, unstructured loops in the CryoEM structures were removed in the QTY analogs. a) NDUA1 vs NDUA1QTY, b) NDUA3 vs NDUA3QTY, c) NDUAB vs NDUABQTY, d) NDUAD vs NDUADQTY, e) NDUB1 vs NDUB1QTY, f) NDUB3 vs NDUB3QTY, g) NDUB4 vs NDUB4QTY, h) NDUB5 vs NDUB5QTY, i) NDUB6 vs NDUB6QTY, j) NDUB8 vs NDUB8QTY, k) NDUBB vs NDUBBQTY, l) NDUC1 vs NDUC1QTY, m) NDUC2 vs NDUC2QTY, n) NU1M vs NU1MQTY, o) NU2M vs NU2MQTY, p) NU3M vs NU3MQTY, q) NU4M vs NU4MQTY, r) NU5M vs NU5MQTY, s) NU6M vs NU6MQTY, and t) NU4LM vs NU4LMQTY.
The structures of the native mitochondrial proteins superposed well with their QTY analogs, with root mean square deviation (RMSD) ranging from 0.315Å to 1.302Å with one exception of NDUB1, which has a slightly higher RMSD of 2.316Å (Table 2). Overall, the low RMSD indicates both the capability of AlphaFold 3 in predicting the structures of novel protein designs and the minimal structural change in the QTY analogs compared to their native counterparts.
Superpositions of AlphaFold 3-predicted native transmembrane enzymes and their water-soluble QTY analogs
We also ask how well the AlphaFold 3-predicted mitochondrial membrane proteins superpose with their QTY analogs (Figure 3). The structures superposed very well with low RMSD (Figure 3): a) NDUA1 vs NDUA1QTY (RMSD = 0.637Å), b) NDUA3 vs NDUA3QTY (RMSD = 0.400Å), c) NDUAB vs NDUABQTY (RMSD = 0.374Å), d) NDUAD vs NDUADQTY (RMSD = 0.570Å), e) NDUB1 vs NDUB1QTY (RMSD = 2.180Å), f) NDUB3 vs NDUB3QTY (RMSD = 1.110Å), g) NDUB4 vs NDUB4QTY (RMSD = 0.687Å), h) NDUB5 vs NDUB5QTY (RMSD = 0.511Å), i) NDUB6 vs NDUB6QTY (RMSD = 3.127Å), j) NDUB8 vs NDUB8QTY (RMSD = 0.773Å), k) NDUBB vs NDUBBQTY (RMSD = 0.478Å), l) NDUC1 vs NDUC1QTY (RMSD = 1.283Å), m) NDUC2 vs NDUC2QTY (RMSD = 0.184Å), n) NU1M vs NU1MQTY (RMSD = 0.308Å), o) NU2M vs NU2MQTY (RMSD = 0.390Å), p) NU3M vs NU3MQTY (RMSD = 0.837Å), q) NU4M vs NU4MQTY (RMSD = 0.270Å), r) NU5M vs NU5MQTY (RMSD = 0.262Å), s) NU6M vs NU6MQTY (RMSD = 0.541Å), t) NU4LM vs NU4LMQTY (RMSD = 0.528Å).

Figure 3. Superpositions of AlphaFold 3-predicted structures of native and their QTY enzyme analogs. Color code: green = AlphaFold 3-predicted native structures; cyan = AlphaFold 3-predicted water-soluble QTY analogs. a) NDUA1 vs NDUA1QTY (RMSD = 0.637Å), b) NDUA3 vs NDUA3QTY (RMSD = 0.400Å), c) NDUAB vs NDUABQTY (RMSD = 0.374Å), d) NDUAD vs NDUADQTY (RMSD = 0.570Å), e) NDUB1 vs NDUB1QTY (RMSD = 2.180Å), f) NDUB3 vs NDUB3QTY (RMSD = 1.110Å), g) NDUB4 vs NDUB4QTY (RMSD = 0.687Å), h) NDUB5 vs NDUB5QTY (RMSD = 0.511Å), i) NDUB6 vs NDUB6QTY (RMSD = 3.127Å), j) NDUB8 vs NDUB8QTY (RMSD = 0.773Å), k) NDUBB vs NDUBBQTY (RMSD = 0.478Å), l) NDUC1 vs NDUC1QTY (RMSD = 1.283Å), m) NDUC2 vs NDUC2QTY (RMSD = 0.184Å), n) NU1M vs NU1MQTY (RMSD = 0.308Å), o) NU2M vs NU2MQTY (RMSD = 0.390Å), p) NU3M vs NU3MQTY (RMSD = 0.837Å), q) NU4M vs NU4MQTY (RMSD = 0.270Å), r) NU5M vs NU5MQTY (RMSD = 0.262Å), s) NU6M vs NU6MQTY (RMSD = 0.541Å), and t) NU4LM vs NU4LMQTY (RMSD = 0.528Å).
The RMSD of NDUB1 (RMSD = 2.180Å) and NDUB6 (RMSD = 3.127Å) shows that AlphaFold 3 might not be as accurate in the prediction of these two proteins. The overall low RMSD shows that the AlphaFold 3 predicted water-soluble QTY analogs share very similar structures with their native transmembrane proteins.
Superpositions of CryoEM structures with AlphaFold 3-predicted native transmembrane enzymes and their water-soluble QTY analogs
To combine the CryoEM-determined native structures, AlphaFold 3-predicted native proteins, and AlphaFold 3-predicted QTY analogs, we superpose all three structures together to get a holistic view of how similar these structures are. The three different kinds of structures superposed very well (Figure 4). The superposed structures all seem reasonable and superposed well.

Figure 4. Superpositions of CryoEM structures with AlphaFold 3-predicted native integral membrane enzymes and their water-soluble QTY analogs. Superposition of i) the experimentally determined CryoEM structures (magenta) with ii) AlphaFold 3-predicted structures (green) and iii) AlphaFold 3-predicted water-soluble QTY analog structures (cyan). These superpositions are shown in Figure 4. These three different kinds of structures are apparently superposed very well. The differences and variations are insignificant.
a) NDUA1CryoEM/NDUA1Native/NDUA1QTY, b) NDUA3CryoEM/NDUA3Native/NDUA3QTY, c) NDUABCryoEM/NDUABNative/NDUABQTY, d) NDUADCryoEM/NDUADNative/NDUADQTY, e) NDUB1CryoEM/NDUB1Native/NDUB1QTY, f) NDUB3CryoEM/NDUB3Native/NDUB3QTY, g) NDUB4CryoEM/NDUB4Native/NDUB4QTY, h) NDUB5CryoEM/NDUB5Native/NDUB5QTY, i) NDUB6CryoEM/NDUB6Native/NDUB6QTY, j) NDUB8CryoEM/NDUB8Native/NDUB8QTY, k) NDUBBCryoEM/NDUBBNative/NDUBBQTY, l) NDUC1CryoEM/NDUC1Native/NDUC1QTY, m) NDUC2CryoEM/NDUC2Native/NDUC2QTY, n) NU1MCryoEM/NU1MNative/NU1MQTY, o) NU2MCryoEM/NU2MNative/NU2MQTY, p) NU3MCryoEM/NU3MNative/NU3MQTY, q) NU4MCryoEM/NU4MNative/NU4MQTY, r) NU5MCryoEM/NU5MNative/NU5MQTY, s) NU6MCryoEM/NU6MNative/NU6MQTY, and t) NU4LMCryoEM/NU4LMNative/NU4LMQTY.
Analysis of the hydrophobic surface of native transmembrane enzymes and their water-soluble QTY analogs
To study these hydrophobic transmembrane enzymes, they need to be separated from their lipid bilayer membranes using detergents, which disrupt the interactions between the membrane enzymes and solubilize the transmembrane proteins. Without proper detergent for isolation, the hydrophobic nature of these enzymes causes them to aggregate and precipitate, leading to a loss in biological function.
The hydrophobic surfaces are represented in yellowish patches (Figure 5). For clarity of view, the extramembrane region is disregarded to clearly view the changes in the hydrophobic patches originating from the transmembrane domains of the proteins. The transmembrane domains are embedded within the hydrophobic lipid bilayer, where nonpolar and hydrophobic amino acids including Leucine (L), Isoleucine (I), Valine (V), Phenylalanine (F), Methionine (M), Tryptophan (W), and Alanine (A) exclude water by interacting with lipid molecules.

Figure 5. Hydrophobic surface of six integral membrane enzymes and their water-soluble QTY analogs. The native proteins have many hydrophobic residues L, I, V, and F in the transmembrane helices. After Q, T, and Y substitutions of L, I and V, and F respectively, the hydrophobic surface patches (yellowish) in the transmembrane helices become more hydrophilic (cyan). For clarity of direct comparisons, unstructured loops in the CryoEM structures were removed in the QTY analogs. a) NDUA1 vs NDUA1QTY, b) NDUA3 vs NDUA3QTY, c) NDUAB vs NDUABQTY, d) NDUAD vs NDUADQTY, e) NDUB1 vs NDUB1QTY, f) NDUB3 vs NDUB3QTY, g) NDUB4 vs NDUB4QTY, h) NDUB5 vs NDUB5QTY, i) NDUB6 vs NDUB6QTY, j) NDUB8 vs NDUB8QTY, k) NDUBB vs NDUBBQTY, l) NDUC1 vs NDUC1QTY, m) NDUC2 vs NDUC2QTY, n) NU1M vs NU1MQTY, o) NU2M vs NU2MQTY, p) NU3M vs NU3MQTY, q) NU4M vs NU4MQTY, r) NU5M vs NU5MQTY, s) NU6M vs NU6MQTY, and t) NU4LM vs NU4LMQTY.
After applying the QTY code to replace the hydrophobic amino acids L, I/V, and F, with hydrophilic amino acids glutamine (Q), threonine (T), and tyrosine (Y), the hydrophobic surface areas are significantly reduced. More importantly, since the electron density of the amino acids replaced are similar, the alpha-helix structure of the QTY analogs retained its structural integrity and stability, an observation that is consistent with previous experiments performed on chemokine, cytokine receptors, and bacterial histidine kinase (Zhang et al., Reference Zhang, Tao, Qing, Tang, Skuhersky, Corin, Tegler, Wassie, Wassie, Kwon, Suter, Entzian, Schubert, Yang, Labahn, Kubicek and Maertens2018; Hao et al., Reference Hao, Jin, Zhang and Qing2020; Li et al., Reference Li, Tang, Qing, Wang, Xu, Zhang and Tao2024).
DockQ score of AlphaFold 3-predicted water-soluble QTY analog megacomplex
The DockQ score shows the quality of an interface of a model compared with the native structure, which combines the fraction of native contacts (Fnat), ligand root mean square deviation (LRMS), and interface root mean square deviation (iRMS) standardized by the CAPRI criteria to produce a score from 0 to 1 (Basu and Wallner, Reference Basu and Wallner2016). The DockQ can be used to evaluate the quality of protein docking models, where a value exceeding 0.80 implies high accuracy, between 0.80 and 0.49 medium accuracy, and between 0.49 and 0.23 acceptable accuracy (Zhu et al., Reference Zhu, Shenoy, Kundrotas and Elofsson2023).
The overall DockQ score for the native CI complex and its AlphaFold 3 predicted QTY-analog complex yielded a score of 0.712, which suggests a medium-quality docking. Additionally, DockQ analyzed the 49 interfaces, which produced a median DockQ score of 0.731, confirming the medium to high quality of the prediction. The median Fnat of 0.5 suggests that approximately half of the native contacts are preserved in the QTY-analog structure. The median LRMS of 1.965Å and median iRMS of 0.905Å demonstrate the ligand’s overall alignment with the native structure and highly accurate interface alignment, respectively. These results suggest that the QTY-analog of CI retains a high degree of structural fidelity to the native complex (Table 3).
Table 3. The DockQ score of QTY analog of Mitochondrial Complex I and 49 native interfaces

Note: The evaluation included 50 interface regions, yielding a DockQ score of 0.712 when comparing the native CI structure and its QTY-analog as predicted by AlphaFold 3, which indicates medium quality (
$ 0.49\le DockQ<0.80\Big) $
docking. The median of DockQ score for the 49 additional interface is 0.731, Fnat is 0.5, LRMS is 1.965Å, and iRMS is 0.905Å. These results suggest that the QTY-analog retains a high degree of structural fidelity to the native complex.
Abbreviations: Fnat, fraction of native contacts; LRMS, ligand root mean square deviation; iRMS, interface root mean square deviation; and –, not applicable.
Superpositions of CryoEM megacomplex structures with AlphaFold 3-predicted native transmembrane enzymes and their water-soluble QTY analog megacomplex
The individual enzymes in the mitochondrial Complex CI are shown to be apt for QTY substitution, with their QTY analogs showing high structural similarities to their native forms. We ask whether these proteins will maintain their original interactions to form a similar complex after applying the QTY code.
We first used AlphaFold 3 to predict the mitochondrial Complex CI, which contains twenty membrane proteins. Then, we superposed the CryoEM-determined native structure with their QTY analog (Figure 6). The complex superposed well (RMSD = 1.647Å). The high structural similarity not only shows AlphaFold 3’s capability in predicting protein–protein interactions well but also indicates the feasibility of applying the QTY substitution systematically to an entire complex while still maintaining its original function.

Figure 6. Superpositions of CryoEM-determined structures of mitochondrial transmembrane Complex I megacomplex and its AlphaFold 3-predicted water-soluble QTY analogs. The CryoEM-determined structures of the mitochondrial complex are obtained from the Protein Data Bank (PDB). The CryoEM structure (magenta) is superposed with its QTY analog (cyan) predicted by AlphaFold 3. These superposed structures show that the membrane complex and its QTY analog have very similar structures (RMSD = 1.601Å). For clarity of direct comparisons, unstructured loops in the CryoEM structure were removed in the QTY analogs.
AlphaFold 3 predictions
DeepMind released AlphaFold 3 in May 2024, marking a significant leap in accuracy for modeling across biomolecular space. This latest iteration outperforms state-of-the-art docking tools and its predecessor, AlphaFold-Multimer v.2.3, in protein structure and protein–protein interaction predictions (Abramson et al., Reference Abramson, Adler, Dunger, Evans, Green, Pritzel, Ronneberger, Willmore, Ballard, Bambrick, Bodenstein, Evans, Hung, O’Neill, Reiman, Tunyasuvunakool, Wu, Žemgulytė, Arvaniti, Beattie, Bertolli, Bridgland, Cherepanov, Congreve, Cowen-Rivers, Cowie, Figurnov, Fuchs, Gladman, Jain, Khan, Low, Perlin, Potapenko, Savy, Singh, Stecula, Thillaisundaram, Tong, Yakneen, Zhong, Zielinski, Žídek, Bapst, Kohli, Jaderberg, Hassabis and Jumper2024). AlphaFold 3 reduces the reliance on multiple sequence alignment by integrating a diffusion-based model, enabling it to predict a broader spectrum of biomolecules, including ligands, ions, nucleic acids, modified residues, and large protein megacomplexes. On October 9, 2024, the Nobel Prize in Chemistry was awarded to DeepMind’s founders, Demis Hassabis and John Jumper, for their contribution in revolutionizing how computation machine learning/AI advance structural biology.
AlphaFold 3 is easily accessible online (https://alphafoldserver.com), allowing users to make 20 predictions a day. The structures of the QTY analogs were predicted using the AlphaFold 3 server, which was run free of charge and the results were produced within a few minutes.
DeepMind also collaborates with the EBI to make over 214 million predicted protein structures available through the AlphaFold Protein Structure Database (https://alphafold.ebi.ac.uk). This number is continuously expanding, with the quality of predictions improving further with the advent of AlphaFold 3.
Despite its advancements, AlphaFold 3 still has limitations, which we encountered in our study. One major constraint is its ability to predict structures with a maximum length of 5,000 residues. While our initial plan was to analyze the entire mitochondrial complexes CI, CII, and CIV, we quickly realized that the AlphaFold server could not process such large and intricate structures. Even for complexes within the 5,000-residue limit, AlphaFold 3 occasionally fails to generate predictions. Fortunately, mitochondrial complex CI fell within this threshold, allowing us to leverage AlphaFold 3’s capabilities successfully.
The integral transmembrane protein megacomplex in this study
In this study, using the advanced capability of AlphaFold 3, we extended the QTY code to megacomplex protein structures and investigated whether the resulting QTY analogs retain their native protein–protein interactions. The mitochondrial Complex I selected for this study is critical in the electron transport and ATP production in the heart, skeletal muscle, brain, liver, and kidney. By reducing the hydrophobicity of this complex, we hope to gain deeper insights into the highly efficient coupling of electron transfer and proton pumping, as well as the associated conformational changes within the megacomplex.
The water-soluble QTY analogs generated in this study hold significant potential: (i) they may help validate and generalize the QTY code system for more intricate protein assemblies, ii) some of these individual QTY code-engineered membrane protein analogs in this megacomplex could be used as water-soluble antigens to generate therapeutic mAbs, and iii) the mitochondrial Complex I could also serve as promising therapeutic targets for the treatment of various neurodegenerative diseases.
Conclusion
Proteins can generally be classified into two groups: Class I (hydrophilic) and Class II (hydrophobic) (Branden and Tooze, Reference Branden and Tooze1999; Fersht, Reference Fersht2017; Zhang and Egli, Reference Zhang and Egli2022). More specifically, proteins often consist of three analogs of alpha-helices: i) Type I, composed of hydrophilic amino acids (D, E, H, N, Q, K, R, S, T, and Y), which are commonly found in water-soluble globular proteins; ii) Type II, composed of hydrophobic amino acids (L, I, V, F, M, A, W, and P), typically located in the helical transmembrane regions of membrane proteins; and iii) Type III, amphiphilic helices, containing nearly equal proportions of hydrophilic and hydrophobic amino acids that partition into distinct hydrophobic and hydrophilic faces (Branden and Tooze, Reference Branden and Tooze1999; Fersht, Reference Fersht2017). Inspired by the exceptional water solubility of hemoglobin, a protein predominantly composed of alpha-helices, we developed the QTY code to systematically replace hydrophobic α-helices with hydrophilic ones. This approach leverages insights from high-resolution (1.5Å) electron density maps of 20 amino acids, which revealed structural similarities between hydrophobic and hydrophilic amino acid pairs: leucine (L) to glutamine (Q), isoleucine (I)/valine (V) to threonine (T), and phenylalanine (F) to tyrosine (Y).
Thus far, because of AlphaFold 3’s recent release, very few protein megacomplexes have been studied using AlphaFold 3. In this study, we applied the QTY code to mitochondrial Complex I to engineer water-soluble QTY analogs. To evaluate the structural impact of these modifications, we used AlphaFold 3 to predict the structures of the QTY analogs and superimposed them onto their respective native protein structures. Additionally, we employed a suite of in silico computational and bioinformatic tools to analyze sequence and structural features related to protein stability and water solubility. Our findings demonstrated that the QTY code effectively reduced the hydrophobic surfaces of the proteins while maintaining high structural similarity between the QTY analogs and their native counterparts. Furthermore, the QTY analogs retained their structural integrity, as they successfully assembled into a megacomplex I structure comparable to the CryoEM-determined native megacomplex. These hydrophilic proteins can now be used as water-soluble antigens for the discovery of therapeutic mAbs for the treatment of a wide range of neurodegenerative diseases.
Methods
Protein sequence alignments and other characteristics
The native protein sequences for NDUA1, NDUA3, NDUAB, NDUAD, NDUB1, NDUB3, NDUB4, NDUB5, NDUB6, NDUB8, NDUBB, NDUC1, NDUC2, NU1M, NU2M, NU3M, NU4M, NU5M, NU6M, and NU4LM. were obtained from UniProt (https://www.uniprot.org). The sequences for the QTY analogs were aligned using the same methods as previously described. The MWs and pI values of the proteins were calculated using the Expasy (https://web.expasy.org/compute_pi/)
AlphaFold 3 predictions
The protein structures of the QTY analogs were predicted using the AlphaFold 3 server (https://alphafoldserver.com/). PBD files for the predicted native protein structures were obtained from The EBI (https://alphafold.ebi.ac.uk), which contains all AlphaFold 3 predicted structures for native proteins. The UniProt website (https://www.uniprot.org) provided protein ID, entry name, description, and FASTA sequence for each native protein. The QTY code can be applied to FASTA sequences through the QTY method website (https://pss.sjtu.edu.cn/). The website also provides MWs, pI values, TM variation, and overall variation.
Superposed structures
PBD files for native protein structures experimentally determined by CryoEM were taken from the PDB: 5XTC. Predictions for the QTY analogs were carried out using the AlphaFold 3 server, which can be found at https://alphafoldserver.com/. These structures were superposed using the PyMOL “super” command and the RMSDs were calculated based on Ca atoms (https://pymol.org). For simplicity and clarity, unstructured loops and extraneous protein monomers were removed from the figures.
Structure visualization
PyMOL (https://pymol.org) was used to superpose the native protein structure and the QTY analog. UCSF Chimera (https://www.rbvi.ucsf.edu/chimera) was used to render each protein model with hydrophobicity patches.
Docking evaluation
DockQ (http://github.com/bjornwallner/DockQ/) was used to assess the quality of protein docking models of the QTY analog of Mitochondrial Complex I.
Data availability of AlphaFold 3-predicted water-soluble QTY analogs
EBI (https://alphafold.ebi.ac.uk) serves as a database that provides open access to more than 214 million AlphaFold 3-predicted protein structures. Protein characteristics used in the analysis are available on UniProt (https://www.uniprot.org/). The native CryoEM-determined six integral membrane protein enzymes are available in the RCSB PDB repository (https://www.rcsb.org/). The QTY code-designed water-soluble analogs of the human integral membrane protein enzymes are available at https://github.com/EdwardChen777/mitochondrial_complex_I. The AlphaFold 3 predicted QTY code-designed water-soluble analogs of the 20 mitochondrial CI subunits are available at https://doi.org/10.5281/zenodo.14584403. The AlphaFold 3 predicted QTY code designed water-soluble analogs of mitochondrial CI is available at https://modelarchive.org/doi/10.5452/ma-s328f. If additional information is needed, please contact the Edward Chen at [email protected].
Open peer review
To view the open peer review materials for this article, please visit http://doi.org/10.1017/qrd.2025.2.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/qrd.2025.2.
Author contribution
Conceptualization: S.Z. Formal analysis: E.C. Investigation and methodology: E.C. Validation: E.C. Data curation: E.C. Writing—original draft preparation: E.C. and S.Z. Review and editing: E.C. and S.Z.
Financial support and disclosure
E.C. is a student in transition who is applying for a Ph.D. in computer science and bioengineering for graduate study. There is no financial support for this digital structural bioinformatic study use only free online tools.
Competing interest
Massachusetts Institute of Technology (MIT) filed several patent applications for the QTY code for GPCRs excluding the olfactory receptors. OH2Laboratories licensed the technology from MIT to work on water-soluble GPCR analogs. S.Z. is an inventor of the QTY code and has a minor equity in OH2Laboratories. S.Z. is a Scientific Advisor and has minor shares for a startup RealNose to develop a sensing device based on olfactory receptors. S.Z. founded a startup 511 Therapeutics to generate therapeutic monoclonal antibodies against solute carrier transporters to treat pancreatic cancer. S.Z. has majority equity in 511 Therapeutics. E.C. declares no competing interest.
Ethics statement
All methods were carried out in accordance with relevant guidelines and regulations. All experimental protocols were approved by a named institutional and licensing committee. Neither human biological samples nor human subjects were used in the study. This is a completely digital structural bioinformatic study using the publicly available AlphaFold 3 machine learning program.
Additional statement
(1) All methods were carried out in accordance with relevant guidelines and regulations. (2) All experimental protocols were approved by a named institutional and licensing committee. (3) Neither human biological samples nor human subjects were used in the study. This is a completely digital structural bioinformatic study using the publicly available AlphaFold 3 machine learning program.
Comments
Prof. Bengt Nordén
Editor in Chief
QRB Discovery
DearProf. Nordén,
I herewith submit a manuscript titled: “Structural bioinformatic study of human mitochondrial respiratory integral membrane megacomplex and its AlphaFold3 predicted water-soluble QTY megacomplex analog” for consideration.
Because of AlphaFold3 recently release in May 2024, so far very few protein megacomplexes have been studied using AlphaFold3. AlphaFold3 can predicted protein-protein, protein-DNA/RNA/small molecule interactions. This is impossible for AlphaFold2 which can only predict individual protein structures. In this study, we applied the QTY code to mitochondrial Complex I to engineer water-soluble QTY analogs. The mitochondrial Complex I selected for this study is critical in the electron transport and ATP production in heart, skeletal muscle, brain, liver, and kidney.
Mitochondrial Complex I is one of the largest multi-subunit membrane protein megacomplexes, which plays a critical role in oxidative phosphorylation and ATP production. However, studying its structure and the mechanisms underlying proton translocation remains challenging due to the hydrophobic nature of its transmembrane parts. In this structural bioinformatic study, we used the QTY code to reduce the hydrophobicity of megacomplex I, while preserving its structure and function. We also facilitate the bioinformatics analysis of twenty key enzymes in the integral membrane parts. We compare their native structure, determined through Cryo-electron microscopy (CryoEM), with their water-soluble QTY analogs predicted by AlphaFold3. Leveraging AlphaFold3’s advanced capabilities in predicting protein-protein interactions, we further explore whether the QTY-modified membrane enzymes maintain their binding interactions necessary to form the functional megacomplex. Our structural bioinformatics analysis demonstrates the feasibility of engineering more water-soluble membrane proteins using the QTY code and highlights its potential to facilitate drug inhibitor design, offering promising implications for the treatment of various diseases.
If you have any questions, please contact me.
Yours sincerely,
Shuguang Zhang, Ph.D.