- 2D
two-dimensional
- IBD
inflammatory bowel disease
- LC
liquid chromatography
- MALDI
matrix-assisted laser desorption/ionisation
- MFGM
milk fat globule membrane
- MS/MS
tandem MS
- m/z
mass-to-charge
- PTM
post-translational modifications
Proteins and peptides as ingredients and biomarkers
Proteins form a major class of macronutrients and they participate in most biological processes in the body. Enzymes are those proteins that catalyse virtually every biochemical reaction involved in metabolism (e.g. the digestive enzymes pepsin and trypsin). Proteins also ensure structural and mechanical functions and they are involved in cell signalling and immune response. Therefore, the body needs relatively large amounts of proteins to ensure proper function and to cope with the continuous synthesis and degradation (protein turnover).
Proteins and peptides are composed of amino acids that are arranged in a sequential fashion to form higher structures and functions. The essential amino acids cannot be synthesised by the body and must therefore be taken up with the diet(Reference Young1) (i.e. for adults: leucine, isoleucine, valine, lysine, threonine, tryptophan, methionine, phenylalanine and histidine). Food proteins differ in their amino acid composition depending on their origin, i.e. animal or plant source. Consequently, a balanced diet covers proteins from complementary sources (e.g. meat, vegetables, cereals, grains and legumes) in order to avoid amino acid deficiencies. The nutritional quality of proteins is described by the amino acid composition, digestibility and absorptive ability(Reference Desai and Dekker2).
Proteins, by nature, are key actors in all biological processes in the human organism. Proteomics is a powerful tool for the elucidation of such molecular events related to nutrition: it can identify and quantify bioactive proteins and peptides, shed light on their effects at protein/peptide level (biomarkers) and thereby addresses questions of nutritional bioefficacy. In this paper, we will first describe how proteins are analysed in a holistic, high-throughput fashion; and in the second part we will review major applications of proteomics in the nutrition field.
Proteomic technologies
Proteomics builds on a combination of several techniques and has been propelled by technological progresses including genome sequencing, biomolecule separation, MS and bioinformatics. The discipline has evolved as an analogue to genomics and has traditionally aimed at identifying all proteins present in a given sample at a given time. Over the last two decades, proteomics has been developed into an established technology for biomarker discovery(Reference Lescuyer, Hochstrasser and Rabilloud3, Reference Schrattenholz and Groebe4), clinical applications(Reference Mischak, Coon and Novak5), disease profiling and diagnostics(Reference Marko-Varga, Lindberg and Lofdahl6, Reference Vitzthum, Behrens and Anderson7), the study of protein interactions(Reference Gingras, Gstaiger and Raught8) and of the dynamics of signalling pathways(Reference Scholten, van Veen and Vos9).
The proteome is highly dynamic and constantly changing in response to environmental stimuli including nutrition. Nutritional proteomics holds great promise to (a) profile and characterise body and dietary proteins, including digestion and absorption of the latter; (b) identify biomarkers of nutritional status and health/disease conditions; and (c) understand functions of nutrients and other dietary factors in growth, reproduction and health(Reference Wang, Li and Dangott10).
The classical proteomic workflow, sometimes referred to as bottom-up approach, combines techniques of protein separation, protein digestion into peptides, mass spectrometric analysis, protein identification by comparison with protein databases and protein quantification(Reference Aebersold and Mann11). The development of gentle ionisation strategies such as electrospray ionisation(Reference Fenn, Mann and Meng12) and matrix-assisted laser desorption/ionisation (MALDI)(Reference Karas and Hillenkamp13) MS, as well as the publication of genome sequences of many organisms including human were breakthrough factors for the successful development and deployment of proteomics.
The sequence of amino acids defines the primary structure of a protein. Secondary, tertiary and quaternary structures concern the arrangement of the amino acid chain in space, resulting from covalent or non-covalent interactions. Nowadays, the primary structures of proteins are routinely analysed by MS. Post-translational modifications (PTM) and protein–ligand interactions can also be elucidated this way.
Protein separation on gels and columns
A major analytical challenge of proteomics is the dynamic range of protein concentrations (e.g. estimated 1012 in human blood)(Reference Jacobs, Adkins and Qian14). Current MS-based proteomic platforms can deliver a dynamic range of 104. This means that the low-abundant proteome has to be addressed by depletion of the most abundant proteins(Reference Gong, Li and Yang15) or by selective enrichment of low-abundant proteins(Reference Mechref, Madera and Novotny16–Reference Wollscheid, Bausch-Fluck and Henderson18). After protein depletion and/or enrichment, further separation is performed at protein and/or peptide level, based on two-dimensional (2D) gels or on liquid chromatography (LC) or on hybrid approaches (Gel-LC). Fig. 1 summarises the classical proteomic workflows.
The gel strategy for protein separation offers the advantage of visualisation of proteins, and, to some extent, of their modifications and, therefore, it preserves the protein context. After protein separation on gel, the gel spots are then excised, digested with trypsin and further processed in LC-tandem MS (MS/MS) or MALDI-MS instruments for protein identification. However, major drawbacks of this method are that (i) hydrophobic proteins such as membrane proteins, very basic proteins and low-abundant proteins are not well represented on the gels and (ii) the procedure is low throughput and difficult to standardise and automate.
The ‘gel-free’ discovery approach for protein identification is referred to as ‘shotgun’ analysis: here, the sample or protein mixture is directly digested in solution. The resulting peptide mixture is then separated on an HPLC usually coupled online to a mass spectrometer for peptide mass identification. The development of ultra-HPLC delivers improved peptide separation(Reference Eschelbach and Jorgenson19) and column efficiency by means of higher mass sensitivity, analytical resolution and speed. The resolving capability of the HPLC separation can be enhanced with 2D HPLC techniques(Reference Washburn, Wolters and Yates20) that combine ion-exchange chromatography followed by reverse-phase HPLC.
Peptides can also be separated using gas-phase fractionation, which is defined as iterative mass spectrometric interrogations of a sample over multiple smaller mass-to-charge (m/z) ranges. By doing so, a higher number of unique peptides compared to the ions selected from the wide mass range scan in standard LC-MS/MS analysis can be addressed. Gas-phase fractionation is described as a means to achieve higher proteome coverage than classical LC-MS/MS analyses of a complex peptide mixture(Reference Blonder, Rodriguez-Galan and Lucas21–Reference Yi, Marelli and Lee23).
Protein identification by MS
Mass spectrometers identify proteins and peptides by determination of their exact masses and by generating information on their amino acid sequences. Today the main ionisation methods deployed are electrospray ionisation(Reference Fenn24) and MALDI(Reference Tanaka25). These ion sources are combined with various mass analysers that separate the ions by m/z. The most popular analysers in proteomics are ion traps, triple-quadrupoles, time-of-flight tubes, orbitrap and Fourier-transform ion cyclotron resonance, with their specific advantages: high sensitivity and multiple-stage fragmentation for ion traps; high selectivity for triple-quadrupoles; high sensitivity and speed for time-of-flight. Current top-end proteomic machines are orbitraps(Reference Makarov, Denisov and Kholomeev26) and Fourier-transform ion cyclotron resonance instruments(Reference Nielsen, Savitski and Zubarev27), which provide very high mass accuracy and resolution compared to the other analysers. MS/MS consists of either the fragmentation of a selected precursor peptide ion to generate specific fragment ions for sequence elucidation (data-dependent acquisition); or uncoupled acquisitions of intact and fragment masses with retrospective reconstitution of the parent–daughter ion context (data-independent acquisition; see Emerging Technologies)(Reference Panchaud, Scherl and Shaffer28).
Considering the amount of data generated in a single shotgun run, search algorithms have been developed to process the raw data automatically and to compare the measured MS/MS spectra against theoretical fragment ion spectra generated by in silico digestion of protein databases. Comparison of experimental with theoretical spectra results in a list of possible peptide matches, each with an associated score that quantifies the quality of the match. The peptide(s) with the highest score(s) is(are) generally considered for protein identification. Several of these search engines have been commercialised (e.g. Sequest, Mascot or Phenyx), but others are freely available (open source programs, e.g. XTandem or OMSSA). They usually agree on 80% of the identifications(Reference Deutsch, Lam and Aebersold29). The two most established applications are Sequest(Reference Eng, McCormack and Yates30) and Mascot(Reference Perkins, Pappin and Creasy31).
Identified peptides and proteins can be further validated by software that applies statistical algorithms to calculate additional scores and probabilities, thereby distinguishing correct from incorrect assignments. PeptideProphet and ProteinProphet from the TransProteomic Pipeline (ISB, Seattle)(Reference Keller, Nesvizhskii and Kolker32, Reference Nesvizhskii, Keller and Kolker33) determine respectively at peptide and protein level the probabilities of correctness associated with false discovery rate. These tools provide the researcher with means to assess the quality of the data in a dataset-dependent manner and to control the trade-off between false positives (specificity) and false negatives (sensitivity)(Reference Urfer, Grzegorczyk and Jung34). The second strategy to elucidate the false-positive/false-negative relationship relies on a database search using a target-decoy database(Reference Elias and Gygi35), which is typically generated by reversing the sequences of target protein database. The search is done against the target and the decoy database and allows the estimation of false positives.
Protein quantification
Once qualitative analysis of a sample is achieved (first discovery mode, protein catalogue), very often quantitative information is required to obtain more insight into proteome differences between conditions or over time.
Relative quantification enables the comparison of two or more biological samples/conditions and the identification of (candidate) biomarkers, i.e. proteins that are more or less abundant (up- or down-regulated) in a certain condition compared to another.
The classical method for relative quantification evolved from 2D gels: 2D difference gel electrophoresis is characterised by differential labelling of proteins with fluorescent dyes prior to separation according to isoelectric point and molecular weight(Reference Unlu, Morgan and Minden36). An internal standard is used to match the protein patterns across gels thereby facilitating gel alignment, spot matching and quantification. Dedicated software (e.g. Progenesis SameSpots and DeCyder) allow for gel comparison and spot quantification.
Other procedures for relative quantification of proteins rely on metabolic or chemical labelling of proteins by incorporation of stable isotopes (usually 13C or 15N) in the samples to be compared and quantified at MS level. A summary of popular chemical labels used for relative protein quantification is shown on Fig. 2 that depicts a generic peptide with its side chains and modification options. Labelling of proteins and peptides is performed by in vitro chemical or enzymatic derivatisation and include for example: isotope-coded affinity tag(Reference Gygi, Rist and Gerber37), isotope tags for relative and absolute quantification(Reference Ross, Ambrose and Cutler38), isotope-coded protein label(Reference Schmidt, Kellermann and Lottspeich39) or aniline and benzoic acid labelling (Reference Panchaud, Hansson and Affolter40). The differentially labelled samples are mixed and infused into an LC-MS/MS instrument. While these techniques involving labelling are accurate, they need specific sample preparation and involve added costs due to stable-isotope labelled reagents. In contrast to chemical labelling, stable isotope labelling by amino acids in cell culture(Reference Ong, Blagoev and Kratchmarova41) consists in labelling of proteins already during cell growth and division by incorporation of labelled amino acids.
Recently, methods for label-free relative quantification have been developed. In this case, each sample is separately analysed by LC-MS/MS and resulting data are processed using software designed for LC-MS/MS run alignment, extraction of peptide intensities and peptide counts (e.g. Progenesis LCMS, DeCyder MS, SuperHirn, SpecArray or MSQuant). Currently, two label-free quantification strategies can be used: (a) measuring and comparing the mass spectrometric signal intensity of peptide precursor ions belonging to a particular protein(Reference Chelius and Bondarenko64, Reference Mueller, Rinner and Schmidt65); and (b) counting and comparing the number of fragment spectra identifying peptides of a given protein(Reference Liu, Sadygov and Yates66). In comparison with stable isotopes, label-free protein quantification is simpler to perform; there is technically no limit to the number of sample/conditions to be compared (except the analytical capacity of software/computers/servers); and it can yield an improved proteome coverage with a broader dynamic range(Reference Bantscheff, Schirle and Sweetman67). However, label-free proteome comparisons are limited to not too complex proteomes, otherwise the peptide-to-peptide alignment becomes too difficult due to LC elution and m/z overlaps.
Absolute quantification of proteins relies on the addition of an internal standard at a known concentration. The Absolute QUAntification method(Reference Gerber, Rush and Stemman68) uses synthetic stable-isotope labelled proteotypic peptides as internal standards that are otherwise identical to the peptides to be quantified. This approach requires preliminary analyses to select the peptides to be used to quantify the protein(s) of interest (proteotypic peptides). Recently, the QconCAT technology has been developed for parallel production of labelled proteotypic peptides which are then used in multiplexed quantification assays(Reference Rivers, Simpson and Robertson69). QconCAT consists of an artificial gene, inserted into a vector for expression in Escherischia coli, which is designed to express artificial proteins comprising a concatenation of proteotypic peptides. This latter technology is highly useful for repetitive, multiplexed analysis, i.e. a protein assay-like situation.
The systematic large-scale approach for absolute protein quantification has been referred to as selected-reaction monitoring or multiple-reaction monitoring(Reference Lange, Picotti and Domon70). This type of protein quantification is based on the quantification of relevant proteotypic peptides and is exclusively performed with triple-quadrupole mass spectrometers. The masses of the peptides and of their most abundant fragments are defined in the method and the mass spectrometer only scans for these as well as for the corresponding stable-isotope labelled proteotypic peptides. A chromatographic peak proportional to the peptide amount appears only if both parent and fragment masses are present (referred to as one transition). These peaks can then be integrated and the peptide and protein concentrations can be calculated by comparison with the internal peptide standard. This method enables the targeted, multiplexed, high-throughput quantification of low-abundance proteins in highly complex mixtures.
Analysis of post-translational modifications: protein functionality
PTM of amino acids give functionality to the proteins through the attachment of functional groups such as phosphate, carbohydrates, acetate or lipids. PTM play crucial roles in regulating the biology of the cell since they can change a protein's physical or chemical property, activity, localisation or stability. Some PTM can be added and removed dynamically as a mechanism for reversibly controlling protein function and cell signalling. Several proteomic techniques have been developed to identify and quantify PTM and allow the study of modifications such as phosphorylation, acetylation, glycosylation or lipid modifications.
Phosphorylation of proteins controls many cellular processes such as growth, differentiation, metabolism, signalling and cell death, and is itself regulated by enzymatic activity (i.e. by kinases and phosphatases). The challenges of phosphoproteomics lie, as for proteomics, in the complexity, dynamic range and temporal dynamics of protein isoforms(Reference Nita-Lazar, Saito-Benz and White71). Several methods have been developed to enrich phosphoproteins and include anti-phosphotyrosine antibodies(Reference Pandey, Andersen and Mann72); immobilised metal affinity chromatography(Reference Andersson and Porath73); and chemical modification and strong exchange chromatography(Reference Beausoleil, Jedrychowski and Schwartz74). The analysis of phosphorylated proteins has been facilitated by the development of new fragmentation techniques such as electron capture dissociation(Reference Zubarev, Horn and Fridriksson75) and electron transfer dissociation(Reference Syka, Coon and Schroeder76, Reference Wiesner, Premsler and Sickmann77) that help identify and determine the location of phosphorylations that cannot be as efficiently characterised by standard collision-induced dissociation.
Protein glycosylation is prevalent in proteins that are involved in mechanisms like cell–cell interactions, immune system (e.g. antibodies, MHC) or transport (e.g. transferrin). Also, glycoproteins account for a major proportion of milk and human blood proteomes for which it has been estimated that 70 and 50% of all proteins are glycosylated, respectively. There are two types of glycoproteins: (a) N-glycosylated proteins with carbohydrates linked to the side chain of the asparagine; and (b) O-glycosylated proteins, in which carbohydrates are bound to the side chain of serine, threonine, hydroxylysine or hydroxyproline. The functions of glycoproteins are still incompletely understood(Reference Funakoshi and Suzuki78). MS can provide information on molecular mass, composition, sequence and sometimes branching of a glycan chain(Reference Dell and Morris79, Reference Tissot, North and Ceroni80). As glycoprotein forms are often minor constituents compared to the non-glycosylated proteins, enrichment methods such as cell surface-capture technology(Reference Zhang, Li and Martin42) or affinity capture with lectins(Reference Yang and Hancock81) have been set-up.
Acetylation is a PTM that has been associated with several biological processes, especially gene expression regulation by histones, which pack the chromosomes so that they fit into the cell nucleus(Reference Allfrey, Faulkner and Mirsky82). Acetylation usually occurs at lysines or on N-terminal groups of peptides or proteins and is involved in the destabilisation of chromatin and recruitment of effector proteins(Reference Grant83–Reference Strahl and Allis86). Especially the mass spectral deciphering of the so-termed and above discussed histone codes, a key epigenetic mechanism, is expected to shed light on the phenomenon of metabolic programming: there is compelling evidence that the human body retains a memory of environmental, such as nutritional, impacts and may thereby be ‘metabolically (re)wired’. Such events have been associated with epigenetics, of which histone (de)acetylation is a central mechanism(Reference Bonenfant, Coulot and Towbin87). Recently O-aceylated serine and threonine residues have been identified(Reference Mukherjee, Hao and Orth88). Acetylation is stable to peptide fragmentation and can be detected by its characteristic mass shift from unmodified form. Trypsin cleavage at acetyllysine residues is usually blocked, so the acetylated peptides are detected as ‘missed cleavage’ product when performing database searches(Reference Witze, Old and Resing89). Enrichment of acetylated peptides is difficult and therefore studies have generally characterised protein acetylation on partially purified mixtures (e.g. histones). Immunoaffinity techniques have been developed to purify acetylated peptides: acetyllysine sites were mapped by enriching acetylated peptides using resin-coupled antibodies to acetyllysine(Reference Kim, Sprung and Chen90).
Lipoproteins are lipid–protein complexes whose primary function is thought to be transport of cholesterol and other lipids and that include five protein classes: chylomicrons, VLDL, intermediate density lipoproteins, LDL and HDL. Many lines of evidence strongly link them to the immune system and macrophage biology(Reference Barter, Nicholls and Rye91–Reference Shiflett, Bishop and Pahwa93). Several approaches have been developed for lipoprotein isolation(Reference Havel, Eder and Bragdon94–Reference Zechner, Moser and Kostner97); delipidation(Reference Folch, Lees and Sloane Stanley98, Reference Karlsson, Leanderson and Tagesson99); sample preparation (solubilisation and digestion)(Reference Farwig, Campbell and Macfarlane100–Reference Stahlman, Davidsson and Kanmert103); and the characterisation of protein component of lipoproteins using MS(Reference Mancone, Amicone and Fimia104, Reference Vaisar, Pennathur and Green105).
Emerging proteomic technologies
The full characterisation of a given proteome remains a challenge today. This is due to factors such as the large dynamic range of protein expression, complexity of the mixture in terms of numbers of proteins, as well as lack of methods to amplify proteins. However, proteomics is a rapidly expanding field and new analytical approaches are emerging.
Usually, proteomic profiling is done in a data-dependent acquisition mode in which the most abundant ionised peptides from each MS scan are selected for subsequent MS/MS analysis. A data-independent acquisition method, referred to as Precursor Acquisition Independent From Ion Count, consists of the acquisition of MS/MS spectra at every m/z value regardless of whether a precursor ion is observed or not: precursor ion scans (MS scan) are no longer conducted(Reference Panchaud, Scherl and Shaffer28). This strategy yields better proteome coverage, higher numbers of identified proteins and an extended dynamic range compared to the classical data-dependent method(Reference Bern, Finney and Hoopmann106).
Imaging MS is a technology enabling the direct examination of the distribution of biomolecules (e.g. proteins, peptides, etc) in cells or tissues(Reference Chaurand and Caprioli107–Reference Stoeckli, Chaurand and Hallahan109). Imaging MS is principally used for clinical applications(Reference Franck, Arafah and Elayed110, Reference Samsi, Krishnamurthy and Groseclose111) and biomarker discovery in diseased tissue(Reference Schwamborn, Krieg and Reska112, Reference Wong, Chan and Ma113). In imaging MS, frozen tissue sections are mounted on a target plate, covered with a suitable matrix, dried and inserted into a MALDI-MS for spectra acquisition. The mass spectrometer records the spatial distribution of peptides and proteins by scanning the tissue surface with consecutive laser shots. Alternatively, in situ tryptic digestion of a spot on a tissue followed by peptide sequencing of a predicted fragment by MALDI-MS/MS can be done(Reference Groseclose, Andersson and Hardesty114). Specific methods have been developed for the analysis of formalin-fixed paraffin-embedded sections(Reference Lemaire, Desmons and Tabet115). Often histological staining, either on the same section(Reference Chaurand, Schwartz and Capriolo116) or on a serial section(Reference Chaurand, Schwartz and Reyzer117), is used to guide the placement of matrix and provides the capability of focusing on areas having a high content of a cell type of interest.
The trend towards biological analysis at decreasing scale, ultimately down to an individual cell, continues, and MS with sensitivity of detecting a few to single molecules will be necessary. Recently, a prototype for a mass spectrometer with single-molecule sensitivity for single-cell proteomics has been designed(Reference Naik, Hanay and Hiebert118). Another method for the analysis of protein complexes in single cells, so-called visual proteomics, has been developed(Reference Beck, Malmstrom and Lange119) and consists of the combination of quantitative MS with cryo-electron tomography for the detection, counting and localisation of protein complexes.
Proteomics today: a paradigm shift
A recent publication reviews proteome coverage and reports on the detection of protein abundance over seven orders of magnitude with today's high-end platforms(Reference Zhang, Faca and Hanash120). This impressive power is a combined result of highly improved mass spectrometric instrumentation and data acquisition/processing as well as of highly sophisticated fractionation, enrichment and depletion techniques.
However, given the complexity and dynamics of proteomes, proteomics experiences nowadays a paradigm shift. Strategically speaking, the original hypothesis-free discovery workflow is being increasingly complemented or followed up by either hypothesis-driven analysis or even by candidate-based targeted analysis and validation: a recent review puts the discovery, directed and targeted proteomics approaches into perspective(Reference Domon and Aebersold121). Proteomics has thereby developed from a pure discovery to a screening and validation tool. The discovery workflow (or shotgun approach) aims at identifying large protein sets in a sample, and can include protein quantification (with or without prior protein labelling). The directed proteomics workflow consists of two successive analyses of the same sample. The first analysis is a survey scan aiming at the definition of a list of target peptides used for a second analysis surveying exclusively the peptides of the target list. This approach allows the quantification of less abundant proteins. Finally, targeted proteomics is a hypothesis-driven approach focusing on the detection and quantification of specific peptides associated with the proteins of interest (selected-reaction monitoring or multiple-reaction monitoring).
The other change of proteomic ‘philosophy’ roots in the increasing appreciation of peptides as bioactive, health beneficial food components(Reference Minkiewicz, Dziuba and Darewicz122). The analysis of such peptides requires a different analytical approach because these entities vary much more in their chemical nature than classical tryptic peptides generated in shotgun proteomics workflows for protein biomarker identification: multiple-processing parameters and digestive enzymes come into play and these generate not only a large variety in peptide length, sequence and terminal residues but also a number of peptide modifications. Moreover, there is only a single possibility to identify and quantify the native peptide of interest as such molecule is not one of several representatives of a parent protein, as it is typically the case in biomarker research. In view of this food peptidome complexity, it becomes evident that proteomic tools must be further developed and adapted from biomarker to bioactive research(Reference Kussmann, Panchaud and Affolter123).
Proteomics and nutrition: major applications
Several groups including ours have contributed to the introduction and adaptation of proteomics to the field of nutrition and health(Reference de Roos and McArdle124–Reference Schweigert127). Numerous studies have shown the prominent role of nutrition for maintaining and improving health. In this view, proteomics has been deployed in fields such as characterisation of bioactive proteins and peptides(Reference Mamone, Picariello and Caira128); elucidation of immune-related disorders(Reference Alex, Gucek and Li129, Reference Kirsch, Fourdrilis and Dobson130); investigation of metabolism-related disorders(Reference Sundsten and Ortsater131); dietary intervention studies for recovery(Reference Fuchs, Dirscherl and Schroot132, Reference Tang, Moore and Kujbida133); and mechanistic elucidation of nutrient action(Reference Erdmann, Cheung and Schroder134). The following selected citations cover topics from characterisation of the food matter itself via investigation of health-related food benefits to understanding disease-related mechanisms.
Protein and peptides as food ingredients
Milk
Milk is an essential component for infant nutrition since it represents the major source of feeding for newborns and infants(Reference German, Freeman and Lebrilla135). It is a rich source of functional peptides and proteins beneficial for human health. Extensive characterisation of milk from different species has been reported(Reference German, Freeman and Lebrilla135–Reference Kanwar, Kanwar and Sun139) and the composition of milk in terms of major proteins, lipids and carbohydrates has been established.
Technical milk fractionation by successive centrifugation steps yields three major fractions: caseins, whey and milk fat globule membrane (MFGM)(Reference Mange, Bellet and Tuaillon140). Each of these fractions contains different protein functionalities which have been studied using proteomics.
The whey protein fraction is dominated by a small number of abundant proteins which constitute over 80% of its protein content(Reference Tremblay, Laporte, Leonil, Fox and McSweeney141). In particular, β-lactoglobulin alone constitutes 50% of whey. In consequence, further fractionation was necessary to identify less abundant proteins. A gel-based approach(Reference Fong, Norris and Palmano142) enabled the identification of a large number of minor whey proteins, for example, a cluster of osteopontin peptides suggesting novel bioactivities. Another study, based on the use of electrospray ionisation and MALDI ionisation sources in parallel, allowed enhanced protein identification(Reference Molle, Jardin and Piot143): a total of thirty-nine bovine milk proteins were identified with a high degree of confidence.
The MFGM is a milk fraction rich in bioactive proteins. A qualitative and a quantitative proteomic profiling of two MFGM enriched milk fractions, whey protein concentrate and buttermilk protein concentrate was reported by our group(Reference Affolter, Grass and Vanrobaeys144): using an LC-MS/MS-based shotgun approach, we could reveal the presence of 244 proteins in whey protein concentrate and 133 in buttermilk protein, respectively, and provided an extensive characterisation of the protein content in those two fractions. Then, a label-free profiling approach delivered semi-quantitative comparison of both fractions and yielded protein fingerprints. Finally, we performed absolute quantification by combining stable-isotope dilution and multiple-reaction monitoring in order to precisely quantify seven major MFGM proteins.
PTM of milk proteins (phosphorylation and glycosylation) were investigated by mass spectrometric technologies. Different approaches, from 2D gels to LC separation, were used to determine the phosphorylation pattern of caseins in human, bovine, equine, goat and buffalo milk(Reference Affolter, Grass and Vanrobaeys144–Reference Roncada, Gaviraghi and Liberatori148). In particular, a study of buffalo skim milk, whey and MFGM reported phosphorylation data on caseins providing scientific basis to coagulation/cheese making processes used in dairy productions(Reference D'Ambrosio, Arena and Salzano149). Glycosylation of milk proteins has also been investigated by MS(Reference Harvey150, Reference Park and Lebrilla151). It is estimated that, in milk, glycoproteins may account for up to 70% of the total protein content, whereas it is about 50% for all human proteins(Reference Casado, Affolter and Kussmann136). Indeed, the most abundant proteins in milk including casein, lactoferrin and the Ig are all glycoproteins. The position and extent of glycosylation of these proteins affect their degradation, the resulting released peptides and glycopeptides and the function they provide. Indeed, there is emerging evidence for the involvement of milk glycoproteins in infant protection against pathogen infection(Reference Casado, Affolter and Kussmann136, Reference Hamosh152–Reference Newburg and Walker154). For example, the glycoproteins in MFGM are considered to operate as specific bacterial and viral ligands preventing the pathogens from binding to the intestinal mucosa of the infants(Reference Hamosh, Peterson and Henderson155–Reference Peterson, Scallan and Ceriani157). Hydrophilic interaction chromatography was used to enrich glycoproteins from human milk(Reference Picariello, Ferranti and Mamone158) and enabled the identification of thirty-two glycoproteins and sixty-three N-glycosylated sites. Immunocompetent complexes, membrane fat globule enzymes, proteins involved in lipid metabolism and specific receptors figured among these glycoproteins.
Lactoferrin is a major Fe-binding mammalian milk glycoprotein that impacts the defence system of the human host: it can for example prevent microbial growth, by direct interaction with the membrane of Gram-negative bacteria(Reference Farnaud and Evans159). Lactoferrin and its derived peptides are also known to influence cytokine production in cell cultures experiments mimicking immune and inflammatory processes(Reference Crouch, Slater and Fletcher160).
The potential benefits of food-derived peptides in terms of reduced risk of CVD have been reviewed(Reference Erdmann, Cheung and Schroder134): the favourable properties for blood pressure, oxidative stress, homoeostasis, appetite and lipid metabolism have been discussed. Also, the benefits of lactotripeptides on hypertension are well established: the tripeptides VPP (Valine–Proline–Proline) and IPP (Isoleucine–Proline–Proline) form upon fermentation of a milk product by Lactobacillus helveticus and Saccharomyces cerevisiae. When this fermented milk was fed to rats, the animals’ blood pressure was lowered(Reference Masuda, Nakamura and Takano161).
Probiotics
Probiotics are live micro-organisms which, when administered in adequate amounts, confer a health benefit on the host(162). Probiotics are commonly consumed as part of fermented foods with specially added active live cultures, such as in yoghurt or dietary supplements. Lactic acid bacteria and bifidobacteria are the most common types of micro-organisms used as probiotics. While their health benefits have been documented in clinical trials, their mechanisms of action are still poorly understood. The benefits of probiotics include stimulation of the mucosal immunity, reduction of mucosal alterations and interaction with mediators of inflammation(Reference Ljungh and Wadstrom163, Reference Salminen, Gueimonde and Isolauri164). Numerous proteomic studies aim at characterising the microbial proteomes and at understanding how probiotics interact with the gastrointestinal tract.
Bacteria release a wide range of compounds into their environment in order to communicate and coordinate their activities. Recently, it was shown that the co-culture of two Bifidobacteria strains (Bifidobacterium longum and Bifidobacterium breve) induced changes in each bacteria's proteome(Reference Ruiz, Sanchez and de Los Reyes-Gavilan165). Indeed, 2D gel analysis followed by LC-MS/MS analysis resulted in the identification of sixteen proteins, whose abundances were drastically changed when bifidobacteria were grown in co-culture compared to mono-culture. Differentially regulated proteins were grouped into ribosomal proteins and proteins involved in carbohydrate metabolism, gene regulation, cell envelope biogenesis as well as transport.
Another study investigated the surface-associated proteins from the probiotic Lactobacillus plantarum (Reference Beck, Madsen and Glenting166). Cell surface proteins were migrated on one-dimensional gels and identified using LC-MS/MS. A total of twenty-nine proteins were identified and many of these proteins had previously been described of being capable to bind components of the human intestinal mucosa. In a related investigation, three different L. plantarum strains showing different adhesion rates were analysed using proteomics(Reference Izquierdo, Horvatovich and Marchioni167). Several proteins, previously reported to be involved in bacterial adhesion, were found to be more abundant in the cell wall proteome of the most highly-adhesive strain (elongation factor EF-tu (Elongation Factor Tu), GroEL (60 kDa chaperonin), DnaK (Chaperone protein Dna K) and glyceraldehyde-3-phosphate dehydrogenase). The association of proteomic profiles with particular probiotic properties opens the way for the selection of probiotics with specific, targeted benefits.
Allergens: protein and peptides as food-derived causes of hazard
Food allergies arise from the intake of allergenic food components, which can induce a response from the immune system and lead to clinical symptoms ranging from mild to life threatening(Reference Bush and Hefle168). The prevalence of food allergy is rising; indeed 2% of adults and 5–8% of children in industrialised countries are affected(Reference Burks, Kulis and Pons169–Reference Ortolani, Ispano and Scibilia171). Over 180 protein allergens have been identified so far, the major ones occurring in common foods such as cow's milk, egg, peanut, soyabean, wheat, fish and tree nut(Reference Poms, Klein and Anklam172). Sensitive consumers have to be protected from undesirable allergic reactions and, therefore, proteomic methods have been developed for accurate allergen identification and quantification(Reference Kirsch, Fourdrilis and Dobson130, Reference Poms, Klein and Anklam172, Reference Monaci and Visconti173).
The classical proteomic strategy to identify food allergens consists of separating food proteins on 2D-PAGE, followed by electro-transfer onto a nitrocellulose membrane and subsequent IgE reactive protein detection by IgE immunoblotting using sera from allergic patients. This method was used to study allergens in wheat(Reference Akagawa, Handoyo and Ishii174, Reference Sotkovsky, Hubalek and Hernychova175), apple(Reference Guarino, Arena and De Simone176), maize(Reference Fasoli, Pastorello and Farioli177) or sesame seeds(Reference Beyer, Bardina and Grishina178). A systematic proteomic analysis of rice (Oryza sativa) leaf, root and seed using 2D gels followed by MS/MS allowed for the detection and identification of more than 2500 proteins(Reference Koller, Washburn and Lange179) including several previously characterised allergenic proteins. The 2D difference gel electrophoresis method was also used to study several peanut varieties in order to show their low content of major allergens(Reference Schmidt, Gelhaus and Latendorf180). Recently, a method based on spectral counting was developed and successfully applied to the analysis of transgenic peanut lines containing reduced levels of certain major allergens(Reference Stevenson, Chu and Ozias-Akins181).
Biomarkers: proteins and peptides as indicators of health and disease
Biomarkers are measurable indicators of different stages in a biological process, ranging from healthy functioning via deviation from such healthy equilibrium to disease onset and development(Reference De Roos182). Proteins and peptides, which are the main effectors in the body, can be used as such biomarkers. In practice, biomarkers are used for diagnostics, for prognostics, and to measure bioefficacy of nutrients in an intervention study.
Intestinal health and disease: Inflammatory bowel disease as an example
Inflammatory bowel diseases (IBD), including ulcerative colitis and Crohn's disease, are chronic, heterogeneous and multi-factorial inflammatory disorders of the gastrointestinal tract(Reference De Roos182, Reference Podolsky183). Proteomic investigations of the intestinal tissue of patients v. controls have the potential to deliver insights into gut dysfunction and may provide disease biomarkers.
A study of protein expression in intestinal epithelial cells led to the characterisation of changes in protein profiles of patients with Crohn's disease or ulcerative colitis compared to controls(Reference Shkoda, Werner and Daniel184). 2D-PAGE followed by MALDI-time-of-flight protein identification delivered the identification of nine proteins significantly different in IBD patients (e.g. Rho GDI alpha (Rho GDP-dissociation inhibitor 1), l-lactate dehydrogenase A, etc.). In ulcerative colitis patients, forty differentially expressed proteins were identified among which thirteen were associated with energy metabolism, which is in line with chronic intestinal inflammation being characterised by energy deficiency and alteration of the oxidative metabolism of epithelial cells(Reference Fukushima and Fiocchi185, Reference Roediger186).
A related study with 120 serum samples collected from four patients groups (Crohn's disease, ulcerative colitis, inflammatory controls, healthy controls) was performed to identify serum IBD biomarkers(Reference Meuwis, Fillet and Geurts187): four new serum biomarkers were identified, namely PF4 (platelet factor 4), MRP8 (migration inhibitory factor-related protein 8), FIBA (fibrinogen alpha chain) and Hp alpha 2 (haptaglobin alpha 2). Another study focused on stool analysis: apart from S100A8 and S100A9 already associated with IBD, S100A12 was identified a possible new IBD marker(Reference Foell, Wittkowski and Ren188).
However, the aetiology of IBD is still poorly characterised. A major symptom of IBD comprises malnutrition since inflammation of the gastrointestinal tract perturbs normal food intake and nutrient absorption(Reference Shamir189, Reference Vagianos, Bector and McConnell190). The mechanisms involved in malnutrition include decreased food intake, malabsorption, increased nutrient loss, increased energy requirements and drug–nutrient interactions. In consequence, nutrition aspects play important roles in IBD and complement drug treatment. Adequate food intake is important for treatment, remission, remission maintenance, relapse prevention and prevention of IBD.
Conclusions and outlook
By balancing their diet, consumers want to optimise some health aspects without compromising others. Holistic and integrative approaches are therefore primordial. Proteomics is a central platform in nutrigenomics, which attempts to holistically understand how our genome is expressed as a response to diet. From a molecular perspective, nutritional proteomics covers two dimensions: characterisation of food proteins and peptides; biomarker as well as bioactive discovery and quantification. Nutritional proteomic biomarkers must be interconnected with other genomic and genetic markers: nutrigenetics investigates our genetic pre-disposition and susceptibility towards diet; epigenetics encompasses DNA sequence-unrelated biochemical modifications of both DNA itself and DNA-binding proteins and appears to provide a format for metabolic programming. Proteomics plays a key role here, too, as it can address PTM (e.g. acetylation) of DNA-packaging proteins and thereby help decipher the so-termed histone code.
Nutrition is still an expanding field for proteomics compared to well-established clinical and medical applications. The success of proteomics in nutrition and health will depend on multiple factors. The proteomic technology per se will benefit from ever improving protein/peptide separation, depletion and enrichment on the one hand and more sensitive and specific mass spectrometers on the other hand. The second area of platform-related improvements is bioinformatics with rapidly improving tools to assess data quality and to convert data into interpretable information. The third room for improvement concerns the analytical strategy: focusing on proteome subsets – be it at the level of cell organelles, protein subclasses, the mass spectral level (targeted proteomics, gas-phase fractionation) – will provide deeper insights into molecular networks.
Apart from this expected progress at platform level, the technology will increasingly benefit from its cross-correlation with gene expression analysis and metabolite profiling. An option of addressing the interrelated timing of gene and protein expression is the investigation of protein turnover at proteomic scale but single-protein resolution, i.e. interpreting protein abundance changes as a result of both protein synthesis and degradation rather than taking proteomic snapshots.
In a nutshell, proteomics in nutrition delivers both biomarkers and bioactives. In this sense, proteomics will continue to drive (nutritional) systems biology, as it not only can identify and quantify the ‘molecular robots’ that do all the work in biological systems but also can map the networks of their physical interactions, between each other and with DNA, nutrients, drugs and other small molecules.
Acknowledgements
The authors declare no conflict of interest. S. S. and M. K. drafted the paper.