The accurate measurement of dietary exposure, which is an essential component of much health-related research, offers a challenging prospect. Specifically, dietary data can be subject to participant bias and can depend heavily upon food composition tables for the estimation of intakes of energy, nutrients and other food constituents( Reference Penn, Boeing and Boushey 1 – Reference Jia, Craig and Aucott 6 ). The chemical content of body fluids is a potentially rich source of information about dietary exposure as many foods contain distinctive metabolites which give rise to further chemical diversity following food ingestion, absorption and metabolism( Reference Bingham, Gill and Welch 7 – Reference Mennen, Sapinho and Ito 10 ). However, to date, putative biochemical markers are available for only a relatively small number of specific foods and food components( Reference Penn, Boeing and Boushey 1 ). The comprehensive analysis of metabolites in biological fluids using metabolomics technology provides an objective approach for the discovery of dietary exposure biomarkers( Reference Favé, Beckmann and Draper 11 – Reference Walsh, Brennan and Pujos-Guillot 35 ). Non-targeted metabolite fingerprinting, using either NMR( Reference Bertram, Hoppe and Petersen 17 , Reference Edmands, Beckonert and Stella 19 , Reference Heinzmann, Merrifield and Rezzi 21 , Reference Martin, Rezzi and Peré-Trepat 28 , Reference O'Sullivan, Gibney and Brennan 29 , Reference Rasmussen, Winning and Savorani 31 , Reference Stella, Beckwith-Hall and Cloarec 33 , Reference Walsh, Brennan and Pujos-Guillot 35 ) or MS( Reference Favé, Beckmann and Lloyd 20 , Reference Lloyd, Favé and Beckmann 25 – Reference Lloyd, Beckmann and Haldar 27 ), and metabolite profiling using liquid chromatography MS( Reference Bondia-Pons, Barri and Hanhineva 18 , Reference Kahle, Kempf and Schreier 22 – Reference Llorach-Asuncio, Jauregui and Urpi-Sarda 24 , Reference Martin, Rezzi and Peré-Trepat 28 , Reference Pujos-Guillot, Hubert and Martin 30 , Reference Stalmach, Mullen and Barron 32 , Reference Tulipani, Llorach and Jáuregui 34 ) have been used successfully for biomarker-lead discovery using urine samples from various study designs (Table 1). In acute food intervention studies( Reference Favé, Beckmann and Lloyd 20 , Reference Kahle, Kempf and Schreier 22 – Reference Lloyd, Favé and Beckmann 25 , Reference Stalmach, Mullen and Barron 32 ), participants were exposed to specific foods in known amounts and either postprandial urines sampled before the next meal or overnight or 24 h urines were collected. In other studies, participants were established on a specific diet for several days/weeks( Reference Bertram, Hoppe and Petersen 17 , Reference Edmands, Beckonert and Stella 19 , Reference Heinzmann, Merrifield and Rezzi 21 , Reference Martin, Rezzi and Peré-Trepat 28 , Reference Stella, Beckwith-Hall and Cloarec 33 , Reference Walsh, Brennan and Pujos-Guillot 35 ) or longer term (>1 month)( Reference Bondia-Pons, Barri and Hanhineva 18 , Reference Rasmussen, Winning and Savorani 31 , Reference Tulipani, Llorach and Jáuregui 34 ) before urine sampling. More recently there have been reports of the use of cohort studies in which participants consumed a freely-chosen diet( Reference Lloyd, Beckmann and Favé 26 , Reference Lloyd, Beckmann and Haldar 27 , Reference O'Sullivan, Gibney and Brennan 29 , Reference Pujos-Guillot, Hubert and Martin 30 ). In these studies, the analysis of diet diary or FFQ information allowed classification of individuals in terms of their frequency of consumption of specific diet constituents. In the present paper, we illustrate these approaches for biomarker discovery with particular reference to two studies carried out by the authors( Reference Favé, Beckmann and Lloyd 20 , Reference Lloyd, Favé and Beckmann 25 – Reference Lloyd, Beckmann and Haldar 27 ).
HPLC–ESI–MS/MS, HPLC–electrospray ionisation–tandem MS; HPLC-PDA-MS, HPLC-photodiode array-MS; HPLC-Q-ToF, HPLC-quadrupole time-of-flight; FIE-MS, flow infusion electrospray-ionisation MS; LC-MS/MS, liquid chromatography tandem MS.
The presence of substantial inter- and intra-individual variability in human metabolite profiles( Reference Assfalg, Bertini and Colangiuli 36 ) provides a challenge for both biofluid sampling and subsequent data normalisation in metabolomics studies seeking information on habitual diet. To address this problem, standardised methods have been validated recently both for the management of participants and for urine sampling in large-scale food interventions involving free-living individuals( Reference Brownlee, Moore and Chatfield 37 ) and also for acute postprandial studies in a controlled environment( Reference Favé, Beckmann and Lloyd 20 , Reference Lloyd, Favé and Beckmann 25 ). Key features of these study protocols include behavioural restrictions, e.g. no alcohol and the consumption of a standardised evening meal in the evening before a clinic visit to provide a fasting urine sample. It was anticipated that the latter would provide a ‘normalised’ background against which differences in urine chemistry resulting from either previous habitual dietary intake prior to clinic visit or acute food intake during the test day would be detectable( Reference Scalbert, Brennan and Fiehn 12 , Reference Walsh, Brennan and Malthouse 14 , Reference Favé, Beckmann and Lloyd 20 , Reference Lloyd, Favé and Beckmann 25 ).
With effective protocols in place for volunteer management and urine sampling there was now an opportunity to determine whether changes in urine chemistry could reflect dietary exposure( Reference Favé, Beckmann and Draper 11 ). In an acute feeding ‘proof of principle’ study, urine samples were analysed from individuals participating in the MEtabolomics to characterise Dietary Exposure (MEDE) research programme( Reference Favé, Beckmann and Lloyd 20 ). As part of the MEDE project( Reference Favé, Beckmann and Draper 11 ), twenty-four healthy participants consumed a ‘test’ breakfast, in which the cereal component of a standardised breakfast was replaced by one of four foods of high public health importance, followed by the collection of postprandial urine samples for metabolome analysis( Reference Favé, Beckmann and Lloyd 20 , Reference Lloyd, Favé and Beckmann 25 ). Once candidate food biomarkers had been identified( Reference Lloyd, Favé and Beckmann 25 , Reference Lloyd, Beckmann and Favé 26 ) there was then the opportunity to validate their potential usefulness to monitor habitual diet in the independent GrainMark study (http://www.ncl.ac.uk/afrd/research/project/2287)( Reference Lloyd, Beckmann and Haldar 27 ). This large-scale dietary intervention study, involving free-living individuals, aimed to discover potential biomarkers of dietary wholegrain exposure. After a washout period of 4 weeks the participants (sixty-eight in total) were asked to consume three servings of either wholegrain rye foods or wholegrain wheat foods per d for 4 weeks and subsequently doubled their intake of the same foods for a further 4 weeks. At baseline, and at the middle of each 4-week intervention period (washout, three servings and six servings of wholegrain rye/wheat foods per d), volunteers completed a validated FFQ (four in total) based on the European Prospective Investigation into Cancer and Nutrition FFQ( Reference Bingham, Gill and Welch 7 ), which recorded consumption of foods for a 7 d period, within 7–14 d of sampling( Reference Brownlee, Moore and Chatfield 37 ). Using data from both the MEDE and GrainMark studies it has been shown that analysis of overnight void urine samples can provide a rich source of potential biomarkers of habitual diet as reported in FFQ( Reference Lloyd, Beckmann and Favé 26 , Reference Lloyd, Beckmann and Haldar 27 ).
Postprandial urine composition reflects recent dietary exposure
Table 1 lists several recent studies in which non-targeted metabolite fingerprinting or metabolite profiling has been used to discover dietary biomarkers in both acute and short-term food intervention studies( Reference Bertram, Hoppe and Petersen 17 , Reference Edmands, Beckonert and Stella 19 – Reference Lloyd, Favé and Beckmann 25 , Reference Martin, Rezzi and Peré-Trepat 28 , Reference Stalmach, Mullen and Barron 32 , Reference Stella, Beckwith-Hall and Cloarec 33 , Reference Walsh, Brennan and Pujos-Guillot 35 ). The basic design principles are illustrated in Fig. 1 with reference to the MEDE study in which fasted participants consumed specific foods (Fig. 1(a)) as part of a standardised breakfast. Metabolome fingerprints representing postprandial urines( Reference Lloyd, Favé and Beckmann 25 ) were then generated using non-targeted, nominal mass flow injection electrospray–ionisation MS( Reference Beckmann, Parker and Enot 15 , Reference Favé, Beckmann and Lloyd 20 , Reference Lloyd, Favé and Beckmann 25 ). Figure 1(b) illustrates a typical flow injection electrospray-ionisation MS urine fingerprint of a mass range from m/z 100 to 800, which shows that urine fingerprints are both complex and information-rich. The question of whether a postprandial urine sample contains chemicals distinctive of exposure to specific foods can be evaluated by subjecting the urine fingerprint data to powerful supervised multivariate analysis including principal component-linear discriminant analysis. Figure 1(c) shows a scores plot from a typical principal component-linear discriminant analysis( Reference Enot, Lin and Beckmann 16 ) of flow injection electrospray–ionisation MS fingerprints representing urine samples from volunteers exposed to a standard breakfast or to a breakfast in which the cereal component of the standard breakfast was replaced by smoked salmon, broccoli or raspberries( Reference Lloyd, Favé and Beckmann 25 ). With an eigenvalue (Tw) of 2.55 in the dimension of maximal discrimination (Fig. 1(c)), it is evident that the ‘test’ foods (particularly smoked salmon) are adequately discriminated from the standard breakfast. A range of feature selection methods( Reference Enot, Lin and Beckmann 16 ) can then be employed to determine the flow injection electrospray–ionisation MS signals responsible for the discrimination of each food and, under most circumstances, it has been found that agglomerative decision trees such as random forest perform consistently well( Reference Favé, Beckmann and Lloyd 20 , Reference Lloyd, Favé and Beckmann 25 , Reference Lloyd, Beckmann and Favé 26 , Reference Beckmann, Enot and Overy 38 ). The detailed analysis of highlighted nominal masses can be undertaken by targeting them for further investigation on MS instruments capable of ultra-high mass resolution( Reference Lloyd, Favé and Beckmann 25 – Reference Lloyd, Beckmann and Favé 26 ). The Fourier transform-ion cyclotron resonance ultra–MS plot shown in Fig. 1(d) represents the analysis of the nominal mass ‘bin’ m/z 241, a signal linked with the consumption of oily fish, indicating that a signal with a mass of 241·12958 is a likely biomarker candidate for oily fish exposure( Reference Lloyd, Favé and Beckmann 25 ). The accurate mass information can be used to generate an elemental formula and to predict candidate metabolites that could yield the measured ion using annotation tools such as MZedDB( Reference Draper, Enot and Parker 39 ), which take into account anticipated ionisation behaviour. Further experiments in which the selected ion is fragmented and the resulting spectrum compared with that of a chemical standard are required to assign a putative structure. For example, in the MEDE study these subsequent targeted analyses suggested that anserine (nominal mass m/z 241) is a candidate biomarker for oily fish consumption and proline betaine for citrus foods (Fig. 1(e))( Reference Lloyd, Favé and Beckmann 25 , Reference Lloyd, Beckmann and Favé 26 ).
Exposure to citrus foods has provided a paradigm for discovery of dietary biomarkers by analysis of dietary data
Citrus fruits and citrus fruit juices are a distinctive and frequently consumed (i.e. often once per d) component of the UK diet and thus represent a paradigm for validation of biomarkers of habitual dietary exposure. Studies using either NMR( Reference Heinzmann, Merrifield and Rezzi 21 ) or ESI–MS fingerprinting( Reference Favé, Beckmann and Lloyd 20 ) of postprandial urines demonstrated initially that proline betaine was a potential biomarker of acute exposure to citrus foods. There was thus an opportunity to determine whether the urinary concentration of this metabolite reflected habitual exposure to the same foods. Recent publications described a multivariate modelling strategy for habitual dietary biomarker discovery dependent on the comparison of urine metabolome fingerprints from groups of individuals reporting differential exposure to specific dietary components( Reference Lloyd, Beckmann and Favé 26 , Reference Lloyd, Beckmann and Haldar 27 , Reference O'Sullivan, Gibney and Brennan 29 , Reference Pujos-Guillot, Hubert and Martin 30 ). An initial analysis( Reference Lloyd, Beckmann and Favé 26 ) of habitual citrus exposure data in the MEDE study (n 24) indicated that there were sufficient individuals reporting citrus consumption to allow the assignment of volunteers into three broad exposure levels (High (two–three citrus portions per d), Medium (about one citrus portion per d) and Low (less than two citrus portions per week)). With the larger number of participants in the GrainMark study it was possible to assign individuals to a more quantitative scale of habitual dietary exposure (i.e. never, 1 per week, 2–4 per week, 5–6 per week, 1 per d, 2–3 per d)( Reference Lloyd, Beckmann and Haldar 27 ). Overnight void urine samples available from both studies were subjected to metabolite fingerprinting and the m/z signals responsible for discriminating higher v. lower habitual consumption levels were identified using random forest feature selection( Reference Enot, Lin and Beckmann 16 ). In both studies the majority of the top twenty highest ranked signals responsible for the classification of High v. Low habitual citrus consumption were identified as ionisation products of proline betaine( Reference Lloyd, Beckmann and Favé 26 , Reference Lloyd, Beckmann and Haldar 27 ). Although not strictly quantitative, the relative intensity of one of the major ions (m/z 144; [M+H]+) reflected the level of habitual citrus exposure reported by individual participants in both MEDE and GrainMark studies. A similar result has been reported for NMR signals associated with recent exposure to citrus foods that were found to be derived from the presence of proline betaine in 24 h urine samples associated with recent exposure to citrus foods( Reference Heinzmann, Merrifield and Rezzi 21 ).
Habitual consumption frequency for individual foods impacts on the ability to detect differential dietary exposure using overnight urine samples
Following the success with citrus foods there have been several recent reports in which dietary data have been analysed from studies where, in most cases, participants have eaten freely-chosen diets (Table 1)( Reference Lloyd, Beckmann and Favé 26 , Reference Lloyd, Beckmann and Haldar 27 , Reference O'Sullivan, Gibney and Brennan 29 , Reference Pujos-Guillot, Hubert and Martin 30 ). These meta-data allowed the assignment of individuals into different food consumption frequency classes and could be used to study habitual exposure for a range of other foods. In the GrainMark study, as expected, habitual consumption frequency differed greatly between individual foods, and these food-specific consumption patterns were generally consistent with the MEDE study( Reference Lloyd, Beckmann and Haldar 27 ). Overall, the patterns of intake of individual foods could be summarised in four general exposure categories (Fig. 2) ranging from foods consumed very infrequently (e.g. liver or kidneys) to those consumed, on average, more than once per d (e.g. coffee). Although habitual citrus exposure could be modelled adequately using data from the MEDE study( Reference Lloyd, Beckmann and Favé 26 ), with only twenty-four participants the present strategy was not suitable for biomarker discovery for foods which are consumed very infrequently by most people. Assignment to habitual consumption frequency ranges for the remaining foods (grouped into High skewed, Normal distribution or Low skewed consumption patterns) identified sufficient numbers of individuals within the GrainMark study to develop higher v. lower frequency exposure groups for multivariate classification( Reference Lloyd, Beckmann and Haldar 27 ). The likelihood of discovering potential biomarkers for each food was assessed by determining the ‘goodness’ of class discrimination using random forest margin values and area under the receiver operating characteristic curve values as robust classification statistics( Reference Enot, Lin and Beckmann 16 ). Figure 2 illustrates a clear trend that classification efficiency (as assessed by random forest margins and area under the receiver operating characteristic curve values) is generally higher in foods that are consumed more frequently. We suggest that such foods will form likely candidates for monitoring using a biomarker strategy if it can be proven that the presence of potential biomarker signals in urine can be linked to the ingestion and chemical composition of specific dietary components. An example ‘data-driven’ strategy for the discovery of potential dietary biomarkers based on the analysis of GrainMark FFQ structure and data is summarised in Fig. 3.
Structural analysis of potential biomarkers representing three selected foods shows that their presence in overnight void urine reflects original food chemical composition
Metabolome models (based on overnight void urine samples) of three distinctive foods representing examples of Low skewed, Normal distribution and High skewed habitual dietary exposure groups have been investigated in more detail to determine whether the selected metabolites could be correlated with known food chemistry( Reference Mennen, Sapinho and Ito 10 , Reference Lloyd, Beckmann and Favé 26 , Reference Stalmach, Mullen and Barron 32 , Reference Stella, Beckwith-Hall and Cloarec 33 , Reference Abe 40 – Reference Visciano, Perugini and Manera 51 ). Groups of participants were identified for each dietary component to represent High, Medium and Low exposure classes within the consumption ranges described for each food in the GrainMark study. Thus for oily fish (Low skewed) the Low consumption category represents less than one portion per week, Medium is approximately one per week and High is greater than or equal to two to four portions week. For tomato (Normal distribution), the Low consumers ate less than or equal to one portion per week, Medium consumers ate greater than two to four portions per week, while High consumers ate up to five to six portions per week. High consumers of coffee (High skewed) drank more than one cup per d while Medium consumers had greater than or equal to two to four cups per week and Low consumers more than or equal to one per week. Despite the wide range of consumption frequencies each of these distinctive foods generated adequate classification models (Fig. 4(a)).
In the GrainMark study, dihydrocaffeic acid-3-O-glucuronide was highly ranked as a potential biomarker of habitual coffee exposure (Fig. 4(b)). Previous studies on acute exposure to coffee reported the presence in urine and/or plasma of at least ten phenolic compounds which represented metabolic endpoints of the biotransformation of chlorogenic acid and caffeic acid which is present at high levels in this beverage( Reference Mennen, Sapinho and Ito 10 , Reference Stalmach, Mullen and Barron 32 , Reference Ito, Gonthier and Manach 43 , Reference Rechner, Spencer and Kuhnle 49 ). Although chlorogenic acid and caffeic acid are found in many fruits and vegetables and it is unlikely that dihydrocaffeic acids will prove to be unique biomarkers of coffee consumption, it is interesting that other phenolic metabolites identified in acute exposure studies were not highlighted by this analysis. Investigation of top ranked signals for oily fish exposure showed that methyl-histidine (probably derived from anserine, which was also highly ranked, by the action of carnosinase)( Reference Abe 40 – Reference Dragsted 42 ), was an excellent biomarker candidate (Fig. 4(c)). Thus, the identity of both of these potential biomarkers could be linked directly to the previous chemical analysis of each food. Although hippuric acids (presumably derived from colonic fermentation of the hydroxycinnamic acid content of tomato fruits( Reference Sánchez-Rodríguez, Ruiz and Ferreres 48 )) were indicative of tomato exposure, many highly ranked signals proved to be dihydroxyphenylvalerolactone conjugates that commonly result from the colonic fermentation of flavonoids, such as flavanols( Reference Urpi-Sarda, Monagas and Khan 52 ) (Fig. 4(d)). As tomato contains insignificant amounts of such polyphenols it is most likely that these compounds are associated with foods strongly co-consumed in meals containing tomatoes. This observation highlights the requirement for careful validation of any potential food biomarkers in the context of the whole diet and emphasises the utility of the test meal approach (as used in the MEDE study) for the identification of putative biomarkers which are causally linked with food exposure. In addition, such data-driven strategies for putative biomarker discovery are constrained by the robustness of the original dietary intake data which may contain inherent biases. Further, both conceptual and practical challenges are likely in cases where there are strong correlations between intakes of foods with related chemistries.
Conclusion
The advent of non-targeted metabolomics technology for global chemical fingerprinting/profiling of human biofluids has offered an opportunity to accelerate research on food biomarker discovery. Recent data support the concept that metabolomics analysis of 24 h or overnight void urine samples, in particular, will provide a productive strategy for the identification of candidate dietary exposure biomarkers. The demonstration that biomarker discovery using high throughput metabolomics is feasible using urine samples derived from populations of free-living individuals( Reference Lloyd, Beckmann and Favé 26 , Reference Lloyd, Beckmann and Haldar 27 , Reference O'Sullivan, Gibney and Brennan 29 , Reference Pujos-Guillot, Hubert and Martin 30 ) supports the ultimate objective of using such biomarkers in epidemiological studies. Evidence is now accumulating that a range of relatively frequently consumed (i.e. one to two portions per week) and more distinctive foods can be considered good candidates for biomarker discovery using biofluid samples from cohort studies( Reference Lloyd, Beckmann and Favé 26 , Reference Lloyd, Beckmann and Haldar 27 , Reference O'Sullivan, Gibney and Brennan 29 , Reference Pujos-Guillot, Hubert and Martin 30 ). Many other foods forming major components of composite meals, or less frequently consumed food items, may be targets for biomarker discovery but would probably require well designed, controlled food intervention studies to identify candidate metabolites. In future studies we propose testing this hypothesis by the analysis of overnight urine samples collected at home by individuals consuming carefully constructed weekly menus designed both to provide adequate exposure to specific dietary components and to offer foods in specific combinations in order to expose any biomarker redundancy that could confound dietary exposure measurement.
Ensuring the generation of accurate dietary information in epidemiological studies requires considerable effort from both researchers and study participants. Despite the advent of increasingly sophisticated digital tools( Reference Penn, Boeing and Boushey 1 ), diet recording remains an inherent source of major uncertainty, even in studies where individuals recording food exposures are well trained and carefully monitored. Following validation in epidemiological and/or controlled dietary intervention studies, it is expected that putative food intake biomarkers can be translated into practical measurements that can complement, or in some cases replace, more traditional methods of assessing dietary exposure. We conclude that in the not too distant future, urine biomarker technology may allow objective monitoring of the levels of intake of several key foods and strengthen the evidence for causal links between dietary exposure and health outcomes.
Acknowledgements
None of the authors has a conflict of interest with respect to this manuscript. This research programme was supported by the UK Food Standards Agency Projects N05073 and N05075 and MRC Programme Grant MR/J010308/1. The authors' contributions to the work were as follows: M. B. developed urine extraction procedures, designed metabolite fingerprinting experiments, supervised MS support staff, pre-processed data for analysis and edited the manuscript; A. J. L. was involved in data analysis, produced figures, researched the literature and wrote the manuscript; G. F. and S. H. undertook volunteer recruitment, coordinated volunteer Clinical Research Facility visits and supervised Clinical Research Facility support staff and edited the manuscript; J. C. M., C. S. and K. B. coordinated the project, supervised research in Newcastle University, designed volunteer handling protocols and edited the manuscript; and J. D. coordinated the project, supervised research in Aberystwyth, designed figures and wrote the manuscript.