A high consumption of fruit and vegetables has been associated with a reduced risk of several chronic diseases, including cancer and CVD( Reference Boeing, Bechthold and Bub 1 – Reference Riboli and Norat 3 ). Therefore, intervention studies that aim to increase the consumption of fruit and vegetables using advice or counselling are often conducted. To investigate the success of an intervention, the subjects are asked to report or recall their consumption of fruits and vegetables. However, because it is highly likely that the subject is aware of the intervention (i.e. the advice or counselling), the report or recall is likely to be biased. Objective measures, such as measuring subjects' serum/plasma concentrations of carotenoids, have been used to investigate whether an intervention led to an increase in fruit and vegetable consumption as compared to the control group( Reference Macdonald, Hardcastle and Duthie 4 – Reference Rock, Moskowitz and Huizar 6 ), but these biomarkers do not quantify the increase in fruit and vegetable intake caused by the intervention.
The validation of fruit and vegetable intake currently relies on self-reporting instruments. However, self-reported dietary intake instruments are found to be biased and to have correlated errors in comparison to recovery biomarkers, such as doubly labelled water and urinary N excretion( Reference Day, McKeown and Wong 7 – Reference Kipnis, Subar and Midthune 10 ). Therefore, if we were able to quantify fruit and vegetable intake based on biomarkers rather than on self-reporting, the comparison of self-reported intake with this biomarker-based intake estimate would provide us with a better idea of true validity. No recovery biomarker is available for fruit and vegetable intake. Therefore, it would be useful to find a predictive biomarker that can be related to the true intake of fruits and vegetables( Reference Tasevska, Midthune and Potischman 11 , Reference Tasevska, Runswick and McTaggart 12 ).
It is not accurate to relate, for instance, an increase in β-carotene concentration with an exact increase in fruit and vegetable consumption. Single biomarkers and the sum of carotenoids have previously been shown to have low correlations with self-reported intakes of fruits and vegetables( Reference Andersen, Veierod and Johansson 13 – Reference Toft, Kristoffersen and Ladelund 21 ). Therefore, in order to ascertain the full range of fruit and vegetable intake, it is worthwhile to investigate whether a combination of biomarkers, possibly in combination with other factors, provides more reliable results. Baldrick et al. ( Reference Baldrick, Woodside and Elborn 22 ) found that the carotenoids and vitamin C are the most consistently responsive biomarkers for fruit and vegetable intake. In addition, serum/plasma folate may be used as a biomarker of fruit and vegetable intake, even though it is a less sensitive marker, especially in countries where fortification with folate is mandatory( Reference Brevik, Vollset and Tell 23 , Reference Willett 24 ). In order to be able to use biomarkers to quantify the consumption of fruits and vegetables, the dose–response relationship between fruit and vegetable intake and the respective biomarkers must be present. Because dietary intake recorded by subjects is often biased, a cross-sectional study with such data will not provide us with an unbiased estimate of the dose–response curve. In contrast, for diet-controlled intervention studies in which fruits and vegetables are provided to the participants, the intake data does not rely solely on self-reporting. In these studies, the combination of information about the amounts provided, information from supervised consumption and self-reported information on compliance may lead to a less biased estimate of fruit and vegetable intake. We therefore conducted an individual participant data meta-analysis of such studies, covering a wide range of fruit and vegetable intakes. The first aim of the present study was to investigate the dose–response curve between fruit and vegetable consumption and multiple biomarkers, namely, serum carotenoids (α-carotene, β-carotene, β-cryptoxanthin, lycopene, lutein and zeaxanthin), serum/plasma folate and serum/plasma vitamin C. The second aim was to establish a prediction model of fruit and vegetable intake based on these biomarkers which may be used as a predictive biomarker or to estimate group-level intake.
Methods
Search strategy
The aim of the literature search was to find diet-controlled intervention studies (i.e. food provision studies or partly supervised feeding studies) conducted with adult subjects in which reports on the amount of consumed fruits and vegetables were supported by information on the amounts provided and in which significant efforts were made to maximise compliance. The following diet-controlled intervention studies were included: (1) studies in which all foods and drinks were provided to the subjects during the intervention, and (2) studies in which all fruits and vegetables consumed were provided to the subjects. In addition, carotenoids or folate concentrations in the blood after intervention were measured, and papers were published in the English language. The search was conducted in Scopus, in Pubmed and by a manual search of reference lists. Search terms in the titles and abstracts included ‘fruit’ and ‘vegetables’ combined with ‘intervention’, ‘trial’ and ‘feeding study’. These terms were then combined with ‘biomarkers’, ‘biological markers’, ‘carotenoids’, ‘α-carotene’, ‘beta-carotene’, ‘beta-cryptoxanthin’, ‘zeaxanthin’, ‘lycopene’, ‘lutein’, ‘folate’ and ‘bioavailability’. The search included studies published before October 2012.
Papers were first screened based on their titles and abstracts. Then, the full text of the papers that were considered potentially relevant were read and judged for relevancy. Next, the full text of the papers was retrieved and judged using inclusion and exclusion criteria. The exclusion criteria were: (1) intervention studies in which the intervention consisted of dietary advice or counselling (and therefore foods were not provided to the subjects by the investigators); (2) intervention studies in which not all fruits and vegetables were provided (i.e. the provision consisted of additional fruits and vegetables on top of normal fruit and vegetable consumption) or in which fruits and vegetables were provided as supplements (e.g. capsules), juices or extracts; (3) intervention studies in which the intervention involved a single ingestion of the intervention food(s) or an intervention period of 6 d or fewer; and (4) studies that were conducted in children, adolescents, institutionalised elderly or pregnant or lactating women.
Data
The current contact details of each study's corresponding author, first author or other authors were searched on the Internet. Authors were contacted by email and asked whether they were willing to send the original data of the study. These authors were offered a co-authorship on the present paper. We requested individual participant data (where available) of subject characteristics (sex, age, height, weight (or BMI) and smoking status), serum/plasma values of biomarkers and intake of fruits and vegetables (or intervention group coding).
In addition, we collected information on: (1) the study design (whether it was a parallel or crossover study, whether a run-in period was included and, where applicable, whether a wash-out period was included); (2) the dietary intervention (the duration of the dietary intervention and the daily intake of fruits and vegetables, carotenoids or folate); and (3) the serum/plasma measurements (whether blood was drawn after a fasting period and which methods were used for sample analysis).
Statistical analysis
Outliers, which were defined as all observations above (Q3+4 × IQR) (where Q3 refers to the third quartile and IQR refers to the interquartile range), were removed from the dataset. The median number of outliers per biomarker was 1 (range: 0–7).
Dose–response curves
The dose–response curve between log-transformed biomarker concentrations (dependent variable) and fruit and vegetable intake (independent variable) and between biomarker concentrations and the corresponding micronutrient was estimated using fractional polynomials( Reference Royston and Altman 25 , Reference Sauerbrei and Royston 26 ). To account for the one crossover study and for between-study heterogeneity, the final parameter estimates were calculated using mixed models with study and subjects as random effects. Therefore, the estimated variance components refer to differences between studies, differences between individuals (to account for the crossover study) and residual variance.
To obtain predictions on the original scale rather than on the logarithmic scale, we applied the following back-transformation:
where Y is the biomarker concentration on the original scale, E(Y) is the expectation of Y, X is the fruit and vegetable intake, β refers to the regression coefficients of the dose–response model and σ2 is the sum of the variance components estimated in the mixed model.
Several covariates were tested to see whether they statistically significantly predicted the biomarker concentrations. Covariates that were tested included age, BMI, sex and smoking. In addition, the interaction between fruit and vegetable intake and these covariates was tested. The covariates and interactions were tested by including them one at a time in separate fractional polynomial regression models.
Prediction models of fruit and vegetable intake
We developed three different prediction models based on what we learned from the dose–response curves. The models were estimated using linear regression: (1) a pre-specified model in which all continuous variables were added as linear terms, (2) a pre-specified model in which the shape of all continuous variables was established using multivariable fractional polynomials (MFP; referred to as the MFP model), and (3) a reduced model that included only the statistically significant predictors which were selected using MFP (referred to as the reduced MFP model). The MFP models were analysed using STATA/SE version 11.0 for Windows. Interactions between the subject characteristics (age, BMI, sex and smoking status) and the biomarkers (α-carotene, β-carotene, lutein+zeaxanthin, lycopene and β-cryptoxanthin) were tested for inclusion in the model in four separate models (including (1) main effects+age × biomarkers; (2) main effects+BMI × biomarkers; (3) main effects+sex × biomarkers; and (4) main effects+smoking status × biomarkers). All interactions were included as linear terms. Interactions with P< 0·05 were considered relevant for inclusion in the prediction model. These interactions were then tested together in the model, and a backward selection was applied until all interactions included in the model had a P value of < 0·05.
Because data on predictors and outcomes were not complete, we used a multiple imputation approach in which ten multiple imputed datasets were created. The power and selection of the predictors was established in all ten imputed datasets separately, and the final model was established by majority voting( Reference Vergouwe, Royston and Moons 27 ).
The validation of the fruit, vegetable and juice intake (FVJ) and fruit and vegetable intake (excluding juices; FV) prediction models was assessed using tenfold cross-validation. First, the data was imputed as it was earlier, and then the data was randomly separated into ten parts. One part was left out to construct the training set (i.e. the remaining nine parts), and the prediction models were fitted to each of the imputed datasets using linear regression models. The regression coefficients were combined using normal procedures to obtain the regression coefficients for the test data. The out-of-sample data (the test set) was used to calculate the predicted values for each individual by multiplying the regression coefficients and the observed values of the predictors in each of the imputed test sets. The final predicted values were calculated by averaging the predicted values over the ten imputed test sets. Each of the parts was left out once, so the procedure was repeated ten times. These predicted values were compared to the observed values as an estimate of the model performance using three different measures: (1) the root mean squared error (RMSE) = $$\sqrt {1/ n \sum ( Y - \circ {> Y })^{2}} $$ , (2) the correlation between observed intake and predicted intake, and (3) the mean difference (observed intake minus predicted intake) with the corresponding limits of agreement at the individual level (i.e. mean difference ± 1·96 × SDdifference). Unless otherwise indicated, all analyses were performed using SAS version 9.2 (SAS Institute, Inc.).
Results
Search and data retrieval
A total of 1002 studies were found of which twenty-seven qualified for inclusion in the present meta-analysis( Reference Appel, Miller and Jee 28 – Reference Yeum, Booth and Sadowski 54 ). Of these twenty-seven papers, eight publications described a study population that was also involved in another publication. Therefore, the authors of a total of nineteen unique diet-controlled intervention studies were contacted for cooperation in retrieving individual data. The flowchart of the selection of studies is shown in Fig. 1. A total of twelve authors responded positively to the request and made their data available for the present analysis. A summary of study characteristics of these studies is given in Table 1, and an overview of the data of these studies is presented in Tables 2 and 3. The data of four studies were unfortunately unavailable, and three authors did not respond to our request. Information from these studies is available in online supplementary Table SA.
F&V, fruit and vegetables; FV, fruit and vegetable intake, excluding juices; FVJ, fruit, vegetable and juice intake; FBV, fruit, berries and vegetables.
* The number of individuals used in the present analysis. In brackets, the number of individuals reported in the original publication. For several studies, specific intervention groups were not useful in the present analysis( Reference Castenmiller, West and Linssen 36 , Reference Dragsted, Pedersen and Hermetter 38 , Reference Karlsen, Svendsen and Seljeflot 41 , Reference van het Hof, Brouwer and West 49 , Reference Van Loo-Bouwman, West and Van Breemen 50 , Reference Winkels, Brouwer and Siebelink 52 ), and for one study( Reference Miller, Erlinger and Sacks 44 ), data of a subset of participants was received.
† In brackets, indication of whether the amount of fruits and vegetables reported in the table and used in the analysis was the amount provided to the subjects (indicated by ‘P’) or whether the amount relied partly on self-reporting (indicated by ‘R’).
‡ The folate data of that study were no longer available( Reference Brouwer, Van Dusseldorp and West 34 ).
* These data are taken from the original publication, but they were not available for the present analysis.
For six studies, specific groups were not useful in the present analysis( Reference Castenmiller, West and Linssen 36 , Reference Dragsted, Pedersen and Hermetter 38 , Reference Karlsen, Svendsen and Seljeflot 41 , Reference van het Hof, Brouwer and West 49 , Reference Van Loo-Bouwman, West and Van Breemen 50 , Reference Winkels, Brouwer and Siebelink 52 ), and for one study( Reference Miller, Erlinger and Sacks 44 ), data of a subset of participants was received. For the study by Miller et al. ( Reference Miller, Erlinger and Sacks 44 ), intake of fruits and vegetables in serves was converted to g/d by multiplying the number of serves by 80 g. For the study by Itsiopoulos et al. ( Reference Itsiopoulos, Brazionis and Kaimakamis 40 ), intake of fruits and vegetables was known for fifteen subjects. For the remaining twelve subjects, vegetable intake was imputed as the mean of the intake reported in the paper (i.e. 466 g/d vegetables and 162 g/d fruits). Where necessary, α-carotene, β-carotene and lycopene were converted from μg/ml to μmol/l.
Dose–response analysis
The estimated dose–response curves between the different biomarkers and FVJ are shown in Fig. 2, and the dose–response curves between the biomarkers and FV are shown in Fig. 3. All biomarkers show a positive dose–response relationship with fruit and vegetable intake. The regression equations that were obtained are shown in online supplementary Table SB.
The P values of the covariate and interaction analyses are shown in online supplementary Table SC. Age and smoking were significant predictors for all carotenoids but not for plasma folate. BMI was a significant predictor for α-carotene, β-carotene, lutein, β-cryptoxanthin and lycopene. Sex was only a significant predictor for lutein, zeaxanthin and lutein+zeaxanthin. The interactions between these covariates and the intake of fruits and vegetables were relevant (P< 0·1) in most instances. The smoking × fruit and vegetable interaction was only a significant predictor for about half of the biomarkers, but this may be a result of the relatively low number of smokers included in the present sample.
Where possible, the dose–response relationship between the biomarkers and the intake of the micronutrient was also investigated (online supplementary Fig. SA). The available sample size was largest for β-carotene (n 316) and smallest for lutein+zeaxanthin (n 35). The sample size of zeaxanthin was too low to warrant analysis. All curves showed a positive relationship between intake and serum or plasma concentrations except lutein at high intakes. There is no biological evidence for the drop that is visible in the lutein curve. Because there were very few data available for lutein intake of more than 15 mg/d, this part of the curve is not considered reliable.
Prediction model
The regression coefficients of the final prediction model are presented in Table 4, and the performance measures are presented in Table 5. The power and variable selection process of the MFP and the reduced MFP model is shown in online supplementary Tables SD and SE. For FVJ, the reduced MFP model showed the lowest RMSE (i.e. 258·0 g) and the highest correlation between observed and predicted (i.e. 0·78) as compared to the linear model and the full pre-specified MFP model. The mean difference of the reduced MFP model ( − 1·7 g) was slightly higher than those of the other two models (linear model: − 1·6 g; MFP model: − 1·5 g), but the limits of agreement were markedly smaller than those of the other two models. Bland–Altman plots are presented in online supplementary Fig. SB.
FVJ, fruit, vegetable and juice intake; FV, fruit and vegetable intake, excluding juices.
* Completed datasets refers to the data after multiple imputation.
† The study of Chopra et al. ( Reference Chopra, O'Neill and Keogh 37 ) could not be used in the present analysis because of an estimation problem.
‡ Folate is scaled as folate/10.
§ Age is scaled as age/10.
FVJ, fruit, vegetable and juice intake; FV, fruit and vegetable intake, excluding juices; RMSE, root mean squared error; MFP, multivariable fractional polynomials.
For FV, the MFP model was the best model. It showed the lowest RMSE (201·1 g), the highest correlation (0·65) and the lowest mean bias (2·4 g) with the smallest limits of agreement ( − 368·2, 373·0 g).
The prediction model for FV showed a somewhat lower correlation and a higher absolute mean difference than the model for fruit and vegetable intake including juices. Therefore, we investigated whether a model including a predictor variable that represented juice intake (in g/d) would improve the prediction for fruit and vegetable intake when juices were excluded. However, this did not markedly change the results. The MFP model including juice as a predictor variable had an RMSE of 202·8 g, a correlation of 0·64, a mean bias of 0·2 g (limits of agreement: − 374·1, 374·6 g). Therefore, the simpler model without juice as a predictor variable is preferred as a prediction model for FV.
In order to compare the performance of the prediction model with the current practice of using the sum of carotenoids or any single biomarker, we calculated the correlation coefficients between the observed intakes and the sum of carotenoids and those between observed intakes and single biomarkers (Table 6). For FVJ, the correlations ranged between 0·04 and 0·32, which was much lower than the 0·65 in the prediction model. Also for FV, the correlations (between 0·15 and 0·38) were lower than that in the prediction model (0·64).
FVJ, fruit, vegetable and juice intake; FV, fruit and vegetable intake, excluding juices.
To indicate the value of the prediction model for individual studies, an additional cross-validation was performed by leaving one entire study out of the training set. The study that was left out comprised the test set. Table 7 shows the RMSE and mean difference with the limits of agreement for the reduced MFP model for FVJ and the MFP model for FV. These show that there is a difference between how well the prediction models perform in each study. The study by Karlsen et al. ( Reference Karlsen, Svendsen and Seljeflot 41 ) shows a worse performance for FVJ but not for FV. This is most likely caused by the relatively high intake of fruits, vegetables and juices in that study (see Table 1).
FVJ, fruit, vegetable and juice intake; MFP, multivariable fractional polynomials; FV, fruit and vegetable intake, excluding juices; RMSE, root mean squared error.
Discussion
The first part of the present research showed that all investigated biomarkers (carotenoids and folate) had a positive relationship with fruit and vegetable intake, and they are therefore useful for predicting fruit and vegetable intake. Several covariates were significantly associated with the biomarkers. The next aim was to develop a prediction model for fruit and vegetable intake based on objective variables, such as biomarkers and subject characteristics. Among the three models for predicting FVJ that were investigated, the reduced MFP model showed the best performance in cross-validation, and the MFP model showed the best performance for FV.
The sum of carotenoids has been used in an attempt to combine biomarkers into a single estimate for fruit and vegetable intake in various studies. The sum of carotenoids was positively correlated with self-reported fruit and vegetable intake( Reference Bogers, Dagnelie and Westerterp 14 – Reference Toft, Kristoffersen and Ladelund 21 , Reference Crispim, Geelen and Souverein 55 , Reference Kristal, Vizenor and Patterson 56 ). In the present study, the correlations between the predicted values, which can easily be calculated in future research by multiplying observed values from biomarkers and subject characteristics with the corresponding β coefficients from Table 4 and then adding these together, and the observed fruit and vegetable intake (both including and excluding juices) was markedly higher than the correlations between the observed intakes and the sum of carotenoids or any of the single biomarkers. Despite the model's good performance on average, there was some residual variation as well as an overestimation of low fruit and vegetable intake and an underestimation of high fruit and vegetable intake. Not all fruits and vegetables contain the same concentration of carotenoids and folate, and other foods in the diet also contain these nutrients. Therefore, the type of fruits and vegetables eaten and the diet as a whole influence the final biomarker concentrations in the blood. The present study tried to capture ‘normal’ diet effects as much as possible by excluding those studies that provided only a single type of fruit or vegetable and by including intervention arms that focused on carotenoid-rich or folate-rich and carotenoid-poor or folate-poor fruits and vegetables. In order to obtain the large-sample benefits of a meta-analysis, these different study types were grouped together. This was done because a number of studies were included, so we assumed that the applied regression analysis would average out the effects of individual studies and that at least the first approximation would not depend on the types of fruits and vegetables included. Obviously, the assumption is not true in an absolute sense, seeing as carrots, for example, contain more carotenoids than some other vegetables, and this will thus require further investigation.
Another source of variability may come from the different intervention durations. We excluded studies with a duration of less than 7 d because we assumed that it would take approximately 1 week to obtain a new steady state for the carotenoids after the change in diet was induced by the intervention( Reference Chopra, McLoone, O'Neill, Kumpulainen and Salonen 57 ). The actual duration of the studies included in the prediction models was much longer (Table 1).
Differences in the analytical methods used in the different studies may be another source of residual variation. In particular, folate levels were analysed using different assays, e.g. immunoassay and radioassay. Also, laboratory variability may be caused by different specimen collection and storage( Reference Blanck, Bowman and Cooper 58 ) techniques, among many other possible sources.
Sex, age, BMI and smoking impact on serum carotenoids, serum vitamin C and plasma folate levels as well as several other covariates, such as serum cholesterol, serum TAG and the consumption of alcohol, fat and energy, may also be related to the biomarkers( Reference Brady, Mares-Perlman and Bowen 59 – Reference van Kappel, Steghens and Zeleniuch-Jacquotte 63 ). It may be of interest to investigate whether these covariates could significantly improve the prediction model. However, the present data did not allow us to investigate this thoroughly.
Although significant efforts were made in all individual studies to encourage compliance to the study protocol (e.g. the supervised consumption of meals; see Table 1), the true intake of fruits and vegetables could not always be determined with absolute certainty because they relied on self-reports of compliance. In quite a number of the individual studies, compliance was investigated with, e.g., questionnaires or diaries, and most often this self-reported compliance was high.
Unfortunately, no external validation data was available for the prediction model. We chose to use all of the data from the diet-controlled intervention studies that were available to us to develop the models. To perform an external validation, data from other or new diet-controlled intervention studies would have to be obtained. Because this would be very complicated and because the data from such studies would preferably be used to develop or improve the present model rather than to just validate it, we mimicked independent data by using cross-validation to calculate the measures of performance( Reference Efron 64 ).
The use of individual participant data from diet-controlled intervention studies made it possible to model the dose–response curves and the prediction models for a large range of fruit and vegetable intake with a relatively large number of subjects using a more objective assessment of intake. However, between-study differences may have influenced the study results. In the dose–response analysis, we took clustering into account by using mixed-effects models( Reference Abo-Zaid, Guo and Deeks 65 ). For the prediction model, the marginal predictions (i.e. using only the fixed effects because the (unknown) random effects cannot be used in predictions for new subjects) from the random intercept linear regression model performed somewhat worse in cross-validation than the predictions from the standard regression model (data not shown), and we therefore chose to present the standard regression model. Bouwmeester et al. ( Reference Bouwmeester, Twisk and Kappen 66 ) found similar performance measures for a standard logistic regression model and a random intercept logistic regression model in a study on surgical patients that were clustered by anaesthesiologist. Recently, Debray et al. ( Reference Debray, Moons and Ahmed 67 ) developed an approach to deal with risk prediction in new patients that takes into account the random intercept after the model has been developed using individual participant data meta-analysis with mixed-effects modelling. In the present study, the performance of the conditional predictions was not considerably better than the performance of the standard predictions in an apparent validation (i.e. an internal validation based on the entire data, not using cross-validation) (data not shown).
In conclusion, the relatively strong correlations between predictions and actual intake indicate that the present prediction models may be used to investigate the ranking of individuals with regard to their intake of fruits and vegetables when validating questionnaires that measure intake (e.g. FFQ or 24 h recall). Furthermore, the low mean bias show that the models have good potential to be used to estimate average fruit and vegetable intake on a group level. The large limits of agreement indicate that the prediction models should not be used to estimate individual fruit and vegetable intake.
Supplementary material
To view supplementary material for the present article, please visit http://dx.doi.org/10.1017/S0007114515000355
Acknowledgements
The present research was financially supported by ZonMW (project number 200400014). ZonMW had no role in the design, analysis or writing of the present article.
The authors declare that there is no conflict of interest.
The authors' responsibilities were as follows: H. C. B. designed the research; R. F., B. W., A. B., E. R. M., J. J. M. C., W. J. P., K. v. d. H., M. C., A. K., L. O. D., R. W., C. I., L. B., K. O., C. A. v. L.-B. and T. H. J. N. provided essential data that was used for the present study; J. H. M. d. V. and H. v. d. V. provided essential advice; O. W. S. performed the statistical analysis; O. W. S. and H. C. B. wrote the paper; O. W. S. and H. C. B. had primary responsibility for final content. All authors read and approved the final manuscript.