Obtaining an accurate estimate of long-term habitual food intake remains the main challenge in diet–disease research. FFQ have been used as an epidemiological tool to assess diet for several decades (Willet, Reference Willett1990). Because short-term recalls and diet records are generally expensive and unrepresentative of usual intake and not good for assessment of past diet, FFQ have been the primary method of dietary assessment for most epidemiological studies. FFQ are easy to administer and relatively inexpensive to use in large populations. However, they are very sensitive to the cultural and dietary practices of the population of concern (Sharma et al. Reference Sharma, Cade, Jackson, Mbanya, Chungong, Forrester, Bennett, Wilks, Balkau and Cruickshank1996). Use of inappropriate food lists in the FFQ may result in underestimation of nutrients due to omission of key items (Sharma et al. Reference Sharma, Cade, Jackson, Mbanya, Chungong, Forrester, Bennett, Wilks, Balkau and Cruickshank1996). Thus, the validity and reliability of an FFQ needs to be evaluated for the specific population of concern. In addition, the use of methods with low validity seriously attenuates the association between nutritional intake and disease in epidemiological studies, a problem known as regression dilution (Day et al. Reference Day, McKeown, Wong, Welch and Bingham2001).
Methods to assess and interpret the validity and reproducibility of FFQ have tended to rely on correlation analysis of nutrients and/or foods measured by two or more dietary assessment methods. As correlation coefficients do not measure agreement, Bland–Altman plots (Bland & Altman, Reference Bland and Altman1986) have been used in conjunction with correlation coefficients in some studies (Thompson & Margetts, Reference Thompson and Margetts1993; Ambrosini et al. Reference Ambrosini, de Klerk, Musk and Mackerras2001; Flood et al. Reference Flood, Smith, Webb and Mitchell2004). In many epidemiological studies, however, the main concern is to classify individuals into different groups according to exposure levels rather than to assess their absolute intake. Thus, comparisons of percentage agreement in quartile distributions are also often used in the evaluation of the validity and reliability of an FFQ.
In 2002, we launched a population-based cohort study of 61 582 men aged 40–74 years in Shanghai, China (Shanghai Men's Health Study (SMHS)) with a main study focus on diet and cancer risk. One of the objectives of the study is to evaluate associations between diet and chronic disease, mainly via categorising participants according to their intake of both nutrients and food groups. We used an FFQ that was developed based on a similar dietary questionnaire used in a sister study, the Shanghai Women's Health Study (SWHS). The SWHS FFQ has been validated previously (Shu et al. Reference Shu, Yang, Jin, Liu, Kushi, Wen, Gao and Zheng2004). The SMHS FFQ includes 88·78 % of all foods that were commonly consumed by men in Shanghai at the time of the baseline survey. We conducted a dietary calibration study to evaluate the validity and reliability of the FFQ in the SMHS population and we report the results of the study in this paper.
Subjects and methods
The Shanghai Men's Health Study
Recruitment for the SMHS started in April 2002 and was completed in June 2006. A total of 83 125 male residents of eight communities of urban Shanghai between the ages of 40 and 70 years were invited to participate by trained interviewers through in-person contact. A total of 61 582 participants were enrolled in the study with a response rate of 74·1 %. Reasons for non-participation were refusal (21·1 %), out of the area during enrolment (3·1 %), and other miscellaneous reasons including poor health or hearing problems (1·7 %).
Dietary calibration study
Before the SHMS began, a pilot, dietary validation study was conducted among ninety-six male residents of Shanghai to estimate the sources of variation in their dietary intake and to determine the number of 24-hour dietary recalls (24-HDR) and samples needed to evaluate the validity and reliability of the FFQ in the study population (Cai et al. Reference Cai, Yang, Xiang, Hebert, Liu, Zheng and Shu2005). We administered twenty-four 24-HDR to the ninety-six men over a 1-year period. We found intra-person variations to be the main contributor to dietary intake variation and that a validation study could be adequately carried out with 12 d of dietary recalls for a study of 100 or more participants (Cai et al. Reference Cai, Yang, Xiang, Hebert, Liu, Zheng and Shu2005).
The dietary calibration study was initiated on 14 November 2004. Study participants were a random sample of SMHS participants. A total of 214 SMHS participants were recruited from two SMHS communities approximately 2–3 months after they completed the baseline survey. Fifteen interviewers conducted the study, with each being responsible for the follow-up of seventeen subjects throughout the 12-month study period. The study communities were chosen based on neighbourhood proximity to the residence of the interviewers. In total, seventeen primary and sixty-eight alternative contacts were identified for possible recruitment by each interviewer.
Among participants in this study, approximately 69·3 % were primary contacts. Study participants were contacted once a month during the 12-month period of the study to provide the name and amount of foods that they consumed over the preceding 24 h. The days that the 24-HDR were administered were chosen to ensure a balanced representation of weekdays and weekend days for each participant. All recalls were obtained by an unannounced in-person interview in the evening after dinner (around 19.00 h). At the end of the 12-month study, a second FFQ was administered. The second FFQ was completed by 196 subjects (with an average time interval between the administration of two FFQ of 1·2 years, range: 1·1–2·1 years).
Of the 196 participants with two FFQ and at least ten 24-HDR, there was one participant with an implausible average daily energy intake (9432 kcal/d). This subject was excluded from the analyses, resulting in 195 participants for the analysis.
Food-frequency questionnaire
The FFQ used in the SMHS was developed based on a similar dietary questionnaire that was used in a sister study, the SWHS. The SWHS FFQ has been validated previously (Shu et al. Reference Shu, Yang, Jin, Liu, Kushi, Wen, Gao and Zheng2004). A total of eighty-one food items were included in the FFQ used in the SMHS. For each food item or food group, subjects were asked how frequently (daily, weekly, monthly, yearly or never) they consumed the food or food group, which was followed by a question on the amount consumed in lians per unit of time. Lian is a unit of weight in China (1 lian = 50 g). The main purpose of this FFQ is to rank individuals along the distribution of dietary nutrient and food intake, so that individuals with low intake can be separated from those with high intake.
Statistical analysis
The Chinese Food Composition Tables (Yang et al. Reference Yang, Wang and Pan2002) were used to estimate the intake levels of major nutrients for study participants. Food groups were formed by combining the intake levels of selected foods with similar nutrients, phytochemicals or botanic classifications.
The validity and reliability of the FFQ were assessed by comparing the median nutrient and food intakes, agreement of quartile distribution of nutrients and food groups, and calculating correlations between the intakes derived from the two different dietary survey methods (FFQ versus 24-HDR) and the two different surveys (baseline and second FFQ). The nutrients and food groups were not normally distributed, and log transformation failed to normalise the distribution. Therefore, Spearman correlation coefficients were applied for the analysis. Virtually all the correlation coefficients presented herein were statistically significant with a P value < 0·01, and are not individually presented.
We also analysed the data using Bland–Altman plots. The Bland–Altman plot assesses the agreement between two dietary assessment methods across a range of intakes. The difference between the two methods was plotted against the average of the two methods. Natural-log (ln) transformations were performed in order to narrow the limits of agreement (LOA), as recommended by Bland & Altman (Reference Bland and Altman1986). The antilogs of the LOA were calculated yielding a ratio of FFQ over 24-HDR. The ratios were multiplied by 100, with 100 % representing ideal agreement.
Results
Participants in the dietary calibration study did not differ from the entire SMHS with regard to age, occupation, education, smoking status, alcohol consumption status, exercise reporting, weight, BMI, waist-to-hip ratio, waist circumference, daily energy intake or macronutrient intake (fat, protein, fibre and carbohydrates) at the baseline survey (Table 1). The dietary data shown in this table are from the information collected in the FFQ administered at study recruitment (baseline survey). Among the 195 dietary calibration study participants, three subjects completed eleven 24-HDR, while 192 subjects completed twelve 24-HDR. This resulted in a total of 2337 24-HDR. These dietary recalls represent 36 783 food entries and 655 unique food items. The FFQ accounted for 88·78 % of the total food entries (some as food groups) recorded in the 24-HDR.
SMHS, Shanghai Men's Health Study; WHR, weight-to-height ratio.
* From the FFQ.
Table 2 presents the median nutrient intake derived from the baseline, second FFQ, average of the 24-HDR and the percentage of differences. The median nutrient intakes assessed by the two FFQ agree considerably well, especially for macronutrients, with the differences in median intakes between the two assessments being 5·2 % (P = 0·13) for fat, 3·4 % (P = 0·17) for protein and 2·5 % (P = 0·42) for carbohydrates. The median differences in consumption of micronutrients are all under 9 % (P>0·05 for all nutrients). The nutrient intakes assessed on the second FFQ are, in general, lower than those assessed at the baseline FFQ, with the exception of fibre, vitamins B1 and B2, niacin and vitamin C consumption, which were 1·7, 1·8, 1·0, 0·6 and 1 %, respectively, higher on the second FFQ. There is also good agreement between nutrient intake assessed by the second FFQ and by the average of the 24-HDR (both assessments cover the same time period). Differences in median intake were between 4·4 (protein, P = 0·04) and 20·6 % (fibre, P < 0·001) for macronutrients and between 0·4 (niacin) and 31·8 % (vitamin C) for micronutrients (P < 0·05 for micronutrients with the exception of vitamin B2 and retinol). Compared with the 24-HDR, the FFQ tends to overestimate intake of most nutrients except for the consumption of protein, fat and niacin.
24-HDR, 24-hour dietary recall.
* FFQ1 – FFQ2/FFQ2.
† FFQ2 – 24-HDR/24-HDR.
The nutrient intakes assessed by the two FFQ approximately 1·2 years apart correlate reasonably well; the correlation coefficient (r) for nutrient intake ranges from 0·38 to 0·53 (Table 3). The correlations of nutrient intake between the second FFQ and the 24-HDR are between 0·33 (retinol) and 0·64 (carbohydrates).
24-HDR, 24-hour dietary recall.
Overall agreement between nutrient intake measured by the FFQ and the 24-HDR was assessed using Bland–Altman plots. The Bland–Altman plots showed that the difference in nutrient intake derived from the FFQ and 24-HDR did not appear to depend on the ‘true’ intake as assessed by the 24-HDR for either macronutrient or micronutrients. The Bland–Altman plots for total energy intake, fat, protein and carbohydrates are presented in Figs 1–4. Anti-logging rendered mean agreement and the LOA of 96·3 % (95 % CI 64–143), 82 % (95 % CI 39–172), 95·6 % (95 % CI 61–149) and 103·9 % (95 % CI 70–155), respectively, for kcal/d, fat, protein and carbohydrates. This suggest that, on average, the FFQ underestimates intake of energy, fat and protein by 3·7, 18 and 4·4 %, respectively, and overestimates carbohydrate intake by 3·9 %, as compared with that derived from multiple 24-HDR.
When the nutrient intakes were categorised into quartiles, the range of agreement rates for same quartile classifications were 31·8–43·6 % for nutrients derived from the two FFQ and 31·8–46·7 % for nutrients derived from the second FFQ and the 24-HDR. The agreement rates for classifying nutrient intakes into the same or adjacent quartiles were between 73·9 and 84·2 % for nutrients derived from the two FFQ and between 74·4 and 87·2 % for nutrients derived from the second FFQ and the 24-HDR. Misclassification of nutrient intake into extreme quartiles was rare (1·5–7·7 %).
Similar sets of analyses were conducted for selected food groups (Tables 4 and 5). With the exception of fish intake, the differences in consumption of food groups between the two FFQ were within 10 %. Intake of poultry, red meat, fish and eggs was lower according to the FFQ as compared with the 24-HDR, while the intake of other food groups was higher according to the FFQ as compared with the 24-HDR (P>0·05 for all food groups). Intake estimates were 28·9 % higher for vegetables, 77·7 % for soy and 57·4 % for fruits. Consumption was lower on the FFQ for poultry by 51·1 %, red meat by 35·4 %, fish by 21·0 % and eggs by 4·9 %. Despite the difference in median intake, the correlations for food group intake ranged from 0·39 to 0·64 when comparing the two FFQs and from 0·35 to 0·72 when comparing the second FFQ and the 24-HDR. Correlation coefficients between the two FFQ were 0·50 for soy, 0·43 for vegetables, 0·64 for fruits, 0·48 for poultry, 0·40 for red meat and 0·39 for eggs (Table 5). Correlation coefficients comparing the second FFQ and the 24-HDR were 0·54 for soy, 0·42 for vegetables, 0·72 for fruits, 0·35 for poultry, 0·45 for red meat and 0·41 for eggs. The agreement rates for food group consumption classified in the same quartile were 33·3–49·2 % for the two FFQ and 32·8–53·3 % for the second FFQ and the 24-HDR. The agreement rates were between 73·8 and 91·8 % for food group consumption classified into the same or adjacent quartiles. Only 1·5–6·2 % of subjects were misclassified into extreme quartiles of food group intake.
24-HDR, 24-hour dietary recall.
* FFQ1 – FFQ2/FFQ2.
† FFQ2 – 24-HDR/24-HDR.
24-HDR, 24-hour dietary recall.
Discussion
This report describes the validity and reproducibility of an FFQ designed to capture the usual intake of nutrients and major foods consumed by men in urban Shanghai. The reference method was repeated monthly 24-HDR that were conducted over a 12-month period. We evaluated the performance of the FFQ by comparing intake of nutrients and selected food groups obtained from this instrument with those derived from the 24-HDR. The results suggest that the SMHS FFQ has reasonable comparative validity and reproducibility, and can categorise major nutrient and food group intakes with relative accuracy among men in urban Shanghai. The performance of the SMHS FFQ is similar to that of many FFQ that have been used in other epidemiological studies (Willett et al. Reference Willett, Sampson, Stampfer, Rosner, Bain, Witschi, Hennekens and Speizer1985; Kaaks et al. Reference Kaaks, Slimani and Riboli1997; Margetts & Pietinen, Reference Margetts and Pietinen1997; Martinez et al. Reference Martinez, Marshall, Graver, Whitacre, Woolf, Ritenbaugh and Alberts1999; Mayer-Davis et al. Reference Mayer-Davis, Vitolins, Carmichael, Hemphill, Tsaroucha, Rushing and Levin1999; Stram et al. Reference Stram, Hankin and Wilkens2000).
The validity and reproducibility of the FFQ used by the Italian arm of the European Prospective Investigation into Cancer and Nutrition (EPIC) were evaluated in a way similar to our study (Pisani et al. Reference Pisani, Faggiano, Krogh, Palli, Vineis and Berrino1997). Two FFQ were administered 1 year apart and between eight and fourteen 24-HDR interviews were administered over a 1-year period. Pearson correlation coefficients of the relationship between questionnaire measurements and the individual average 24-HDR ranged from 0·28 for fat to 0·58 for carbohydrates for men. In the German arm of the EPIC study, correlation coefficients between the FFQ and monthly 24-HDR over 1 year were 0·37 for fat, 0·41 for protein, 0·41 for carbohydrates and 0·46 for fibre (Bohlscheid-Thomas et al. Reference Bohlscheid-Thomas, Hoting, Boeing and Wahrendorf1997). Our results are very similar to those of the two studies cited above, with correlation coefficients between the FFQ and 24-HDR being 0·38, 0·49, 0·48 and 0·64 for fat, protein, fibre and carbohydrates, respectively. The reproducibility correlation coefficients for two FFQ administered 6 months apart were between 0·59 and 0·69 for macronutrients in the German arm of the EPIC study (Bohlscheid-Thomas et al. Reference Bohlscheid-Thomas, Hoting, Boeing and Wahrendorf1997), while in our study the reproducibility for macronutrients derived from two FFQ administered 1·2 years apart is between 0·39 and 0·53.
In the Male Professionals Health Study, the validity and reproducibility of the FFQ was evaluated in 127 men by two 1-week dietary records and two FFQ given 1 year apart (Rimm et al. Reference Rimm, Giovannucci, Stampfer, Colditz, Litin and Willett1992). Correlation coefficients for log-transferred nutrient intake derived from the two FFQ administrated 1 year apart ranged from 0·47 to 0·72 for micronutrients (without supplements) and from 0·59 to 0·69 for macronutrients, while in our study correlation coefficients from the two FFQ ranged between 0·39 and 0·53 for macronutrients and between 0·38 and 0·53 for micronutrients. The correlation coefficients for nutrients derived from the second FFQ and the mean of the two 1-week dietary records were between 0·28 and 0·64 for micronutrients (without supplements) and between 0·25 and 0·63 for macronutrients. Our results compare well with those of the Male Professionals Health Study, with correlation coefficients between the FFQ and 24-HDR being between 0·38 and 0·52 for micronutrients and between 0·38 and 0·64 for macronutrients.
The validity and reproducibility of FFQs in multiethnic populations have also been investigated (Mayer-Davis et al. Reference Mayer-Davis, Vitolins, Carmichael, Hemphill, Tsaroucha, Rushing and Levin1999; Stram et al. Reference Stram, Hankin and Wilkens2000). In a multiethnic cohort study in Hawaii and Los Angeles, dietary information reported from a questionnaire was compared with three 24-HDR in a calibration substudy. Subjects from each of eight subgroups defined by sex and ethnic group (African-American, Japanese-American, Latino and white) were chosen randomly from among the cohort members. In males, estimates of the correlation between the questionnaire and 24-HDR for nutrients ranged from 0·17 to 0·64 for absolute nutrient intake. For absolute nutrient intakes, the correlations were greatest for whites, somewhat lower for Japanese-Americans and Latinos, and lowest for African-Americans (Kaaks et al. Reference Kaaks, Slimani and Riboli1997; Stram et al. Reference Stram, Hankin and Wilkens2000).
The reproducibility of the SMHS FFQ compares well with that of the SWHS FFQ (Shu et al. Reference Shu, Yang, Jin, Liu, Kushi, Wen, Gao and Zheng2004). The validation of the SWHS FFQ is slightly better than in the present study. For example, correlation coefficients between the FFQ and the 24-HDR were 0·60 for protein, 0·59 for fat, 0·66 for carbohydrates and 0·55 for fibre in the SWHS, while in the present study the correlation coefficients were 0·49, 0·38, 0·64 and 0·48 for protein, fat, carbohydrates and fibre, respectively. One possible reason for this could be that in Shanghai, women are mainly responsible for purchasing and preparing foods for the family and, thus, are more likely to estimate their dietary intake accurately than men.
A wide LOA indicates that the potential for large differences between methods and agreement is considered poor, even if the bias and dependency are small. The LOA of the SMHS FFQ as compared with multiple 24-HDR are narrower than those reported in two previous studies (Ambrosini et al. Reference Ambrosini, de Klerk, Musk and Mackerras2001; Flood et al. Reference Flood, Smith, Webb and Mitchell2004). Direct comparison with other studies was not possible because the data were not presented as log-transformed (MacIntyre et al. Reference MacIntyre, Venter and Vorster2001; Bakker et al. Reference Bakker, Twisk, van Mechelen, Mensink and Kemper2003).
In our study, the Bland–Altman plots did not show that the over- or underestimation of dietary intake of the SMHS FFQ depends on the ‘true’ nutrient intake levels. Thus, misclassification in absolute nutrient intake amount is less likely to cause systematic biases.
Finding a gold standard for measuring long-term dietary intake is the most challenging obstacle in assessing the validity of a dietary instrument. In our study, we chose monthly 24-HDR over a 1-year period as the reference method for assessing usual dietary intake. For the 214 subjects who participated in the calibration study, 93 % completed at least ten 24-HDR. The multiple recalls meant we were able to minimise the effect of daily and seasonal variation in dietary intake on the dietary assessment. In order to minimise the possibility that study participants might change their dietary intake to facilitate dietary recall, none of the dietary recall interviews was scheduled in advance (i.e. all were unannounced). Thus, we believe that the dietary recall information obtained from this study is a fairly accurate measurement of the true usual intake for this study population over a 1-year period.
However, the fact remains that multiple dietary recalls may sensitise study participants regarding their dietary intake, and thus participants may answer the FFQ more accurately, resulting in an overestimation of the true validity of the FFQ. When we calculated the correlation between the first FFQ and the mean of the 24-HDR, we found that the correlations were slightly lower than those observed for the second FFQ. On the other hand, changes in dietary intake during the year could reduce the correlation and increase the mean differences between both instruments. Nutrient and food intake derived from the second FFQ was, in general, lower than that derived from the first FFQ, which might be due to changes in diet. To address whether a possible change in diet might have resulted in a low estimation of the correlation coefficients, we restricted the analysis to those participants that reported ‘no change’ to the question ‘Compared with 5 years ago, have you changed your dietary habits in the past year?’ No major changes in correlation coefficients were found.
Assuming the 24-HDR are close to ‘true’ intake, we found that the SMHS FFQ overestimated soy, fruit and vegetable intake, and slightly underestimated poultry, red meat, fish and egg intake. Some of the measurement error may reflect a bias of study participants seeking social approval (Hebert et al. Reference Hebert, Clemow, Pbert, Ockene and Ockene1995). The substantial overestimation of fruit and vegetable intake is not likely to be accounted for by the seasonal variation of supply of these foods, since our pilot calibration study of this FFQ found little variance (within 5 %) for day of the week or season of the year (Cai et al. Reference Cai, Yang, Xiang, Hebert, Liu, Zheng and Shu2005). However, correlation coefficients for these food groups are quite good. Thus, although the SMHS does not appear to do well in estimating the absolute amount of intake of selected groups, it can classify subjects reasonably well in terms of their relative intake levels. The latter is particularly important for epidemiological studies, since it is categorised dietary intake rather than the absolute amount of intake that has been more commonly used in epidemiological studies of diet and chronic disease.
As the recall time interval and format of the questionnaire used by the two methods are different, the errors in these two instruments are less likely to be correlated. The FFQ asked about frequency and amount of consumption for a list of foods (close-ended questions) during the preceding 12-month period, while the 24-HDR employed an open-ended question format and inquired about intake of foods that the study participants ate during the preceding 24 h period. However, we acknowledge that both methods suffer from the social approval bias (Hebert et al. Reference Hebert, Clemow, Pbert, Ockene and Ockene1995) as they are both self-reported.
As an alternative to self-reported methods of dietary intake, biomarkers have been used for validation purposes. Advantages of biomarkers as opposed to self-reported methods are that they are objective, unbiased and their errors are uncorrelated to the FFQ. However, biomarkers are not always available, are expensive and may not necessarily reflect long-term dietary intake habits (Willett, Reference Willett1990). In the UK EPIC validation studies, the accuracy of several methods was assessed by comparison with another self-report instrument (weighed records) and the biomarkers, 24 h urine nitrogen and potassium, plasma carotenoids and plasma vitamin C (Bingham et al. Reference Bingham, Gill and Welch1997). The correlations between nitrogen from weighed records and estimated food diaries and urinary nitrogen were better than those from other methods, while the results for urinary potassium and serum carotenoids were similar among all methods. We collected four spot urine samples and quarterly blood samples during the 1-year study period from most participants of the study. These samples will be used for further validation of the FFQ when funding becomes available.
Random and uncorrelated measurement errors cause attenuation of relative risk estimates and decrease the statistical power of epidemiological studies. Willett estimated that if a true relative risk is 2·0, the observed relative risk will be attenuated to 1·62 if the correlation between the estimated and true dietary exposure is 0·7 or to 1·32 if the correlation is 0·40 (Willett, Reference Willett1998). Methods for correction of attenuation and regression diluted bias have become available (Johansson et al. Reference Johansson, Hallmans, Wikman, Biessy, Riboli and Kaaks2002). However, calibration studies require certain assumptions about the independence of measurement errors, and biomarkers of dietary intake may better fulfil this requirement (Kaaks et al. Reference Kaaks, Ferrari, Ciampi, Plummer and Riboli2002). We will apply results obtained from this analysis in conjunction with future biomarker studies to calibrate the risk assessment for diet and disease associations in our future research.
In summary, this study suggests that the SMHS FFQ can reasonably categorise usual intake of major nutrients and food groups among men in Shanghai. The SMHS FFQ, however, may under- or overestimate the absolute amount of intake of some nutrients or foods.
Acknowledgements
This study was supported by research grant RO1 CA 82 729 from the National Cancer Institute. R. V., G. Y., D. K. L., Y. B. X., W. Z. and X. O. S. drafted and provided critical revision of the manuscript; R. V., G.Y., H.C. and X. O. S. analysed and interpreted the data; G. Y., D. K. L., B. Y. X. and X. O. S. were responsible for implementation of the study and acquisition of the data; and W. Z. and X. O. S. conceived of and designed the study. The authors have no conflicts of interest to declare.