Bone health benefits derived from consuming dairy products, the largest contributors of Ca to the Western diet( 1 ), are well documented in the scientific literature( Reference Heaney 2 ). There is some evidence for the role of dairy food consumption in reducing the risk of chronic disease, i.e. obesity and hypertension, but further studies are required to confirm a causal relationship( Reference Spence 3 ). An adequate Ca intake is particularly important during childhood and adolescence to optimise peak bone mass and consequently is a readily modifiable factor in reducing osteoporosis risk( Reference Heaney 4 ). However, intake studies suggest that while younger children appear to consume appropriate amounts of dairy foods and are meeting the recommendations for Ca intake, older children and adolescents tend to consume less dairy foods than their younger counterparts and thus fail to meet recommended intakes of Ca( 5 – Reference Baird, Syrette and Hendrie 7 ).
In order to address shortfalls in Ca intake in a population, accurate assessment of Ca intake is required to identify those with suboptimal intakes, thus enabling public health strategies to be targeted appropriately. As dairy foods are the principal source of Ca in the Western diet, assessment of dairy intake can potentially provide a reasonable estimate of Ca intake. For population-wide assessment of Ca intake, or dairy as a proxy for Ca, a dietary intake tool that is short, reliable, relatively simple and inexpensive to administer is ideal.
For a tool to be recommended for use it needs to be reliable and valid. Reliable tools produce consistent results when performed under similar circumstances, e.g. at repeat administrations and/or when conducted by different researchers( Reference Lang and Secic 8 ). Criterion validity identifies the ability of a tool to accurately measure what it proposes to measure, usually determined by how closely the results match those of a reference test( Reference Lang and Secic 8 , Reference Jones 9 ). The present paper is the first of two reviews with the overall aims to identify published tools that estimate Ca and/or dairy intake in children, adolescents and adults and to assess the testing of tool properties in order to recommend a tool(s) for use. The current paper focuses on tools developed for use with children and adolescents.
Methods
A comprehensive search was completed to identify existing tools to measure dairy and/or Ca intake. The search was conducted using the databases MEDLINE, Scopus, Ovid, Informit and Web of Knowledge, with the keywords ‘calcium’, ‘dairy’, ‘milk’, ‘diet’, ‘nutrition’ and ‘food’, combined with ‘tool’, ‘questionnaire’, ‘FFQ’, ‘measurement’, ‘assessment’, ‘evaluation’ and ‘analysis’. The search was not limited by dates, but databases were searched from their year of inception, the earliest being 1948 in the case of MEDLINE. The search was limited to English-language papers only. An identical search was conducted in Google Scholar to identify any relevant tools or papers in the grey literature. Additional articles were identified by searching the reference lists of the articles found. No age criterion was applied.
Two authors (K.M., L.B.) sorted the articles for relevance and where disagreement arose a third author (M.M.) provided input. Where controversy remained, the relevance of the article was discussed and a final decision made regarding inclusion. Articles that referred to the development of a new tool to measure dairy and/or Ca intake in Western populations and did or did not test for validity or reliability were deemed relevant. Articles that (i) measured dairy and/or Ca intake in non-Western countries (due to differences in the major food sources of Ca), (ii) utilised existing tools but did not test these for validity or reliability in the study sample, (iii) were not in English, (iv) utilised 24 h recalls, food records or diet histories to measure dietary intake, (v) did not assess dairy or Ca intake or (vi) were published abstracts only were considered to be irrelevant for the purposes of the current review. From the initial 121 articles seventy-three were excluded, leaving forty-eight relevant articles for examination. Of these forty-eight, eighteen were categorised as relevant for children/adolescents.
Tools described in the articles were classified as (i) dairy assessment tools that assess the quantity or frequency of dairy food intake or (ii) Ca assessment tools that estimate absolute Ca intake or categorise intake into specific levels, e.g. ≥800 mg Ca/d or <800 mg Ca/d. Some tools assessed intake of dairy foods or dairy foods and other Ca-containing foods and quantified or classified Ca intake and quantified dairy food intake in terms of quantity or frequency. These tools were considered to be both Ca and dairy assessment tools.
When assessing reliability and validity of tools a sample size of at least 100 subjects was considered acceptable( Reference Willett 10 ), tests of association (correlation coefficients) were considered weak statistical analysis, whereas tests that measured agreement (Bland–Altman or kappa coefficient, κ) and/or sensitivity and specificity were considered to provide strong systematic analysis( Reference Peat, Mellis and Williams 11 ). A mean difference of 100 mg (representing about 10 % of recommended daily intake or one-third of a serving of dairy products) was considered clinically significant. Further, a value κ >0·5 was considered moderate agreement, κ >0·7 good agreement and κ >0·8 very good agreement( Reference Peat, Mellis and Williams 11 ).
Results
Eighteen articles reporting on seventeen tools that had been used in those aged less than 18 years were identified (Table 1). For two tools there were two articles reporting on each( Reference Magkos, Manios and Babaroutsi 12 – Reference Lanfer, Hebestreit and Ahrens 15 ), and another article reported on two other tools( Reference Marshall, Eichenberger Gilmore and Broffitt 16 ). Only two of the tools assessed dairy intake, ten assessed Ca intake and five assessed both. Details of each of the tools are provided in Table 1. The tools were used in a range of population groups of differing age, gender and race and with the exception of the study by Taylor et al. ( Reference Taylor, Lamparello and Kruczek 17 ), all subjects could be considered relatively healthy participants.
D, dairy; Q-FFQ, quantitative FFQ; SQ-FFQ, semi-quantitative FFQ.
*Purpose: number of items in the tool.
Tool characteristics
All tools used an FFQ, with varying response options covering a variable period. Ten were quantitative allowing an estimate of milligrams of Ca, six were semi-quantitative and one was qualitative. The quantitative tools allowed varying serving sizes, the semi-quantitative tools provided a standard serving size and the qualitative tool asked for frequency of consumption of each food only. In terms of food coverage, five tools included dairy products and other foods that make an important contribution to Ca intake. Two tools included Ca-containing foods plus those foods that may displace dairy or other high-Ca foods( Reference Jensen, Gustafson and Boushey 18 , Reference Wong, Boushey and Novotny 19 ) and another included ‘dummy’ foods to mask the intent of the tool( Reference Marcotte, Hennessy and Dwyer 20 ). Three tools were designed to assess several nutrients and the food coverage reflected this( Reference Marshall, Eichenberger Gilmore and Broffitt 16 , Reference Bellu, Riva and Ortisi 21 , Reference Vereecken and Maes 22 ). The remaining six tools were general FFQ that were developed and tested for their ability to assess dairy and/or Ca intake.
Visual aids to assist participants to identify and quantify foods were provided with five tools. Six tools took less than 15 min to complete, indicating a level of user-friendliness, and one tool took 30 min( Reference Bertoli, Petroni and Pagliato 23 ). Information regarding average completion times was not reported for ten tools; however, estimates from similar tools indicated that four of these could be completed in <15 min while the remaining six tools would likely require longer than this. Most tools required computer analysis or professional assistance to determine total daily Ca intake and/or adequacy. A few tools were able to provide an immediate indication of daily Ca intake, through a computer-based FFQ which automatically calculates intake for the participant( Reference Wong, Boushey and Novotny 19 ), or via manual calculation with or without reference to the recommended dietary intake( Reference Taylor, Lamparello and Kruczek 17 , Reference Marcotte, Hennessy and Dwyer 20 ), or via a simple scoring system to grade adequacy of intake( Reference Yang, Martin and Boushey 24 ).
Tool properties
All tools had one or more properties tested. Only ten papers reported test–retest reliability and these were for two dairy tools, one dairy/Ca tool and seven Ca tools (Table 2). The statistical analyses varied with correlation (Pearson, Spearman or intra-class) being the most frequently used test and six studies reporting mean values for each administration, with only Lanfer et al.( Reference Lanfer, Hebestreit and Ahrens 15 ) not reporting standard deviations or conducting t-test analysis. Three studies reported κ values( Reference Lanfer, Hebestreit and Ahrens 15 , Reference Vereecken and Maes 22 , Reference Huybrechts, De Bacquer and Matthys 25 ) and two of these studies also reported classification agreement( Reference Vereecken and Maes 22 , Reference Huybrechts, De Bacquer and Matthys 25 ). The tools were mostly used in children aged 10 years and older, with eight of the studies having a sample size over 100 and the other two less than fifty. The period between the two administrations of the tool varied from a minimum of 1 h( Reference Marcotte, Hennessy and Dwyer 20 ) to 1 year( Reference Rockett, Wolf and Colditz 26 ), with most being 1 to 4 weeks. Rockett et al. reported reliability for only one dairy item for which the correlation was modest (0·56)( Reference Rockett, Wolf and Colditz 26 ). The more comprehensive results of the study by Veerecken and Maes assessing four dairy items in two age groups demonstrated only moderate repeatability for most of the items with the younger age group tending to perform better than the older group( Reference Vereecken and Maes 22 ). Similarly, the five dairy items in the Child Eating Habits Questionnaire assessed by Lanfer et al. also showed moderate repeatability( Reference Lanfer, Hebestreit and Ahrens 15 ).
D, dairy; n, sample size; min, minimum; max, maximum; S, Spearman's correlation; P, Pearson's correlation; ICC, intra-class correlation; TP, transformed Pearson's correlation; CC, cross-classification; κ, kappa coefficient.
*Purpose: number of items in the tool.
Huybrechts et al. reported a weighted κ value of 0·60 representing only ‘good’ agreement for their tool but a high correlation (0·80), a non-significant mean difference of only 24 mg, and 57 % correctly classified and no gross misclassification( Reference Huybrechts, De Bacquer and Matthys 25 ). Only two other studies( Reference Jensen, Gustafson and Boushey 18 , Reference Wong, Boushey and Novotny 19 ) reported correlation values as high as Huybrechts et al.( Reference Huybrechts, De Bacquer and Matthys 25 ) and both studies explored test–retest reliability by ethnicity (Asian, Hispanic, non-Hispanic White) and by various age groups. While correlation was highest for 14/15- to 18-year-olds compared with 10/11- to 13/14-year-olds, there were no clear ethnic group or gender differences. One of these two studies( Reference Wong, Boushey and Novotny 19 ) reported a statistically significant mean difference of approximately 200 mg Ca between repeat administrations which is clinically unacceptable. Similarly, Harnack et al. demonstrated a clinically unacceptable and statistically significant difference in mean Ca, on repeat administration of their tool, of ∼100 mg despite an acceptable intra-class correlation of 0·66 to 0·79( Reference Harnack, Lytle and Story 27 ).
Two other studies( Reference Yang, Martin and Boushey 24 , Reference Zemel, Carey and Paulhamus 28 ) reported correlations (0·74, 0·76) near that of Huybrechts et al.( Reference Huybrechts, De Bacquer and Matthys 25 ) and Yang et al. ( Reference Yang, Martin and Boushey 24 ) also reported a non-significant mean difference on repeat administrations. Lower correlations were reported by Rockett et al. (0·58), who also reported a significant and unacceptable mean difference of 150 mg( Reference Rockett, Wolf and Colditz 26 ), and Marcotte et al. (0·49–0·67), who reapplied the tool in many cases only 1 h after the first administration, and hence the true correlation for a more appropriate repeat administration of at least 1 week remains unknown( Reference Marcotte, Hennessy and Dwyer 20 ).
Fifteen studies reported relative validity, three providing data on dairy food intake using four tools( Reference Marshall, Eichenberger Gilmore and Broffitt 16 , Reference Vereecken and Maes 22 , Reference Vereecken, Covents and Maes 29 ) and fourteen reporting Ca intake using fourteen tools (Table 3). A range of reference methods were used across the fifteen papers. Three studies reported using a 7 d weighed food record( Reference Bellu, Riva and Ortisi 21 , Reference Bertoli, Petroni and Pagliato 23 , Reference Zemel, Carey and Paulhamus 28 ) and five studies reported using an estimated food record of 3 to 7 d( Reference Marshall, Eichenberger Gilmore and Broffitt 16 , Reference Taylor, Lamparello and Kruczek 17 , Reference Huybrechts, De Bacquer and Matthys 25 , Reference Vereecken, Covents and Maes 29 ), one of which was web-based( Reference Vereecken, Covents and Maes 29 ). Three studies used a single 24 h recall( Reference Magkos, Manios and Babaroutsi 12 , Reference Magkos, Manios and Babaroutsi 13 , Reference Marcotte, Hennessy and Dwyer 20 ) and four( Reference Jensen, Gustafson and Boushey 18 , Reference Wong, Boushey and Novotny 19 , Reference Harnack, Lytle and Story 27 , Reference Rockett, Breitenbach and Frazier 30 ) used multiple 24 h recalls. Huybrechts et al. assessed validity v. urinary Ca excretion( Reference Huybrechts, Börnhorst and Pala 14 ).
D, dairy; n, sample size; A, Asian, H, Hispanic; W, white; AN, anorexia nervosa; C, control; AA, African American; NAA, non-African American; G, girls; B, boys; CC, cross-classification; ICC, intra-class correlation; κ, kappa coefficient; BA, Bland-Altman; LOA, limits of agreement; SE, sensitivity; SP, specificity.
*Purpose: number of items in the tool.
An array of statistical tests were performed and included correlation, comparison of mean values, Bland–Altman plots, agreement using κ, cross-classification, and assessment of sensitivity and specificity (Table 3).
Four tools, reported in three papers, estimated various dairy products with both mean values and correlations for the tool and reference method presented( Reference Marshall, Eichenberger Gilmore and Broffitt 16 , Reference Vereecken and Maes 22 , Reference Vereecken, Covents and Maes 29 ). With the exception of the seventy-five-item Block Kids’ Food Questionnaire reported by Marshall et al., all studies showed significant differences for milk intake between the test and reference tool. Similarly there were significant differences for most other dairy products. While correlations for milk questions were good to high, they were poor for other dairy foods. Values of κ and cross-classification results generally supported the poor performance of the tools.
All studies using tools to estimate Ca intake reported results of a correlation test between the estimate for the tool and the reference method. Few results were greater than 0·5, suggesting moderate association at best. Thus a discussion of the values reported for correlation coefficients (Pearson, Spearman, intra-class) will not be provided in the present review as the meaningfulness of these analyses is questionable. However, the values are reported in Table 3 to allow the reader to consider these alongside other findings.
All studies also reported mean Ca intake estimated by the tool and the reference method. Eight papers reporting on seven tools showed significant differences in the overall sample or in subgroups( Reference Magkos, Manios and Babaroutsi 12 , Reference Magkos, Manios and Babaroutsi 13 , Reference Jensen, Gustafson and Boushey 18 , Reference Bertoli, Petroni and Pagliato 23 , Reference Huybrechts, De Bacquer and Matthys 25 , Reference Harnack, Lytle and Story 27 – Reference Vereecken, Covents and Maes 29 ). In addition, Marshall et al. reported significant mean differences for the twenty-two-item targeted nutrient questionnaire but not for the seventy-five-item Block Kids’ Food Questionnaire( Reference Marshall, Eichenberger Gilmore and Broffitt 16 ). Four other authors found no statistically significant differences in comparison with the reference method( Reference Taylor, Lamparello and Kruczek 17 , Reference Marcotte, Hennessy and Dwyer 20 , Reference Bellu, Riva and Ortisi 21 , Reference Rockett, Breitenbach and Frazier 30 ). For these studies the mean differences ranged from values that would not be considered clinically meaningful, <10 mg( Reference Marshall, Eichenberger Gilmore and Broffitt 16 ) and <20 mg( Reference Bellu, Riva and Ortisi 21 ), to values that are approaching concern: ∼66 mg( Reference Rockett, Breitenbach and Frazier 30 ), ∼70 mg( Reference Taylor, Lamparello and Kruczek 17 ) and ∼120 mg( Reference Marcotte, Hennessy and Dwyer 20 ). While no P value was provided, the mean difference reported by Wong et al. was 127 mg( Reference Wong, Boushey and Novotny 19 ).
Only three studies reported sensitivity and specificity analysis( Reference Magkos, Manios and Babaroutsi 12 , Reference Magkos, Manios and Babaroutsi 13 , Reference Huybrechts, De Bacquer and Matthys 25 ) (Table 3). Sensitivity values ranged from 62 %( Reference Huybrechts, De Bacquer and Matthys 25 ) to 82·8 %( Reference Magkos, Manios and Babaroutsi 13 ) and specificity values ranged from 54·9 %( Reference Magkos, Manios and Babaroutsi 13 ) to 77 %( Reference Huybrechts, De Bacquer and Matthys 25 ). A similar concept to sensitivity and specificity is the calculation of cross-classification statistics, a model applied by far more studies (n 8)( Reference Magkos, Manios and Babaroutsi 12 , Reference Magkos, Manios and Babaroutsi 13 , Reference Marshall, Eichenberger Gilmore and Broffitt 16 – Reference Jensen, Gustafson and Boushey 18 , Reference Bertoli, Petroni and Pagliato 23 , Reference Huybrechts, De Bacquer and Matthys 25 , Reference Zemel, Carey and Paulhamus 28 ). While one study( Reference Bertoli, Petroni and Pagliato 23 ) did achieve levels of correct classification similar to those that reported sensitivity and specificity( Reference Magkos, Manios and Babaroutsi 12 , Reference Magkos, Manios and Babaroutsi 13 , Reference Huybrechts, De Bacquer and Matthys 25 ), none of these additional studies achieved a smaller level of gross misclassification than those studies (5–26 % v. 1–2·6 %).
Six studies used Bland–Altman plots to illustrate the level of agreement( Reference Magkos, Manios and Babaroutsi 12 , Reference Magkos, Manios and Babaroutsi 13 , Reference Marcotte, Hennessy and Dwyer 20 , Reference Bertoli, Petroni and Pagliato 23 , Reference Huybrechts, De Bacquer and Matthys 25 , Reference Vereecken, Covents and Maes 29 ). The plots identify the mean bias and limits of agreement between methods and identify whether a tool is valid for assessment of Ca intake at the individual and population levels( Reference Bland and Altman 31 ). The mean bias between the tool and reference method ranged from ∼50 mg( Reference Vereecken, Covents and Maes 29 ) to 250 mg( Reference Marcotte, Hennessy and Dwyer 20 , Reference Bertoli, Petroni and Pagliato 23 ). The difference between the lower and upper limit of agreement varied widely between studies but all were greater than 900 mg, indicating that none of the tools evaluated would be suitable for use at an individual level.
Discussion
The current review identified eighteen papers that reported on two tools that assessed dairy intake, ten that assessed Ca intake and five that assessed both dairy and Ca. Overall assessment of tool properties indicated a general lack of evidence to support use of any of the tools. The most common limitation of the testing of tool properties was the high reliance on correlation, which assesses association but not agreement. Ideally, validity is assessed using a measure of agreement such as Bland–Altman analysis( Reference Lang and Secic 8 ) and a measure of sensitivity and specificity (the ability of a test to correctly identify true positives and true negatives as in the case of identifying adequate intake).
In addition, when assessing validity it is important to determine a clinically meaningful level of significance as opposed to statistical significance. None of the papers defined a level of clinical significance at which the results were meaningful in terms of dietary adequacy. This lack of recognition between statistically and clinically significant results limits conclusions relevant to clinical practice. We applied a 100 mg Ca cut-off when assessing studies.
Tests of reliability were reported for ten tools only and of these only four studies reported mean difference for the repeat administration (a fifth reported the P value only) and only two reported cross-classification. Based on these results and the overall lack of tests, only one tool assessing Ca( Reference Huybrechts, De Bacquer and Matthys 25 ) and one tool assessing dairy( Reference Vereecken and Maes 22 ) could be considered reliable.
Validation studies of the three tools assessing dairy intake suggest that these tools are not valid for estimating dairy intake. With respect to tools that assess Ca, validity results vary such that some should be considered with caution while others appear to have acceptable levels of agreement and/or sensitivity and specificity indicating adequate validity. The tools that appear to be best in levels of validity are those developed by Huybrechts et al.( Reference Huybrechts, De Bacquer and Matthys 25 ), Vereecken et al.( Reference Vereecken, Covents and Maes 29 ) and Magkos et al.( Reference Magkos, Manios and Babaroutsi 12 ). Each of these studies included a minimum of 216 children and reported less than 100 mg difference between the test tool and the reference method. It is important to note that the two Belgian studies were tested in children aged ∼2 to 7 years( Reference Huybrechts, De Bacquer and Matthys 25 , Reference Vereecken, Covents and Maes 29 ), while a more suitable tool for older children might be the tool reported by Magkos et al. as it has been tested in children aged 10 to 15 years( Reference Magkos, Manios and Babaroutsi 12 ). Importantly, only one of these tools was tested for reliability( Reference Huybrechts, De Bacquer and Matthys 25 ).
In addition to the results of testing of tool properties it is important to consider the quality of the study design used in providing these findings. Key study design criteria include level of evidence, potential sources of error and bias, and sample size. All studies had level III-2 level of evidence, as defined by the National Health and Medical Research Council evidence hierarchy for diagnostic accuracy( 32 ), i.e. there was a comparison with a reference standard. They do not meet the criterion for higher evidence which requires an independent blinded comparison with a valid reference standard, thus introducing a risk of positive respondent bias.
Possible sources of error in the methodology are limitations associated with the reference methods used for comparison with the identified tools. As there is no biomarker for Ca status, only relative validity can be assessed, i.e. against another method, and thus there will be some inherent limitations. Sample size is important for reliability and validation studies with a sample size of 100 suggested as the minimum( Reference Willett 10 ). Most studies, and those potentially recommended, met this criterion.
Potential positive respondent bias was present across all of the studies as none were blinded; thus participants were aware of the study purpose and may have altered their responses with the aim of conforming to the recommendations regarding Ca or dairy intake, resulting in over-reporting of intake. Potential recall bias is always associated with an FFQ and the degree will depend on the cognitive ability of the subjects and the period to which the FFQ refers (e.g. the last week, the last year). With the exception of one study, samples appeared representative of the population indicating absence of recruitment bias.
The ability for a tool to detect change is an additional attribute and some argue this should be a third essential property of a tool( Reference Streiner and Norman 33 ). Sensitivity to change describes a tool's ability to measure any degree of change and responsiveness describes a tool's ability to measure clinically important change. These parameters can be measured in a number of ways. The most frequently used measure is Cohen's effect size: the ratio of the mean difference (of the two measures) to the standard deviation at baseline. Whether or not a statistically significant change is detected depends on the sample size. However, if the sample is large a small change will be significant although such a change may not be clinically significant. Importantly, it is not appropriate to measure change within subjects if the reliability of the tool is less than 0·5( Reference Streiner and Norman 33 ). None of the tools described/recommended specifically reported the tool's ability to detect change.
Based on the review of methods and results of the reliability and validity testing of identified tools, there is no recommended tool for assessing dairy intake in children but three for assessing Ca intake. Appropriate testing methods for validity and demonstrated adequate levels of sensitivity, specificity and/or agreement were reported for all these tools but only one was tested for reliability, which was shown to be moderate( Reference Huybrechts, De Bacquer and Matthys 25 ). Interestingly, there were few common elements between these tools with the number, type and scope of questions and the target group varying. The tool by Magkos et al. was the shortest, containing thirty questions, assessing semi-quantitatively the intake of principally Ca-containing foods only in the previous 12 months and self-completed by children aged 10–14 years( Reference Magkos, Manios and Babaroutsi 12 ). The tool by Huybrechts et al. was parent-completed for young children (2–6 years), contained forty-seven items most of which were high Ca-containing foods, assessed semi-quantitatively for the previous 12 months( Reference Huybrechts, De Bacquer and Matthys 25 ). The tool reported by Vereecken et al. was also parent-completed for very young children but had seventy-seven items assessing quantitatively total diet in the previous 3 months( Reference Vereecken, Covents and Maes 29 ). In conclusion, there is an obvious need for development and rigorous testing of tools to assess Ca and/or dairy intake in children and adolescents.
Acknowledgements
Sources of funding: This review was supported by Dairy Australia Inc. The funder had no role in the analysis or writing of this article. Conflicts of interest: There is no conflict of interest. Ethics: Ethical approval was not required. Authors’ contributions: A.M., A.Y. and M.M. designed the study, oversaw the implementation, checked data extraction, contributed to the writing of the manuscript and commented on all drafts. K.M. and L.B. undertook the literature search, extracted all of the data, contributed to writing of the manuscript and commented on all drafts.