Overweight and obesity among children and adolescents are a major threat to population health( Reference Lobstein, Baur and Uauy 1 ). Over the past decades unhealthy changes in lifestyles have accelerated due to economic development, industrialisation and globalisation. Dietary patterns are shifting, while energy expenditure is declining( 2 ). Unhealthy diets and lack of physical activity are among the leading causes of the major non-communicable diseases, including CVD, type 2 diabetes and certain types of cancer, and contribute substantially to the global burden of disease, disability and mortality( 3 ).
School-based interventions aimed at promoting healthy diets and physical activity may contribute to prevention of overweight in children and adolescents( Reference Brown and Summerbell 4 ). In 2002 the Dutch Obesity Intervention in Teenagers (DOiT) was developed with the aim of preventing excessive weight gain in Dutch adolescents aged 12–14 years. DOiT especially targets adolescents from lower socio-economic backgrounds. The goal of this comprehensive school-based prevention programme is to induce behavioural changes concerning energy intake and energy expenditure. DOiT focuses on a range of energy balance-related behaviours (EBRB), i.e. consumption of sugar-containing beverages, consumption of high-energy (high-caloric) snacks and sweets, levels of sedentary behaviour and levels of physical activity( Reference Singh, Chinapaw and Kremers 5 ). DOiT proved to be effective in inducing favourable changes in body composition as indicated by sums of skinfold thickness in girls, as well as consumption of sugar-containing beverages in both boys and girls as indicated by self-reports( Reference Singh, Chinapaw and Brug 6 ).
To measure changes in EBRB and assess the effectiveness of interventions, reliable and valid instruments are vital( Reference McDowell 7 , Reference Steiner and Norman 8 ). Questionnaires are widely used for the assessment of behavioural changes in population-based studies, since they are inexpensive and easy to administer( Reference Montoye, Kemper and Saris 9 ). Unfortunately, most questionnaires focus on one specific behaviour and have limited or unknown psychometric characteristics( Reference Chinapaw, Mokkink and van Poppel 10 – Reference Lubans, Hesketh and Cliff 12 ). Since there were no reliable and valid questionnaires addressing the whole range of EBRB targeted by the DOiT intervention, we developed the DOiT questionnaire.
Based on its favourable effects in the initial trial( Reference Singh, Chinapaw and Brug 6 ), we made the DOiT intervention available for all pre-vocational schools in the Netherlands, accompanied by research on implementation and process data( Reference Van Nassau, Singh and van Mechelen 13 ). The current study aimed to assess the test–retest reliability and construct validity of the DOiT questionnaire.
Methods
DOiT questionnaire
The self-administered DOiT questionnaire was developed to assess EBRB in adolescents (aged 12–14 years) of pre-vocational secondary schools. The questionnaire was divided into nine sections, i.e. (A) demographic characteristics, (B) consumption of sugar-containing beverages (soft drinks and fruit juices; diet sodas excluded), (C) consumption of high-energy snacks and sweets (high in sugar and fat), (D) breakfast behaviour and frequency of meals, (E) screen behaviour, (F) active transport, (G) physical activity during leisure time, (H) physical activity at school and (I) having a job.
At the beginning of each section, written information was provided on the items of the following section, including examples of soft drinks and fruit juices, for example. This information was also provided orally during the completion of the questionnaire, according to a standardised protocol.
In the current study the test–retest reliability and construct validity of all sections related to diet and physical activity (B to I) were assessed. In total, ninety-five multiple-choice question items were assessed.
Items from the questionnaire used in the first randomised controlled trial evaluating the effectiveness of the DOiT intervention( Reference Singh, Chinapaw and Brug 6 ) (based on other questionnaires( Reference Booth, Okely and Chey 14 – Reference van Assema, Brug and Ronda 17 )), in combination with items derived from questionnaires of the ENERGY (EuropeaN Energy balance Research to prevent excessive weight Gain among Youth) project( Reference Brug, te Velde and Chinapaw 18 – Reference van Stralen, te Velde and Singh 20 ) and items advised by the Dutch National Institute for Public Health and the Environment( 21 ), were adapted to the target behaviours and population and resulted in the current version of the DOiT questionnaire.
The DOiT questionnaire was pre-tested for comprehensibility and duration of completion among six adolescents from one school, not participating in the present study. After completion of the questionnaire by the adolescents, a structured focus group interview was conducted. Based on the results of this pre-test, no adjustments were required to the questionnaire. The instruction protocol of the questionnaire was adapted in order to clarify the additional information given by the research assistant during completion of the questionnaire.
Study population
For the current study we recruited adolescents aged 12–14 years old, i.e. the target population of the DOiT intervention( Reference Singh, Chinapaw and Kremers 5 ). Pre-vocational secondary schools located in different areas of the Netherlands were recruited to form a representative sample with regard to degree of urbanisation and socio-economic status. Recruitment and data collection took place from February to May 2010. Schools were recruited by email, telephone and a teacher forum on a website. When a school was interested in participation, extra information was provided and dates for measurement were planned. Each participating school selected one class of twenty to twenty-five adolescents for participation. Only adolescents capable of completing a questionnaire in the Dutch language were included in the study. Data collected for the test–retest reliability and construct validity studies were from different adolescents (attending the same school classes). An information letter was sent to the parents of the adolescents. This letter contained a passive informed consent form, meaning that the parents were offered the opportunity to decline participation without signing and returning a consent form. Adolescents and/or their parents who did not want (their child) to participate were excluded from the study. The Medical Ethics Committee of the VU University Medical Center approved the study protocol.
Study design and data collection
Test–retest reliability
Reliability concerns consistency and reproducibility of measurements. This means that a questionnaire is reliable when measurements done under equal circumstances repeatedly give the same results( Reference Steiner and Norman 8 ). The reliability of the DOiT questionnaire was examined using a test–retest design. Adolescents were asked to complete the paper-and-pencil self-administered questionnaire in the classroom under the supervision of a trained research assistant. The research assistant guided the class through each section of the questionnaire using a structured protocol. All concepts were explained in writing in the questionnaire and verbally by the research assistant. In addition, the research assistant brought products to the classroom to serve as examples. The baseline measurements were carried out on all school days except for Mondays, due to the fact that some questions refer to ‘yesterday’ as a school day. Exactly one week later, on the same weekday and under comparable circumstances, the adolescents were asked to fill in the questionnaire for a second time. The research assistant provided information at the beginning of each section, guiding the adolescents through the questionnaire. Completion of the questionnaire took 45 min on average.
Construct validity
Construct validity is the extent to which a test measures constructs that it intends to measure and is assessed by comparing the scores of the questionnaire with scores of established measures( Reference Kirshner and Guyatt 22 ). Because of the absence of a ‘gold standard’, the construct validity of the DOiT questionnaire was determined by the agreement between the answers of the self-administered questionnaire (first measurement) and a questionnaire completed by a research assistant based on information collected in a personal cognitive interview. This construct validation method has been used previously to validate questionnaires on children's EBRB( Reference Singh, Vik and Chinapaw 19 , Reference McMinn, van Sluijs and Harvey 23 ) and we gain information on whether the adolescents interpreted the questions as intended. For reasons of feasibility, the sample size of the construct validity study was determined to be at least twenty adolescents. The research team asked the teacher of each class participating in the test–retest study to select three to four adolescents, representative for the class, for participation in the construct validity study. Data from these adolescents were excluded from the test–retest reliability study. The participating adolescents were asked to volunteer for a cognitive interview about the same topics as the questionnaire after filling in the first questionnaire. These adolescents filled in the questionnaire together with the other adolescents in the classroom (first measurement) and were subsequently interviewed by a research assistant. The interviews were performed using a standard question route (interview guide), considering the course of an adolescent's day. The interviews took 25 min on average and were audio-recorded and transcribed. Based on the transcribed interview, a second research assistant filled in a second identical questionnaire (second measurement). Both the interviewer and the second research assistant were blinded to the answers of the first questionnaire.
Data management
All data were entered in the statistical software package SPSS version 18·0 according to a standardised protocol. For both the test–retest reliability and construct validity studies, a randomly selected 5 % of the questionnaires were re-entered in SPSS to check for typing errors and misinterpretation. In cases where there was a difference of more than 3 % between the entries, the questionnaires had to be re-entered in the original data set and the procedure was repeated. The rate of disagreement in both studies ranged from 0·0 to 2·0 %. Subsequently, data were cleaned by checking the original data for duplicate records, system-missing values, out-of-range values and logical inconsistencies.
Statistical analysis
Descriptive statistics
Means, standard deviations and percentages were calculated for the participants’ characteristics. We calculated medians, 25th and 75th percentile values for the EBRB of the first measurement.
Test–retest reliability and construct validity
To determine test–retest reliability and construct validity, the agreement between the two measurements was assessed at the individual item level. For all continuous items (n 57) the two-way, random-effects, single-measure intraclass correlation coefficient (ICC) was calculated. ICC were classified as ‘excellent’ (≥0·81), ‘good’ (0·61–0·80), ‘moderate’ (0·41–0·60) or ‘poor’ (≤0·40)( Reference Landis and Koch 24 – Reference Nunnally JC & Bernstein 26 ). For categorical items (n 5) or items with a dichotomous scale (n 33), Cohen's kappa coefficient (κ) was calculated. The classification of κ values was the same as for the classification of ICC. Because the calculation of ICC and κ values depends on the variability in answering categories, we also calculated the percentage agreement classified as ‘excellent’ (90–100 %), ‘good’ (75–89 %), ‘moderate’ (60–74 %) or ‘poor’ (<60 %). When an ICC or κ value was lower than or equal to 0·40/0·60/0·80, but the percentage agreement was equal to or higher than 60 %/75 %/90 % respectively, we determined the classification according to the percentage agreement( Reference Saelens, Frank and Auffrey 27 ). All statistical analyses were performed using SPSS version 18·0.
Results
Participants’ characteristics
A total of 111 adolescents from six schools participated in the test–retest reliability study and twenty adolescents from the same six schools participated in the construct validity study. The characteristics of both study populations are presented in Table 1. For none of the characteristics was a significant difference between the study populations found. The dropout rate was 10 % (13/124) in the test–retest reliability study and 0 % in the construct validity study.
*This classification is based on that of CBS Statistics Netherlands. Someone with a Western background is someone originating from a country in Europe (excl. Turkey), North America, Oceania, Indonesia or Japan. Someone with a non-Western background is someone originating from a country in Africa, South America, Asia (excl. Indonesia and Japan) or Turkey.
Energy balance-related behaviours
Table 2 shows descriptive statistics of the EBRB of the test–retest reliability study and the construct validity study, based on calculations from the data of the first measurement.
*Including candies, cookies, chocolates and ice cream.
†Including organised (in a club or at school) and unorganised sport activities.
General findings
Results of the test–retest reliability and construct validity study for all questionnaire items of the DOiT questionnaire are presented in Table 3. For each item, the ICC or κ value and the percentage agreement are presented for both the test–retest reliability and construct validity. Table 4 shows a summary of the results per section of the questionnaire.
*Zero variance.
ICC, intraclass correlation coefficient; κ, Cohen's kappa coefficient; % Agree, percentage agreement.
Test–retest reliability
For the total population, thirty-two (34 %) items had excellent test–retest reliability, forty-two (44 %) items had good and twenty-one (22 %) items had moderate test–retest reliability. No item had poor test–retest reliability. Most items with moderate scores were on consumption of sugar-containing beverages (five items on soft drinks and six items on fruit juices) and high-energy snacks and sweets (five items on snacks and two items on sweets). Concerning sugar-containing beverages, the ICC/κ values of ten out of twelve items on cartons/small bottles and glasses indicated moderate test–retest reliability. One item on physical activity (‘hours of after-school-time physical activity at school’) showed moderate test–retest reliability. Fourteen items of the questionnaire showed low variability, resulting in ICC ≤ 0·60, but a high percentage agreement (≥90 %).
To gain insight into possible differences between boys and girls, we also performed gender-specific analyses for the test–retest reliability. The analyses revealed some differences between boys and girls (presented in Supplementary Materials, Tables 1 and 2). In general, items showed values indicating lower test–retest reliability in girls. There were no items with poor test–retest reliability in the total population or boys only. In girls, 7 % (n 7) of the items scored low on test–retest reliability. Four out of these seven items concerned yesterday's behaviour.
Construct validity
For thirty-three (35 %) items construct validity was excellent, eighteen (19 %) items had good and twenty-three (24 %) items moderate validity. For twenty-one (22 %) items we found values indicating poor construct validity. Eight items on high-energy snacks/sweets and nine items on sugar-containing beverages showed poor validity values. Most of these items concerned the amount of drinks or portions of food the adolescent had on a weekday or weekend day. Sixteen items showed low variability, resulting in ICC ≤ 0·40, but a high percentage agreement (≥90 %).
The study sample of the construct validity study was too small for gender-specific analyses.
Discussion
The current study examined the test–retest reliability and construct validity of the DOiT questionnaire among 12- to 14-year-old Dutch adolescents attending pre-vocational education. The DOiT questionnaire, measuring EBRB, showed good test–retest reliability and moderate to good construct validity.
More than three-quarters of all items (78 %) showed good to excellent test–retest reliability. The test–retest reliability appeared to be moderate for all other items. Most items with moderate scores were on consumption of soft drinks, fruit juices, sweets and snacks. Notable was the moderate test–retest reliability of the items concerning the consumption of sugar-containing beverages, especially the question on the amount of cartons/small bottles and glasses.
Fifty-four per cent of the items showed good to excellent construct validity, 24 % of the items showed moderate construct validity and 22 % of the items poor construct validity. Most items with poor validity concerned consumption of sugar-containing beverages (mainly the amount, i.e. cartons/small bottles and glasses) and high-energy snacks/sweets, which are the same questionnaire sections in which items often scored moderate on test–retest reliability.
There was a certain overlap in items scoring poor on construct validity and items scoring moderate on test–retest reliability. Especially recall of drinks and snacks, which are consumed throughout the day, seems to be difficult for adolescents. Behaviours such as screen behaviour in leisure time or consumption of breakfast may be more structured or stable or attached to specific parts of the day, and therefore easier to recall. Adolescents might also have had difficulties differentiating between the different packing sizes of drinks or between sweets and snacks, despite of examples given in the questionnaire and by the research assistant. Another explanation is that these concepts might be not defined explicitly enough in the questionnaire.
Therefore, we suggest that items regarding the consumption of sugar-containing beverages and high-energy snacks/sweets should be simplified and if used in future studies in the current form, the poor to moderate construct validity should be noted in interpreting research results based on these items. It should be considered that earlier research showed the effectiveness of the DOiT intervention in reducing the consumption of sugar-containing beverages( Reference Singh, Chinapaw and Brug 6 ).
Comparison with other studies
No comparable questionnaires and corresponding test–retest reliability and construct validity studies among adolescents aged 12–14 years were found for the broad range of EBRB assessed in the DOiT questionnaire. However, there are reliability and validity studies focusing on specific EBRB. It should be considered that validity is often assessed by different procedures than the method used in the current study. Because of the differences in methods and questionnaires, the results are difficult to compare.
Neuhouser et al.( Reference Neuhouser, Lilley and Lund 28 ) examined the test–retest reliability and validity of the Beverage and Snack Questionnaire (BSQ) among young American adolescents (mean age 12·7 years). The test–retest reliability ranged from r = 0·62 to r = 0·89. Validity coefficients comparing the BSQ with a 4 d food record ranged from r = 0·48 to r = 0·87. In the DOiT questionnaire the items on sugar-containing beverages and high-energy snacks/sweets had slightly lower reliability (ICC/κ = 0·28–0·75 and agreement =31–94 %) and validity (ICC/κ = 0·00–0·60 and agreement = 10–100 %). Unlike the DOiT questionnaire, the BSQ distinguishes between beverages and snacks/sweets consumed at school and not at school.
In a study by Hardy et al.( Reference Hardy, Booth and Okely 29 ) the reliability of the Adolescent Sedentary Activity Questionnaire (ASAQ) was examined in 11- to 15-year-old Australian adolescents. ICC values indicated good to excellent reliability among all students of grade 8 (mean age 13·3 years) for screen behaviour (ICC = 0·78 for girls and ICC = 0·90 for boys), while our study showed moderate to good reliability values for screen behaviour (ICC = 0·43–0·66). A study by Chinapaw et al.( Reference Chinapaw, Slootmaker and Schuit 30 ) showed moderate reliability values on sedentary behaviour (ICC = 0·57) as well as on moderate- and vigorous-intensity physical activity (ICC = 0·50–0·59) assessed by the Activity Questionnaire for Adults and Adolescents (AQuAA) in Dutch adolescents aged 12–16 years. The construct validity of the AQuAA compared with assessment by accelerometry was poor (Spearman correlation coefficients = −0·21 to 0·23). Also the WHO Health Behaviour in School-aged Children (HBSC) questionnaire and the International Physical Activity Questionnaire (IPAQ, short version) did not seem to be a valid instrument compared with a 7 d activity monitoring instrument for measuring physical activity in Norwegian adolescents aged 13–18 years (Spearman correlation coefficients = 0·01–0·29). The reliability was better in the WHO HBSC questionnaire (ICC = 0·71–0·73) compared with the IPAQ (ICC = 0·10–0·62)( Reference Rangul, Holmen and Kurtze 31 ). Our study showed good to excellent test–retest reliability (ICC/κ = 0·00–0·98 and agreement = 37–99 %) and mixed values for the construct validity study on physical activity items (ICC/κ =0·07–0·96 and agreement = 28–100 %).
Strengths and limitations
Assessing a broad variety of EBRB is a major strength of the DOiT questionnaire. Besides, data were collected and managed by standardised protocols. Nevertheless, there are also several limitations in our study. The sample size in the construct validity study was relatively small (n 20). Another limitation was the lack of a ‘gold standard’ in the construct validity study. Although comparison of the questionnaire with information from a cognitive interview was the most feasible and informative option in the present study, both measurement tools are self-reports and cognitive interviewing might lead to bias. We tried to minimise social desirability bias by accentuating the importance of accurate and honest answers. We restricted possible bias due to interpretation of the interview responses by processing of the interview data by a person other than the interviewer and according to a strict data entry protocol. An advantage of using interviews was to obtain additional information about the interpretation of the questions by the adolescents, based on which the DOiT questionnaire can be adapted for future use.
Conclusions
The results of our study demonstrated good test–retest reliability and moderate to good construct validity of a majority of items from the DOiT questionnaire assessing EBRB in Dutch adolescents. All items with poor construct validity should be revised and tested again to improve the DOiT questionnaire for future use. Differentiating between consumption of sugar-containing beverages and high-energy snacks and sweets at school and out of school is another point to consider in a later version of the DOiT questionnaire.
Acknowledgements
Sources of funding: This research received no specific grant from any funding agency in the public, commercial or non-for-profit sectors. Conflicts of interest: All authors declare to have no conflict of interests. Authors’ contributions: A.S.S., M.J.M.C., F.v.N., J.B. and W.v.M. developed the measurement instrument. A.S.S. developed the study protocol. F.v.N. coordinated and supervised the data collection. F.v.N. and A.S.S. contributed to or supervised the data collection. E.H.C.J. conducted the data analyses under supervision of A.S.S. and M.J.M.C. E.H.C.J. and A.S.S. drafted the manuscript. All authors read and approved the final manuscript.
Supplementary Materials
For Supplementary Materials for this article, please visit http://dx.doi.org/10.1017/S1368980012005253