Critical nutrition literacy
Citizens encounter nutrition issues in their daily lives. Most such encounters involve social media and require citizens to address nutrition-related issues on personal, national or even global levels. The outcome of these encounters probably depends on citizens’ ‘nutrition literacy’. Nutrition literacy can be defined as ‘the capacity to obtain, process and understand nutrition information and the materials needed to make appropriate decisions regarding one's health’( Reference Silk, Sherry and Winn 1 ). This definition has a clear link to the definition of health literacy made by Nutbeam( Reference Nutbeam 2 ). Furthermore, Pettersen( Reference Pettersen 3 ) has added a ‘critical dimension’ to the definition of nutrition literacy: ‘the ability to critically assess nutritional information and dietary advice’. Pettersen( Reference Pettersen 3 ) and Silk et al. ( Reference Silk, Sherry and Winn 1 ) have described three cumulative levels of nutrition literacy referred to as ‘functional’, ‘interactive’ and ‘critical’ nutrition literacy.
Functional nutrition literacy (FNL) refers to proficiency in applying basic literacy skills, such as reading and understanding food labelling and grasping the essence of nutrition information guidelines. Interactive nutrition literacy (INL) comprises more advanced literacy skills, such as the cognitive and interpersonal communication skills needed to interact appropriately with nutrition counsellors, as well as interest in seeking and applying adequate nutrition information for the purpose of improving one's nutritional status and behaviour. Critical nutrition literacy (CNL) refers to being proficient in critically analysing nutrition information and advice, as well as having the will to participate in actions to address nutritional barriers in personal, social and global perspectives.
CNL is part of scientific literacy( Reference Pettersen 4 ) – ‘the capacity to use scientific knowledge, to identify questions and to draw evidence-based conclusions’( 5 ), i.e. proficiency in describing, explaining and predicting scientific phenomena, and understanding the processes of scientific inquiries as well as the premises of scientific evidence and conclusions( 6 ).
The aim of the present study was to use Rasch modelling to examine the construct validity of a new instrument developed for measuring nursing students’ CNL. The emphasis is on interpreting statistical misfit in terms of substantive inconsistency with a view to improving the CNL instrument. To date, the CNL instrument has only been assessed using classical test theory( Reference Dalane 7 – Reference Kjøllesdal 9 ).
The unidimensional simple logistic Rasch model
The unidimensional simple logistic Rasch model (SLM), expressed as
models the probability that a respondent will affirm a dichotomous item( Reference Andrich 10 ). The probability (P) is modelled as a function of the distance between the two independent parameters ‘person location’ (βn ) and ‘item location’ (δi )( Reference Rasch 11 ). The graphical representation of the SLM is referred to as the item characteristic curve (ICC).
The person parameter and the item parameter represent certain locations on the underlying construct, i.e. the latent variable that the instrument is intended to measure. Person location typically refers to proficiency – the ability a person possesses – and item location to difficulty – the amount of ability associated with endorsing a certain item. Items located at zero, i.e. δi = 0, measure moderate level of the latent variable.
The assumptions of unidimensional Rasch models are that: (i) the response probability depends on a dominant dimension (unidimensionality) – not only one factor – with the possible presence of minor dimensions( Reference Smith 12 , Reference Linacre 13 ); (ii) the responses to items are independent (local independence); (iii) the raw scores contain all of the information on person location regardless of which items have been endorsed (sufficiency); and (iv) the response probability increases with higher values of person location (monotonicity). In Rasch analyses, raw scores are converted to a logit scale in the estimation process.
The unidimensional polytomous Rasch model
The unidimensional polytomous Rasch model (PRM), expressed as
where $$--><$> \gamma \: = \:\mathop{\sum}\nolimits_{k\: = \:0}^m {{{{\rm{e}}}^{[({{{\tf="MPi-OneI"\char107}}_k}\: + \:k({{\beta }_n}\:{\rm{ - }}\:{{\delta }_i})]}} } $$$ is a normalization factor ensuring $$--><$> {\int}_ {-\infty }^{\infty } {P \cdot {\rm d}\beta = 1} $$$ , models the probability for person n with location βn scoring x points or ticking off response category x on a polytomous item i with location δi ( Reference Andrich, de Jong and Sheridan 14 ). κ refers to category coefficients.
Parameterizations of the polytomous Rasch model
If the observed distance between the response categories is the same across all items, e.g. the distance between ‘agree strongly’ and ‘agree partly’ for a Likert-scale item is equal to the distance between ‘agree strongly’ and ‘agree partly’ for another item, the data fit the rating scale parameterization( Reference Andrich 15 ) of the PRM best. If the distance is not the same across the items, the partial credit parameterization( Reference Wright and Masters 16 ) is indicated.
Response categories and ordered thresholds of the polytomous Rasch model
A threshold is defined as the person location at which the probability of responding in one of two adjacent response categories reaches 0·50. A polytomous item with an m + 1 number of response categories has m ordered thresholds (τk ) where k∈{1,2,…,m} and x∈{0,1,…,m + 1}. The score x indicates the number of m ordered thresholds a respondent has passed( Reference Andrich, de Jong and Sheridan 14 ).
The succeeding ordered thresholds reflect successively more of the latent ability or attitude. The ordering of thresholds is a property of the data and not the Rasch model. Disordered thresholds are clear evidence of problems in the data( Reference Fisher 17 ), but statistical analysis cannot determine the cause of the disordering( Reference Andrich, de Jong and Sheridan 14 ).
Constructing invariant measures – over- and under-discriminating items
When an item provides data which sufficiently fit a unidimensional Rasch model, the item provides an indication of relative ability or attitude along the latent variable. In Rasch analysis, this information is used to construct measures. If the data approach a step function, the item is said to over-discriminate. If the data approach a constant function, the item is said to under-discriminate.
Strongly over-discriminating items tend to act like ‘switches’ which stratify the persons below and above certain ability estimates, but they are not measuring devices. Under-discriminating items tend to neither stratify nor measure.
Model fit
Fit residuals and item χ 2 values are used to test how well the data fit the model( Reference Smith and Plackner 18 ). Negative and positive item fit residuals indicate whether items over- or under-discriminate. A person fit residual indicates how well a person's response pattern fits the ‘Guttman structure’( Reference Andrich 19 , Reference Andrich 20 ).
Large χ 2 indicates that persons with different locations do not ‘agree on’ item locations, thus compromising the required property of invariance. To adjust χ 2 probabilities for the number of significant tests performed, the probabilities are Bonferroni adjusted( Reference Bland and Altman 21 ) using the software package RUMM2030( 22 ).
Reliability
Cronbach's α is an index of internal consistence reliability( Reference Traub and Rowley 23 ). When the index is calculated using estimates from Rasch models, it is referred to as the person separation index (PSI).
Targeting
Comparing the mean location of persons with the mean location of items provides an indication of how well the items are targeted to the persons. When items are well targeted to the person locations, the measurement error is reduced.
Differential item functioning and invariance
Rasch models are the only item response theory (IRT) models that provide invariant measurements if the data fit the model. Criterion-related construct validity, sufficiency and reliability are also provided if the data fit a Rasch model. Invariant measurement is not guaranteed if the data fit a two-parameter IRT model because these models also model item discrimination in addition to person and item location.
Differential item functioning (DIF) between a person factor's categories, e.g. male and female for gender, is evident when, for a given estimate of the latent trait, the mean scores of the people in the gender categories are ‘significantly’ different from each other. This means that an item has different location estimates for males and females, i.e. the observed values for males and females are described by two different ICC.
If these ICC do not intersect, the item discriminates equally strongly across the continuum for both groups and the DIF is uniform( Reference Andrich and Hagquist 24 ). Non-uniform DIF is an important factor for non-invariant measures. Uniform DIF might be resolved( Reference Brodersen, Meads and Kreiner 25 , Reference Looveer and Mulligan 26 ) by using the ‘person factor split’ procedure in RUMM2030( 22 ) while items with non-uniform DIF must be discarded.
Local trait dependence
Trait dependence violates unidimensionality and causes ‘dimension violations’ of local independence( Reference Andrich and Kreiner 27 – Reference Marais and Andrich 29 ). Trait dependence appears when person factors other than ability or attitude influence response, e.g. ability to guess( Reference Ryan 30 ) or DIF related to gender and ethnicity.
The result is usually ‘less’ Guttmann structure in the response patterns and under-discriminating items showing DIF that will lower construct validity( Reference Brodersen, Meads and Kreiner 25 , Reference Looveer and Mulligan 26 ). Multidimensionality results in a decreased variance of person estimates and a decreased reliability coefficient( Reference Marais and Andrich 28 ).
Large variations in the percentage variance explained by each principal component (PC) is one way of generating a hypothesis about multidimensionality in the data( Reference Smith 12 , Reference Linacre 13 ). The assumption of unidimensionality might be tested using the t-test procedures in RUMM2030( 22 ) and by estimating the latent correlation between possible sub-dimensions( 31 ).
Local response dependence
Response dependence violates statistical independence and causes ‘response violations’ of local independence( Reference Andrich and Kreiner 27 – Reference Marais and Andrich 29 ), meaning that the entire correlation between the items is not captured by the latent trait. This might take place when a previous item gives hints or clues that affect responses to a subsequent (dependent) item, causing deviations of the thresholds of the dependent item( Reference Andrich, Humphry and Marais 32 ).
The result is ‘more’ Guttmann structure in the response patterns and consequently over-discriminating items, which result in an increased variance of person estimates and an increased reliability coefficient( Reference Marais and Andrich 29 , Reference Smith 33 ).
A high correlation between a pair of item residuals is one way of generating a hypothesis about whether two items show response dependence( Reference Smith 12 , Reference Marais and Andrich 29 ). The magnitude of the response dependence might be estimated using the ‘item dependence split’ procedure in RUMM2030( 22 ). The estimate helps test the hypothesis( Reference Andrich and Kreiner 27 , Reference Andrich, Humphry and Marais 32 ).
Method
Frame of reference
Using email advertisements, 473 people (response rate 52 %), of whom 8 % were males, were recruited from ten of the twenty-eight Norwegian university colleges offering nursing education, covering urban and rural areas. Almost all respondents (96 %) were third-year nursing students aged between 20 and 54 years with a mean age of 26·4 (sd 6·9) years. More than a quarter of those surveyed (28 %) lived with one or more children.
Data collection
The data collection took place during autumn and winter 2010 by means of a paper-and-pencil questionnaire handed out by school personnel. Participation was voluntary and the questionnaire was completed anonymously in the classrooms within 20 min.
The critical nutrition literacy instrument
The assessed CNL instrument consists of two scales measuring separate aspects of critical nutrition literacy (see Tables 1 and 2): (i) the ‘engagement in dietary habits’ scale (the ‘engagement’ scale) consisting of eight items and (ii) the ‘taking a critical stance towards nutrition claims and their sources’ scale consisting of eleven items (the ‘claims’ scale). A five-point Likert scale with all of the response categories anchored with a phrase was applied for all items: ‘disagree strongly’ (1), ‘disagree partly’ (2), ‘neither agree nor disagree’ (3), ‘agree partly’ (4) and ‘agree strongly’ (5).
Reverse-scored items (rev) are indicated by ‘x’.
Table 2 refers to scale (‘engagement in dietary habits’ (engagement) and ‘taking a critical stance towards nutrition claims and their sources’ (claims)), cluster (‘concern about dietary habits’ (concern), ‘willingness to engage in democratic processes to improve dietary habits’ (democracy), ‘justifying premises for and evaluating the sender of nutrition claims’ (evaluating) and ‘identifying scientific nutrition claims’ (identifying)), loc (item location), res (fit residual), df (degrees of freedom), chi-square (χ 2), chi-square probability (P (χ 2)) and ordering of thresholds.
*Item 25 under-discriminates.
†Item 26 overlaps with the identifying items.
Results
Comparing the parameterizations of the polytomous Rasch model using summary statistics
The first step in the analysis was to determine the appropriate model that fitted the data (see Table 3). The rating scale parameterization fitted the data from the engagement scale better when comparing overall χ 2 while the partial credit parameterization provided the best fit for the data from the claims scale.
Table 3 refers to total item chi-square (χ 2), degrees of freedom (df), chi-square probability (P (χ 2)), person separation index (PSI), mean fit residual (z) with its standard deviation, skewness (S), kurtosis (K) and mean person location (loc) with its standard deviation.
*Negative values.
†Analyses where item 25 was deleted.
‡Analyses where items were rescored.
The two scales’ item fit residual mean and standard deviation deviated from their expected values, i.e. 0 and 1, as their values were 0·17 (sd 1·92) and 0·61 (sd 0·89), respectively (see Table 3).
The functioning of response categories and ordering of thresholds
The pattern of responses for items 20, 21, 23 and 30 indicated that the response category ‘disagree partly’ (2) did not have the highest probability of being selected by any attitude level. The response patterns to items 24 and 26 indicated that the ‘neither agree nor disagree’ (3) response category was not the most likely response for any attitude level. These observations implied disordered thresholds in the data.
Item discrimination, item fit and person fit
According to the item fit residuals and χ 2 statistics, the under-discriminating item 25 did not fit the model. Individual person fit residuals showed that eight people had a z-fit residual outside the range z = ±2·5.
Reliability estimates
Cronbach's α was estimated using the SPSS statistical software package version 20. Cronbach's α for the engagement scale was 0·80. Cronbach's α for the claims scale increased from 0·69 to 0·70 when item 25 was deleted. The PSI were 0·77 for the engagement scale and 0·71 for the claims scale (see Table 3).
Targeting – mean person attitude
The average person location value was 0·90 for the engagement scale and 0·30 for the claims scale (see Table 3). Figure 1 shows the distribution of person locations and item threshold locations for the engagement scale. Item locations are reported in Table 2.
Resolving the item with uniform differential item functioning
The responses to item 16 were influenced by whether or not the person lives with children. Item 16 showed uniform DIF as it discriminated equally strongly for both categories. By splitting item 16, the two virtual items’ affective values differed by more than 0·6 logits between the person factor categories ‘live with children’ (location −0·24, Fig. 2 left curve) and ‘do not live with children’ (location 0·38, Fig. 2 right curve).
Possible dimension violations of local independence
The correlation coefficient between the residual of each item on the engagement scale and the first PC was positive for the ‘concern’ items and negative for the ‘democracy’ items. These sets of items are identical to the clusters initially formed based on a qualitative judgement of item content (see Table 2).
The correlation coefficient between the residual of each item on the claims scale and the first PC was positive for items 20, 21, 23 and 30 and negative for items 22 and 24–29. The PC summary in RUMM2030 indicated more variations in the amount of percentage variance explained by each component for the claims scale than for the engagement scale. These analyses indicated that items 20, 21, 23 and 30 might tap into a subscale of the claims scale while items 22 and 24–29 might form a second subscale of the claims scale.
A subtest analysis based on clusters of items with positive and negative correlation coefficients with the first PC was performed. The latent correlation between the two possible subscales of the engagement scale was r = 0·82 (the concern items and the democracy items), while the latent correlation between the two possible subscales of the claims scale was r = 0·31 (items 20, 21, 23, 30 and 22, 24–29).
Applying the equating tests procedure in RUMM2030, the percentage of persons with ‘significantly’ different scores to a 5 % level on the two possible subscales of the engagement scale and the two possible subscales of the claims scale was 5 % and 13 %, respectively. The t-test procedures in RUMM2030 indicated that the engagement scale had acceptable unidimensionality while the claims scale might have problematic dimensionality.
Possible response violations of local independence
The residual correlation between items 20 and 21 slightly exceeded 0·3. The magnitude of the response dependence was not estimated as the items had disordered thresholds indicating problematic data.
Discussion
Our qualitative categorization of the engagement scale items into sets finds support in the quantitative empirical data. What is really interesting in the data is that the location of the engagement scale items is clearly differentiated by stages, meaning that the ‘concern’ items require lower ‘engagement in dietary habits’ to overcome than the ‘democracy’ items.
Items measuring ‘global perspectives’ should be developed to incorporate the global aspect with the engagement scale, and phrases making clearer reference to personal or social perspective should be added to items 16 and 18 accordingly. Further, the references to children and adolescents in item 16 needs to be revised to avoid DIF for the person factor ‘parenthood’.
Except for item 26, the locations of the claims scale items are also differentiated by stages. The ‘evaluating’ items require lower levels of ‘taking a critical stance towards nutrition claims and their sources’ to overcome than the ‘identifying’ items. A rather low latent correlation between subsets of the claims scale items indicates possible multidimensionality in the data.
The collapsing of the adjacent response categories of the claims scale items was based on the disordering of the threshold estimates in the Wright map. The patterns of response for the items with disordered thresholds on the claims scale suggest that these items might function like four-point response formats.
Items 20 and 21 should be rephrased to avoid the items collecting redundant information. Items 22 and 27 should be rephrased to further distinguish them from the ‘evaluating’ items. Item 25 must be discarded as it is phrased like a true–false item and under-discriminates. Item 26 could be rephrased to alter its affective level as we seek to differentiate items by stages.
New and revised items should be field-trialled to ensure that the CNL instrument can measure students’ CNL invariantly across student years without disordering the stage-specific sets of items.
Cronbach's α and PSI are valid measures of reliability only when items are independent. Response violations (item 21) and dimension violations (items 16 and 25) of local independence cause non-invariant measures and affect the reported reliability estimates of the scales.
The engagement scale could have been better targeted with the mean item location at a higher affective level corresponding to persons’ mean attitude levels. For example, response category 1 ‘disagree strongly’ is out of range for most items.
Our finding that the rating scale parameterization fits the data of the engagement items better indicates that the distance between the response categories is the same across all items. This is not the case for the claims scale items.
Conclusions
Taken together, these results suggest that further psychometric analyses on similar and different samples should be carried out and complemented by qualitative focus group interviews in order to ensure that the sets of items make sense both conceptually and empirically against the Rasch model. Our method of examining construct validity using item response modelling has important implications for the future development and validation of quantitative public health research.
Acknowledgements
Sources of funding: This research received no specific grant, consulting honorarium, support for travel to meetings, fees for participation in review activities, payment for writing or reviewing the manuscript, or provision of writing assistance from any funding agency in the public, commercial or not-for-profit sectors. Ethical approval: Ethical approval was not required. Conflicts of interest: None declared. Authors’ contributions: J.Ø.D. and S.P. took part in the process of developing the questionnaire (together with Aarnes and Kjøllesdal – see references). J.Ø.D. did the most recent classical analysis of the instrument. Ø.G. did the Rasch analysis and wrote the main parts of the paper. S.P. co-wrote the abstract and the Critical nutrition literacy section of the paper. S.P. and Ø.G. discussed the paper, and S.P. and J.Ø.D. read through and commented on the paper. Each author has seen and approved the content of the submitted manuscript.