Clinical guidelines recommend physicians to base their obesity-related cardiovascular risk management on abdominal as well as general obesity(1, 2). Abdominal obesity (as measured by waist circumference) has been conveyed as a better independent predictor of obesity related-disorders than general obesity (as measured by BMI)(Reference Assmann, Cullen and Jossa3–Reference Yusuf, Hawken and Ounpuu5). The presence of abdominal obesity can thus indicate the need for interventions in subjects who would otherwise not be considered at risk based on general obesity alone(1). Measuring abdominal obesity using the waist circumference has been marked as feasible because it is easy to learn, takes no more time than measuring body height and weight, and requires minimal costs(Reference Wang6).
Waist circumference is now used more often to monitor changes as a result of interventions, not only by trained researchers but also by clinicians in primary care settings as well(Reference Smith and Haslam7–Reference Giorda, Guida and Avogaro9). However, physicians report they find it hard to measure(Reference Mason and Katzmarzyk10–Reference Verweij, Proper and Hulshof12). Moreover, studies showing good reliability of waist circumference measurements are mainly performed by health professionals trained in anthropometrics (1·3 to 6·5 cm)(Reference Moreno, Joyanes and Mesana13–Reference Ulijaszek and Kerr15) while studies in which measurements are conducted by physicians show larger variability (0·7 to 12 cm)(Reference Sebo, Beer-Borst and Haller16–Reference Sicotte, Ledoux and Zunzunegui18).
The consequences of variability for clinical practice are not yet clear. This depends on whether the variability is so large that clinically relevant changes within subjects or clinically relevant differences between subjects cannot be measured reliably. Three problems can be identified here. First, only a few reliability studies are available. Second, many studies report reliability (e.g. the intra-class correlation coefficient), but not an absolute measurement error (e.g. in centimetres). This information is needed to interpret change scores in individual subjects in clinical practice(Reference de Vet, Terwee and Knol19). Third, it is not clear what a clinically relevant change in waist circumference is, because there is insufficient evidence on the dose–response relationship between reductions in waist circumference and obesity-related morbidity and mortality(Reference Lee11, Reference Huxley, Barzi and Lee20, Reference Tuan, Adair and Stevens21). Thus, it is necessary to summarize what is known from state-of-the-art research and identify gaps in knowledge. The aims of the present study were therefore to: (i) explain the difference between reliability and measurement error and highlight why it is important to determine the measurement error of waist circumference; (ii) discuss what is known about the measurement error of waist circumference and which factors may cause this; (iii) discuss what is known about clinically relevant changes in waist circumference; (iv) discuss how knowledge about clinically relevant changes can help interpret the magnitude and importance of the measurement error of waist circumference; and finally (v) provide recommendations for future research and clinical practice on the measurement error of waist circumference.
The difference between reliability and measurement error
The terms ‘reliability’ and ‘measurement error’ are part of the concept term ‘reproducibility’, as both address whether measurement results are reproducible in test–retest situations(Reference de Vet, Terwee and Knol19). Reliability refers to how well subjects can be distinguished from each other in populations, despite the measurement error. This information is required for instruments that are used for discriminative purposes (e.g. to characterize individual differences between subjects in order to establish their clinical status and therapeutic needs, such as for discriminating between overweight and obese subjects). Measurement error assesses exactly how close values of repeated measurements within subjects are. This information is required for instruments that are used for evaluative purposes (e.g. to register change over time). The difference between reliability and measurement error is important for clinical practice because when studies present the reliability (e.g. intra-class correlation coefficient) of waist circumference measurements, a clinician is informed whether the instrument is able to discriminate between (e.g. overweight and obese) subjects in a sample. The clinician is not informed whether the instrument is suitable for monitoring waist circumference of individual subjects over time. In the latter case, the absolute measurement error around a single measurement of a single change score is important(Reference de Vet, Terwee and Knol19). This measurement error is expressed in for example the standard error of measurement or the limits of agreement(Reference de Vet, Terluin and Knol22). Moreover, measurement error provides an important advantage over reliability for clinical interpretation as it is expressed on the actual scale of measurement (e.g. centimetres), and not as a dimensionless value between 0 and 1. While information on both reliability and measurement error is necessary for clinical practice, reliability is generally high(Reference Ulijaszek and Kerr15) but the magnitude of measurement error is not clear. In the present paper we focus on this absolute measurement error, which influences measurements in individual persons(Reference de Vet, Terluin and Knol22).
Measurement error of waist circumference
To identify the magnitude of the measurement error of waist circumference measurements from the literature, a systematic search was conducted in PubMed from 1975 to February 2011. Search terms for measurement error were selected from a search filter that was developed for finding studies on measurement properties(Reference Terwee, Jansma and Riphagen23) and combined with the text word ‘waist circumference’. Studies using self-reported measurements and those among children or adolescents were excluded because these are associated with higher measurement error(Reference Ulijaszek and Kerr15). Data were extracted on the smallest detectable change (SDC) or smallest detectable difference (SDD), which reflect the smallest change or difference in waist circumference of an individual subject that can be detected beyond measurement error(Reference de Vet, Terwee and Knol19). The search resulted in 559 studies, of which nine reported on the intra- or inter-observer measurement error of waist circumference (e.g. repeated measurements on the same subjects by one observer or by different observers, respectively; Table 1). The methodological quality of studies was assessed by two authors (L.M.V. and C.B.T.) using the COSMIN checklist for grading studies on measurement properties (Box C)(Reference Terwee, Mokkink and Knol24). An overall methodological quality score was obtained by taking the lowest rating of the eleven items (‘worst score counts’) from the following ratings: excellent, good, fair or poor. For example, if for a study one item scored poor, the overall methodological quality of that study was rated as poor.
ref./refs = reference number(s) in the References list.
The selected studies included between seven and 9279 participants, consisting of healthy adults to employees or patients. The outcome assessors were physicians in three studies and other health professionals in six studies. The outcome assessors were trained in advance in seven studies or between repeated measurements in two studies. All followed a standard (although different) protocol. Participants were measured in standing position, except for one study that measured participants in supine position. These measurements were carried out midway between the lower rib and the iliac crest in five studies, at the narrowest point between the rib cage and the iliac crest in one study, and at the uppermost limit of the ileum in another study. One study examined the effect of measurement site (lower rib, iliac crest or midway) on the measurement error(Reference Bosy-Westphal, Booke and Blocker25). Finally, the overall methodological quality of the studies was fair or poor.
Overall, the intra-observer measurement error varied from 0·7 cm to 9·2 cm. The inter-observer measurement error varied from 1·4 cm to 15 cm (Table 1). In most studies that measured both, the intra-observer measurement error was smaller(Reference Ulijaszek and Kerr15). Moreover, smaller intra- and inter-observer measurement errors were found in larger studies. No notable differences in relation to the measurement error were observed according to participant characteristics, outcome assessor, measurement protocol, effects of training or methodological quality. However, greater measurement error was reported from measuring at the iliac crest or midway, compared with the lower rib, possibly because the latter is most easily located(Reference Bosy-Westphal, Booke and Blocker25).
Based on the small number of studies and the many differences between the studies, we conclude that it is difficult to draw conclusions on the magnitude of measurement error. Moreover, the variation in measurement error may be caused by a number of other factors not mentioned in Table 1 such as muscle mass, bone structure, lean tissues, looseness of abdominal muscles, posture, phase of respiration and time since the last meal(Reference Misra, Wasir and Vikram26, Reference Agarwal, Misra and Aggarwal27). Additionally, measurement error may be larger among overweight and obese subjects compared with normal-weight subjects due to difficulty in locating anatomical landmarks(Reference Nordhamn, Sodergren and Olsson14, Reference Ulijaszek and Kerr15, Reference Agarwal, Misra and Aggarwal27, Reference Wang, Liu and Chen28).
Clinically relevant change in waist circumference
Whether the measurement error is problematic in clinical practice can only be judged if there is a clear conception of the magnitude of change in, or the difference between, waist circumference that is considered important. In other words, we need to identify a minimal important change (MIC) within subjects or a minimal important difference (MID) between subjects in waist circumference(Reference Terwee, Roorda and Knol29). While several studies suggest that a reduction of waist circumference may be associated with benefits across a wide range of health outcomes, there is limited evidence for what constitutes a minimal important change or difference in waist circumference(Reference Cardona-Morrell, Rychetnik and Morrell30–Reference Stevens, Obarzanek and Cook33). The National Institutes of Health stated in 1998 that a sustained reduction of 4 cm may be clinically relevant(34). More recently it has been suggested that, similar to body weight and BMI, a reduction in waist circumference of >5 % may be considered a clinically relevant change for individual subjects in the short term and a maintained waist circumference of >3 % from initial waist circumference may be considered clinically relevant for individual subjects in the long term(Reference Stevens, Truesdale and McClain35) (I Lemieux and R Ross, personal communications). No clear definitions were provided on what short-term change and long-term maintenance are(Reference Stevens, Truesdale and McClain35). Following that recommendation, for an overweight woman with a waist circumference of 80 cm this corresponds to a waist reduction of at least 4·0 cm and a maintained reduction in waist circumference of at least 2·4 cm. For an obese woman with a waist circumference of 110 cm, this corresponds to a waist reduction of at least 5·5 cm and a maintained reduction in waist circumference of at least 3·0 cm. This shows that for subjects with a larger waist circumference, a larger reduction in waist circumference is necessary for change to be clinically relevant. Taking a realistic range of measurable waist circumference, for example 60–135 cm, this implies that a short-term change between 3·0 and 6·8 cm and a maintained change between 1·8 and 4·1 cm may be clinically relevant.
The relationship between measurement error and clinically relevant change
In order to distinguish clinically relevant change from measurement error, the measurement error (SDC) should be smaller than the clinically relevant change (MIC; see Fig. 1a). In this case, changes as large as the clinically relevant change will be statistically significant(Reference Terwee, Roorda and Knol29). Thus the smaller is the measurement error, the smaller the change that can be detected beyond the measurement error. But if the measurement error (SDC) is larger than the clinically relevant change (MIC), this change cannot be distinguished from measurement error (see Fig. 1b).
The range of measurement error presented in Table 1 (0·7–15 cm) indicates that we are probably able to detect a short-term clinically relevant change of 4·0 cm (5 % for a women of 80 cm) or 5·5 cm (5 % for a women of 110 cm), as the intra-observer measurement error is smaller than 4 cm in all but one study. However, the probability to detect a long-term clinically relevant change of 2·4 cm (3 % for a women of 80 cm) or 3·0 cm (3 % for a women of 110 cm) is much lower, as the intra-observer measurement error is larger than 3 cm in more than half of the studies. Across the realistic range of waist circumference measurements (60–135 cm), many relevant short-term changes (between at least 3·0 and 6·8 cm) and maintained changes (between at least 1·8 and 4·1 cm) probably cannot be distinguished from measurement error. Interestingly, the measurement error of waist circumference seems equally problematic for normal-weight, overweight or obese subjects. Although the measurement error is larger among overweight or obese subjects, a larger reduction in waist circumference is also necessary to obtain a clinically relevant change.
Recommendations for future research and clinical practice
To summarize, we have shown that there are two important gaps in knowledge. First, the assessment of measurement error identified a wide range (0·7–15 cm) of measurement errors, due to the small number of fair and poor quality studies and many differences between studies. Second, no clear definition of clinically relevant change could be extracted from the literature. Taking a realistic range of a measurable waist circumference (60–135 cm) into account, we argue that a proposed clinically relevant change in waist circumference of 5 % in the short term (approximately 3·0–6·8 cm) may be detectable, but a proposed maintenance of 3 % (approximately 1·8–4·1 cm) may not be detectable, because it cannot be distinguished from measurement error. Although the current paper does not provide practising clinicians with empirical insight into the application and interpretation of waist circumference measurements in the clinical setting, the results do highlight that more attention should be paid to reducing measurement error, in order for clinicians and researchers to accurately measure real change in waist circumference rather than measurement error.
Three ways to potentially reduce measurement error in clinical practice are: (i) adopting a standard protocol; (ii) training; and (iii) repeating measurements(Reference Ulijaszek and Kerr15). Two papers studied the influence of using different measurement protocols on waist circumference measurements. The first found that using different measurement protocols influenced the association between waist circumference, all-cause and CVD mortality, CVD and diabetes(Reference Ross, Berentzen and Bradshaw36). However these protocols were only compared on measurement site. The second study found that the type of protocol significantly influenced waist circumference measurements by comparing the measurement of waist circumference in eleven different ways (by anatomical site, posture, respiratory phase and time since the last meal)(Reference Agarwal, Misra and Aggarwal27). However, as we have shown, other factors may also influence measurement error and smaller measurement errors are required in order to detect (smaller) changes beyond measurement error. For clinicians, no standard protocol was advised as best. To overcome this gap in knowledge, we support the worldwide request for a uniform measurement protocol, decided upon by an expert team(Reference Mason and Katzmarzyk10, Reference Bosy-Westphal, Booke and Blocker25, Reference Park, Mitrou and Keogh37, Reference Dhaliwal and Welborn47).
A second way to reduce measurement error is by training. Measurement error is likely to be larger if measurements are carried out by poorly (often recently) trained individuals(Reference Ulijaszek and Kerr15). Training may thus reduce measurement error by quality control across time and by minimizing the number of observers(Reference Ulijaszek and Kerr15). Unfortunately, it is unclear how (much) training is needed to decrease measurement error, nor whether the effect of training is sustained over time(Reference Sebo, Beer-Borst and Haller16, Reference Panoulas, Ahmad and Fazal17).
A third way to reduce measurement error is to repeat waist circumference measurements. If the same measurement is repeated for example two or three times and the average value is taken, the measurement error of this average value is much smaller (by a factor $$-->$<>\sqrt k <$><!--$$, with k being the number of repeated measurements)(Reference Streiner and Norman48). For example, taking the realistic short-term (approximately 3·0–6·8 cm) and long-term (approximately 1·8–4·1 cm) clinically relevant change, two repeated measurements would result in an average measurement error of 2·1–4·8 cm for short-term clinically relevant change and 1·3–2·9 cm for long-term clinically relevant change. Three repeated measurements would result in an average measurement error of 1·7–3·9 cm for short-term clinically relevant change and 1·0–2·4 cm for long-term clinically relevant change. Thus, two measurements seem to be sufficient for detecting short-term changes, but three measurements seem to be necessary to distinguish long-term change from measurement error.
Conclusions
Four gaps in knowledge have been identified. First, the magnitude of measurement error in waist circumference is unclear. Second, the definition of clinically relevant change in waist circumference is unclear. We therefore caution clinicians and researchers when interpreting individual changes in waist circumference, as clinically relevant changes in waist circumference may not be distinguished from measurement error. Third, consensus is needed on adopting a uniform protocol for measuring waist circumference. Fourth, there is a lack of knowledge on the effects of training on measurement error in waist circumference. Considering these gaps in knowledge, it is clear that there is a need for more good quality research and for action. Until then, we recommend consistently using one standard protocol, quality control as part of training and minimizing the number of observers, outsourcing measurements to well-trained clinicians and repeating measurements at least two, but preferably three times. Ultimately, by reducing measurement error, smaller changes in waist circumference may be detected by clinicians beyond measurement error. This is necessary for accurately monitoring changes in waist circumference of individual subjects over time.
Acknowledgements
This study was funded by The Netherlands Organization for Health Research and Development (ZonMw, project 120510007). The authors declare that they have no conflict of interest. L.M.V. wrote the initial manuscript. C.B.T. provided intellectual input and wrote sections of the manuscript. K.I.P., C.T.J.H. and W.v.M. provided intellectual input and had a role in supervision. All authors have read and approved the final version of the manuscript.