Convergent validity of the International Physical Activity Questionnaire (IPAQ): meta-analysis

Youngdeok Kim; Ilhyeok Park; Minsoo Kang

doi:10.1017/S1368980012002996

Convergent validity of the International Physical Activity Questionnaire (IPAQ): meta-analysis

Published online by Cambridge University Press: 02 July 2012

Youngdeok Kim ,

Ilhyeok Park and

Minsoo Kang

Show author details

Youngdeok Kim*: Affiliation:
Department of Health and Human Performance, Middle Tennessee State University, 1500 Greenland Drive, PO Box 96, Murfreesboro, TN 37132, USA
Ilhyeok Park: Affiliation:
Department of Physical Education, Seoul National University, Seoul, South Korea
Minsoo Kang: Affiliation:
Department of Health and Human Performance, Middle Tennessee State University, 1500 Greenland Drive, PO Box 96, Murfreesboro, TN 37132, USA
*: *Corresponding author: Email [email protected]

Article contents

Abstract
Objective
Design
Setting
Subjects
Results
Conclusions
Methods
Results
Discussion
Conclusion
References

Rights & Permissions

Abstract

Objective

The purpose of the present study was to use a meta-analytic approach to examine the convergent validity of the International Physical Activity Questionnaire (IPAQ).

Design

Systematic review by meta-analysis.

Setting

The relevant studies were surveyed from five electronic databases. Primary outcomes of interest were the product-moment correlation coefficients between IPAQ and other instruments. Five separate meta-analyses were performed for each physical activity (PA) category of IPAQ: walking, moderate PA (MPA), total moderate PA (TMPA), vigorous PA (VPA) and total PA (TPA). The corrected mean effect size (ESρ) unaffected by statistical artefacts (i.e. sampling error and reliability) was calculated for each PA category. Selected moderator variables were length of IPAQ (i.e. short and long form), reference period (i.e. last 7 d and usual week), mode of administration (i.e. interviewer and self-reported), language (i.e. English and translated) and instruments (i.e. accelerometer, pedometer and subjective measure).

Subjects

A total of 152 ESρ across five PA categories were retrieved from twenty-one studies.

Results

The results showed small- to medium-sized ESρ (0·27–0·49). The highest value was observed in VPA while the lowest value was found in MPA. The ESρ were differentiated by some of the moderator variables across PA categories.

Conclusions

The study shows the overall convergent validity of IPAQ within each PA category. Some differences in degree of convergent validity across PA categories and moderator variables imply that different research conditions should be taken into account prior to deciding on use of the appropriate type of IPAQ.

Keywords

IPAQ Convergent validity Meta-analysis Physical activity

Type: Assessment and methodology
Information: Public Health Nutrition , Volume 16 , Issue 3 , March 2013 , pp. 440 - 452

DOI: https://doi.org/10.1017/S1368980012002996 [Opens in a new window]
Copyright: Copyright © The Authors 2012

Physical activity (PA) has been regarded as one of the most important habitual behaviours which leads to a healthy life by preventing diseases and increasing health benefits⁽¹^–Reference Blair, Cheng and Holder⁵⁾. As the importance of PA has been emphasized, attempts have been made to develop appropriate measurement tools, including objective and subjective measurement tools, to quantify the amount of PA in daily life. Of these, questionnaires remain the most widely used measurement tool in large-scale studies due to their efficiency of measuring PA levels in large populations⁽Reference Sallis and Saelens⁶⁾.

The International Physical Activity Questionnaire (IPAQ) is an instrument which was developed by the International Consensus Group in 1998–1999 to establish a standardized and culturally adaptable measurement tool across various populations in the world⁽Reference Craig, Marshall and Sjostrom⁷⁾. IPAQ is designed to assess the levels of habitual PA for individuals ranging from young to middle-aged adults (i.e. 15–69 years old). In addition, there are different forms of IPAQ depending on several variations which include length of questionnaire (i.e. short or long form), reference period (i.e. last 7 d or usual week) and mode of administration (i.e. self-report or interviewer-based).

Soon after IPAQ was developed it was translated into several different languages and numerous studies have been conducted to examine the reliability and validity of these versions across countries. In these studies one of the most commonly applied approaches to establish the validity evidence of IPAQ is the convergent validity, which indicates the extent to which different measurement tools measure the same construct. However, the extent to which the estimates from IPAQ linearly relate to other counterpart instruments has varied depending on the different characteristics of IPAQ examined (i.e. translation, length, reference period and mode of administration) and the instrument used for the comparison⁽Reference Hallal, Gomez and Parra⁸⁾, yet quantification of the exact extent of variations is still undefined.

To the best of our knowledge, no studies to date have examined the sources and magnitudes of factors that may explain such discrepancies in convergent validity of IPAQ across studies. With high prevalence of usage of IPAQ in measuring levels of PA at the population level and limited information for convergent validity of IPAQ in various formats, synthesizing all empirical evidence on convergent validity of IPAQ would provide more comprehensive information. The purpose of the present study was therefore to apply a meta-analytic method to quantifying the overall convergent validity of IPAQ across different studies and to investigate the sources and magnitudes of moderator factors that may affect the overall convergent validity of IPAQ.

Methods

Search strategy and selection criteria

The relevant studies for examining convergent validity of IPAQ were obtained from five electronic databases (i.e. SPORTDiscus, Medline, Google Scholar, PubMed and EBSCOhost). The main keywords used to identify the appropriate studies were ‘International Physical Activity Questionnaire’, ‘IPAQ’, ‘validity’, ‘convergent validity’, ‘comparison’ and ‘validation’. All of these keywords were entered with several combinations.

The primary outcome of interest was the correlation coefficient between IPAQ and another instrument. The following criteria were used to select potential studies for inclusion: (i) a study that used IPAQ as either a main instrument to be validated or an instrument to be compared with; (ii) a study in which the participants were not physically or emotionally challenged or disabled; (iii) a study in which the mean age of participants fell between 15 and 69 years old; (iv) in circumstances where IPAQ was translated into other languages, no changes in the structure occurred; (v) a study had a precise definition of PA intensity derived from the instrument; (vi) a study that reported statistical results in sufficient detail to estimate effect size (ESr); and (vii) a peer-reviewed article published in English. Using these criteria, potentially relevant studies were screened by two independent reviewers and full texts of all studies meeting the inclusion criteria were further assessed for methodological quality and for data extraction. Consensus was achieved through discussion when disagreements occurred between the two reviewers.

Methodological quality

Two reviewers independently assessed the methodological quality of studies using the modified version of the Downs and Black checklist⁽Reference Downs and Black⁹⁾, which was used in recent systematic reviews⁽Reference Prince, Adamo and Hamel¹⁰^, Reference Warburton, Charlesworth and Ivey¹¹⁾. The modified checklist consisted of fifteen items within three domains (i.e. reporting, external validity and internal validity), and possible scores ranged between 0 and 15 (e.g. higher scores indicated better methodological quality). Any study that scored relatively low on methodological quality (i.e. Z-score <−1·96) was not considered for inclusion in the meta-analyses.

Data extraction and coding

The long form of IPAQ examines the habitual PA in daily life using twenty-seven items across four PA domains (i.e. leisure time, domestic and gardening, occupational and transport-related activities), while the short form of IPAQ consists of seven summarized items that measure the comprehensive level of PA regardless of the domains to be measured. In both forms, the participants are asked to report the durations and frequencies of three specific PA categories, i.e. walking, moderate PA (MPA) and vigorous PA (VPA). Total amount of time spent engaging in or energy expenditure for each PA category can be estimated as main outcomes using metabolic equivalent of task (MET) values of 3·3, 4 and 8 for walking, MPA and VPA, respectively. Because the MET value of walking is within a range for moderate-intensity PA (i.e. 3–6 MET)⁽Reference Ainsworth, Haskell and Whitt¹²⁾, it has also been recommended to combine the estimates of walking and MPA to obtain the total MPA (denoted as TMPA)⁽¹³⁾. Total PA (TPA) can be simply estimated by summation of all estimates from each category (i.e. walking + MPA + VPA). Therefore, there are a total of five PA categories that can be derived from IPAQ (i.e. walking, MPA, TMPA, VPA and TPA).

Throughout the systematic review of selected studies, ESr values were extracted separately for each of the five PA categories to avoid dependency issues in the meta-analysis. In addition, each ESr was extracted only if the compared PA categories from both IPAQ and the other instrument were consistent or reasonably consistent (see outcome domains in Table 1). For example, estimates in walking, MPA and TMPA from IPAQ should be compared with estimates for MPA obtained from the other instrument. Likewise, ESr values were extracted for VPA and TPA only if they were compared with the same PA categories from the other instrument. However, because a pedometer does not provide the information of step counts within specific PA categories, ESr that were estimated between total step counts of pedometers and each PA category of IPAQ were also extracted. If a single study reported more than one ESr within the same PA category, but from different subpopulations, we assumed each ESr from different subpopulations to be independent from each other and included them in a single meta-analysis⁽Reference Lipsey and Wilson¹⁴⁾. The units or scales of estimated value within each study were not considered because the primary outcome of interest in the present study was the correlation coefficient, which is a scale invariant coefficient in itself⁽Reference Filliben¹⁵⁾.

Table 1 Features of the studies included in the meta-analyses of convergent validity of IPAQ and outcome domains examined

IPAQ, International Physical Activity Questionnaire; MET, metabolic equivalent of task; PA, physical activity.

†Regions where the participants were recruited (sample size); ‘-’ indicates no moderator variables were extracted.

‡Types of instrument and cut-off standards compared with IPAQ: GPAQ, Global Physical Activity Questionnaire; MLTPAQ, Minnesota Leisure Time Physical Activity Questionnaire; Baecke-Q, Baecke questionnaire; OIMQ, Office In Motion Questionnaire.

§Outcome domains for meta-analyses (PA categories).

Moderator variables which may affect overall convergent validity of IPAQ were obtained from different characteristics of IPAQ used in each study: (i) length of IPAQ (i.e. short and long forms); (ii) reference period (i.e. last 7 d and usual week); (iii) mode of administration (i.e. interviewer and self-reported); and (iv) language (i.e. English and translated). In addition, the instruments which were used for comparison with IPAQ within each study were also extracted as a moderator variable: (v) instruments (i.e. accelerometer, pedometer and subjective measure).

Study characteristics

A total of sixty-seven potentially relevant studies were considered for further review. By systematic review based on inclusion criteria, a total of twenty-eight studies were excluded due to their inability to meet criteria and duplication. Full texts of the remaining thirty-nine studies were reviewed for a detailed assessment. Of these, twenty-one studies met all inclusion criteria and secured relatively higher methodological quality (mean 13·2; sd 1·3). A total of 152 ESr values across five PA categories in IPAQ were retrieved (i.e. seventeen ESr from ten studies for walking⁽Reference De Cocker, Cardon and De Bourdeaudhuij¹⁶^–Reference van der Ploeg, Tudor-Locke and Marshall²⁵⁾, seventeen ESr from twelve studies for MPA⁽Reference De Cocker, Cardon and De Bourdeaudhuij¹⁶^–Reference Mader, Martin and Schutz²⁴^, Reference Boon, Hamlin and Steel²⁶^, Reference Roman-Vinas, Serra-Majem and Hagstromer²⁷⁾, twenty-three ESr from ten studies for TMPA⁽Reference Hagstromer, Ainsworth and Oja²¹^, Reference Mader, Martin and Schutz²⁴^, Reference van der Ploeg, Tudor-Locke and Marshall²⁵^, Reference Roman-Vinas, Serra-Majem and Hagstromer²⁷^–Reference Vandelanotte, De Bourdeaudhuij and Sallis³³⁾, thirty-five ESr from seventeen studies for VPA⁽Reference De Cocker, Cardon and De Bourdeaudhuij¹⁶^–Reference Mader, Martin and Schutz²⁴^, Reference Boon, Hamlin and Steel²⁶^–Reference Vandelanotte, De Bourdeaudhuij and Sallis³³⁾ and sixty ESr from sixteen studies for TPA⁽Reference Craig, Marshall and Sjostrom⁷^, Reference De Cocker, De Bourdeaudhuij and Cardon¹⁷^–Reference Hagstromer, Ainsworth and Oja²¹^, Reference Mader, Martin and Schutz²⁴^, Reference van der Ploeg, Tudor-Locke and Marshall²⁵^, Reference Roman-Vinas, Serra-Majem and Hagstromer²⁷^–Reference Lachat, Verstraeten and Khanh le³⁰^, Reference Timperio, Salmon and Rosenberg³²^–Reference Thuy, Blizzard and Schmidt³⁵⁾). See Table 2 for stem–leaf plots of ESr extracted across PA categories. Total sample sizes for each PA category ranged from a low of 4453 in TMPA to a high of 8867 in TPA.

Table 2 Stem-and-leaf plots of correlation coefficients (ESr) of IPAQ

IPAQ, International Physical Activity Questionnaire; n, number of ESr; MPA, moderate physical activity; TMPA, total moderate physical activity; VPA, vigorous physical activity; TPA, total physical activity.

Computation of effect sizes

The measure of ESr in the present study was the product-moment correlation coefficients (e.g. Pearson r and Spearman ρ), which represent the strength of associations between the estimates of IPAQ and other counterpart instruments as an indication of convergent validity of IPAQ. The psychometric meta-analytic method proposed by Hunter and Schmidt⁽Reference Hunter, Schmidt and Jackson³⁶^, Reference Hunter and Schmidt³⁷⁾ was conducted to obtain the population-level estimates unaffected by statistical artefacts, such as sampling error and measurement error. The ‘bare-bone’ mean ESr (i.e. $$\[-->$<>{\rm{ES}}\bar{r}<$> $<>{\rm{ES}}\bar{r}<$> <!--\]$$. In order to correct for the measurement errors of IPAQ in addition to sampling error, the reliability coefficients of IPAQ with respect to each PA category (e.g. intra-class correlation coefficients) were further extracted. There were eleven reliability coefficients available for walking (mean 0·74; sd 0·15), nine for MPA (mean 0·63; sd 0·22), eight for TMPA (mean 0·62; sd 0·21), twelve for VPA (mean 0·67; sd 0·23) and thirty-two for TPA (mean 0·77; sd 0·13). Because the reliability coefficients were not available for all of the included studies, the artefact distributions were calculated for each PA category to obtain the corrected mean ESr at the population level (i.e. ESρ) that was unaffected by sampling error and measurement error. 95 % confidence intervals (CI) were produced on the basis of the standard error of ESρ and 95 % credibility intervals (CV) were also yielded using the residual standard deviation of ESρ. According to Cohen's guidelines, ESρ was interpreted as small (<0·30), medium (0·31–0·49) and large (≥0·50)⁽Reference Cohen³⁸⁾.

Moderator analysis

For determining the presence of moderator effects on ESρ, three different criteria (i.e. the percentage of variance components attributed to statistical artefacts, the Q homogeneity statistic and 95 % CV) were simultaneously examined as recommended by Hunter and Schmidt⁽Reference Hunter and Schmidt³⁷⁾. To be specific, we concluded that moderators exist if: (i) the percentage of variance accounted for by statistical artefacts is less than 75 % of the observed variance in ESr; (ii) the Q homogeneity statistic is significant; and (iii) the 95 % CV is either relatively large or includes zero. However, due to the imprecise meaning of ‘large’ CV, we focused mainly on first two criteria to examine the moderator effects unless disagreement occurred.

Results

Overall effect sizes

The ESρ corrected for artefacts of sampling error and measurement error across each PA category is presented in Table 3. There were positive relationships between IPAQ and other instruments across all PA categories (ESρ range = 0·27–0·49) in which all 95 % CI did not include zero. According to Cohen's guideline, medium-sized ESρ were retrieved for walking (ESρ = 0·32), TMPA (ESρ = 0·45), VPA (ESρ = 0·49) and TPA (ESρ = 0·39), while MPA had a small-sized effect size with an ESρ of 0·27. The proportions of variance accounted by artefacts among the total variance of observed ESr for each PA category were all less than 75 % and statistical significances were found in Q homogeneity tests for all PA categories (all P < 0·05). Therefore, follow-up moderator analyses were conducted using predefined moderators as hypothesized in the present study.

Table 3 Results of meta-analyses for overall weighted mean correlation coefficients (ESr) across PA categories of IPAQ

PA, physical activity, IPAQ, International Physical Activity Questionnaire; K, number of studies; n, number of ESr; N, total sample size; CV, credibility interval; MPA, moderate physical activity; TMPA, total moderate physical activity; VPA, vigorous physical activity; TPA, total physical activity.

*P < 0·05.

†Averaged ESr corrected for sampling error only.

‡Averaged ESr corrected for sampling error and measurement errors of IPAQ.

§Percentage of variance accounted for by statistical artefacts including sampling error and measurement error of IPAQ.

Moderator analyses

Moderator analyses were conducted to examine the effects of language (i.e. English and translated), length of IPAQ (i.e. short and long form), reference period (i.e. last 7 d and usual week), mode of administration (i.e. interviewer and self-reported) and instruments (i.e. accelerometer, pedometer and subjective measure) on overall ESρ for each PA category (see Table 4). Collectively, substantial differences in ESρ were detected by different levels of included moderators across all PA categories.

Table 4 Results of moderator analyses across all PA categories of IPAQ

*P < 0·05.

†Averaged ESr corrected for sampling error only.

‡Averaged ESr corrected for sampling error and measurement errors of IPAQ.

§Percentage of variance accounted for by statistical artefacts including sampling error and measurement error of IPAQ.

In terms of language of IPAQ, there were consistent trends in the rank of ESρ for all PA categories in studies which used translated versions. These studies had significantly greater ESρ compared with those in which the English version was applied. Using the 75 % rule and Q homogeneity statistic, the observed ESr values obtained from the English-version IPAQ studies for walking, MPA, TMPA and VPA were shown to be homogeneous, while there was still a large amount of unexplained variance in ESρ, mostly for which the translated versions were used.

The variations in ESρ for all PA categories were also not significantly explained by different length of IPAQ with the exception of walking and MPA. The percentage of variance accounted for by artefacts increased dramatically for the studies where the long form was used (83·9 % and 56·6 % for walking and MPA, respectively). Non-significant Q statistics were detected for the long forms of walking and MPA (Q(df = 6) = 8·34; P > 0·05 and Q(df = 7) = 14·06; P > 0·05, respectively). Although the length of IPAQ accounted for a relatively small percentage of variance in ESρ for most PA categories, the ESρ values by different length of IPAQ differed significantly in walking and TMPA, where the 95 % CI for ESρ were not overlapped between the long and short form. Moreover, systematic trends for the rank of ESρ were detected in the studies which used short-form versions of IPAQ. These studies had greater ESρ for all PA categories.

Moderator analyses by reference period did not significantly increase the percentage of variance accounted for by artefacts or the non-significant Q homogeneity statistic for all PA categories. Moreover, there were no observable trends for rank of ESρ values across PA categories.

With respect to the mode of administration, the results showed that interviewer-administered studies had greater ESρ values for all PA categories with the exception of MPA, in which 80·9 % of the variation in ESρ for self-reported studies was attributed to artefacts with a non-significant Q statistic (Q(df = 10) = 13·60; P > 0·05). The ESρ values differed significantly by different mode of administration in TMPA, VPA and TPA. Interviewer-administered studies had a greater ESρ than those which utilized the self-reported measure of IPAQ.

The type of instrument moderately increased the percentage of variance accounted for by artefacts in walking, VPA and TPA, in which non-significant Q homogeneity statistics for respective types of instruments were also detected. The studies which utilized subjective measures had greater ESρ values than studies utilizing objective measures in all PA categories with the exception of walking, in which opposite results were yielded.

Discussion

To our knowledge, the present study is the first comprehensive attempt to synthesize the scientific evidence on convergent validity of IPAQ using meta-analysis. The first purpose of the study was to examine the overall convergent validity of IPAQ. The results showed that the overall ESρ for each PA category were all positive, which supports the convergent validity evidence of IPAQ, but they varied from small-to-medium effect size according to Cohen's definitions⁽Reference Cohen³⁸⁾. Walking, TMPA, VPA and TPA of IPAQ secured medium-sized ESρ, while MPA had a small-sized ESρ. Such variations in ESρ by different categories of IPAQ may be due to the inherent property of IPAQ as a subjective measure. Measuring PA in IPAQ relies on the recall of diverse activities for a 7 d period, which requires participants to utilize their cognitive ability for the recall process. The greatest ESρ observed in VPA can be explained by the evidence which shows that vigorous-intensity PA tends to be more structured, which may positively affect participant recall. On the other hand, walking and moderate-intensity activity are not typically structured but rather accumulated gradually during daily life⁽Reference Hagstromer, Oja and Sjostrom²⁹⁾. This may result in participants not recalling the exact amount of walking and activities involved in MPA⁽Reference Vandelanotte, De Bourdeaudhuij and Sallis³³^, Reference Montoye, Kemper and Saris³⁹^, Reference Washburn, Heath and Jackson⁴⁰⁾. Another possible explanation for varying results in ESρ across PA categories is that variations in individual perceptions with respect to the intensity of each PA category may occur due to insufficient information for each specific category⁽Reference Shephard⁴¹⁾. For example, IPAQ defines VPA as an activity causing harder than usual breathing and MPA as an activity causing somewhat harder breathing⁽Reference Bauman, Ainsworth and Bull⁴²⁾. In order to clarify this gap between MPA and VPA, IPAQ offers some examples of activity according to MET values for each type of intensity; however, different perceived exertions may exist with respect to the specific examples given by IPAQ considering that IPAQ covers a broad range of ages from 15 to 69 years. Hallal et al.⁽Reference Hallal, Gomez and Parra⁸⁾ noted that specific examples linked to physiological signs or culturally adapted examples should be provided to aid participants in distinguishing MPA from VPA; we suggest that stratifying age-relevant examples would be beneficial to obtain more valid measures for MPA and VPA.

In IPAQ, participants are instructed to report time spent in MPA that lasted for at least 10 min except while walking, which is asked in separate questions. Walking and MPA that are defined as MET values of 3·3 and 4 in IPAQ fall within the same boundary of moderate-intensity PA (i.e. 3–6 MET)⁽Reference Ainsworth, Haskell and Whitt¹²⁾. Our finding suggests that TMPA, which is the sum of walking and MPA, has a greater ESρ than walking and MPA, indicating that TMPA has secured more strong convergent validity than sole measures of walking and MPA. This may imply that IPAQ has secured its initial intention of discriminating walking from MPA, in that summation of the estimates from walking and MPA would yield more valid estimates for TMPA. Some researchers argue that separation of walking and MPA in the same questionnaire may confuse participants about time spent in walking under MPA⁽Reference Dinger, Behrens and Han¹⁹⁾; however, the results of the present study indicated, collectively, that participants may well conceive time spent in walking separate from MPA.

The second purpose of the present study was to investigate the effects of moderator variables on overall validity of IPAQ across all PA categories. IPAQ was developed with the aim of international monitoring and national comparison⁽Reference Craig, Marshall and Sjostrom⁷⁾; however, variation incurred by language translation still remained questionable due to the different cultural atmospheres⁽Reference Bauman, Ainsworth and Bull⁴²⁾. In our study, we attempted to synthesize a total of 152 ESr from different cultures. There were 120 ESr retrieved from translated versions of IPAQ, which yielded greater ESρ values compared with English versions of IPAQ across all PA categories. These findings supported that IPAQ secured comparable convergent validity across different cultures without any structural changes in IPAQ. Although we agree that some examples or words should be adapted in accordance with the cultural atmosphere where IPAQ would be used, following well-established translation protocols suggested by the IPAQ consensus group would be promising for positive convergent validity of IPAQ in different cultures.

IPAQ has two different versions (i.e. long or short form). The long form measures the habitual PA in three intensity-specific categories across four domains, while the short form examines only generic PA within three intensity-specific categories without any separation of specific domains. The short form has been recommended for population-based study due to its feasibility and preferences over the long form⁽Reference Craig, Marshall and Sjostrom⁷⁾; however, the estimates from the short form tend to overestimate actual PA due to the lack of sufficient information for specific domains⁽Reference Hallal, Victora and Wells⁴³⁾. Bauman et al.⁽Reference Bauman, Ainsworth and Bull⁴²⁾ noted that the large variances in PA measures estimated from the short form could be caused by using the short form as a means of estimating continuous levels of PA, while the primary purpose of the short form is categorical reporting. In the current meta-analyses, levels of PA with the forms of continuous measures obtained from the short form have ESρ comparable to or even larger than that of the long form. From this, we can conclude that using a short form to estimate the amount of PA as a form of continuous measures seems to be acceptable if the primary interest of the study is not domain-specific measures. However, 95 % CV for ESρ obtained from the studies where the short form was used were shown to be relatively large v. the estimates from the long form. One should bear in mind that PA estimates from the short form can be varied dramatically by unexplained moderators or factors, while the long form may provide more stable measures.

Measuring generic PA using questionnaires relies heavily on recall processes that may require the appropriate retrieval cues for stimulating the search of the participant's memory⁽Reference Sallis and Saelens⁶^, Reference Bauman, Ainsworth and Bull⁴²⁾. There are two cues with respect to reference period (i.e. last 7 d or usual week) that one can utilize to aid the participant's recall process. In the original development study of IPAQ⁽Reference Craig, Marshall and Sjostrom⁷⁾, the International Consensus Group found the comparability of both ‘last 7 d’ and ‘usual week’ reference periods in terms of reliability and validity and suggested to use the last 7 d reference period based on the preferences in participating countries of their study. In the current analyses, no particular patterns for the rank of ESρ by different reference periods were observed across all PA categories. It could be expected to have stronger convergent validity when using the last 7 d reference period, since most studies have implemented the IPAQ right after they finished collecting objective data for a 7 d period. The comparable results between the last 7 d and usual week may reflect the fact that people tend to conceive the reference period of usual week as the last 7 d and subsequently respond in a common way as they regarded.

It has been widely recognized that interviewer administration would minimize the possible errors in implementing subjective measurement tools that are due to participant's misinterpretation and/or misunderstanding of the questions being asked⁽Reference Heesch, van Uffelen and Hill⁴⁴^, Reference Vuillemin, Oppert and Guillemin⁴⁵⁾. The findings of the current meta-analyses were mostly in agreement with previous understandings that the greater ESρ values were found from the studies in which interviewer administration was applied across all PA categories with the exception of MPA. Interviewer administration may have several advantages in that it prevent respondents from skipping questions and also could provide more opportunities to obtain more detailed information on each question v. the self-administrated questionnaire⁽Reference Craig, Marshall and Sjostrom⁷⁾. Moreover, it allows the researchers to obtain more reliable estimates of PA levels among less educated populations who cannot fully understand the context being asked⁽Reference Hallal, Gomez and Parra⁸⁾. Despite the benefits of interviewer administration, the self-reported approach may be more preferred in a large epidemiological study due to time or budget limitations; however, there would be a strong possibility to obtain more accurate measures of PA when an interviewer administered the IPAQ.

Objective measurement tools to quantify levels of PA have been highly recognized for their capability to provide more precise and accurate estimates of PA levels over subjective measurement tools⁽Reference Bassett⁴⁶⁾. There has been an increase in using objective measurement tools as a means of criterion for validating PA questionnaires. In the current meta-analyses, three types of instrument (i.e. accelerometer, pedometer and subjective measure) have been used for comparison with IPAQ. The studies featuring subjective measurement tools used as a counterpart instrument to IPAQ resulted in the greatest ESρ values for most of the PA categories. These findings are broadly in agreement with the notion that subjective measurement tools tend to share similar psychometric properties based on common subjective recall processes⁽Reference Prince, Adamo and Hamel¹⁰⁾. In other words, similar systematic errors such as cognitive biases or social desirability might occur for subjective measurement tools, by which stronger linear relationships of the estimates from IPAQ with other subjective measurement tools could be estimated. While the systematic errors within the estimates from objective measurement tools are more likely to occur by different measurement conditions, such as seasons and months⁽Reference Kang, Bassett and Tudor-Locke⁴⁷⁾ or number of monitoring days⁽Reference Kang, Bassett and Tudor-Locke⁴⁸⁾, that may result in lower convergent validity of IPAQ when comparing with objective measurement tools. In addition, such inconsistency between the estimates from IPAQ and objective measurement tools may also be attributed to the fact that IPAQ is intended to measure activities longer than 10 min in duration, whereas the accelerometer and pedometer tend to measure every form of physical movement. The concept of 10 min in IPAQ may result in unreliably large variations within individual PA levels, which may worsen the linear relationship of estimates of IPAQ with other objective measures⁽Reference Hallal, Gomez and Parra⁸^, Reference Ekelund, Sepp and Brage³⁴⁾.

There were several limitations that should be considered when examining the results of the present study. First, variations by different cut-off standards set to determine PA categories of accelerometer data across studies were not considered, which may influence varying results in ESρ, especially in MPA and VPA that are based on those standards. However, considering that there is no single ‘gold standard’ measure as a criterion for PA comparison, we believe that the results from our study may be generalized as overall convergent validity of IPAQ. Another area of concern is that the measure of effect size aggregated for the current meta-analysis was the correlation coefficients, which are not capable of detecting the agreements on the estimates between IPAQ and other criterion measures. Correlation coefficients would provide sufficient information for convergent validity of IPAQ as a form of linear relationship; however, examining the agreements would give an insight into the extent to which the IPAQ over- or underestimates the actual level of PA. Thus, we suggest future studies to conduct the meta-analytic review on the agreements between IPAQ and other criterion instruments. In addition, 95 % CV around ESρ values in moderator analyses showed that there was still a large amount of unexplained variance after controlling for artefacts and predefined moderators. Hierarchical moderator analyses may be a more appropriate approach to resolve this problem⁽Reference Hunter and Schmidt³⁷⁾; however, more effect sizes would be needed for each level of moderators. Lastly, some of the moderator analyses were conducted based on the small number of ESr, which may affect the generalizability of the current findings. Small-sized meta-analysis (i.e. <200 ESr) may only be capable of summarizing the evidence or generating hypotheses for future research⁽Reference Flather, Farkouh and Pogue⁴⁹⁾. The process of confirming validity evidence for a certain measurement tool is regarded as a ‘never ending process’⁽Reference Shepard⁵⁰⁾; therefore, more evidence not only for convergent validity but also diverse aspects of validity of IPAQ should be continuously accumulated across different populations or measurement conditions.

Conclusion

The present study attempted to synthesize all scientific evidence to examine the overall convergent validity of IPAQ. The findings indicated that IPAQ is a reasonably valid measurement tool for measuring habitual PA. However, the variations in convergent validity across different PA categories and moderator variables imply that different research conditions should be taken into account prior to deciding on use of the appropriate type of IPAQ.

Acknowledgements

This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. There are no conflicts of interest. Study concept and design: Y.K., I.P. and M.K. Acquisition of data: Y.K. and M.K. Statistical analysis and interpretation of data: Y.K., I.P. and M.K. Drafting of manuscript: Y.K. Critical revision of manuscript: Y.K., I.P. and M.K. Study supervision: M.K.

References

1.World Health Organization (2010) Global Recommendations on Physical Activity for Health. Geneva: WHO; available at http://whqlibdoc.who.int/publications/2010/9789241599979_eng.pdf Google Scholar

2.National Institutes of Health Consensus Development Panel on Physical Activity and Cardiovascular Health (1996) Physical activity and cardiovascular health. JAMA 276, 241–246.CrossRef Google Scholar

3.Shiroma, EJ & Lee, I (2010) Physical activity and cardiovascular health: lessons learned from epidemiological studies across age, gender, and race/ethnicity. J Am Heart Assoc 122, 743–752.Google Scholar PubMed

4.Warburton, DER, Nicol, CW & Bredin, SSD (2006) Health benefits of physical activity: the evidence. CMAJ 174, 801–809.CrossRef Google Scholar PubMed

5.Blair, SN, Cheng, Y & Holder, S (2001) Is physical activity or physical fitness more important in defining health benefits? Med Sci Sports Exerc 33, 6 Suppl., S379–S399.CrossRef Google Scholar PubMed

6.Sallis, JF & Saelens, BE (2000) Assessment of physical activity by self-report: status, limitations, and future directions. Res Q Exerc Sport 71, 2 Suppl., S1–S14.CrossRef Google Scholar PubMed

7.Craig, CL, Marshall, AL, Sjostrom, Met al. (2003) International Physical Activity Questionnaire: 12-country reliability and validity. Med Sci Sports Exerc 35, 1381–1395.CrossRef Google Scholar PubMed

8.Hallal, PC, Gomez, LF, Parra, DCet al. (2010) Lessons learned after 10 years of IPAQ use in Brazil and Colombia. J Phys Act Health 7, Suppl. 2, S259–S264.CrossRef Google Scholar PubMed

9.Downs, SH & Black, N (1998) The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J Epidemiol Community Health 52, 377–384.CrossRef Google Scholar PubMed

10.Prince, SA, Adamo, KB, Hamel, MEet al. (2008) A comparison of direct versus self-report measures for assessing physical activity in adults: a systematic review. Int J Behav Nutr Phys Act 5, 56.CrossRef Google Scholar PubMed

11.Warburton, DR, Charlesworth, S, Ivey, Aet al. (2010) A systematic review of the evidence for Canada's Physical Activity Guidelines for Adults. Int J Behav Nutr Phys Act 7, 39.CrossRef Google Scholar PubMed

12.Ainsworth, BE, Haskell, WL, Whitt, MCet al. (2000) Compendium of physical activities: an update of activity codes and MET intensities. Med Sci Exerc Sports 32, Suppl. 9, S498–S516.CrossRef Google Scholar PubMed

13.International Physical Activity Questionnaire (2005) Guidelines for data processing and analysis. http://www.ipaq.ki.se/scoring.pdf (accessed May 2010).Google Scholar

14.Lipsey, MW & Wilson, DB (2001) Practical Meta-analysis. Newbury Park, CA: Sage.Google Scholar PubMed

15.Filliben, JJ (1975) The probability plot correlation coefficient test for normality. Technometrics 17, 111–117.CrossRef Google Scholar

16.De Cocker, KA, Cardon, G & De Bourdeaudhuij, IM (2007) Pedometer-determined physical activity and its comparison with the International Physical Activity Questionnaire in a sample of Belgian adults. Res Q Exerc Sport 78, 429–437.CrossRef Google Scholar

17.De Cocker, KA, De Bourdeaudhuij, IM & Cardon, GM (2009) What do pedometer counts represent? A comparison between pedometer data and data from four different questionnaires. Public Health Nutr 12, 74–81.CrossRef Google Scholar PubMed

18.Deng, HB, Macfarlane, DJ, Thomas, GNet al. (2008) Reliability and validity of the IPAQ-Chinese: the Guangzhou Biobank Cohort study. Med Sci Sports Exerc 40, 303–307.CrossRef Google Scholar PubMed

19.Dinger, MK, Behrens, TK & Han, JL (2006) Validity and reliability of the International Physical Activity Questionnaire in college students. Am J Health Promot 37, 337–343.Google Scholar

20.Gauthier, AP, Lariviere, M & Young, N (2009) Psychometric properties of the IPAQ: a validation study in a sample of northern Franco-Ontarians. J Phys Act Health 6, Suppl. 1, S54–S60.CrossRef Google Scholar

21.Hagstromer, M, Ainsworth, BE, Oja, Pet al. (2010) Comparison of a subjective and an objective measure of physical activity in a population sample. J Phys Act Health 7, 541–550.CrossRef Google Scholar

22.Kolbe-Alexander, TL, Lambert, EV, Harkins, JBet al. (2006) Comparison of two methods of measuring physical activity in South African older adults. J Aging Phys Act 14, 98–114.CrossRef Google Scholar PubMed

23.Kurtze, N, Rangul, V & Hustvedt, BE (2008) Reliability and validity of the international physical activity questionnaire in the Nord-Trøndelag health study (HUNT) population of men. BMC Med Res Methodol 8, 63.CrossRef Google Scholar PubMed

24.Mader, U, Martin, BW, Schutz, Yet al. (2006) Validity of four short physical activity questionnaires in middle-aged persons. Med Sci Sports Exerc 38, 1255–1266.CrossRef Google Scholar PubMed

25.van der Ploeg, HP, Tudor-Locke, C, Marshall, ALet al. (2010) Reliability and validity of the international physical activity questionnaire for assessing walking. Res Q Exerc Sport 81, 97–101.CrossRef Google Scholar PubMed

26.Boon, RM, Hamlin, MJ, Steel, GDet al. (2010) Validation of the New Zealand Physical Activity Questionnaire (NZPAQ-LF) and the International Physical Activity Questionnaire (IPAQ-LF) with accelerometry. Br J Sports Med 44, 741–746.CrossRef Google Scholar PubMed

27.Roman-Vinas, B, Serra-Majem, L, Hagstromer, Met al. (2010) International Physical Activity Questionnaire: reliability and validity in a Spanish population. Eur J Sport Sci 10, 297–304.CrossRef Google Scholar

28.Bull, FC, Maslin, T & Armstrong, T (2009) Global physical activity questionnaire (GPAQ): nine country reliability and validity. J Phys Act Health 6, 790–804.CrossRef Google Scholar PubMed

29.Hagstromer, M, Oja, P & Sjostrom, M (2006) The International Physical Activity Questionnaire (IPAQ): a study of concurrent and construct validity. Public Health Nutr 9, 755–762.CrossRef Google Scholar PubMed

30.Lachat, CK, Verstraeten, R, Khanh le, NBet al. (2008) Validity of two physical activity questionnaires (IPAQ and PAQA) for Vietnamese adolescents in rural and urban areas. Int J Behav Nutr Phys Act 5, 37.CrossRef Google Scholar PubMed

31.Macfarlane, DJ, Lee, CC, Ho, EYet al. (2006) Convergent validity of six methods to assess physical activity in daily life. J Appl Physiol 101, 1328–1334.CrossRef Google Scholar PubMed

32.Timperio, A, Salmon, J, Rosenberg, Met al. (2004) Do logbooks influence recall of physical activity in validation studies? Med Sci Sports Exerc 36, 1181–1186.CrossRef Google Scholar PubMed

33.Vandelanotte, C, De Bourdeaudhuij, I, Sallis, JFet al. (2005) Reliability and validity of a computerized International Physical Activity Questionnaire (IPAQ). J Phys Act Health 2, 63–75.CrossRef Google Scholar

34.Ekelund, U, Sepp, H, Brage, Set al. (2006) Criterion-related validity of the last 7-day, short form of the International Physical Activity Questionnaire in Swedish adults. Public Health Nutr 9, 258–265.CrossRef Google Scholar PubMed

35.Thuy, AB, Blizzard, L, Schmidt, Met al. (2010) Reliability and validity of the global physical activity questionnaire in Vietnam. J Phys Act Health 7, 410–418.CrossRef Google Scholar

36.Hunter, JE, Schmidt, FL & Jackson, GB (1982) Meta-analysis: Cumulating Research Findings Across Studies. Beverly Hills, CA: Sage.Google Scholar

37.Hunter, JE & Schmidt, FL (2004) Methods of Meta-analysis: Correcting Error and Bias in Research Findings, 2nd ed. Newbury Park, CA: Sage.CrossRef Google Scholar

38.Cohen, JA (1992) Power primer. Psychol Bull 112, 155–159.CrossRef Google Scholar PubMed

39.Montoye, HJ, Kemper, HCG, Saris, WHMet al. (1996) Measuring Physical Activity and Energy Expenditure. Champaign, IL: Human Kinetics.Google Scholar

40.Washburn, RA, Heath, GW & Jackson, AW (2000) Reliability and validity issues concerning large-scale surveillance of physical activity. Res Q Exerc Sport 71, 2 Suppl., S104–S113.CrossRef Google Scholar PubMed

41.Shephard, JR (2003) Limits to the measurement of habitual physical activity by questionnaires. Br J Sport Med 37, 197–206.CrossRef Google Scholar

42.Bauman, A, Ainsworth, BE, Bull, Fet al. (2009) Progress and pitfalls in the use of the International Physical Activity Questionnaire (IPAQ) for adult physical activity surveillance. J Phys Act Health 6, Suppl. 1, S5–S8.CrossRef Google Scholar PubMed

43.Hallal, CP, Victora, GC, Wells, CKJet al. (2004) Comparison of short and full-length International Physical Activity Questionnaires. J Phys Act Health 1, 227–234.CrossRef Google Scholar

44.Heesch, CK, van Uffelen, GZ, Hill, LRet al. (2010) What do IPAQ questions mean to older adults? Lessons from cognitive interviews. Int J Behav Nutr Phys Act 7, 35.CrossRef Google Scholar PubMed

45.Vuillemin, A, Oppert, J, Guillemin, Fet al. (2000) Self-administered questionnaire compared with interview to assess past-year physical activity. Med Sci Sports Exerc 32, 1119–1124.CrossRef Google Scholar PubMed

46.Bassett, DR (2000) Validity and reliability issues in objective monitoring of physical activity. Res Q Exerc Sport 71, 2 Suppl., S30–S36.CrossRef Google Scholar PubMed

47.Kang, M, Bassett, DR, Tudor-Locke, Cet al. (2012) Measurement effects of seasonal and monthly variability on pedometer-determined data. J Phys Act Health 9, 336–343.CrossRef Google Scholar PubMed

48.Kang, M, Bassett, DR, Tudor-Locke, Cet al. (2009) How many days are enough? A study of 365 days of pedometer monitoring. Res Q Exerc Sport 80, 445–453.CrossRef Google Scholar PubMed

49.Flather, MD, Farkouh, ME, Pogue, JMet al. (1997) Strengths and limitations of meta-analysis: larger studies may be more reliable. Control Clinical Trials 18, 568–579.CrossRef Google Scholar PubMed

50.Shepard, LA (1993) Evaluating test validity. Rev Res Educ 19, 405–450.Google Scholar

Table 1 Features of the studies included in the meta-analyses of convergent validity of IPAQ and outcome domains examined

Table 2 Stem-and-leaf plots of correlation coefficients (ESr) of IPAQ

Table 3 Results of meta-analyses for overall weighted mean correlation coefficients (ESr) across PA categories of IPAQ

Table 4 Results of moderator analyses across all PA categories of IPAQ

Article contents

Convergent validity of the International Physical Activity Questionnaire (IPAQ): meta-analysis

Abstract

Keywords

Methods

Search strategy and selection criteria

Methodological quality

Data extraction and coding

Study characteristics

Computation of effect sizes

Moderator analysis

Results

Overall effect sizes

Moderator analyses

Discussion

Conclusion

Acknowledgements

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests