Physical activity (PA) has been regarded as one of the most important habitual behaviours which leads to a healthy life by preventing diseases and increasing health benefits(1–Reference Blair, Cheng and Holder5). As the importance of PA has been emphasized, attempts have been made to develop appropriate measurement tools, including objective and subjective measurement tools, to quantify the amount of PA in daily life. Of these, questionnaires remain the most widely used measurement tool in large-scale studies due to their efficiency of measuring PA levels in large populations(Reference Sallis and Saelens6).
The International Physical Activity Questionnaire (IPAQ) is an instrument which was developed by the International Consensus Group in 1998–1999 to establish a standardized and culturally adaptable measurement tool across various populations in the world(Reference Craig, Marshall and Sjostrom7). IPAQ is designed to assess the levels of habitual PA for individuals ranging from young to middle-aged adults (i.e. 15–69 years old). In addition, there are different forms of IPAQ depending on several variations which include length of questionnaire (i.e. short or long form), reference period (i.e. last 7 d or usual week) and mode of administration (i.e. self-report or interviewer-based).
Soon after IPAQ was developed it was translated into several different languages and numerous studies have been conducted to examine the reliability and validity of these versions across countries. In these studies one of the most commonly applied approaches to establish the validity evidence of IPAQ is the convergent validity, which indicates the extent to which different measurement tools measure the same construct. However, the extent to which the estimates from IPAQ linearly relate to other counterpart instruments has varied depending on the different characteristics of IPAQ examined (i.e. translation, length, reference period and mode of administration) and the instrument used for the comparison(Reference Hallal, Gomez and Parra8), yet quantification of the exact extent of variations is still undefined.
To the best of our knowledge, no studies to date have examined the sources and magnitudes of factors that may explain such discrepancies in convergent validity of IPAQ across studies. With high prevalence of usage of IPAQ in measuring levels of PA at the population level and limited information for convergent validity of IPAQ in various formats, synthesizing all empirical evidence on convergent validity of IPAQ would provide more comprehensive information. The purpose of the present study was therefore to apply a meta-analytic method to quantifying the overall convergent validity of IPAQ across different studies and to investigate the sources and magnitudes of moderator factors that may affect the overall convergent validity of IPAQ.
Methods
Search strategy and selection criteria
The relevant studies for examining convergent validity of IPAQ were obtained from five electronic databases (i.e. SPORTDiscus, Medline, Google Scholar, PubMed and EBSCOhost). The main keywords used to identify the appropriate studies were ‘International Physical Activity Questionnaire’, ‘IPAQ’, ‘validity’, ‘convergent validity’, ‘comparison’ and ‘validation’. All of these keywords were entered with several combinations.
The primary outcome of interest was the correlation coefficient between IPAQ and another instrument. The following criteria were used to select potential studies for inclusion: (i) a study that used IPAQ as either a main instrument to be validated or an instrument to be compared with; (ii) a study in which the participants were not physically or emotionally challenged or disabled; (iii) a study in which the mean age of participants fell between 15 and 69 years old; (iv) in circumstances where IPAQ was translated into other languages, no changes in the structure occurred; (v) a study had a precise definition of PA intensity derived from the instrument; (vi) a study that reported statistical results in sufficient detail to estimate effect size (ESr); and (vii) a peer-reviewed article published in English. Using these criteria, potentially relevant studies were screened by two independent reviewers and full texts of all studies meeting the inclusion criteria were further assessed for methodological quality and for data extraction. Consensus was achieved through discussion when disagreements occurred between the two reviewers.
Methodological quality
Two reviewers independently assessed the methodological quality of studies using the modified version of the Downs and Black checklist(Reference Downs and Black9), which was used in recent systematic reviews(Reference Prince, Adamo and Hamel10, Reference Warburton, Charlesworth and Ivey11). The modified checklist consisted of fifteen items within three domains (i.e. reporting, external validity and internal validity), and possible scores ranged between 0 and 15 (e.g. higher scores indicated better methodological quality). Any study that scored relatively low on methodological quality (i.e. Z-score <−1·96) was not considered for inclusion in the meta-analyses.
Data extraction and coding
The long form of IPAQ examines the habitual PA in daily life using twenty-seven items across four PA domains (i.e. leisure time, domestic and gardening, occupational and transport-related activities), while the short form of IPAQ consists of seven summarized items that measure the comprehensive level of PA regardless of the domains to be measured. In both forms, the participants are asked to report the durations and frequencies of three specific PA categories, i.e. walking, moderate PA (MPA) and vigorous PA (VPA). Total amount of time spent engaging in or energy expenditure for each PA category can be estimated as main outcomes using metabolic equivalent of task (MET) values of 3·3, 4 and 8 for walking, MPA and VPA, respectively. Because the MET value of walking is within a range for moderate-intensity PA (i.e. 3–6 MET)(Reference Ainsworth, Haskell and Whitt12), it has also been recommended to combine the estimates of walking and MPA to obtain the total MPA (denoted as TMPA)(13). Total PA (TPA) can be simply estimated by summation of all estimates from each category (i.e. walking + MPA + VPA). Therefore, there are a total of five PA categories that can be derived from IPAQ (i.e. walking, MPA, TMPA, VPA and TPA).
Throughout the systematic review of selected studies, ESr values were extracted separately for each of the five PA categories to avoid dependency issues in the meta-analysis. In addition, each ESr was extracted only if the compared PA categories from both IPAQ and the other instrument were consistent or reasonably consistent (see outcome domains in Table 1). For example, estimates in walking, MPA and TMPA from IPAQ should be compared with estimates for MPA obtained from the other instrument. Likewise, ESr values were extracted for VPA and TPA only if they were compared with the same PA categories from the other instrument. However, because a pedometer does not provide the information of step counts within specific PA categories, ESr that were estimated between total step counts of pedometers and each PA category of IPAQ were also extracted. If a single study reported more than one ESr within the same PA category, but from different subpopulations, we assumed each ESr from different subpopulations to be independent from each other and included them in a single meta-analysis(Reference Lipsey and Wilson14). The units or scales of estimated value within each study were not considered because the primary outcome of interest in the present study was the correlation coefficient, which is a scale invariant coefficient in itself(Reference Filliben15).
IPAQ, International Physical Activity Questionnaire; MET, metabolic equivalent of task; PA, physical activity.
†Regions where the participants were recruited (sample size); ‘-’ indicates no moderator variables were extracted.
‡Types of instrument and cut-off standards compared with IPAQ: GPAQ, Global Physical Activity Questionnaire; MLTPAQ, Minnesota Leisure Time Physical Activity Questionnaire; Baecke-Q, Baecke questionnaire; OIMQ, Office In Motion Questionnaire.
§Outcome domains for meta-analyses (PA categories).
Moderator variables which may affect overall convergent validity of IPAQ were obtained from different characteristics of IPAQ used in each study: (i) length of IPAQ (i.e. short and long forms); (ii) reference period (i.e. last 7 d and usual week); (iii) mode of administration (i.e. interviewer and self-reported); and (iv) language (i.e. English and translated). In addition, the instruments which were used for comparison with IPAQ within each study were also extracted as a moderator variable: (v) instruments (i.e. accelerometer, pedometer and subjective measure).
Study characteristics
A total of sixty-seven potentially relevant studies were considered for further review. By systematic review based on inclusion criteria, a total of twenty-eight studies were excluded due to their inability to meet criteria and duplication. Full texts of the remaining thirty-nine studies were reviewed for a detailed assessment. Of these, twenty-one studies met all inclusion criteria and secured relatively higher methodological quality (mean 13·2; sd 1·3). A total of 152 ESr values across five PA categories in IPAQ were retrieved (i.e. seventeen ESr from ten studies for walking(Reference De Cocker, Cardon and De Bourdeaudhuij16–Reference van der Ploeg, Tudor-Locke and Marshall25), seventeen ESr from twelve studies for MPA(Reference De Cocker, Cardon and De Bourdeaudhuij16–Reference Mader, Martin and Schutz24, Reference Boon, Hamlin and Steel26, Reference Roman-Vinas, Serra-Majem and Hagstromer27), twenty-three ESr from ten studies for TMPA(Reference Hagstromer, Ainsworth and Oja21, Reference Mader, Martin and Schutz24, Reference van der Ploeg, Tudor-Locke and Marshall25, Reference Roman-Vinas, Serra-Majem and Hagstromer27–Reference Vandelanotte, De Bourdeaudhuij and Sallis33), thirty-five ESr from seventeen studies for VPA(Reference De Cocker, Cardon and De Bourdeaudhuij16–Reference Mader, Martin and Schutz24, Reference Boon, Hamlin and Steel26–Reference Vandelanotte, De Bourdeaudhuij and Sallis33) and sixty ESr from sixteen studies for TPA(Reference Craig, Marshall and Sjostrom7, Reference De Cocker, De Bourdeaudhuij and Cardon17–Reference Hagstromer, Ainsworth and Oja21, Reference Mader, Martin and Schutz24, Reference van der Ploeg, Tudor-Locke and Marshall25, Reference Roman-Vinas, Serra-Majem and Hagstromer27–Reference Lachat, Verstraeten and Khanh le30, Reference Timperio, Salmon and Rosenberg32–Reference Thuy, Blizzard and Schmidt35)). See Table 2 for stem–leaf plots of ESr extracted across PA categories. Total sample sizes for each PA category ranged from a low of 4453 in TMPA to a high of 8867 in TPA.
IPAQ, International Physical Activity Questionnaire; n, number of ESr; MPA, moderate physical activity; TMPA, total moderate physical activity; VPA, vigorous physical activity; TPA, total physical activity.
Computation of effect sizes
The measure of ESr in the present study was the product-moment correlation coefficients (e.g. Pearson r and Spearman ρ), which represent the strength of associations between the estimates of IPAQ and other counterpart instruments as an indication of convergent validity of IPAQ. The psychometric meta-analytic method proposed by Hunter and Schmidt(Reference Hunter, Schmidt and Jackson36, Reference Hunter and Schmidt37) was conducted to obtain the population-level estimates unaffected by statistical artefacts, such as sampling error and measurement error. The ‘bare-bone’ mean ESr (i.e. $$\[-->$<>{\rm{ES}}\bar{r}<$> <!--\]$$), corrected for only sampling error, was calculated by weighting each ESr with the respective sample size when aggregating them into $$\[-->$<>{\rm{ES}}\bar{r}<$> <!--\]$$. In order to correct for the measurement errors of IPAQ in addition to sampling error, the reliability coefficients of IPAQ with respect to each PA category (e.g. intra-class correlation coefficients) were further extracted. There were eleven reliability coefficients available for walking (mean 0·74; sd 0·15), nine for MPA (mean 0·63; sd 0·22), eight for TMPA (mean 0·62; sd 0·21), twelve for VPA (mean 0·67; sd 0·23) and thirty-two for TPA (mean 0·77; sd 0·13). Because the reliability coefficients were not available for all of the included studies, the artefact distributions were calculated for each PA category to obtain the corrected mean ESr at the population level (i.e. ESρ) that was unaffected by sampling error and measurement error. 95 % confidence intervals (CI) were produced on the basis of the standard error of ESρ and 95 % credibility intervals (CV) were also yielded using the residual standard deviation of ESρ. According to Cohen's guidelines, ESρ was interpreted as small (<0·30), medium (0·31–0·49) and large (≥0·50)(Reference Cohen38).
Moderator analysis
For determining the presence of moderator effects on ESρ, three different criteria (i.e. the percentage of variance components attributed to statistical artefacts, the Q homogeneity statistic and 95 % CV) were simultaneously examined as recommended by Hunter and Schmidt(Reference Hunter and Schmidt37). To be specific, we concluded that moderators exist if: (i) the percentage of variance accounted for by statistical artefacts is less than 75 % of the observed variance in ESr; (ii) the Q homogeneity statistic is significant; and (iii) the 95 % CV is either relatively large or includes zero. However, due to the imprecise meaning of ‘large’ CV, we focused mainly on first two criteria to examine the moderator effects unless disagreement occurred.
Results
Overall effect sizes
The ESρ corrected for artefacts of sampling error and measurement error across each PA category is presented in Table 3. There were positive relationships between IPAQ and other instruments across all PA categories (ESρ range = 0·27–0·49) in which all 95 % CI did not include zero. According to Cohen's guideline, medium-sized ESρ were retrieved for walking (ESρ = 0·32), TMPA (ESρ = 0·45), VPA (ESρ = 0·49) and TPA (ESρ = 0·39), while MPA had a small-sized effect size with an ESρ of 0·27. The proportions of variance accounted by artefacts among the total variance of observed ESr for each PA category were all less than 75 % and statistical significances were found in Q homogeneity tests for all PA categories (all P < 0·05). Therefore, follow-up moderator analyses were conducted using predefined moderators as hypothesized in the present study.
PA, physical activity, IPAQ, International Physical Activity Questionnaire; K, number of studies; n, number of ESr; N, total sample size; CV, credibility interval; MPA, moderate physical activity; TMPA, total moderate physical activity; VPA, vigorous physical activity; TPA, total physical activity.
*P < 0·05.
†Averaged ESr corrected for sampling error only.
‡Averaged ESr corrected for sampling error and measurement errors of IPAQ.
§Percentage of variance accounted for by statistical artefacts including sampling error and measurement error of IPAQ.
Moderator analyses
Moderator analyses were conducted to examine the effects of language (i.e. English and translated), length of IPAQ (i.e. short and long form), reference period (i.e. last 7 d and usual week), mode of administration (i.e. interviewer and self-reported) and instruments (i.e. accelerometer, pedometer and subjective measure) on overall ESρ for each PA category (see Table 4). Collectively, substantial differences in ESρ were detected by different levels of included moderators across all PA categories.
PA, physical activity, IPAQ, International Physical Activity Questionnaire; K, number of studies; n, number of ESr; N, total sample size; CV, credibility interval; MPA, moderate physical activity; TMPA, total moderate physical activity; VPA, vigorous physical activity; TPA, total physical activity.
*P < 0·05.
†Averaged ESr corrected for sampling error only.
‡Averaged ESr corrected for sampling error and measurement errors of IPAQ.
§Percentage of variance accounted for by statistical artefacts including sampling error and measurement error of IPAQ.
In terms of language of IPAQ, there were consistent trends in the rank of ESρ for all PA categories in studies which used translated versions. These studies had significantly greater ESρ compared with those in which the English version was applied. Using the 75 % rule and Q homogeneity statistic, the observed ESr values obtained from the English-version IPAQ studies for walking, MPA, TMPA and VPA were shown to be homogeneous, while there was still a large amount of unexplained variance in ESρ, mostly for which the translated versions were used.
The variations in ESρ for all PA categories were also not significantly explained by different length of IPAQ with the exception of walking and MPA. The percentage of variance accounted for by artefacts increased dramatically for the studies where the long form was used (83·9 % and 56·6 % for walking and MPA, respectively). Non-significant Q statistics were detected for the long forms of walking and MPA (Q(df = 6) = 8·34; P > 0·05 and Q(df = 7) = 14·06; P > 0·05, respectively). Although the length of IPAQ accounted for a relatively small percentage of variance in ESρ for most PA categories, the ESρ values by different length of IPAQ differed significantly in walking and TMPA, where the 95 % CI for ESρ were not overlapped between the long and short form. Moreover, systematic trends for the rank of ESρ were detected in the studies which used short-form versions of IPAQ. These studies had greater ESρ for all PA categories.
Moderator analyses by reference period did not significantly increase the percentage of variance accounted for by artefacts or the non-significant Q homogeneity statistic for all PA categories. Moreover, there were no observable trends for rank of ESρ values across PA categories.
With respect to the mode of administration, the results showed that interviewer-administered studies had greater ESρ values for all PA categories with the exception of MPA, in which 80·9 % of the variation in ESρ for self-reported studies was attributed to artefacts with a non-significant Q statistic (Q(df = 10) = 13·60; P > 0·05). The ESρ values differed significantly by different mode of administration in TMPA, VPA and TPA. Interviewer-administered studies had a greater ESρ than those which utilized the self-reported measure of IPAQ.
The type of instrument moderately increased the percentage of variance accounted for by artefacts in walking, VPA and TPA, in which non-significant Q homogeneity statistics for respective types of instruments were also detected. The studies which utilized subjective measures had greater ESρ values than studies utilizing objective measures in all PA categories with the exception of walking, in which opposite results were yielded.
Discussion
To our knowledge, the present study is the first comprehensive attempt to synthesize the scientific evidence on convergent validity of IPAQ using meta-analysis. The first purpose of the study was to examine the overall convergent validity of IPAQ. The results showed that the overall ESρ for each PA category were all positive, which supports the convergent validity evidence of IPAQ, but they varied from small-to-medium effect size according to Cohen's definitions(Reference Cohen38). Walking, TMPA, VPA and TPA of IPAQ secured medium-sized ESρ, while MPA had a small-sized ESρ. Such variations in ESρ by different categories of IPAQ may be due to the inherent property of IPAQ as a subjective measure. Measuring PA in IPAQ relies on the recall of diverse activities for a 7 d period, which requires participants to utilize their cognitive ability for the recall process. The greatest ESρ observed in VPA can be explained by the evidence which shows that vigorous-intensity PA tends to be more structured, which may positively affect participant recall. On the other hand, walking and moderate-intensity activity are not typically structured but rather accumulated gradually during daily life(Reference Hagstromer, Oja and Sjostrom29). This may result in participants not recalling the exact amount of walking and activities involved in MPA(Reference Vandelanotte, De Bourdeaudhuij and Sallis33, Reference Montoye, Kemper and Saris39, Reference Washburn, Heath and Jackson40). Another possible explanation for varying results in ESρ across PA categories is that variations in individual perceptions with respect to the intensity of each PA category may occur due to insufficient information for each specific category(Reference Shephard41). For example, IPAQ defines VPA as an activity causing harder than usual breathing and MPA as an activity causing somewhat harder breathing(Reference Bauman, Ainsworth and Bull42). In order to clarify this gap between MPA and VPA, IPAQ offers some examples of activity according to MET values for each type of intensity; however, different perceived exertions may exist with respect to the specific examples given by IPAQ considering that IPAQ covers a broad range of ages from 15 to 69 years. Hallal et al.(Reference Hallal, Gomez and Parra8) noted that specific examples linked to physiological signs or culturally adapted examples should be provided to aid participants in distinguishing MPA from VPA; we suggest that stratifying age-relevant examples would be beneficial to obtain more valid measures for MPA and VPA.
In IPAQ, participants are instructed to report time spent in MPA that lasted for at least 10 min except while walking, which is asked in separate questions. Walking and MPA that are defined as MET values of 3·3 and 4 in IPAQ fall within the same boundary of moderate-intensity PA (i.e. 3–6 MET)(Reference Ainsworth, Haskell and Whitt12). Our finding suggests that TMPA, which is the sum of walking and MPA, has a greater ESρ than walking and MPA, indicating that TMPA has secured more strong convergent validity than sole measures of walking and MPA. This may imply that IPAQ has secured its initial intention of discriminating walking from MPA, in that summation of the estimates from walking and MPA would yield more valid estimates for TMPA. Some researchers argue that separation of walking and MPA in the same questionnaire may confuse participants about time spent in walking under MPA(Reference Dinger, Behrens and Han19); however, the results of the present study indicated, collectively, that participants may well conceive time spent in walking separate from MPA.
The second purpose of the present study was to investigate the effects of moderator variables on overall validity of IPAQ across all PA categories. IPAQ was developed with the aim of international monitoring and national comparison(Reference Craig, Marshall and Sjostrom7); however, variation incurred by language translation still remained questionable due to the different cultural atmospheres(Reference Bauman, Ainsworth and Bull42). In our study, we attempted to synthesize a total of 152 ESr from different cultures. There were 120 ESr retrieved from translated versions of IPAQ, which yielded greater ESρ values compared with English versions of IPAQ across all PA categories. These findings supported that IPAQ secured comparable convergent validity across different cultures without any structural changes in IPAQ. Although we agree that some examples or words should be adapted in accordance with the cultural atmosphere where IPAQ would be used, following well-established translation protocols suggested by the IPAQ consensus group would be promising for positive convergent validity of IPAQ in different cultures.
IPAQ has two different versions (i.e. long or short form). The long form measures the habitual PA in three intensity-specific categories across four domains, while the short form examines only generic PA within three intensity-specific categories without any separation of specific domains. The short form has been recommended for population-based study due to its feasibility and preferences over the long form(Reference Craig, Marshall and Sjostrom7); however, the estimates from the short form tend to overestimate actual PA due to the lack of sufficient information for specific domains(Reference Hallal, Victora and Wells43). Bauman et al.(Reference Bauman, Ainsworth and Bull42) noted that the large variances in PA measures estimated from the short form could be caused by using the short form as a means of estimating continuous levels of PA, while the primary purpose of the short form is categorical reporting. In the current meta-analyses, levels of PA with the forms of continuous measures obtained from the short form have ESρ comparable to or even larger than that of the long form. From this, we can conclude that using a short form to estimate the amount of PA as a form of continuous measures seems to be acceptable if the primary interest of the study is not domain-specific measures. However, 95 % CV for ESρ obtained from the studies where the short form was used were shown to be relatively large v. the estimates from the long form. One should bear in mind that PA estimates from the short form can be varied dramatically by unexplained moderators or factors, while the long form may provide more stable measures.
Measuring generic PA using questionnaires relies heavily on recall processes that may require the appropriate retrieval cues for stimulating the search of the participant's memory(Reference Sallis and Saelens6, Reference Bauman, Ainsworth and Bull42). There are two cues with respect to reference period (i.e. last 7 d or usual week) that one can utilize to aid the participant's recall process. In the original development study of IPAQ(Reference Craig, Marshall and Sjostrom7), the International Consensus Group found the comparability of both ‘last 7 d’ and ‘usual week’ reference periods in terms of reliability and validity and suggested to use the last 7 d reference period based on the preferences in participating countries of their study. In the current analyses, no particular patterns for the rank of ESρ by different reference periods were observed across all PA categories. It could be expected to have stronger convergent validity when using the last 7 d reference period, since most studies have implemented the IPAQ right after they finished collecting objective data for a 7 d period. The comparable results between the last 7 d and usual week may reflect the fact that people tend to conceive the reference period of usual week as the last 7 d and subsequently respond in a common way as they regarded.
It has been widely recognized that interviewer administration would minimize the possible errors in implementing subjective measurement tools that are due to participant's misinterpretation and/or misunderstanding of the questions being asked(Reference Heesch, van Uffelen and Hill44, Reference Vuillemin, Oppert and Guillemin45). The findings of the current meta-analyses were mostly in agreement with previous understandings that the greater ESρ values were found from the studies in which interviewer administration was applied across all PA categories with the exception of MPA. Interviewer administration may have several advantages in that it prevent respondents from skipping questions and also could provide more opportunities to obtain more detailed information on each question v. the self-administrated questionnaire(Reference Craig, Marshall and Sjostrom7). Moreover, it allows the researchers to obtain more reliable estimates of PA levels among less educated populations who cannot fully understand the context being asked(Reference Hallal, Gomez and Parra8). Despite the benefits of interviewer administration, the self-reported approach may be more preferred in a large epidemiological study due to time or budget limitations; however, there would be a strong possibility to obtain more accurate measures of PA when an interviewer administered the IPAQ.
Objective measurement tools to quantify levels of PA have been highly recognized for their capability to provide more precise and accurate estimates of PA levels over subjective measurement tools(Reference Bassett46). There has been an increase in using objective measurement tools as a means of criterion for validating PA questionnaires. In the current meta-analyses, three types of instrument (i.e. accelerometer, pedometer and subjective measure) have been used for comparison with IPAQ. The studies featuring subjective measurement tools used as a counterpart instrument to IPAQ resulted in the greatest ESρ values for most of the PA categories. These findings are broadly in agreement with the notion that subjective measurement tools tend to share similar psychometric properties based on common subjective recall processes(Reference Prince, Adamo and Hamel10). In other words, similar systematic errors such as cognitive biases or social desirability might occur for subjective measurement tools, by which stronger linear relationships of the estimates from IPAQ with other subjective measurement tools could be estimated. While the systematic errors within the estimates from objective measurement tools are more likely to occur by different measurement conditions, such as seasons and months(Reference Kang, Bassett and Tudor-Locke47) or number of monitoring days(Reference Kang, Bassett and Tudor-Locke48), that may result in lower convergent validity of IPAQ when comparing with objective measurement tools. In addition, such inconsistency between the estimates from IPAQ and objective measurement tools may also be attributed to the fact that IPAQ is intended to measure activities longer than 10 min in duration, whereas the accelerometer and pedometer tend to measure every form of physical movement. The concept of 10 min in IPAQ may result in unreliably large variations within individual PA levels, which may worsen the linear relationship of estimates of IPAQ with other objective measures(Reference Hallal, Gomez and Parra8, Reference Ekelund, Sepp and Brage34).
There were several limitations that should be considered when examining the results of the present study. First, variations by different cut-off standards set to determine PA categories of accelerometer data across studies were not considered, which may influence varying results in ESρ, especially in MPA and VPA that are based on those standards. However, considering that there is no single ‘gold standard’ measure as a criterion for PA comparison, we believe that the results from our study may be generalized as overall convergent validity of IPAQ. Another area of concern is that the measure of effect size aggregated for the current meta-analysis was the correlation coefficients, which are not capable of detecting the agreements on the estimates between IPAQ and other criterion measures. Correlation coefficients would provide sufficient information for convergent validity of IPAQ as a form of linear relationship; however, examining the agreements would give an insight into the extent to which the IPAQ over- or underestimates the actual level of PA. Thus, we suggest future studies to conduct the meta-analytic review on the agreements between IPAQ and other criterion instruments. In addition, 95 % CV around ESρ values in moderator analyses showed that there was still a large amount of unexplained variance after controlling for artefacts and predefined moderators. Hierarchical moderator analyses may be a more appropriate approach to resolve this problem(Reference Hunter and Schmidt37); however, more effect sizes would be needed for each level of moderators. Lastly, some of the moderator analyses were conducted based on the small number of ESr, which may affect the generalizability of the current findings. Small-sized meta-analysis (i.e. <200 ESr) may only be capable of summarizing the evidence or generating hypotheses for future research(Reference Flather, Farkouh and Pogue49). The process of confirming validity evidence for a certain measurement tool is regarded as a ‘never ending process’(Reference Shepard50); therefore, more evidence not only for convergent validity but also diverse aspects of validity of IPAQ should be continuously accumulated across different populations or measurement conditions.
Conclusion
The present study attempted to synthesize all scientific evidence to examine the overall convergent validity of IPAQ. The findings indicated that IPAQ is a reasonably valid measurement tool for measuring habitual PA. However, the variations in convergent validity across different PA categories and moderator variables imply that different research conditions should be taken into account prior to deciding on use of the appropriate type of IPAQ.
Acknowledgements
This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. There are no conflicts of interest. Study concept and design: Y.K., I.P. and M.K. Acquisition of data: Y.K. and M.K. Statistical analysis and interpretation of data: Y.K., I.P. and M.K. Drafting of manuscript: Y.K. Critical revision of manuscript: Y.K., I.P. and M.K. Study supervision: M.K.