Fe deficiency is one of the three biggest ‘hidden hungers’ (Fe deficiency, iodine deficiency and vitamin A deficiency) in the world. According to a report from the WHO in 2001(1), over 2 billion individuals suffer from Fe-deficiency anaemia (IDA). The epidemic situation of Fe deficiency is more severe in developing countries. In 2002, the National Nutrition and Health Survey revealed that the average anaemia prevalence in China was 15·2 % and for children below the age of 2 years, individuals older than the age of 60 years and child-bearing women, the corresponding prevalence was 24·2, 21·5 and 20·6 %, respectively(2). The WHO ranked Fe deficiency as the seventh most important preventable risk factor for diseases, disability and death in 2002(3).
Fe is an essential element for Hb synthesis in the human body. Fe deficiency can lead to a reduction in Hb synthesis and damaged health in individuals. The impact of IDA on health manifests in the following aspects(Reference Preziosi, Prual, Galan, Daouda, Boureima and Hercberg4–Reference Haas and Brownlie9): IDA could lead to low birth weight, increased mother and neonatal mortality, and increased infant mortality. In infancy, IDA will delay physical and mental development and thus damage the work capacity in adulthood. In children, IDA will increase the chances and prolong the duration of upper respiratory tract infections. As anaemia damages capacity related to O2 transporting and lowers tolerance, the physical strength and work capacity of all IDA individuals will be harmed and undoubtedly this will lead to decreased income on an individual, family and country level. The reduction in economic productivity caused by anaemia was estimated to be 326 billion Yuan in China in 2001, which accounted for 3·6 % of gross domestic product(Reference Ross, Chen, He, Fu, Wang, Fu and Chen10).
Besides the lack of factors (such as meat, vitamin C) which could promote absorption of Fe in the food(Reference Charlton and Bothwell11), one important reason why Fe deficiency is epidemic in most developing countries is that a cereal-based diet is rich in phytic acid which decreases the bioavailability of Fe(Reference Hallberg, Brune and Rossander12–Reference Hurrell, Lynch and Bothwell14). When using most Fe salts for controlling Fe deficiency, the influence of factors that could inhibit the bioavailability of Fe can hardly be avoided. As an Fe-fortification compound, NaFeEDTA has a high Fe bioavailability in the human body through protection against inhibition by phytic acid(15). Experiments have shown that the bioavailability of Fe in NaFeEDTA is two to three times higher than the traditional Fe preparation, FeSO4 (ferrous sulfate), which is generally regarded as having a relatively high bioavailability of Fe compared with other Fe preparations(15, Reference Huo, Piao and Yu16). On the other hand, NaFeEDTA could promote the absorption of non-haem Fe in the diet(Reference Davidsson, Walczyk, Zavaleta and Hurrell17). Consequently, it has the potential to be effective against Fe deficiency. To date, the effect and safety of NaFeEDTA for Fe deficiency have not been systematically evaluated. Our objective was to evaluate the effect and safety of NaFeEDTA on Hb and serum ferritin in Fe-deficient populations.
Methods
Inclusion and exclusion criteria
Types of studies
We included randomised and quasi-randomised controlled trials; we excluded controlled before-and-after studies, self-controlled before-and-after studies, interrupted time-series studies, cohort studies, case–control studies and cross-sectional studies.
Types of participants
Participants included were any population in which Fe deficiency was epidemic. In our systematic review, we defined ‘Fe deficiency’ as serum ferritin concentration < 12 μg/l according to the standard of the International Nutritional Anemia Consultative Group(18).
Types of intervention
We included studies comparing NaFeEDTA v. placebo; we excluded studies in which vitamin C or other anti-anaemic drugs were simultaneously administered, studies comparing Fe preparations other than NaFeEDTA v. placebo, studies comparing NaFeEDTA v. other Fe salts, or studies comparing an EDTA complex which does not contain Fe v. placebo.
Types of outcomes
We included studies that assessed the effect and safety of NaFeEDTA on Hb concentration and/or serum ferritin concentration. At the same time, we included any possible adverse effect outcomes.
Search strategy
We searched Medline (1950 to May 2007), Cochrane Library (issue 2, 2007), Embase (1966 to May 2007), WHO Library (WHOLIS) and China National Knowledge Infrastructure (CNKI) (1980 to 2007). We also hand-searched conference proceedings and reference lists and contacted specialists in the field. We did not appoint any limit in country, race, language or publication year.
Selection of eligible studies
First, randomised or quasi-randomised controlled trials were identified through title or abstract (if necessary). Further, based on inclusion and exclusion criteria, eligible studies were included through abstract or full text (if necessary). This was performed by two reviewers (B. W. and Y. X.) independently. Discrepancies were resolved by discussion between the two reviewers and unresolved disagreement was referred to a third reviewer (S. Z.).
Quality assessment
The Cochrane Effective Practice and Organisation of Care Group (EPOC) review group has established quality-assessment criteria for randomised or quasi-randomised controlled trials(19). In our systematic review, we assessed the quality of included studies using the EPOC criteria. Two reviewers (B. W. and Y. X.) independently assessed the quality. Disagreements were resolved by discussion and by seeking the opinion of a third reviewer (S. Z.).
Data extraction
Data were extracted independently by two reviewers (B. W. and Y. X.). Any differences of opinion were resolved by discussion and consensus reached by discussion with a third reviewer (S. Z.). We collected information about methodological characteristics (study design, blinding, follow-up, allocation concealment, protection against contamination, baseline comparability, levels of allocation and analysis) and study characteristics (intervention measures, control measures, location and setting, inclusion criteria, interested outcomes, main results).
Analysis
We used RevMan software (version 4.2.8; Update Software Ltd, Oxford, Oxon, UK) to undertake heterogeneity tests and meta-analysis. As cluster randomised controlled trials were included, we used the generic inverse variance method and chose weighted mean difference as the effect measure. We decided whether to use the fixed effects model or the random effects model based on the result of the heterogeneity test. For serum ferritin outcome (the data were often log-normally distributed), we undertook meta-analysis on the logarithmic scale and report results on the arithmetic scale(Reference Higgins and Green20). For one study with more than one intervention group, we divided the control group evenly according to the number of intervention groups(21). We examined publication bias using the ‘metabias’ command in Stata 9.0 software (StataCorp LP, College Station, TX, USA).
For cluster randomised controlled trials with unit of analysis error, we computed effective sample size using the design effect, then we obtained approximately adjusted effect estimates and standard errors(Reference Higgins and Green20). Intracluster correlation coefficients needed to calculate design effects were provided by one similar study. Meanwhile, we undertook sensitivity analysis for this approximate adjustment. We used final values rather than change values to undertake meta-analysis. In quality assessment, if more than three items in one study were regarded as ‘not done’, then we defined this study as ‘unacceptable’ in methodological quality, and it was not included in the analysis. For Hb outcome, we undertook subgroup analysis according to baseline Hb concentration ( < 120 g/l or ≥ 120 g/l) and intervention dose ( < 10 mg Fe/d or ≥ 10 mg Fe/d) to explore the contribution of these two variables to heterogeneity in Hb outcome.
Results
Characteristics of included studies
Fig. 1 shows the selection of eligible studies. Through comprehensive searching we found 599 articles. Among them, 145 articles that were randomised or quasi-randomised controlled trials were identified. Further, according to the inclusion and exclusion criteria, we excluded 120 articles from 145 articles. Then we identified and excluded eighteen repeated articles and finally seven studies were included(Reference Ballot, MacPhail, Bothwell, Gillooly and Mayet22–Reference Thuy, Berger, Nakanishi, Khan, Lynch and Dixon28).
Important excluded studies included: one study that compared the combination of NaFeEDTA and Chinese herb v. Chinese herb alone(Reference Liang, Wang and Pan29), two self-controlled before-and-after studies that assessed the effect of NaFeEDTA for Fe deficiency in infants and children respectively(Reference Kahn and Larsen30–Reference Lin, Ji, Liu, Long and Shen32), five controlled before-and-after studies that assessed the effect of NaFeEDTA for Fe deficiency(Reference Garby and Areekul33–Reference Sun, Huang, Li, Wang, Wang, Huo, Chen and Chen38), one study that assessed the effect of NaFeEDTA on the prevention of Fe deficiency in pregnant women(Reference Li and Wang39), one study that compared the effect of the combination of NaFeEDTA and vitamin C v. placebo for anaemia(Reference Yang, Chen and Tan40), and one study that compared the effect of NaFeEDTA v. other Fe preparations (FeSO4, elemental Fe) for Fe deficiency(Reference Sun, Huo, Yu, Miao, Chen, Zhang, Ma, Wang and Li41).
Table 1(Reference Ballot, MacPhail, Bothwell, Gillooly and Mayet22–Reference Thuy, Berger, Nakanishi, Khan, Lynch and Dixon28) shows the characteristics of the seven included studies. All studies were implemented in developing countries: four studies in China, two studies in Vietnam and one study in South Africa. Eligible studies included two individual randomised controlled trials and five cluster randomised controlled trials. The participants of included studies were all from Fe-deficient populations: two studies focusing on the general population, three studies focusing on children and the other two studies on women of child-bearing age. In terms of intervention forms, five studies used NaFeEDTA-fortified condiments (soya sauce, fish sauce and curry powder) while the other two studies used tablets that contained NaFeEDTA. The intervention dose of Fe from NaFeEDTA ranged from 4·9 to 20·0 mg/d; less than 10·00 mg/d in six intervention arms and more than 10·00 mg/d (including 10·00 mg/d) in two arms. Intervention duration ranged from 3 to 24 months. All studies reported Hb concentration and four studies reported serum ferritin concentration. Only one study reported serum Zn concentration as a possible adverse effect outcome.
RCT, randomised controlled trial; T, total sample size; Pl, placebo; SF, serum ferritin.
* Mean (sd) for Hb values; geometric mean (sd range) for SF values.
† This study had two intervention arms, a high-dose arm and a low-dose arm. The high-dose arm provided an Fe dose of 20 mg/d and the low-dose arm, 5 mg/d.
Methodological quality of included studies
According to the EPOC checklist(19), we assessed the quality of the included studies in six aspects: allocation concealment, follow-up, baseline measurement, blinded assessment of outcomes, reliable outcome measure and protection against contamination. All controlled trials had adequate follow-up, good comparability in baseline measurement between intervention and control groups, blinded assessment of outcome, reliable outcome measures and measures to protect against contamination. Allocation concealment was implemented in four studies(Reference Ballot, MacPhail, Bothwell, Gillooly and Mayet22, Reference Wang, Ping, Mao and Huang24, Reference Wang, Ping, Jin, Mao and Huang25, Reference Chen, Zhao and Zhang27), not clear in two studies(Reference Thuy, Berger, Davidsson, Khan, Lam, Cook, Hurrell and Khoi26, Reference Thuy, Berger, Nakanishi, Khan, Lynch and Dixon28) and ‘not done’ in one study(Reference Huo, Sun and Miao23). All the included studies were regarded as ‘acceptable’ in methodological quality and thus were included in the analysis.
Summary of effects and safety
Haemoglobin concentration
Among the seven included studies that reported Hb concentration(Reference Ballot, MacPhail, Bothwell, Gillooly and Mayet22–Reference Thuy, Berger, Nakanishi, Khan, Lynch and Dixon28), unit of analysis error existed in four cluster randomised controlled trials(Reference Ballot, MacPhail, Bothwell, Gillooly and Mayet22, Reference Wang, Ping, Mao and Huang24, Reference Wang, Ping, Jin, Mao and Huang25, Reference Chen, Zhao and Zhang27). We used intracluster correlation coefficients of Hb at family and postcode sector levels provided by the Health Survey for England 1994(Reference Colhoun and Prescott-Clarke42) to compute the design effect and obtained approximately adjusted estimates and standard errors (Table 2) in three studies(Reference Ballot, MacPhail, Bothwell, Gillooly and Mayet22, Reference Wang, Ping, Jin, Mao and Huang25, Reference Chen, Zhao and Zhang27). Approximate adjustment analysis could not be undertaken for one study(Reference Wang, Ping, Mao and Huang24), because we could not find any intracluster correlation coefficient of Hb at class level from external sources and this study did not provide information on the number of clusters, which was thus excluded from the meta-analysis.
ICC, intracluster correlation coefficient; N/A, not available.
* Here we used ICC of Hb and serum ferritin at the family level provided by the Health Survey for England 1994(Reference Colhoun and Prescott-Clarke42).
† We did not find any ICC of Hb at the class level from external sources; meanwhile this study did not provide information on the number of clusters. Thus, approximate adjustment analysis could not be done.
‡ Here we used ICC of Hb and serum ferritin at the postcode sector level provided by the Health Survey for England 1994(Reference Colhoun and Prescott-Clarke42).
§ For serum ferritin, the estimates and se are shown on the logarithmic scale.
Finally, six studies(Reference Ballot, MacPhail, Bothwell, Gillooly and Mayet22, Reference Huo, Sun and Miao23, Reference Wang, Ping, Jin, Mao and Huang25–Reference Thuy, Berger, Nakanishi, Khan, Lynch and Dixon28), which contributed seven analytic components totally, were included in the meta-analysis. The heterogeneity test showed that heterogeneity existed among studies (P < 0·001). Meta-analysis using the random effects model found that the pooled estimate (weighted mean difference) for Hb with NaFeEDTA was 8·56 (95 % CI 2·21, 14·90) g/l (P = 0·008; Fig. 2). Sensitivity analysis did not materially change the result of the meta-analysis after excluding cluster randomised trials with unit of analysis error (weighted mean difference 12·46 (95 % CI 3·77, 21·16) g/l; P = 0·005). We performed statistical testing for publication bias: the Begg rank correlation method (P = 0·881) and the Egger weighted regression method (P = 0·568); both indicated no publication bias found.
Subgroup analysis (Table 3) found that the pooled differences with NaFeEDTA were 13·23 (95 % CI 6·50, 19·95) g/l (P < 0·001) in the subgroup with baseline Hb of < 120·00 g/l and 2·53 (95 % CI 1·01, 4·04) g/l (P = 0·001) in the subgroup with higher baseline Hb, and this indicated that a higher Hb increase was associated with baseline Hb concentration < 20·00 g/l (non-overlapping 95 % CI). The pooled differences with NaFeEDTA in the subgroup with an intervention dose of < 10·00 mg/d and the subgroup with the higher dose were 5·92 (95 % CI − 0·65, 12·48) g/l (P = 0·080) and 15·14 (95 % CI 2·60, 27·69) g/l (P = 0·020), respectively. Thus we found no relationship between Hb increase and intervention dose (overlapping 95 % CI).
Serum ferritin concentration
Four included studies(Reference Ballot, MacPhail, Bothwell, Gillooly and Mayet22, Reference Thuy, Berger, Davidsson, Khan, Lam, Cook, Hurrell and Khoi26–Reference Thuy, Berger, Nakanishi, Khan, Lynch and Dixon28) reported serum ferritin concentration, and unit of analysis error existed in two cluster randomised controlled trials(Reference Ballot, MacPhail, Bothwell, Gillooly and Mayet22, Reference Chen, Zhao and Zhang27). We used intracluster correlation coefficients of serum ferritin at family and postcode sector levels provided by the Health Survey for England 1994(Reference Colhoun and Prescott-Clarke42) to compute the design effect and obtained approximately adjusted estimates and standard errors (Table 2) in both studies(Reference Ballot, MacPhail, Bothwell, Gillooly and Mayet22, Reference Chen, Zhao and Zhang27).
Finally, all four studies(Reference Ballot, MacPhail, Bothwell, Gillooly and Mayet22, Reference Thuy, Berger, Davidsson, Khan, Lam, Cook, Hurrell and Khoi26–Reference Thuy, Berger, Nakanishi, Khan, Lynch and Dixon28) were included in the meta-analysis. The heterogeneity test showed that heterogeneity existed among studies (P = 0·010). The meta-analysis using the random effects model found that the pooled difference for serum ferritin with NaFeEDTA was 1·58 (95 % CI 1·20, 2·09) μg/l (P < 0·001; Fig. 3). Sensitivity analysis did not materially change the result of the meta-analysis after excluding cluster randomised trials with unit of analysis error (weighted mean difference 2·29 (95 % CI 1·62, 3·16) μg/l (P < 0·001).
Possible adverse effects
One study(Reference Ballot, MacPhail, Bothwell, Gillooly and Mayet22) reported the effect of NaFeEDTA on serum Zn concentration; there was no difference (mean difference 0·1 (95 % CI − 1·6, 1·8) μmol/l; P = 0·910; power 90·0 %) in serum Zn concentration between the intervention group and control group. No other possible adverse effect was reported.
Discussion
Inclusion and exclusion criteria
Besides Hb, we added serum ferritin as an outcome addressed in our systematic review. Serum ferritin was found to be more sensitive than Hb when measuring change in the status of Fe(Reference Cook, Lipschitz, Miles and Finch43, Reference Hallberg44). Research has shown that serum ferritin at 1 μg/l represents 8·0–10·0 mg body Fe stores(Reference Walters, Miller and Worwood45–Reference Jacob, Sanstead, Klevay and Johnson47). One latest systematic review also indicated the importance of this outcome(Reference Gera, Sachdev, Nestel and Sachdev48).
Methods of review
In terms of quality assessment, scales with multiple items and complex scoring systems were not supported by empirical evidence(Reference Jüni, Witschi, Bloch and Egger49). In our systematic review, we used quality-assessment criteria (including six items) established by the Cochrane EPOC review group based on threats to validity of studies(19). The criteria did not provide cut-points to define high-quality studies or low-quality studies. Considering that restriction to high-quality studies may exclude much information, while inclusion of low-quality studies may bias the summary effect estimate, we defined studies in which more than three items were regarded as ‘not done’ as ‘unacceptable’ in methodological quality and we did not include such studies in our analysis.
For continuous outcomes, usually analysis based on ‘change values’ is more efficient and powerful than comparison of final values as it removes a component of between-individual variability from the analysis(Reference Higgins and Green20). In our systematic review, all included studies only reported ‘final values’ and we could not compute sd for change value measurements because se, t value or p value was not provided. However, no substantial difference between groups in baseline measurements in each included study meant that the difference in mean final values would on average be the same as difference in mean change values. Thus comparison of change values could be assumed to be addressing exactly the same underlying effects as analysis based on final values(Reference Higgins and Green20). So we used final values to undertake the meta-analysis and did not impute standard deviation of change values using correlation coefficient between the pre-test and post-test variance.
Unit of analysis error, which is caused by ignoring cluster design effect when undertaking analysis at the individual level, existed in many cluster randomised controlled trials(Reference Simpson, Klar and Donner50–Reference Isaakidis and Ioannidis55). This mistake always leads to false positives, which Cornfield called a self-deceiving action(Reference Cornfield56). In a meta-analysis, cluster randomised controlled trials with unit of analysis error would have more narrow CI and thus would be given bigger weight mistakenly. In our systematic review, we performed approximate adjustment analysis for those trials with this kind of error (also we undertook sensitivity analysis for this adjustment).
Investigation of sources of heterogeneity will increase both the scientific and the clinical relevance of the results of meta-analyses(Reference Thompson57). Subgroup analysis and meta-regression are usual methods to explore heterogeneity of effect. It is very unlikely that meta-regression will produce useful findings unless there are at least ten studies(Reference Higgins and Green20). In our systematic review, we only undertook subgroup analysis in Hb outcome, as the number of studies included was less than ten. It has been suggested that the number of investigated variables should be small enough and the scientific rationale for investigating each characteristic should be ensured(Reference Higgins and Green20). We selected baseline Hb and intervention dose as the investigated variables and excluded two other variables, duration of intervention and form of intervention. A previous review indicated that 2 or 3 months should be a threshold to detect an association between duration of intervention and Hb effect(Reference Gera, Sachdev, Nestel and Sachdev48), while the duration was at least 3 months in all the included studies of our review. For most Fe salts, the absorption from supplements (such as tablets) is significantly higher than from fortified food, as the absorption of Fe is considerably inhibited by food vehicles such as wheat, maize and rice(Reference Layrisse and Martinez-Torres58, Reference Layrisse, Martinez-Torres, Renzi, Velez and Gonzalez59). On the contrary, it has been demonstrated that NaFeEDTA exchanges completely with food Fe in the lumen of the gut but with the characteristic that the absorption is higher than expected from other Fe salts used as Fe fortification(Reference Layrisse, Martinez-Torres, Cook, Walker and Finch60). This means the absorption of NaFeEDTA in the fortified form will probably not be different from the supplementation form, and thus different forms of intervention will not contribute to heterogeneity. For serum ferritin outcome, we did not even perform subgroup analysis, considering only four studies were included.
Funnel plots are a usual way to identify publication bias. Symmetry or asymmetry is generally defined through visual examination while visual interpretation may vary between observers(Reference Villar, Piaggio, Carroli and Donner61). In our systematic review, we used more formal statistical methods to examine publication bias in Hb outcome(Reference Begg and Mazumdar62, Reference Egger, Smith, Schneider and Minder63). For serum ferritin outcome, we did not undertake statistical testing because there is limited power to detect bias when the number of studies is small(Reference Higgins and Green20).
Results of analysis
The results of this systematic review showed that NaFeEDTA supplementation significantly increased both the Hb concentration and serum ferritin concentration of Fe-deficient populations. The differences from the placebo group of 8·56 g/l in final Hb and 1·58 μg/l in final serum ferritin were both substantial and of significance to public health. For the two interested outcomes, sensitivity analysis, which excluded cluster randomised controlled trials with unit of analysis error, showed robustness of the results. In subgroup analysis, a significant finding was the substantially higher increase in Hb values among those with a baseline Hb of < 120·00 g/l, which was supported by the evidence that lower Fe status could enhance Fe absorption(Reference Hunt64, Reference Fairweather-Tait and Teucher65). Contrary to expectation, no significant association was found between the dose of intervention and Hb response. However, it is possible that the data may have been inadequate to detect an association due to the small number of included studies.
As to safety of NaFeEDTA, neither effect on serum Zn nor other adverse effects were found in our systematic review. This was in accordance with safety assessment results (mainly based on animal and human experiments) from the Joint FAO/WHO Expert Committee on Food Additives and US Food and Drug Administration(66–68). The two institutions claimed that below the allowable dose, NaFeEDTA could be ‘generally recognised as safe’ or ‘safe’ when used for food fortification.
Limitations of analysis
Five limitations merit consideration. First, allocation concealment was not performed in one included study and was not clear in two included studies. Empirical evidence has shown that this is associated with bias(Reference Moher, Pham, Jones, Cook, Jadad, Moher, Tugwell and Klassen69). However, sensitivity analysis which excluded these three studies suggested that this bias was unlikely to materially alter the main results of our analysis (data now shown). Second, the results of meta-analysis in this review came from largely heterogeneous data derived from randomised controlled trials. Differences in such characteristics as age groups, baseline Hb levels and doses of intervention might have contributed to heterogeneity among included studies. However, we believe it was appropriate to combine data from heterogeneous studies in random-effect meta-analyses in our review because each study addressed the effect of NaFeEDTA on the outcomes of interest (Hb and/or serum ferritin) in Fe-deficient populations. We also undertook subgroup analyses to explore whether baseline Hb and intervention dose were significant predictors of heterogeneity in Hb outcome. Third, we used intracluster correlation coefficients from external sources (Health Survey for England 1994(Reference Colhoun and Prescott-Clarke42)) to perform approximate adjustment analysis for cluster randomised controlled trials with unit of analysis error. While the difference between the population in England and the population in developing countries possibly affected the results of adjustment analysis, sensitivity analysis, however, demonstrated that the results were robust. Fourth, because three included studies did not examine serum ferritin, we could only combine data from the other four studies which reported this outcome to assess the effect of NaFeEDTA on serum ferritin. Finally, two studies used tablets containing NaFeEDTA and the remainder used NaFeEDTA-fortified soya sauce, fish sauce and curry powder. Since none of the studies included cereals (wheat, maize, etc) as the vehicle for fortification, the results of our systematic review cannot be extrapolated to the use of NaFeEDTA in cereal products.
Implication for future studies
Effectiveness of NaFeEDTA for Fe deficiency has been validated in our systematic review. Future systematic reviews should be carried out to compare the effect of NaFeEDTA v. other commonly used Fe preparations (such as FeSO4) for Fe deficiency.
Conclusion
In summary, our systematic review found that NaFeEDTA increased Hb concentration and serum ferritin concentration substantially in Fe-deficient populations. Lower baseline Hb concentration was more likely to be associated with greater Hb increase. No possible adverse effect was found. The application of NaFeEDTA will probably play an important role in controlling Fe deficiency.
Acknowledgements
The present study was performed at the Department of Epidemiology and Biostatistics, School of Public Health (Beijing, China).
We have not received any external support. We thank Dr Jianping Liu (Beijing University of Chinese Medicine, China) for assistance in analysis methods. B. W. applied the search strategy, performed the retrieval of articles, extracted the data from the included studies, performed the statistical analysis, and drafted the manuscript. L. L. developed the idea for the review, conceived the study design, finalised the protocol, and provided critical input for the writing of the manuscript. S. Z. provided critical input for the design of the present study, the statistical analysis, the interpretation of data, and the early versions of the manuscript. Y. X. helped with the search strategy, data extraction and statistical analysis. All of the authors took part in the discussion of the results and contributed to the drafting of the final version of the manuscript. None of the authors had any conflicts of interest.