Hostname: page-component-5cf477f64f-54txb Total loading time: 0 Render date: 2025-03-31T01:38:09.786Z Has data issue: false hasContentIssue false

A critical assessment of matching-adjusted indirect comparisons in relation to target populations

Published online by Cambridge University Press:  21 March 2025

Ziren Jiang
Affiliation:
School of Public Health, Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN, USA
Jialing Liu
Affiliation:
School of Public Health, Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN, USA
Demissie Alemayehu
Affiliation:
Statistical Research and Data Science Center, Pfizer Inc., New York, NY, USA
Joseph C. Cappelleri
Affiliation:
Statistical Research and Data Science Center, Pfizer Inc., New York, NY, USA
Devin Abrahami
Affiliation:
Global Access and Value, Pfizer Inc., New York, NY, USA
Yong Chen
Affiliation:
Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA The Center for Health AI and Synthesis of Evidence (CHASE), University of Pennsylvania, Philadelphia, PA, USA
Haitao Chu*
Affiliation:
School of Public Health, Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN, USA Statistical Research and Data Science Center, Pfizer Inc., New York, NY, USA
*
Corresponding author: Haitao Chu; Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Matching-adjusted indirect comparison (MAIC) has been increasingly applied in health technology assessments (HTA). By reweighting subjects from a trial with individual participant data (IPD) to match the summary statistics of covariates in another trial with aggregate data (AgD), MAIC enables a comparison of the interventions for the AgD trial population. However, when there are imbalances in effect modifiers with different magnitudes of modification across treatments, contradictory conclusions may arise if MAIC is performed with the IPD and AgD swapped between trials. This can lead to the “MAIC paradox,” where different entities reach opposing conclusions about which treatment is more effective, despite analyzing the same data. In this paper, we use synthetic data to illustrate this paradox and emphasize the importance of clearly defining the target population in HTA submissions. Additionally, we recommend making de-identified IPD available to HTA agencies, enabling further indirect comparisons that better reflect the overall population represented by both IPD and AgD trials, as well as other relevant target populations for policy decisions. This would help ensure more accurate and consistent assessments of comparative effectiveness.

Type
Research-in-Brief
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of The Society for Research Synthesis Methodology

Highlights

What is already known:

  • Matching-adjusted indirect comparison (MAIC) methods are increasingly used in health technology assessment (HTA) submissions to adjust for population differences.

  • MAIC estimates the comparative effectiveness of interventions for the population represented by the trial with aggregate data (AgD).

What is new:

  • We present an illustration demonstrating an MAIC paradox in which the comparative effectiveness conclusions are reversed by switching the availability of IPD and AgD while adjusting the same set of effect modifiers.

  • Additionally, we examine how variations in covariate distribution overlap of effect modifiers between trials influence the estimated comparative effectiveness.

Potential impact for Research Synthesis Methods readers:

  • Through an illustrative example, we emphasize the vital importance of clearly defining the target population when applying MAIC in HTA submissions.

  • We recommend providing de-identified IPD to HTA agencies to enable a more accurate assessment of comparative effectiveness for the target population.

1 Background of MAIC

Effect modification occurs when the magnitude of the effect of a treatment on an outcome differs depending on the value of a third variable. For example, studies indicate that Black individuals may experience less favorable outcomes compared to non-Black individuals when treated with angiotensin-converting enzyme (ACE) inhibitor-based therapies.Reference Exner 1 , Reference Ogedegbe, Shah and Phillips 2 In health technology assessments (HTAs), pharmaceutical companies are required to benchmark their new drugs against the prevailing standard of care for reimbursement decisions by HTA agencies.Reference Dias, Sutton, Ades and Welton 3 Nevertheless, the presence of effect modifiers can pose a unique challenge in comparing treatments when there is a lack of head-to-head trials. The traditional Bucher method,Reference Bucher, Guyatt, Griffith and Walter 4 which compares the relative treatment effects of two interventions assessed in two randomized trials without covariate balancing, is limited to scenarios where all effect modifiers are balanced across trial populations.Reference Phillippo, Ades, Dias, Palmer, Abrams and Welton 5 When individual participant data (IPD) are available for one trial while aggregate-level data (AgD) are only available for the other trial, researchers introduced population-adjusted indirect comparison (PAIC) methods to obtain unbiased estimates of comparative effectiveness, particularly when imbalances in effect modifiers exist. PAIC methods include matching-adjusted indirect comparison (MAIC),Reference Signorovitch, Wu and Yu 6 simulated treatment comparison (STC),Reference Ishak, Proskorovsky and Benedict 7 and multilevel network meta-regression (ML-NMR).Reference Phillippo, Dias and Ades 8

Among these methods, MAIC is becoming increasingly popular and widely used in health technology appraisals, such as the National Institute for Health and Care Excellence (NICE) in the United Kingdom.Reference Phillippo, Dias, Elsada, Ades and Welton 9 In a recent methodological systematic review, 88.9% (144 out of 162) of PAIC studies used MAICs.Reference Truong, Tran, Le, Pham and Vo 10 MAICs estimate a set of balancing weights for each subject in the IPD trial such that the weighted summary statistics (e.g., mean and standard deviation) of covariates in the IPD trial match the reported summaries of the same covariates in the AgD trial. Then, MAIC compares the marginal treatment effect estimated using the weighted data in the IPD trial with the marginal treatment effect reported in the AgD trial.Reference Phillippo, Ades, Dias, Palmer, Abrams and Welton 5 , Reference Phillippo, Ades, Dias, Palmer, Abrams and Welton 11 For a more detailed description of the MAIC methods, readers can refer to the review paper by Jiang et al.Reference Jiang, Cappelleri, Gamalo, Chen, Thomas and Chu 12

Notably, the indirect comparison result is only valid with respect to the population represented by the AgD trial. However, this may not align with the population of interest for the company conducting the MAIC, which might prioritize the population represented by the IPD trial or another specific target population. This paper presents an illustrative example showing that MAICs can yield conflicting comparative effectiveness results when switching the availability of AgD and IPD between trials. This paradoxical phenomenon occurs due to differing magnitudes of effect modification for the two drugs by an effect modifier that is also imbalanced between the trial populations.

2 An illustrative example of the MAIC paradox

Consider an anchored indirect comparison between drug A (from Company A) and drug B (from Company B), each compared to a common placebo comparator C, as depicted in Figure 1. Each company has access only to IPD from its own trial and to AgD for the other company’s trial through published sources. For simplicity, we assume that race (Black versus non-Black) is the sole effect modifier, allowing both MAICs to include the “correct” effect-modifying variable. Additionally, we assume that drug A shows a stronger treatment effect among Black participants, while drug B is more effective among non-Black participants. If the AC trial includes a higher proportion of non-Black participants (among whom drug A is less effective than drug B), while the BC trial predominantly includes Black participants, we can observe a paradox: in separate MAICs, drug A outperforms drug B in the BC trial population, while drug B outperforms drug A in the AC trial population.

Figure 1 Indirect comparison of Drug A versus B in two trials. For the AC trial, we have the individual participant data (IPD). For the BC trial, we only have the aggregate level data (AgD).

Table 1 presents hypothetical data of the AC and BC trials. Let ${n}_{11}=800,{n}_{10}=400$ be the number of non-Black and Black patients in the AC trial, and ${n}_{21}=400,{n}_{20}=800$ denote the number of non-Black and Black patients in the BC trial, respectively. The MAIC performed by Company A calculates the weights ${w}_1$ for all non-Black patients and ${w}_0$ for all Black patients in the AC trial such that the weighted proportion of non-Black patients $\frac{n_{11}{w}_1}{n_{11}{w}_1+{n}_{10}{w}_0}$ matches the proportion of non-Black patients in the BC trial $\frac{n_{21}}{n_{21}+{n}_{20}}=\frac{1}{3}$ (see Table 1), subject to the constraint that weights sum to 1 (i.e., ${n}_{11}{w}_1+{n}_{10}{w}_0=1$ ). Solving the equation, we have ${w}_1=\frac{1}{3{n}_{11}}=\frac{1}{2400}$ and ${w}_0=\frac{2}{3{n}_{10}}=\frac{1}{600}$ . Based on the trial data, under the usual logit link, the estimated treatment effect (log of the odds ratio of the survival rate) for drug A versus drug C in the population of BC trial would be

$$\begin{align*}\log \left(\frac{\frac{80\times {w}_1+180\times {w}_0}{400\times {w}_1+200\times {w}_0}}{\left(1-\left(\frac{80\times {w}_1+180\times {w}_0}{400\times {w}_1+200\times {w}_0}\right)\right)}\right)-\log \left(\frac{\frac{40\times {w}_1+80\times {w}_0}{400\times {w}_1+200\times {w}_0}}{\left(1-\left(\frac{40\times {w}_1+80\times {w}_0}{400\times {w}_1+200\times {w}_0}\right)\right)}\right)=1.540.\end{align*}$$

Table 1 Results for the illustrative example. In this example, the risk difference in survival rate for Drug A versus Drug C in the AC trial is 10% for non-black patients and 50% for black patients. Treatment effect for Drug B versus Drug C in the BC trial is 40% for non-black patients and 20% for black patients

* logOR: Log of odds Ratio.

** Y: Outcome variable with Y = 1 indicating death and Y = 0 indicating survival.

*** n: Sample size.

The indirect comparison of drug A versus B can then be obtained by subtracting the estimated marginal treatment effect for drug B in the BC trial from this value, which gives $1.540-1.115=0.425$ . The corresponding standard error can be calculated using the robust sandwich estimator, which provides robust results given that the weights are estimated. The 95% confidence interval can then be constructed as $\left(0.048,0.802\right)$ , indicating that drug A is statistically significantly better than drug B in the population of the BC trial.

Similarly, for the MAIC performed by Company B with IPD and AgD switched, the weights can also be determined by matching the weighted proportion of non-Black patients in the BC trial (IPD trial) to the proportion in the AC trial (AgD trial). This gives ${w}_1=\frac{2}{3{n}_{21}}=\frac{1}{600}$ and ${w}_0=\frac{1}{3{n}_{20}}=\frac{1}{2400}$ . Performing the same calculations, the estimated comparative effectiveness for drug A versus drug B in the population of the AC trial is $-0.402$ with a 95% confidence interval $\left(-0.014,-0.790\right)$ , which indicates that drug B is significantly better than drug A in the population of the AC trial.

Here, we emphasize that both conclusions—drug A being more effective than drug B or B being more effective than A—are potentially valid within the context of the specific populations considered in this example. However, without a clearly defined target population, the results of the indirect comparison lack meaningful applicability for guiding medical decisions.

In the supplementary material Section S1, we show a step-by-step derivation of the weights. In Section S2, we further explore the impact of varying proportions of non-Black participants in the IPD and AgD populations separately on the MAIC results (see Figures S1a and Figure S1b). In addition, we present an example illustrating this paradox in unanchored MAIC in Section S3.

3 Discussion

In this manuscript, we highlight a potential paradox, referred to as the “MAIC paradox,” where both companies may claim the superiority of their drugs through MAIC, even when the same covariates, including all effect modifiers, are included in the analyses. This paradox arises when there are imbalances in effect modifiers with different magnitudes of modification across treatments. The key issue is the lack of careful consideration of the target population. If the target population is not clearly defined or appropriately selected for the context, results from MAIC may lead to misleading or contradictory conclusions. This emphasizes the need for Health Technology Assessment (HTA) appraisals to explicitly define the target population that is most relevant for policy decision-making to ensure valid and consistent results. A clear definition of the target population is crucial to avoid misinterpretations that may arise from the MAIC paradox. Additionally, population overlap can impact MAIC results. Indirect comparison results tend to be more consistent when there is a high degree of overlap between populations. Conversely, if neither the AC trial population nor the BC trial population is comparable to the target population, MAIC may not reliably estimate the most relevant treatment effect. A more detailed discussion is provided in Supplementary Material Section S2 .

The dependence of MAIC results on the target population has also been discussed in other literature as well. Following a review of NICE appraisals, Phillippo et al.Reference Phillippo, Dias, Elsada, Ades and Welton 9 observed that many appraisals overlooked the fact that comparative effectiveness was estimated over the population from the AgD trial, which might not represent the target population of interest. An example provided by the NICE DSU Technical Support Document 18Reference Phillippo, Ades, Dias, Palmer, Abrams and Welton 11 illustrates how contradicting conclusions may arise when both Novartis and AbbVie used MAIC to compare their drugs secukinumab and adalimumab: Novartis claimed significant efficacy advantages for secukinumab, while AbbVie argued that adalimumab had comparable efficacy but was more cost-effective. This discrepancy arose in part because two MAICs used different sets of covariates in their analyses, which complicated the understanding of the cause for inconsistency. In our manuscript, we show that such paradoxes can still occur even when the correct MAIC model is used with the same set of covariates.

The manuscript illustrates how the target population affects the results of MAIC. However, the results of other PAIC methods, such as STC, also depend on the target population. Additionally, extrapolating outside the IPD population increases the risk of bias in the case of STC, which is a broader concern for any regression-based or other adjustment method. In the presence of effect modifiers, indirect comparison results are only valid when the target population is clearly defined.

Phillippo et al.Reference Phillippo, Ades, Dias, Palmer, Abrams and Welton 5 proposed an additional sufficient condition for validly extrapolating the comparative effectiveness results to other populations, known as the “shared effect modifier assumption.” This assumption has two key components: 1) treatment effect modifiers are the same for all treatments, and 2) the magnitude of each effect modifier (i.e., how much it influences the treatment effect) is the same for all included treatments. Under the classical two-trial scenario, this assumption is statistically untestable, as only one trial provides IPD, leaving the other trial’s effect modifiers uncertain. Therefore, the validity of this assumption must only be demonstrated from a clinical perspective, relying on existing knowledge about the disease and treatments. This highlights the importance of ensuring clinical consistency and understanding of the treatments involved when applying MAIC or other PAIC methods.

Finally, we advocate for a collaborative effort among all relevant stakeholders to make de-identified IPD from clinical trials available through a trusted authority. Access to IPD from both trials would allow for a more robust approach to balancing the entire covariate distribution rather than just balancing the moments (such as means) of the covariates. This help mitigate the risk of ecological fallacy when drawing inferences between the two populations. However, sharing data must be done with the informed consent of trial participants, ensuring that appropriate de-identification protocols are followed to minimize the risk of participant re-identification.Reference Ohmann, Banzi and Canham 13 In addition to direct sharing of IPD, interim solutions could include the use of federated learning algorithmsReference Jordan, Lee and Yang 14 Reference Duan, Luo and Schuemie 17 and secure data-sharing infrastructures,Reference Raisaro, Marino and Troncoso-Pastoriza 18 which allows for the analysis of summary statistics without exposing sensitive individual data.

Author contributions

  • Ziren Jiang: Methodology; Software; Writing - review & editing; Writing - original draft; Visualization.

  • Jialing Liu: Writing - review & editing; Writing - original draft; Validation; Software.

  • Demissie Alemayehu: Writing - review & editing; Writing - original draft; Validation; Supervision.

  • Joseph C. Cappelleri: Writing - review & editing; Validation; Writing - original draft; Supervision.

  • Devin Abrahami: Writing - review & editing; Writing - original draft; Validation; Supervision.

  • Yong Chen: Supervision; Writing - review & editing; Writing - original draft; Validation.

  • Haitao Chu: Conceptualization; Writing - original draft; Writing - review & editing; Methodology; Investigation; Supervision.

Competing interest statement

Demissie Alemayehu, Joseph C. Cappelleri, Devin Abrahami, and Haitao Chu are employed by Pfizer and own stocks in their company. However, all the contents in this manuscript are strictly educational, instructive, and methodological, not involving any real medicinal intervention.

Data availability statement

The simulated data used in the illustrative example are presented in Table 1.

Funding statement

This work was supported in part by National Institutes of Health grants U01TR003709, U24MH136069, 1R01AG077820, R01AG073435, R56AG074604, R01LM013519, R01DK128237, R21AI167418 and R21EY034179 (Yong Chen).

Supplementary material

To view supplementary material for this article, please visit http://doi.org/10.1017/rsm.2025.10.

References

Exner, DV. Lesser response to angiotensin-converting–enzyme inhibitor therapy in black as compared with white patients with left ventricular dysfunction. N Engl J Med. Published online 2001.CrossRefGoogle Scholar
Ogedegbe, G, Shah, NR, Phillips, C, et al. Comparative effectiveness of angiotensin-converting enzyme inhibitor-based treatment on cardiovascular outcomes in hypertensive blacks versus whites. J Am Coll Cardiol. 2015;66(11):12241233. doi:10.1016/j.jacc.2015.07.021.CrossRefGoogle ScholarPubMed
Dias, S, Sutton, AJ, Ades, AE, Welton, NJ. Evidence synthesis for decision making 2: a generalized linear modeling framework for pairwise and network meta-analysis of randomized controlled trials. Med Decis Making. 2013;33(5):607617.CrossRefGoogle ScholarPubMed
Bucher, HC, Guyatt, GH, Griffith, LE, Walter, SD. The results of direct and indirect treatment comparisons in meta-analysis of randomized controlled trials. J Clin Epidemiol. 1997;50(6):683691.CrossRefGoogle ScholarPubMed
Phillippo, DM, Ades, AE, Dias, S, Palmer, S, Abrams, KR, Welton, NJ. Methods for population-adjusted indirect comparisons in health technology appraisal. Med Decis Making. 2018;38(2):200211. doi:10.1177/0272989X17725740.CrossRefGoogle ScholarPubMed
Signorovitch, JE, Wu, EQ, Yu, AP, et al. Comparative effectiveness without head-to-head trials: a method for matching-adjusted indirect comparisons applied to psoriasis treatment with adalimumab or etanercept. PharmacoEconomics. 2010;28(10):935945. doi:10.2165/11538370-000000000-00000.CrossRefGoogle ScholarPubMed
Ishak, KJ, Proskorovsky, I, Benedict, A. Simulation and matching-based approaches for indirect comparison of treatments. PharmacoEconomics. 2015;33(6):537549. doi:10.1007/s40273-015-0271-1.Google ScholarPubMed
Phillippo, DM, Dias, S, Ades, AE, et al. Multilevel network meta-regression for population-adjusted treatment comparisons. J R Stat Soc Ser A Stat Soc. 2020;183(3):11891210.Google ScholarPubMed
Phillippo, DM, Dias, S, Elsada, A, Ades, AE, Welton, NJ. Population adjustment methods for indirect comparisons: a review of National Institute for Health and Care Excellence technology appraisals. Int J Technol Assess Health Care. 2019;35(03):221228. doi:10.1017/S0266462319000333.CrossRefGoogle ScholarPubMed
Truong, B, Tran, LT, Le, TA, Pham, TT, Vo, T. Population adjusted-indirect comparisons in health technology assessment: a methodological systematic review. Res Synth Methods. 2023;14(5):660670. doi:10.1002/jrsm.1653.Google ScholarPubMed
Phillippo, D, Ades, T, Dias, S, Palmer, S, Abrams, KR, Welton, N. NICE DSU Technical Support Document 18: Methods for Population-Adjusted Indirect Comparisons in Submissions to NICE. Published online 2016.Google Scholar
Jiang, Z, Cappelleri, JC, Gamalo, M, Chen, Y, Thomas, N, Chu, H. A comprehensive review and shiny application on the matching-adjusted indirect comparison. Res Synth Methods. Published online February 21, 2024:jrsm.1709. doi:10.1002/jrsm.1709.Google ScholarPubMed
Ohmann, C, Banzi, R, Canham, S, et al. Sharing and reuse of individual participant data from clinical trials: principles and recommendations. BMJ Open. 2017;7(12):e018647. doi:10.1136/bmjopen-2017-018647.Google ScholarPubMed
Jordan, MI, Lee, JD, Yang, Y. Communication-efficient distributed statistical inference. J Am Stat Assoc. 2019;114(526):668681. doi:10.1080/01621459.2018.1429274.Google Scholar
Luo, C, MdN, Islam, Sheils, NE, et al. DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models. Nat Commun. 2022;13(1):1678. doi:10.1038/s41467-022-29160-4.Google ScholarPubMed
Duan, R, Ning, Y, Chen, Y. Heterogeneity-aware and communication-efficient distributed statistical inference. Biometrika. 2022;109(1):6783. doi:10.1093/biomet/asab007.CrossRefGoogle Scholar
Duan, R, Luo, C, Schuemie, MJ, et al. Learning from local to global: an efficient distributed algorithm for modeling time-to-event data. J Am Med Inform Assoc. 2020;27(7):10281036. doi:10.1093/jamia/ocaa044.Google ScholarPubMed
Raisaro, JL, Marino, F, Troncoso-Pastoriza, J, et al. SCOR: a secure international informatics infrastructure to investigate COVID-19. J Am Med Inform Assoc. 2020;27(11):17211726. doi:10.1093/jamia/ocaa172.Google ScholarPubMed
Figure 0

Figure 1 Indirect comparison of Drug A versus B in two trials. For the AC trial, we have the individual participant data (IPD). For the BC trial, we only have the aggregate level data (AgD).

Figure 1

Table 1 Results for the illustrative example. In this example, the risk difference in survival rate for Drug A versus Drug C in the AC trial is 10% for non-black patients and 50% for black patients. Treatment effect for Drug B versus Drug C in the BC trial is 40% for non-black patients and 20% for black patients

Supplementary material: File

Jiang et al. supplementary material

Jiang et al. supplementary material
Download Jiang et al. supplementary material(File)
File 299.7 KB