Medicines regulators have defined “orphan products” as those that treat patients with life-threatening or chronically debilitating “rare conditions” that have no satisfactory treatment.
The classification of a “rare condition” (Table 1) differs from country to country, but all classification systems cover a wide spectrum of diverse conditions, with 5,000–8,000 rare conditions in the European Union and United States.
The low prevalence of rare diseases presents the obvious challenge of paucity of patients to study, but other challenges also arise. Many rare conditions are genetic, metabolic diseases that are highly heterogeneous, so the understanding of natural history and demonstration of value of treatment at a population level is challenging. Regulators have created a range of initiatives to support development of orphan products, including scientific advice to help manufacturers develop study plans, conditional authorization/licensing and periods of market exclusivity that has led to good collaboration, among countries and stakeholders (Reference Aymé and Rodwell1).
Despite these initiatives, over the decade from 2000, only 108 European marketing authorization applications were made for orphan products. Of these, only sixty-three products were licensed for seventy-three indications in forty-six conditions (Reference Joppi, Bertele and Garattini2) and only thirty-eight (60 percent) provided multi-country randomized controlled trials (RCTs). One third of these products were tested in fewer than 100 patients.
As health technology assessment (HTA) is now often performed at the point of marketing authorization, the paucity of evidence available for orphan products at this time presents a challenge for traditional HTA processes. Indeed, a review of reimbursement decisions for twenty-five orphan products in Belgium showed that only thirteen (52 percent) included an RCT and that in many cases long-term effectiveness, safety, and optimal dose were unclear (Reference Dupont and van Wilder3). In some, hundreds of patients were evaluated in one or more uncontrolled trials, whereas better evidence would have been achieved from one well-designed RCT with good follow-up.
Traditionally, HTA Organizations have paid less attention to products for rare diseases than regulators, but this is changing with more health systems using HTA to assess or re-assess reimbursement of products for rare diseases. This is exemplified by the publication of new guidance relating to assessment of rare diseases in 2013 by the National Institute for Health and Care Excellence in England/Wales, the Ministry of Health and Long Term Care in Ontario, and the Ministry of Health in the Netherlands.
HTA organizations often apply different standards of assessment for diagnostics, devices and surgical interventions because the evidence base is limited, with few RCTs. However, all medicines seem to be judged by higher standards, whatever the feasibility of study. A systematic review found that of twenty-four HTA methods manuals, only five contained specific information about assessing rare diseases (4). These referred to general design and analytical issues associated with studying small populations, but were not specific to the particular issues of rare diseases.
Given the small, heterogeneous populations and high unmet need in the 1,000s of rare diseases for which there is no authorized product, there is an imperative for collaboration to advise on the design of medicines development programs to produce the evidence necessary to demonstrate clinical effectiveness in HTA. This requires an understanding of the developing work of regulatory agencies about clinical trials in small populations, a focus on outcomes relevant to HTA and consideration of issues related to modeling and interpretation of value.
The objective of this study is to summarize research methods that have particular relevance to the small, heterogeneous populations that have rare diseases so that evidence can be generated to inform robust HTA decisions.
METHODS
The authors of this study participated in a panel discussion at the HTAi 2012 Annual Meeting, discussing best HTA practices to assess the clinical effectiveness of therapies for rare diseases. This included a presentation of the systematic review of HTA methods for rare diseases (4), which led us to consider which research methods might be most valuable to generate sufficient evidence for HTA from the small, heterogeneous populations that are found in many rare diseases.
The HTAi panel session presentations were used as a basis for this study and augmented by a review of grey literature to identify policies on developing clinical research for rare diseases, with a focus on North America and Europe. This was augmented by specific methodological papers suitable for small populations and examples from patient experts.
RESULTS
This section presents research methods that may be appropriate for demonstrating the clinical effectiveness of a product for a rare disease, showing that even with small heterogeneous populations there are robust study designs and analyses that can generate evidence that is valuable for HTA.
Registries
Registries provide a framework for the systematic generation and collection of long-term data about a condition. For rare diseases, where there is a paucity of information, they can provide valuable evidence of natural history and longer term outcomes such as mortality in a real-world setting. Although the challenges of using uncontrolled data are well known, such long-term information can be important inputs to the economic modeling of HTA. To make best use of the data, a standardized database infrastructure is needed across regions/countries to allow pooling of data, with recognition that minimum datasets should not burden physicians with unnecessary data collection requirements. For examples, see www.orpha.net.
In addition, patients and families often seek to input their own data to the records held on a clinic registry. This can give invaluable insights into the impact of the condition in the real world, thus adding an important extra dimension when assessing novel interventions for HTA.
RCTs
Double-blind RCTs are the gold standard for demonstration of efficacy in common diseases. However, for an extremely rare disease there may be insufficient patients across many countries to provide the power required to make an RCT worthwhile. For example, the Progeria Research Foundation website indicates that they identified just fifty-four children in thirty countries who have progeria syndrome.
Traditional (fixed sample size) RCTs can be modified using sequential methods to perform interim analyses that will allow a trial to stop early in light of a large difference among treatments or futility (Reference Whitehead5). The savings in sample size gained compared with a fixed sample size trial depend on the sequential design chosen and the size of the actual treatment effect compared with that expected treatment effect (the larger the actual treatment effect, the earlier the trial will stop). Alternatively more information can be obtained from a traditional RCT by applying a randomized withdrawal design (randomizing patients at the end of treatment to stay on treatment or to come off treatment) to determine maintenance of effect.
An innovative three-stage RCT design has been specifically designed for treatments of rare diseases that increases study power and ensures that everyone who needs it gets active treatment (Reference Honkanen, Siegel and Szalai6). In stage 1, eligible subjects are randomized into a parallel-arm, placebo-controlled phase. In stage 2, subjects who responded to study treatment in the first stage enter into a randomized withdrawal phase. In stage 3, placebo-treated patients who did not respond in Stage 1 are placed on active treatment and responders are randomly assigned to treatment or placebo.
Table 2 outlines the concepts of two special forms of designs (adaptive and multiple N-of-1) that are gaining in popularity and may be relevant for use in rare diseases. They often require fewer patients than their fixed sample size equivalent and maintain important features such as randomization and blinding to reduce potential bias.
Adaptive Designs
Adaptive designs are an extension of sequential designs that permit a range of items to be altered between Stages 1 and 2 (Reference Dragalin7;8) and so are particularly valuable in rare diseases where the natural history is not well characterized (see Table 2). For example, in the case of a rare disease, a Stage 1 endpoint may be a marker such as response rate or time to disease progression. Based on the results of Stage 1, some of the inferior arms can be dropped. In addition, if some of the arms are promising, randomization can be altered to assign more patients to the superior arms (play-the-winner). This allows a combination of phase 2 and phase 3 studies in a much shorter timeframe than is the traditional case.
In advance of the conference on adaptive trial design that launched the FDA guidance (8), an FDA official stated the following advantages of adaptive designs “they tell us more about safety and benefits of drugs, in potentially shorter time frames, exposing fewer people to experimental treatments and resulting in clinical trials that may not only be more efficient but are more attractive to patients and their physicians to enroll in.”
However, several key issues need to be considered during the study planning process for an adaptive trial. To maintain the integrity of the trial all potential adaptive design decisions should be specified before starting the trial and the total number of patients recruited will depend on the choices at Stage 1 and outcomes at Stage 2. Furthermore, it is important to maintain blinding. This could be challenging when the treatments to which patients may be allocated are altered between the stages, but this is feasible when undertaken by means of a well organized independent unit or body as is the case in a pharmaceutical setting. Then analyses must take account of the interim analysis at Stage 1. So it is important to assess the level of difficulty that will be necessary to implement the adaptive design and whether the implementation challenges are worth the costs. Also, for treatments where responses are delayed, an adaptive design may not be ideal because changes would be difficult to make without observing the responses.
N-of-1 Trials
An N-of-1 trial involves offering a patient multiple episodes of active or placebo treatment in a double-blind, randomized manner, while regularly measuring key endpoints (Reference Sackett, Haynes, Guyatt and Tugwell9). It can be used to establish in a rigorous way whether a specific patient sufficiently benefits from a particular treatment. When results of similar N-of-1 trials in several patients are analyzed, this may provide evidence of treatment effectiveness at a group level. This can be done using standard meta-analysis techniques, linear mixed models (repeated measures models), or Bayesian hierarchical models taking account of within patient, between patient and random variation (Reference Zucker, Ruthazer and Schmid10).
The repeated measures models provide improved within-patient precision compared with standard meta-analytic techniques, but their complex variance structures may require more patients or more periods of observation. Bayesian models use a different approach, which does not rely on the hypothesis testing/confidence intervals paradigm, but allows determination of the posterior probability of whether an effect is beneficial. This is intuitively appealing, but such analyses can be complex and are sensitive to prior assumptions. Further details of the specific types of model that were fitted to 46 N-of-1 trials are provided in Zucker et al. (Reference Zucker, Ruthazer and Schmid10). Another example shows that a Bayesian model in just six N-of-1 trials in children with idiopathic arthritis could be used to provide an estimate of population effect and it notes that such techniques may be particularly suited to rare diseases (Reference Huber, Tomlinson, Koren and Feldman11). A major limitation of N-of-1 trials is that they require treatments that act, and cease to act, quickly and for biologic formulations the potential for immunogenicity may have a lasting effect on later periods of treatment.
Outcomes
A variety of outcomes may be measured in rare diseases including laboratory markers, symptom response, patient-reported outcomes and long-term clinical outcomes. The choice of outcome depends on the disease. The challenge is that in genetic, rare diseases, there is often heterogeneity in disease progression and in response to treatment, which cannot be linked to a specific cause. This leads to a lack of consensus about the most important outcomes to study and the size of benefit that is important. This is exemplified by a recent HTA of therapies for Gaucher's disease, which described a range of “potentially beneficial effects” and improvements across a wide range of outcomes including hematological markers and skeletal improvement but it was unclear how these effects translated into patient wellbeing (Reference Connock, Burls and Frew12).
To address this problem, patients and families provide unique knowledge about the disease (Reference Hansen and Lee13) and can help identify outcomes that are most important in terms of functioning and wellbeing. Therefore, they should be involved in the design of studies to ensure that outcomes are being studied that matter to them (patient relevant outcomes).
Patient-reported Outcomes
Structured instruments may be used to measure functioning and wellbeing by means of quality or life (QOL)/patient-reported outcome (PRO) measures. These instruments allow a patient to evaluate their health in terms of the impact a given health state has on the ability to function and enjoy life (Reference Devlin and Appleby14). PROs are particularly important in rare diseases, but the measures used need to be validated in the languages used by study participants, provided in forms suitable to the cognitive abilities of the patient and take account of the cultural norms of the country. The PROQOLID database describes over 1,000 PRO and QOL instruments, with a specific section on congenital hereditary diseases that could be a useful source for rare diseases.
If a PRO measure is not available for a specific disease, one can be created and validated, even in the small populations that have a rare disease. An example is the development of the R-Pact instrument that measures limitations in activities and social participation of patients with Pompe's disease (Reference van derBeek, Hagemans, van der Ploeg, van Doorn and Merkies15). Just 186 patients were required to develop this scale, but they were studied for six years to ensure the responsiveness of the measure.
Trials may not be powered to detect effects on PROs/QOL (Reference Wyatt, Henley and Anderson16) if investigators use unresponsive instruments. If a substantial number of patients are lost to follow-up, risk of bias increases. It is therefore essential to use a responsive disease specific questionnaire with efforts made to make data collection as complete as possible. Then it is necessary to understand the implications of the effect on the PRO. This has particular importance in the case of rare diseases, where the choice of the appropriate outcomes for study may be in question and where there may be very limited clinical expertise.
One approach to enhancing interpretability of PROs/QOL measures is to evaluate the smallest difference that patients consider important for each domain of the instrument. This “minimally important difference” (MID) can be determined by relating changes in QOL to a global scale of change (lot worse, little worse, same, little better, lot better). For those who report positive or negative changes on the global scale, the difference between the baseline and follow-up QOL measure establishes the MID. Using this methodology for two questionnaires in respiratory diseases, 0.5 was consistently the MID on a 7-point scale (Reference Guyatt, Juniper, Walter, Griffith and Goldstein17). A moderate difference was identified as 1.0 and a change of greater than 1.5 was large.
QOL is generally analyzed by comparing the mean difference between baseline and follow-up. A complementary analysis is to compare the proportion of patients that achieve the MID (or who deteriorate more than the MID) between treatment groups. The difference in proportions that have had a minimally important improvement and the difference in proportions in those who had a minimally important deterioration can then be determined. Such an analysis was used in an RCT of seventy-eight patients with chronic respiratory disease who were assessed at baseline and 6 months (Reference Guyatt, Sackett and Taylor18). This demonstrates its potential for use in small populations such as rare diseases to complement the more powerful continuous data analysis that establishes whether there is a true treatment effect.
Another example of the effective use of the MID in small numbers of patients comes from a series of 27 N-of-1 randomized trials in patients with respiratory disease. The MID was used both in the interpretation of the pooled data, but most importantly in the definition of individual patient response in each of the 27 N-of-1 trials.
Qualitative Research Sub-studies
Patients, their families and carers/carer-givers have unique knowledge about living with a condition that becomes more important when clinical expertise is limited, as is the case in rare diseases. They can explain how a rare disorder has altered function and outlook, the limitations that the disease and its treatment places on daily/family life and can explain the most difficult aspects of the disease. Patients can explain the benefits and unwanted effects with existing treatments (such as impact on daily living, mode of administration, challenging side effects, costs associated with illness, etc). They can also indicate unmet needs with current therapies and requirements for new treatments (Reference Facey19).
These views and experience can be elicited by means of qualitative research, which is the systematic collection, organization and interpretation of text from talk or observation using robust theories and approaches (Reference Malterud20). Such research seeks to identify common themes and different perspectives among a range of patients. This can help validate models and assumptions used in models submitted to HTA and provide information that can aid interpretation of benefit, risk, and issues with service delivery that contribute to the determination of the value of a technology.
Qualitative research can be undertaken with a relatively small number of patients and so is highly applicable to rare diseases. In particular, qualitative research may be performed in a sub-study within a clinical trial thus making best use of the limited patients available for study (Reference Hansen and Lee13).
DISCUSSION
Regulatory initiatives have stimulated the research, development, and authorization of orphan products, but equitable and timely access to the treatments that treat rare diseases remains an issue (Reference Aymé and Rodwell1). The low prevalence and heterogeneity of rare diseases means that there is a paucity of evidence available for HTA. However, there is little recognition by HTA organizations that treatments for rare disorders need different forms of assessment (Reference Kruer and Steiner21).
Study designs that allow early stopping if a treatment is highly effective or ineffective or adapt part way through to assign patients to a particular treatment arm or consider maintenance of effect are particularly valuable for heterogeneous, rare diseases to maximize the use of limited patients and combine phase 2 and phase 3 questions. However, to provide robust evidence for HTA they must be carefully conducted to ensure maintenance of blinding and appropriately analyzed to overcome biases due to multiple testing. Furthermore, statistical complexity needs to be balanced against the need for clinically relevant and interpretable results (22) that are understood by HTA organizations.
When considering the evidence base for HTA, clinical trials are just one element. Registries can provide valuable information to help characterize disease progression. A range of European Union initiatives (Reference Aymé and Rodwell1) are seeking to promote registries for rare diseases. Ideally these should have common structures to enable combination, not just across Europe, but internationally. Furthermore, they should be designed by all stakeholders—HTA and regulatory agencies, clinicians, patients, and industry.
There is potential for greater participation of patient organizations in rare disease clinical research (Reference Field and Boat23), with some patient organizations funding research into epidemiology, clinical endpoints, ethical, legal, and social issues and working collaboratively to support registries, tissue banks, research centers, clinical trials; and sharing scientific information among partners. We must ensure that such collaborations extend to an understanding of HTA requirements and full engagement in the HTA process.
New forms of research using social networking, blogs and discussion fora are also emerging. When there are few patients with a disease in a country, these can be valuable resources for patient organizations to connect with patients worldwide to help describe the disease and its impact. The International Pompe Association used social networking to generate a picture of their disease. They gave patients around the world the opportunity to take part in an evidence generation exercise, which both allowed families to endorse or disagree with the input of others with the condition, and to input their own information for others to add to. This led to the rapid development of a perception of the common elements of the condition and a view of the heterogeneity that was crucial when drug development got under way.
Disease-specific PROs are particularly valuable to understand the social value of products, but generic instruments should also be used to support comparisons with other health conditions.
Another form of evidence that could be key for HTA of rare diseases is qualitative research of patients’ and carers’ perspectives. However, few HTA Organizations invest in obtaining this evidence or know how to appraise it alongside the more traditional evidence on clinical and cost effectiveness. Patients can also act as expert witnesses to explain the relevance of an effect. This is particularly crucial in the HTA of rare diseases, where there are few clinical experts and poor understanding of the disease and its impacts. A seemingly small benefit in a well-known endpoint for someone who is very ill may indicate an important improvement for the patient (e.g., 4 percent increase in Forced Expiratory Volume leading to major reduction in need for assisted ventilation).
An example of the importance patients’ perspectives on decision making was shown in the Dutch reassessment of Fabrazyme and Myozyme in September 2012. An assessment of cost effectiveness led to a recommendation to discontinue reimbursement. However, in a public hearing international patient experts explained the heterogeneity of response and highlighted that for some patients the products were highly effective, but that the patients in whom it would be effective could not be identified in advance. This led to a recommendation to use of N-of-1 trials in patients and for reconsideration of the case for removing reimbursement.
The discontinuity between regulatory and HTA processes is recognized, but initiatives are under way in Europe to improve regulatory and HTA collaboration in relation to rare diseases. The Clinical Added Value of Orphan Medicinal Products-Information Flow (CAVOMP-IF) initiative aims to develop a common view of clinical trial design, improve evidence generation and exchange activities post authorization to meet both risk:benefit and HTA requirements, but the breadth of stakeholder involvement in this is questionable.
Meanwhile, in the United States work is under way to improve trial designs, analyses and evidence generation for rare diseases (Reference Field and Boat23) and the International Rare Diseases Research Consortium, (which includes public and private research funders, scientists, patient organizations and regulatory agencies from around the world) is identifying priority research areas and seeking to address regulatory challenges for rare diseases. It is of concern that this international consortium makes no mention of HTA and we recommend that those interested in HTA seek to engage in this work to ensure evidence generation of future clinical trials in rare diseases is fit for HTA.
Policy Implications
To ensure consistent and transparent approaches to the HTA of rare diseases, there is a need to gain international agreement on the evidentiary requirements for clinical effectiveness assessments of rare diseases that is accepted by all stakeholders. This should include guidance on novel trial designs suitable for small populations, appropriate outcomes and analyses, and should link more closely with the work of regulators in terms of generation of pre-licensing and post-licensing data. Particular focus should be given to the need for targeted and robust patient evidence in terms of qualitative research and PROs. These efforts should include all stakeholders, particularly patients, and be standardized internationally to enable evidence generation for orphan products that suits not only regulatory, but also HTA needs.
Greater collaboration is also needed to agree how the value of products to treat complex, heterogeneous rare diseases will be assessed in HTA taking account of broader ethical and societal issues and the uncertainty that will always be inherent in these populations.
CONTACT INFORMATION
Karen Facey ([email protected]), Honorary Senior Research Fellow, Department of Health Economics and HTA, Institute of Health and Wellbeing, University of Glasgow, Woodlands Lodge, Buchanan Castle Estate, Drymen, G63 0X, UK
Alicia Granados, Head of Global HTA, R&D, Global Medical Affairs, EVD. Genzyme a Sanofi Company and Autonomous University of Barcelona, CBT. Denia, 32-34 Barcelona 08006, Catalonia, Spain
Gordon Guyatt, Distinguished Professor, McMaster University, Faculty of Health Sciences, Clinical Epidemiology & Biostatistics, 1280 Main Street West, Hamilton, Ontario L8S 4K1 Canada
Alastair Kent, Genetic Alliance UK, 4D Leroy House, 436 Essex Road, London N1 3QP, United Kingdom
Nilay Shah, Associate Professor of Health Services Research, Mayo Clinic, 200 First St. SW, Rochester, MN 55905. United States of America
Gert Jan van der Wilt, Professor of Health Technology Assessment, Department for Health Evidence (133), Radboud University Medical Centre, Geert Grooteplein-Zuid 10, 6525 GA Nijmegen, the Netherlands
Durhane Wong-Rieger, President & CEO, Canadian Organization for Rare Disorders 151 Bloor Street West, Suite 600 Toronto, Ontario M5S 1S4 Canada
CONFLICTS OF INTEREST
Dr. Facey reports grants from Sanofi-Genzyme, during the conduct of the study; personal fees from MerckSerono, personal fees from Bridgehead, personal fees from Novartis, personal fees from Pfizer, personal fees from Bayer, personal fees from Medicys, personal fees from Allergan, personal fees from Abbott, grants and personal fees from Lilly, personal fees from Lifecell, personal fees from Takeda, personal fees from Deerfield, personal fees from PRMA consulting, personal fees from GSK, personal fees from Otsuka, outside the submitted work; and Member of Scottish Government's Technical Advisory Group on Resource Allocation and Scottish Health Technologies Group. Dr. Granados reports other from Genzyme, during the conduct of the study; other from Genzyme, outside the submitted work. Dr. Guyatt reports grants from Sanofi Genzyme Corporation, outside the submitted work. Dr. Kent reports grants from Genzyme, grants from GSK, grants from Pfizer, grants from Shire, grants from Aegon, grants from Actelion, grants from Alexion, grants from Biomarin, grants from MSD, grants from Raptor, grants from Shire, grants from Sigma Tau, grants from Viropharma, personal fees from genzyme, personal fees from GSK, from null, outside the submitted work. Dr. Shah has nothing to disclose. Dr. van der Wilt reports personal fees from Genzyme, outside the submitted work. Dr. Wong-Rieger reports personal fees from Sanofi-Genzyme, during the conduct of the study; grants and personal fees from Novartis, grants and personal fees from Pfizer, grants from Janssen Inc., grants from Merck, grants from GSK, grants from Shire, grants from Alexion, grants from Actelion, grants from BioMarin, grants from Viropharma, outside the submitted work.