Skip to main content Accessibility help
×
Hostname: page-component-f554764f5-qhdkw Total loading time: 0 Render date: 2025-04-23T00:35:32.378Z Has data issue: false hasContentIssue false

1 - “Can It Work? Does It Work? Is It Worth It?”

from Part I - Fundamentals

Published online by Cambridge University Press:  13 March 2025

Karen B. Schmaling
Affiliation:
Washington State University
Robert M. Kaplan
Affiliation:
Stanford University

Summary

In the middle of the last century, Archie Cochrane, one of the founding fathers of evidence-based medicine, argued that understanding healthcare treatments required the consideration of three questions: “Can it work?”, “Does it work?” and “Is it worth it?” Each of these questions addresses a different aspect of the problem and requires different assumptions and different research methodologies. Understanding if a treatment can work establishes proof of principle derived from efficacy studies that control who takes the treatment, how it is administered, and how outcomes are measured. The question “Does it work?” is about effectiveness that is evaluated under conditions of the usual care. Randomized controlled trials, which form the core of efficacy research, are difficult to employ in the evaluation of effectiveness. Even if interventions are shown to be efficacious and effective, people need to decide if accepting the treatment is worth it. Healthcare can be expensive, inconvenient, painful, and sometimes of little value. This introductory chapter reviews the three questions and prepares the reader for the in-depth discussion of these issues in the following 16 chapters.

Type
Chapter
Information
Rethinking Clinical Research
Methodology and Ethics
, pp. 17 - 34
Publisher: Cambridge University Press
Print publication year: 2025

Sir Archie Cochrane, the evidence based–medicine guru we met in the Introduction, characterized the usefulness of health services according to effectiveness, efficiency, and the balance of costs and benefits.Reference Cochrane1 There is considerable wisdom in this formula, which recognizes that the value of a treatment lies not only in its potential to promote health but also in the realization of better health and patient satisfaction – and at reasonable cost, as measured in both adverse effects and financial demands.

In an influential 1999 article, the noted epidemiologist Brian Haynes refashioned Sir Archie’s formula in sharper terms.Reference Haynes2 What we wish to know about a treatment is threefold: “Can it work? Does it work? Is it worth it?” In Haynes’s telling, and as our colleague Franz Porzsolt has argued,Reference Porzsolt3 these questions are equally important, yet clinical researchers tend to focus overwhelmingly on the first. Seeking proof of principle under controlled conditions, scientists use randomized controlled trials (RCTs), also known as efficacy or explanatory studies, to show that treatments can work – that there is a plausible relation between potential treatments and desired outcomes. With that done, they move on to new experiments aiming to answer more “Can it work?” questions.

What remains is the no less crucial, yet often overlooked, matter of whether the now proven principle leads to any useful change. Whether a treatment works in practice cannot be determined by experiments, because life doesn’t happen under controlled conditions. The “Does it work?” question must be tackled through pragmatic effectiveness trials and observational studies conducted in everyday settings. And, because resources are finite and ethics enjoin health professionals to do good on behalf of everyone, we cannot always be satisfied with even a treatment that works. Researchers still have to ask whether useful treatments are worth pursuing, given their costs to individuals, health systems, and societies. To answer this question, we must apply economic methods, such as cost-effectiveness and cost-utility analyses. It’s not enough to know that a treatment can work; we also want to know that its recipients lead longer, better lives, and that those in need are able to access it at acceptable cost.

Let’s dig into the foundations of clinical research with Haynes’s three questions as a guide. A firm appreciation of these foundations is essential for all healthcare investigators. For, no matter our specializations, we all have the same overarching goal. We don’t do clinical research to prove a point. We do clinical research to improve everyone’s health and quality of life.

Can It Work?

There is much more to evidence-based healthcare than experiments, but, no doubt, experimental trials are critically important. It is through experimental work that foundational research – basic mechanistic studies, animal studies, case studies – is transformed into something of clinical relevance. So we begin with clinical trials, the means of answering the first of our three equally significant questions: Can it work?

The answer in most cases will be provided by an RCT, but this is not the researcher’s first step. That is the phase I trial, the purpose of which is to determine that a possible treatment – perhaps an agent that has shown promise in animal studies – is presumed safe for administration to humans. Because we are not concerned at this point with whether the treatment can work, phase I trial participants typically comprise healthy people. For example, a phase I trial of a new drug intended to treat cancer would study its safety in people who don’t have cancer.

Only after a treatment has proven safe in phase I trials can it be tested on humans via a RCT. The goal of an RCT is to assess whether treatments are both safe and effective for the people who stand to benefit from them. Given this, study participants are not healthy people, as in phase I trials, but people needing care.

RCTs: An Introduction to the “Gold Standard” in Clinical Research

The RCT is widely considered the gold standard for demonstrating a treatment’s efficacy, mainly because it is structured to isolate possible confounding variables and to prevent bias. We discuss strengths, weaknesses, and proper design and utilization of RCTs throughout this book, providing detailed consideration of biases that do in fact affect RCTs. For introductory purposes, we here discuss RCTs in broader strokes.

When evaluating if a treatment can work, study details are important. Many study-design decisions will affect RCT results. Most basically, researchers must decide who will participate in the trial, how to assign participants to treatment and control groups, what the treatment and control parameters will be (e.g., how much of a drug to administer for what period of time and whether to administer a placebo to the control group), and which outcome variables are of interest. All of this precedes statistical analysis of the data from the study, a process over which researchers also have a good deal of control, and one that can dramatically influence the findings, how the findings are interpreted, and therefore whether treatments have evidence of efficacy. The location where a study is conducted may also be of interest as we evaluate results. For example, studies conducted in the developing world and in Eastern Europe often report lower rates of death and other adverse events than do studies conducted in North America.Reference Mentz, Kaski and Dan4

The choice of outcome variable is crucial to the construction of RCTs: The study result is the difference between the treatment and control groups with respect to the outcome variable. Good practice in RCTs is to identify one primary outcome variable, which should be the most significant manifestation of the condition being studied. The outcome variable could have a binary character, such as mortality. Alternatively, outcome variables could be measured on a continuous scale. For example, an obesity study might focus on weight or body mass index (BMI) as the primary outcome variable.

The choice of outcome variable may depend on, among other things, whether the goal of the trial is to demonstrate the superiority or the noninferiority of an intervention. The superiority trial is perhaps the most familiar variety of RCT: Its purpose is to see if the treatment is better than nothing, as represented by, for instance, a placebo or delayed treatment. In such cases, the outcome variable of interest is likely to be symptom remission. Noninferiority trials, in contrast, compare a new treatment against the best-known treatment: Their purpose is to see if the new treatment is at least as efficacious as, or equivalent to, the established treatment.5 In such cases, it is certainly important to know whether the new treatment produces as much symptom remission as the existing treatment, but we may also be interested in whether the treatments vary in their side effects and costs, and whether patients are more satisfied with one or the other option. While noninferiority trials have their place, they can also provoke suspicion. Sometimes they are used for purely commercial purposes: to show that a “new” treatment – very similar to existing treatments – has efficacy, in order to make available an unneeded but marketable treatment.

A key to the validity of RCTs is that the treatment and control groups are equivalent before treatment begins. The individuals comprising the study sample are randomly assigned to each group, so that the groups do not differ on average. More precisely, any differences between groups are due to chance, and the probability that they differ is small and specifiable. The groups will differ by chance in 5 of every 100 studies, if the level of statistical significance is set at a probability value of 5 percent (p = 0.05). If random assignment fails and the groups do differ, these differences can interfere with results: It may be that other variables, rather than effects of the treatment, are responsible for the outcome. To test if the treatment and control groups are in fact equivalent, researchers sometimes compare the groups at baseline or prior to the administration of treatment. Reflecting their importance in study design, such comparisons are usually reported early in the results section of a paper describing an RCT: Only when researchers are convinced that the treatment and control groups are truly equivalent can the study results be attributed to the treatment.

Who exactly shows up in the treatment and control groups depends on who is admitted to the study generally. Participant selection is foundational to clinical trials of every variety – not just RCTs – and significantly influences results. For this reason, the specifics of inclusion and exclusion criteria are enormously important. Because these criteria shape the trial participants – determining whether that sample is actually representative of the “real-world” population needing treatment – their purpose should be clear,Reference Manabe, Haruma and Ito6 and the rationale underlying them should always be specified.

Inclusion and exclusion criteria can seriously bias studies, including RCTs, and variability in the criteria across trials can make studies difficult to compare. We discuss the issues of RCT sample representativeness and biased criteria in detail in Chapter 5. For now, let’s consider one example that should give a sense of just how much is at stake in participant inclusion and exclusion. We’ll use this same example throughout this chapter as we tour the clinical research process in light of our three questions.

Gastroesophageal reflux disease (GERD) is a prevalent ailment significantly affecting quality of life. Understandably, there is much interest in treating the condition, whereby stomach acid flows back into the esophagus, causing so-called heartburn (chest pain) and potentially increasing the risks of developing other serious conditions over the long term. We devote a whole case study to trials of GERD medications later in this book, but here is a preview, focused first on trial participant issues. Our case study unpacks a meta-analysis of RCTs studying the safety and efficacy of omeprazole, a drug administered for treatment of GERD. Although all studies included in the meta-analysis seek to answer the same question – is omeprazole a safe and effective GERD treatment? – the studies are not truly comparable, because they consider highly variable samples. One trial recruited as participants individuals who had experienced symptoms for at least a month and at least two of the previous seven days.Reference Bate, Green and Axon7 Another trial welcomed those with symptoms for at least a year and for four of the previous seven days.Reference Richter, Peura, Benjamin, Joelsson and Whipple8 Another trial required that participants experience symptoms for at least three years.Reference Hosseini, Salari, Akbari Rad, Salehi, Birjandi and Salari9 Still another specified no minimum.Reference Bate, Griffin and Keeling10 Exclusion criteria varied even more than inclusion criteria. This variability likely resulted in fundamentally different samples among the trials.

RCT participants are not assembled randomly. They are determined by inclusion and exclusion criteria. Such criteria may be chosen in order to bias outcomes – in particular, to maximize the likelihood of favorable outcomes in the treatment group. Even where there may be good and innocuous reasons to exclude people with certain characteristics, doing so could bias a study and invalidate its results.

Consider that, in some omeprazole RCTs, researchers have excluded potential participants who took aspirin and nonsteroidal anti-inflammatory drugs (NSAIDs) such as ibuprofen. This exclusion arguably makes sense because these medications cause some patients gastric irritation, which could be misattributed to GERD.Reference Richter, Peura, Benjamin, Joelsson and Whipple8 Yet these medications are commonly used to remedy everyday aches and pains. Excluding aspirin and NSAID users from these RCTs made it more likely that the studies would show benefits of omeprazole treatment but also meant that results would be applicable only to the minority of people who don’t use aspirin and NSAIDs.

This example of how exclusion criteria can bias outcomes and limit the value of results leads us to consider the larger issue of the comparability of study participants to everyone with the relevant disorder. Study designers need to ask themselves: Are my trial participants typical of those with the disorder? Even if participant selection criteria are not designed, per se, to bias outcomes, there may be systematic differences between RCT participants and the general population experiencing the disorder. For example, we found significant differences when comparing omeprazole RCT participants to participants in population-based studies of omeprazole treatment – that is, studies of everyday users in routine clinical practice, who were not selected for conformance with inclusion and exclusion criteria. The RCT population was younger, less likely to be obese, and more likely to drink alcohol than people with GERD in the population-based studies. In other words, the patients treated in omeprazole RCT studies systematically differed from the broader population of GERD sufferers.

This is a concern for two reasons. First, the results of the RCTs may not generalize to everyone treated with omeprazole. Second, participant characteristics may have biased outcomes, producing more positive results than would be found if typical persons had been enrolled. For example, obesity is known to increase the risk of GERD and to exacerbate and perpetuate symptoms.Reference Boeckxstaens, El-Serag, Smout and Kahrilas11 Excluding obese participants from RCTs can produce a bias toward more favorable outcomes.

To be clear, the purpose of this example is not to suggest that any trial with inclusion and exclusion criteria is invalid. Such criteria may well be necessary in order to manage confounding variables. But the particular sample-shaping choices one makes also necessarily influence trial results. Unfortunately, there is no foolproof solution to this dilemma. Researchers can use statistical controls, or “covariates,” to estimate the contribution of confounding factors to outcomes, but these statistical adjustments, while often informative, are known to be imperfect.

Perhaps the right conclusion to draw is that research consumers need to be careful interpreters of study findings, attentive to the effects of inclusion and exclusion criteria. In addition, we must recognize that RCTs don’t answer every important question about treatment efficacy. They tell us how well a treatment works for the people in the study; they do not necessarily tell us how well a treatment works for the whole population that may use it. This disconnect is a core theme of the book and speaks to the need for multiple approaches to the study of treatment efficacy and effectiveness.

With the importance of study participants in mind, let’s turn briefly to the results that an RCT produces – an answer to the “Can it work?” question. We mentioned earlier one of the key goals of such studies: to examine changes in outcome variables in order to determine whether treatments are associated with symptom improvement or remission. In an ideal world, a treatment that works for one person would work for everyone. But in our nonideal world, the effects of treatments vary across users. This gives rise to an important metric of efficacy: the number needed to treat (NNT). The NNT is the number of people who would have to be treated so that we would expect one among them to experience symptom improvement or remission.

To calculate the NNT, we first estimate the absolute risk reduction (ARR) associated with a treatment. The ARR is the difference in the proportion of patients benefiting in an RCT’s placebo and treatment conditions. The GERD meta-analysis shows that we can expect that 47 percent of patients receiving omeprazole will not achieve complete relief from GERD symptoms, as compared with 63 percent of those receiving a placebo. In this case the ARR is the proportion of placebo nonresponders (0.63) – omeprazole nonresponders (0.47) = 0.16. The NNT is defined as 1.0/ARR, which in this case is 6.25 (1.0/0.16). This means that for every 6.25 people who take omeprazole, one achieves complete symptom remission.

A lower NNT is, of course, more desirable. In comparison with most treatments, omeprazole has a good NNT, which weighs in favor of a finding of efficacy. (Later in this book we offer examples where the NNT is in the hundreds or even higher.) But those evaluating treatments must consider more than the NNT. Such information must be regarded in the context of a careful evaluation of study design, including possible biases; the balance of short- and longer-term benefits and harms; and possible alternative treatments. Consider that the meta-analysis of omeprazole for GERD showed that omeprazole had large effects on GERD symptoms in relation to comparator treatments, but only when older trials affiliated with drug manufacturers were counted. More recent trials not affiliated with drug manufacturers showed less benefit. This should give us pause. Could it be that manufacturers, keen to sell omeprazole, introduced into their studies – or the statistical analyses of the data – biases that flattered their commercial interests?

One way to get around potentially biased statistical analyses of treatment efficacy is to analyze the raw data across studies. We would consider this a best practice: obtain individual patient data from all trials and conduct a meta-analysis on these data. Although we are not aware of an individual-patient meta-analysis for omeprazole RCTs, it would certainly be possible to perform such a study using data from the RCTs submitted to the Food and Drug Administration, which requires that trialists supply their raw data whether or not the study in question is ever published in a peer-reviewed journal. (As we discuss later, the results from many studies are never published or publicly disclosed.) A meta-analysis based on the publicly available data should be less subject to the biases that can be present in published trials, which then carry over into meta-analyses of published trials.

As an index of the overwhelming emphasis placed on clinical experiments, it is worth noting that, while RCTs speak only to whether treatments can work under controlled conditions, RCTs results are commonly the basis for practice guidelines applied in real-life clinical settings. The importance of such guidelines cannot be underestimated. It should be obvious, then, that the quality of the studies that inform them is paramount. If industry conflicts of interest might be responsible for biases introduced into RCT design and analysis, similar conflicts can and do influence the development of practice guidelines. That development process should be free of such conflicts. This is a widely acknowledged best practice,12 but it is not always followed, as will become clear in our case studies of statin medications for the treatment of heart disease and of antidepressants.

Modifiable Determinants of Health: From Confounding Variables to Interventions

One of the major downsides of RCTs is that they tend to control for precisely what ought to be the independent variable. We have in mind the common social and behavioral determinants of health behaviors. Most conditions can be credited to these factors, even as clinical trials treat them as nuisance variables. On the social determinants side, important factors include economic stability, healthcare access and quality, education access and quality, characteristics of the built environment, and community life.13 Health behaviors include physical activity, diet, substance use, healthcare seeking, and adherence to treatment. Only about 10 percent of health outcomes and prevention of premature deaths can be attributed to medical care; the rest is owed to social and behavioral matters.Reference Kaplan and Milstein14

Consider the example of obesity, a common condition. Obesity is usually operationalized as a BMI of 30 or higher, calculated as weight / height.Reference Haynes2 A person of average height (5 ft 9 in or 69 in or 1.75 m) would be considered obese at 203 lbs (92.1 kg) or more. (BMI is a surrogate marker for obesity because it does not directly assess body fat.) There is endless interest in weight-loss drugs, yet obesity is promoted by environmental factors and socioeconomic status.Reference Cockerham15, Reference Lakerveld and Mackenbach16 For instance, rural environments, where it is only possible to get around by driving, can promote lack of exercise and therefore weight gain. Then, too, weight loss can improve health status in general. For instance, one study published in 2015 found that, over a two-year period, weight loss was associated with reduced multimorbidity in 500 patients who were severely obese and had two or more chronic conditions.Reference Agborsangaya, Majumdar, Sharma, Gregg and Padwal17 Another study reported that a 3.5-point reduction in BMI was associated with a 40 percent reduction in GERD symptoms, including among women with normal range BMIs.Reference Jacobson, Somers, Fuchs, Kelly and Camargo18

In some trials, health behaviors related to weight and smoking, for example, are potential confounding variables that should be controlled. But such potentially modifiable variables could also be treatment targets. For example, in two studies of obese patients with GERD, weight-loss interventions were especially effective. Those treated with weight-loss interventions had decreased GERD symptoms compared with those not treated and were more likely able to stop taking proton pump inhibitors (PPIs) such as omeprazole. Those who received weight-loss interventions and stayed on PPIs were more able to decrease their doses.Reference de Bortoli, Guidi and Martinucci19, Reference Yadlapati, Pandolfino and Alexeeva20 Modifying health behaviors rather than treating them as confounding variables can have impressive outcomes.

Does It Work?

If RCTs show that a treatment can work, then researchers have established proof of principle. The next step is to see if the treatment does work. To address real-world treatment effectiveness, we need studies in the real world, not in the controlled conditions of the RCT. Several types of studies address real-world effectiveness. These include pragmatic trials, observational studies, and comparative effectiveness research (CER).

Pragmatic Studies

A study can be considered pragmatic if it examines treatment effectiveness under real-world conditions. For this reason, pragmatic studies should have few participant exclusion criteria. Real-world patients have comorbidities; pragmatic studies include such patients. In addition, the outcomes studied should be those that matter to patients. Patients care most about longevity and quality of life; they care less about laboratory values, such as whether a surrogate marker of a condition rises or falls. For example, patients with heart disease are interested in whether drugs can help them avoid debilitation and death from heart disease, not whether drugs reduce their blood pressure, irrespective of impact on longevity or quality of life. (We discuss surrogate markers extensively in Chapter 10.)

The Pragmatic Explanatory Continuum Indicator Summary (PRECIS-2) is a useful tool for assessing whether a study is in fact pragmatic.Reference Loudon, Treweek, Sullivan, Donnan, Thorpe and Zwarenstein21 PRECIS-2 grades studies along nine domains in order to discriminate pragmatic (or effectiveness) studies from efficacy/explanatory studies. These domains include study participant eligibility, primary outcome tested, participant-recruitment methodology, study setting, study organization, and issues concerning data used for primary analysis. PRECIS-2 also considers characteristics of the treatment condition, including flexibility in adherence, flexibility in delivery, and follow-up. These are very important in differentiating pragmatic from explanatory studies. One of the big differences between RCT and real-world conditions is that patients in RCTs are closely scrutinized to ensure that they are adhering to treatment regiments. They receive treatment on rigorous schedules and exactingly follow rules for dosing and behavior interventions. Follow-up is similarly regimented. None of this is typical of ordinary patient experience.

Using PRECIS-2, studies are scored on each domain on a scale from 1 (very explanatory) to 5 (very pragmatic). A score of 3 is considered equally pragmatic and explanatory.Reference Janiaud, Dal-Ré and Ioannidis22 In other words, PRECIS-2 does not assume that clinical studies are either explanatory or pragmatic. A study can be mostly pragmatic, mostly explanatory, or somewhere in between.

Unfortunately, when investigators label their studies as pragmatic, they often fail to justify their designation (e.g., by applying PRECIS-2).Reference Janiaud, Dal-Ré and Ioannidis22 Studies also may not provide sufficient information to enable others to apply PRECIS-2 after the fact. Some studies characterized as pragmatic may involve the use of placebos; testing at only a single site; blinding, whereby researchers and/or patients do not know whether they are receiving the treatment or a placebo; and unapproved drugs. None of these practices occur in standard real-world care and so disqualify a study from being truly pragmatic.Reference Dal-Ré, Janiaud and Ioannidis23 Research consumers need to be cautious about claims of pragmatism.

PRECIS-2 is not a perfect tool. For instance, randomization is not among its criteria, yet in real-world practice clinicians do not choose treatments randomly. We feel strongly that randomization of individual participants disqualifies a study from being pragmatic. However, in the real world, clinical sites do differ. Different sites may represent different practices and may be components of different care systems. One may be trying to improve the quality of its care, while others may not be. In other words, randomization at levels other than individual patients may reproduce real-world differences and so should not disqualify a study from being pragmatic.

Observational Studies

Observational studies examine the effectiveness of treatments as they occur in real-life practice. These are not experiments: There is no manipulation of the treatment, as is done in RCTs. Observational studies can be retrospective, using existing data, or involve new data collection. Unlike trials – both RCTs and pragmatic trials – that accrue samples of patients for the purpose of the trial, observational studies examine existing patient groups. For example, electronic medical records and other sources of existing data could be used to identify a cohort of patients with a certain diagnosis and then study the treatments received and how well they worked. Furthermore, depending on the data source and quality, it may be possible to identify patient characteristics that were associated with better or worse outcomes, such as age, the presence of other conditions or multimorbidity, health behaviors such as a sedentary lifestyle, or social determinants of health, such as living in poverty.13

Observational studies tell us a great deal that is difficult or impossible to learn from an RCT. Consider three examples of observational studies concerning GERD treatment. These studies complicate how we might think about the usefulness of PPIs such as omeprazole. In one study, among 987 patients with GERD who had been prescribed a PPI for four weeks, 71 percent responded (i.e., they experienced heartburn or regurgitation on only one day or less in the week before reporting symptoms).Reference Lu, Zhang and Wang24 Another study involved a cohort of 96 patients with peptic acid disease, which would include GERD, who had been prescribed omeprazole for four weeks or more.Reference Jain, Shamrao Kulkarni and Mahapatra25 The patients completed measures of symptom severity and treatment satisfaction – perceptions of medication effectiveness, convenience, and global satisfaction. Heartburn and other symptoms decreased significantly, but the average treatment satisfaction score was 37 out of a possible 100, indicating a high level of patient dissatisfaction, in spite of apparent symptom remission. Finally, the third observational study involved a cohort of patients at high risk of gastrointestinal complications because they took drugs that increased the risk of bleeding (aspirin or NSAIDS) or had bleeding peptic ulcers, or because of other factors. These patients took omeprazole for its putative gastroprotective effects. The study found that the cohort did not experience more gastrointestinal complications when compared with other studies, suggesting that RCTs may needlessly exclude participants with such comorbidities or who are taking aspirin or NSAIDS.Reference Lanas, Rodrigo and Márquez26

These cohort studies place omeprazole RCT findings in a different light. On the basis of such findings, PPIs have become enormously popular among a wide swath of the population suffering with GERD. But what RCTs do in fact provide is good evidence that omeprazole can work for some people with GERD. Observational studies, meanwhile, suggest that we don’t know very much about how well omeprazole works in real-world practice.

The difficulty with observational studies is that it is nearly impossible to assure that the treatment causes changes in the outcome. Characteristics of users versus nonusers – those who elect to take a treatment may differ in systematic ways from those who decide not to – remain an alternative explanation for any observed differences in outcome. Even with statistical adjustments for these characteristics, we cannot assure that differences between the groups are the result of the treatment.

The need for caution surrounding the conclusions of observational studies is clear from the case of COVID-19 vaccinations. In the United States, vaccine manufacturers Pfizer and Moderna carried out surprisingly few studies that randomly assigned people to vaccine versus placebo; their original studies followed participants for a relatively short period of time; and they never clearly documented differences in hospitalizations and deaths. So most of the data concerning whether vaccination protected individuals from serious consequences of COVID-19 came from observational studies, which compared people who elected to receive the vaccine with those who refused it. However, vaccine refusers and acceptors differed on a wide range of variables, including political affiliation, education, income, obesity, diabetes, and other factors associated with COVID-19 complications. This means that the results of these observational studies were very likely influenced not only by effects of vaccination but also by systematic differences among the sorts of people who took the vaccine and those who rejected it.

Comparative Effectiveness Research

Which treatment works better in the real world? We might use CER to tackle this question. The Institute of Medicine (now renamed the National Academy of Medicine) defines CER as “The generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition or to improve the delivery of care.”Reference Sox and Greenfield27 The purpose of CER is to assist consumers, clinicians, purchasers, and policy makers to make informed decisions that will improve healthcare at both the individual and population levels.

CER uses diverse kinds of studies (including RCTs, pragmatic trials, observational studies, and systematic reviews) to synthesize and disseminate the evidence. These processes are centered on patient perspectives and values. Accruing evidence allows for the possibility to address which treatment works better for whom.

An example is a CER synthesis of treatments for dyspepsia, which includes but is not limited to GERD.Reference Elliott, Steel, Leech and Peng28 The authors identified 36 studies that used diverse treatments for dyspepsia. In contrast with traditional “narrative” literature reviews, for which authors may select studies that support their theories, CER requires authors to prospectively define the rules for which studies qualify to be included or excluded from the review. In the GERD example, the authors used the PRECIS-2 to evaluate the characteristics of the included studies. Although most of the 36 studies were deemed pragmatic by PRECIS-2 classification, the authors identified several limitations in the studies’ relevance for real-world care. These included extensive inclusion and exclusion criteria, the frequent use of endoscopies as study entry requirements, and the lack of a commonly used patient-recorded outcome measure.

Is It Worth It?

If a treatment can and does work, we must then ask whether it is worth implementing. “Worth” is an apt word, referring to a treatment’s value or cost. When we assess the balance of cost and value, it is important that we do so in a patient-centric manner. In other words, we should measure outcomes that are meaningful to patients. Healthcare professionals want patients to be satisfied: Examining outcomes relevant to patients has intuitive appeal as a way to be responsive to patient preferences. Doing so is also consistent with the ethical principle of respect for people (discussed in detail in Chapter 3), which urges, among other things, that care be consistent with patients’ values and priorities.

Disability

Often, what matters most to patients is the degree to which a condition disables them. Disability occurs when a physical or mental condition limits functioning, impairing a person’s ability to engage in activities that enable quality of life. For example, GERD can be disabling: Its direct symptoms are challenging on their own and can also be associated with emotional distress, fatigue, impaired sleep, altered eating and drinking behavior, and lower levels of social and physical activity. Collectively, these disabilities diminish both health and quality of life.

Researchers interested in disability often use questionnaires to assess functioning and quality of life. Generic quality-of-life questionnaires assess overall well-being. For instance, the Short Form-36 Health Survey Questionnaire (SF-36) is a commonly used measure of generic functional status.Reference Ware, Snow, Kosinski and Gandek29 The survey includes 36 items summarized in eight domains: bodily pain, vitality, physical functioning, physical role limitations, mental functioning, emotional role limitations, mental health, and general perception of health. Patients’ answers to the questions can also be summarized in terms of physical and mental component scores. Scores range from 0 to 100, with 80 and above indicating normal-range functioning in the specified area.

There are also disease-specific quality-of-life questionnaires, which reflect the effects of a specific condition. For example, the Quality of Life in Reflux and Dyspepsia (QOLRAD) targets effects of GERD on quality of life.Reference Wiklund, Junghard and Grace30 The QOLRAD asks participants about their experience during previous week. There are 25 items, each rated on a scale from one to seven. The questionnaire yields scores in five areas: emotional distress, sleep disturbance, food and drink problems, physical/social functioning, and vitality.

As readers will see in our GERD case study, GERD is associated with impaired general and GERD-specific quality of life. GERD treatment is usually associated with improved quality of life, but there is limited evidence that GERD treatment results in normal-range quality-of-life values. Improvement that results in normal-range functioning is referred to as clinically significant improvement; substantial improvement short of normal-range functioning is known as statistically significant improvement. Therefore, we could say that treatment of GERD with omeprazole often results in statistically but not clinically significant improvements in quality of life.

Cost-Utility Analysis and QALYs

One of the most useful tools for patient-centric value assessment is cost-utility analysis, and at the core of cost-utility analyses is a metric called the quality-adjusted life-year (QALY). QALYs quantify wellness on a continuum in which a value of 1 indicates perfect health and 0 indicates death. The value of one year lived in a state of perfect health is 1. Many people do not enjoy perfect health; the impact of the condition they experience is quantified in terms of a disutility weight, or the value of one year lived with the condition. For example, a condition such as nonfatal heart failure might have a disutility weight of 0.5. So, in comparison with perfect health, heart failure is associated with the loss of about one half of a QALY for each year a person suffers from it. That said, we often don’t learn much by comparing an individual’s quality of life against the standard of perfection. Instead, it is more useful to compare individuals with what is expected of their cohort. Since people in the age range where heart failure is common are rarely in perfect health, the age-matched comparison group for our heart failure sufferer might have an average disutility weight of 0.75. For a person in the age range where heart failure is typical, each year with heart failure might be associated with loss of about one-quarter of a QALY (0.75 – 0.50 = 0.25).

There are several commonly used utility measures based on QALYs. One found in a wide range of studies is the Health Utility Index (HUI-3), which comprises 15 areas of functioning such as vision, pain, and mobility.Reference Furlong, Feeny, Torrance and Barr31 Each area is rated on a scale from 1 to 5 or 1 to 6, where 1 means “free of pain and discomfort” and 5 or 6 means “severe pain that prevents most activities.” These ratings are compiled into a total score ranging from 0 (dead) to 1 (in perfect health). The average for adults in the general population is 0.9.

To see how we can take advantage of the HUI-3 to determine cost-utility, let’s consider again the case of GERD. Among those with GERD, HUI-3 scores average about 0.8.Reference El-Dika, Guyatt and Armstrong32Reference Schünemann, Armstrong and Degl’innocenti34 Treatment with omeprazole represents an incremental improvement of 0.1 in HUI-3 score for each month of treatment – from 0.8 before treatment to 0.9 afterward. Extended over one year, then, GERD treatment adds 1.2 QALYs (0.1 × 12 months). Incremental cost-utility over the course of a year, assuming chronic use of omeprazole costing US$11/month (incremental cost of US$132/year), versus no treatment would be US$110 ($132/1.2). Is this worth it? Opinions may vary, but as one benchmark, consider that the United Kingdom’s National Institute for Health and Care Excellence usually approves those treatments with cost per QALY of less than £30,000 per year, meaning it considers such treatments worth implementing.35 Using this benchmark, omeprazole would be considered to have excellent cost-utility.

What about Harms?

Before we declare a treatment such as omeprazole an unqualified success in cost-utility terms, we should consider its adverse effects. Indeed, potential harms should be factored into cost-utility analysis for every treatment. These effects can appear after short- or long-term use and can involve physical or mental symptoms and conditions. Some harms are revealed only after the accumulation of information over time. It’s been over 30 years since omeprazole was approved for use in GERD in the United States, an advantage for examining long-term effects. Unfortunately, the findings are dispiriting: There is ample evidence that long-term use comes with significant downsides.Reference Mafi, May and Kahn36

Use of omeprazole and other drugs of its class – PPIs – can result in dependency. Symptom rebound owing to hypersecretion of gastric acid often occurs after stopping PPI use, leading patients to restart PPIs. This means that patients will continue to take drugs that have been associated with the increased risk (odds) of experiencing serious conditions including pneumonia,Reference Wilhelm, Rjater and Kale-Pradhan37 Clostridium difficile infections,Reference Wilhelm, Rjater and Kale-Pradhan37, Reference Bavishi and Dupont38 fractures,Reference Ngamruengphong, Leontiadis, Radhi, Dentino and Nugent39 certain cancers,Reference Huber, Nadella and Cao40, Reference Guo, Zhang and Zhang41 and irritable bowel syndromes.Reference Onwuzo, Boustany and Khaled Abou Zeid42 Because PPIs are available over the counter, people can make their own decisions regarding their use and may be unaware of these risks. And even patients who are aware of side-effect risks may believe that if they do not immediately experience these side effects they are immune from them. It is therefore important to promote information concerning long-term risks.Reference Wilhelm, Rjater and Kale-Pradhan37

Conclusions

This chapter examined the goals of clinical research, as they relate to various types of studies, with three overarching questions as a guide: Can it work? Does it work? Is it worth it? Although RCTs are widely viewed as the premier approach to clinical data collection, they are suited only to answering the first of these three equally important questions. Then too, structural biases narrow the scope of RCTs considerably, so that social and behavioral determinants of health – the most important risk factors for illness – are typically considered confounding variables rather than treatment targets.

To determine whether a treatment really does work, we need pragmatic trials and observational studies. And to determine whether a treatment is worth implementing, we need cost-utility studies, which examine a treatment’s value in terms of its impact on quality of life. Unfortunately, such studies are uncommon. There are, for instance, far more RCTs testing the use of PPIs for GERD treatment than there are cost-utility studies assessing how most efficiently to restore GERD sufferers to normal quality of life. Such studies focus on the patient experience of the condition and treatment, taking stock of harms to patients and counting more than symptom remission. Patients care about symptom remission, but this is just one element of quality of life, which also must be assessed in the long term – something RCTs are not good at. Boxes that summarize the key issues of what needs rethinking will be found at the end of each chapter: Box 1.1 lists the key issues for Chapter 1.

Box 1.1 What Needs Rethinking: Alternative Designs to Address Study Aims

Can it work?

  • RCTs that target health behavior change

Does it work?

  • More real-world studies facilitated by using:

Is it worth it?

  • More focus on cost-utility analyses

Chapter 2 picks up where this one leaves off, digging more deeply into the research designs used in clinical science. One of its key themes is the importance of a solid research plan, regardless of the design. This plan, known as a research protocol, helps the investigator stay organized throughout the lengthy process of clinical research. It also provides an accountability mechanism, helping to ensure that the execution of research – including the reporting of results – matches what was promised when the project was initially undertaken. This is essential, because execution can determine whether study results are actually useful in the real world.

References

Cochrane, AL. Effectiveness and Efficiency: Random Reflections on Health Services. The Rock Carling Fellowship, Nuffield Provincial Hospitals Trust; 1972:xi, p. 92.Google Scholar
Haynes, B. Can it work? Does it work? Is it worth it? The testing of healthcare interventions is evolving. BMJ. 1999; 319(7211):652653. doi:10.1136/bmj.319.7211.652.CrossRefGoogle Scholar
Porzsolt, F. Comparative effectiveness is the common denominator in health services research: Experimental effects are promising, real-world effects are compelling. J Complement Integr Med. 2023; 21(1):1925. doi:10.1515/jcim-2023-0179.CrossRefGoogle Scholar
Mentz, RJ, Kaski, JC, Dan, GA, et al. Implications of geographical variation on clinical outcomes of cardiovascular trials. Am Heart J. 2012; 164(3):303312. doi:10.1016/j.ahj.2012.06.006.CrossRefGoogle ScholarPubMed
World Medical Association. WMA Declaration of Helsinki – Ethical principles for medical research involving human subjects. www.wma.net/what-we-do/medical-ethics/declaration-of-helsinki/.Google Scholar
Manabe, N, Haruma, K, Ito, M, et al. Efficacy of adding sodium alginate to omeprazole in patients with nonerosive reflux disease: A randomized clinical trial. Dis Esophagus. 2012; 25(5):373380. doi:10.1111/j.1442-2050.2011.01276.x.CrossRefGoogle ScholarPubMed
Bate, CM, Green, JR, Axon, AT, et al. Omeprazole is more effective than cimetidine for the relief of all grades of gastro-oesophageal reflux disease-associated heartburn, irrespective of the presence or absence of endoscopic oesophagitis. Aliment Pharmacol Ther. 1997; 11(4):755763. doi:10.1046/j.1365-2036.1997.00198.x.CrossRefGoogle ScholarPubMed
Richter, JE, Peura, D, Benjamin, SB, Joelsson, B, Whipple, J. Efficacy of omeprazole for the treatment of symptomatic acid reflux disease without esophagitis. Arch Intern Med. 2000; 160(12):18101816. doi:10.1001/archinte.160.12.1810.CrossRefGoogle ScholarPubMed
Hosseini, M, Salari, R, Akbari Rad, M, Salehi, M, Birjandi, B, Salari, M. Comparing the effect of psyllium seed on gastroesophageal reflux disease with oral omeprazole in patients with functional constipation. J Evid Based Integr Med. 2018; 23:2515690X18763294. doi:10.1177/2515690X18763294.CrossRefGoogle ScholarPubMed
Bate, CM, Griffin, SM, Keeling, PW, et al. Reflux symptom relief with omeprazole in patients without unequivocal oesophagitis. Aliment Pharmacol Ther. 1996; 10(4):547555. doi:10.1046/j.1365-2036.1996.44186000.x.CrossRefGoogle ScholarPubMed
Boeckxstaens, G, El-Serag, HB, Smout, AJ, Kahrilas, PJ. Symptomatic reflux disease: The present, the past and the future. Gut. 2014; 63(7):11851193. doi:10.1136/gutjnl-2013-306393.CrossRefGoogle ScholarPubMed
Institute of Medicine (U.S.). Committee on Standards for Developing Trustworthy Clinical Practice Guidelines., Graham R. Clinical Practice Guidelines We Can Trust. National Academies Press; 2011:xxxiv, p. 266.Google Scholar
U.S. Department of Health and Human Services Office of Disease Prevention and Health Promotion. Healthy People 2030. February 4, 2024, https://health.gov/healthypeople/.Google Scholar
Kaplan, RM, Milstein, A. Contributions of health care to longevity: A review of 4 estimation methods. Ann Fam Med. 2019; 17(3):267272. doi:10.1370/afm.2362.CrossRefGoogle ScholarPubMed
Cockerham, WC. Theoretical approaches to research on the social determinants of obesity. Am J Prev Med. 2022; 63(1 Suppl 1):S8–S17. doi:10.1016/j.amepre.2022.01.030.CrossRefGoogle ScholarPubMed
Lakerveld, J, Mackenbach, J. The upstream determinants of adult obesity. Obes Facts. 2017; 10(3):216222. doi:10.1159/000471489.CrossRefGoogle ScholarPubMed
Agborsangaya, CB, Majumdar, SR, Sharma, AM, Gregg, EW, Padwal, RS. Multimorbidity in a prospective cohort: Prevalence and associations with weight loss and health status in severely obese patients. Obesity (Silver Spring). 2015; 23(3):707712. doi:10.1002/oby.21008.CrossRefGoogle Scholar
Jacobson, BC, Somers, SC, Fuchs, CS, Kelly, CP, Camargo, CA. Body-mass index and symptoms of gastroesophageal reflux in women. N Engl J Med. 2006; 354(22):23402348. doi:10.1056/NEJMoa054391.CrossRefGoogle ScholarPubMed
de Bortoli, N, Guidi, G, Martinucci, I, et al. Voluntary and controlled weight loss can reduce symptoms and proton pump inhibitor use and dosage in patients with gastroesophageal reflux disease: A comparative study. Dis Esophagus. 2016; 29(2):197204. doi:10.1111/dote.12319.CrossRefGoogle ScholarPubMed
Yadlapati, R, Pandolfino, JE, Alexeeva, O, et al. The Reflux Improvement and Monitoring (TRIM) program is associated with symptom improvement and weight reduction for patients with obesity and gastroesophageal reflux disease. Am J Gastroenterol. 2018; 113(1):2330. doi:10.1038/ajg.2017.262.CrossRefGoogle ScholarPubMed
Loudon, K, Treweek, S, Sullivan, F, Donnan, P, Thorpe, KE, Zwarenstein, M. The PRECIS-2 tool: Designing trials that are fit for purpose. BMJ. 2015; 350:h2147. doi:10.1136/bmj.h2147.CrossRefGoogle ScholarPubMed
Janiaud, P, Dal-Ré, R, Ioannidis, JPA. Assessment of pragmatism in recently published randomized clinical trials. JAMA Intern Med. 2018; 178(9):12781280. doi:10.1001/jamainternmed.2018.3321.CrossRefGoogle ScholarPubMed
Dal-Ré, R, Janiaud, P, Ioannidis, JPA. Real-world evidence: How pragmatic are randomized controlled trials labeled as pragmatic? BMC Med. 2018; 16(1):49. doi:10.1186/s12916-018-1038-2.CrossRefGoogle ScholarPubMed
Lu, B, Zhang, L, Wang, J, et al. Empirical treatment of outpatients with gastroesophageal reflux disease with proton pump inhibitors: A survey of Chinese patients (the ENLIGHT Study). J Gastroenterol Hepatol. 2018; 33(10):17221727. doi:10.1111/jgh.14143.CrossRefGoogle Scholar
Jain, S, Shamrao Kulkarni, S, Mahapatra, JR, et al. Effectiveness of omeprazole in acid peptic disease: A real-world, patient-reported outcome measures study. Cureus. 2023; 15(7):e41994. doi:10.7759/cureus.41994.Google ScholarPubMed
Lanas, A, Rodrigo, L, Márquez, JL, et al. Low frequency of upper gastrointestinal complications in a cohort of high-risk patients taking low-dose aspirin or NSAIDS and omeprazole. Scand J Gastroenterol. 2003; 38(7):693700. doi:10.1080/00365520310003967.Google ScholarPubMed
Sox, HC, Greenfield, S. Comparative effectiveness research: A report from the Institute of Medicine. Ann Intern Med. 2009; 151(3):203205. doi:10.7326/0003-4819-151-3-200908040-00125.CrossRefGoogle ScholarPubMed
Elliott, N, Steel, A, Leech, B, Peng, W. Design characteristics of comparative effectiveness trials for the relief of symptomatic dyspepsia: A systematic review. Integr Med Res. 2021; 10(2):100663. doi:10.1016/j.imr.2020.100663.CrossRefGoogle ScholarPubMed
Ware, JE, Snow, KK, Kosinski, M, Gandek, B. SF-36 Health Survey Manual and Interpretation Guide. The Health Institute, New England Medical Center; 1993.Google Scholar
Wiklund, IK, Junghard, O, Grace, E, et al. Quality of Life in Reflux and Dyspepsia patients: Psychometric documentation of a new disease-specific questionnaire (QOLRAD). Eur J Surg Suppl. 1998; (583):4149.Google ScholarPubMed
Furlong, WJ, Feeny, DH, Torrance, GW, Barr, RD. The Health Utilities Index (HUI) system for assessing health-related quality of life in clinical studies. Ann Med. 2001; 33(5):375384. doi:10.3109/07853890109002092.CrossRefGoogle ScholarPubMed
El-Dika, S, Guyatt, GH, Armstrong, D, et al. The impact of illness in patients with moderate to severe gastro-esophageal reflux disease. BMC Gastroenterol. 2005; 5:23. doi:10.1186/1471-230X-5-23.CrossRefGoogle ScholarPubMed
Fallone, CA, Guyatt, GH, Armstrong, D, et al. Do physicians correctly assess patient symptom severity in gastro-oesophageal reflux disease? Aliment Pharmacol Ther. 2004; 20(10):11611169. doi:10.1111/j.1365-2036.2004.02257.x.CrossRefGoogle ScholarPubMed
Schünemann, HJ, Armstrong, D, Degl’innocenti, A, et al. A randomized multicenter trial to evaluate simple utility elicitation techniques in patients with gastroesophageal reflux disease. Med Care. 2004; 42(11):11321142. doi:10.1097/00005650-200411000-00013.CrossRefGoogle ScholarPubMed
Office for Health Improvement and Disparities, National Institute for Health and Care Excellence. Cost utility analysis: Health economic studies. Accessed June 3, 2023, www.gov.uk/guidance/cost-utility-analysis-health-economic-studies.Google Scholar
Mafi, JN, May, FP, Kahn, KL, et al. Low-value proton pump inhibitor prescriptions among older adults at a large academic health system. J Am Geriatr Soc. 2019; 67(12):26002604. doi:10.1111/jgs.16117.CrossRefGoogle Scholar
Wilhelm, SM, Rjater, RG, Kale-Pradhan, PB. Perils and pitfalls of long-term effects of proton pump inhibitors. Expert Rev Clin Pharmacol. 2013; 6(4):443451. doi:10.1586/17512433.2013.811206.CrossRefGoogle ScholarPubMed
Bavishi, C, Dupont, HL. Systematic review: The use of proton pump inhibitors and increased susceptibility to enteric infection. Aliment Pharmacol Ther. 2011; 34(11–12):12691281. doi:10.1111/j.1365-2036.2011.04874.x.CrossRefGoogle ScholarPubMed
Ngamruengphong, S, Leontiadis, GI, Radhi, S, Dentino, A, Nugent, K. Proton pump inhibitors and risk of fracture: A systematic review and meta-analysis of observational studies. Am J Gastroenterol. 2011; 106(7):12091218; quiz 1219. doi:10.1038/ajg.2011.113.CrossRefGoogle ScholarPubMed
Huber, MA, Nadella, S, Cao, H, et al. Does chronic use of high dose proton pump inhibitors increase risk for pancreatic cancer? Pancreas. 2022; 51(9):11181127. doi:10.1097/MPA.0000000000002145.CrossRefGoogle ScholarPubMed
Guo, H, Zhang, R, Zhang, P, et al. Association of proton pump inhibitors with gastric and colorectal cancer risk: A systematic review and meta-analysis. Front Pharmacol. 2023; 14:1129948. doi:10.3389/fphar.2023.1129948.CrossRefGoogle Scholar
Onwuzo, S, Boustany, A, Khaled Abou Zeid, H, et al. Prevalence and risk factors associated with inflammatory bowel disease in patients using proton-pump inhibitors: A population-based study. Cureus. 2023; 15(1):e34088. doi:10.7759/cureus.34088.Google ScholarPubMed
Mc Cord, KA, Ewald, H, Agarwal, A, et al. Treatment effects in randomised trials using routinely collected data for outcome assessment versus traditional trials: Meta-research study. BMJ. 2021; 372:n450. doi:10.1136/bmj.n450.Google ScholarPubMed

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×