Impact statement
Individuals do not respond to health interventions in the same way. This creates a need for identifying what it is (e.g., a behavior, a gene, a biomarker, or their combinations) that may indicate which interventions should be provided to different individuals. In fact, a great deal of modern biomedical science has focused on the identification of the mechanisms that contribute to disease, and relevant research has revealed that most disease processes are indeed multifactorial and can differ substantially between individuals. However, only now are studies being pursued in earnest that seek to identify links between measurable factors and likely response to health interventions. In this light, studies designed to identify unequivocal individual responders and non-responders to health interventions are needed. Current approaches, specifically those involving large cohort-based clinical trials with single endpoints and a focus on average effects of an intervention, are not necessarily designed for this. Rather, emerging N-of-1 trial designs that focus on individual responses to an intervention by collecting enough data on a participant to statistically determine and quantify their responses are better suited for this. We provide the basic motivation and techniques used in N-of-1 studies, contrasting them with standard population-based clinical trials, and focus on directions in which the research community is going that could accelerate the use of strategies for providing health interventions to the individuals most likely to benefit from them. One key area where clinical studies of health interventions have fallen short is in limiting their focus on one health outcome or measure. It is underappreciated that what individuals put in their bodies may impact them in a wide variety of ways – both good and bad – and N-of-1 studies have the potential to help overcome this and thereby push the understanding of human biology in unprecedented ways.
Introduction
The rapid development of high-throughput, cost-efficient and data-intensive assays for use in molecular biology and the biomedical sciences (e.g., DNA sequencing, proteomics, metabolomics, etc.) is revolutionizing the manner in which studies are pursued by seeking a deeper understanding of the pathological processes underlying diseases of all sorts. The application of such technologies to, for example, explorations of the differences between diseased and non-diseased human tissue specimens or genome-wide association studies (GWAS) interrogating DNA collected on tens of thousands of individuals with and without a particular condition, has led to many very useful insights into how to combat diseases (Karczewski and Snyder, Reference Karczewski and Snyder2018). However, such investigations have also exposed one very complicated set of issues: most pathological processes underlying diseases are heterogeneous and nuanced, to the point where mechanisms contributing to disease in one individual may be different from those in another individual. Given this, it has also been shown that available treatments or preventive interventions for different diseases tend not to work in everyone with the same general diagnosis. These two facts have led to concerted efforts to promote ‘precision’ or ‘personalized’ medicine and nutrition whereby health interventions are tailored to the unique genomic, physiologic, clinical, behavioral and exposure profiles of individuals who could benefit from them (Ginsburg and Willard, Reference Ginsburg and Willard2016; Karczewski and Snyder, Reference Karczewski and Snyder2018; Zeggini et al., Reference Zeggini, Gloyn, Barton and Wain2019).
The two largest impediments to enabling and deploying precision medicine at scale are (1) simply not having a more complete understanding of human in vivo biology and (2) not having insight into whether the differences exhibited by individuals at the molecular level – that have largely been identified from in vitro or ex vivo studies of human tissues – are truly clinically meaningful. Comprehensive longitudinal evaluations of humans using state-of-the-field assays have been pursued, but they have focused on identifying patterns among individuals in their natural environments without any controlled perturbation or design to relevant data collections (Chen et al., Reference Chen, Mias, Li-Pook-Than, Jiang, Lam, Chen, Miriami, Karczewski, Hariharan, Dewey, Cheng, Clark, Im, Habegger, Balasubramanian, O’Huallachain, Dudley, Hillenmeyer, Haraksingh, Sharon, Euskirchen, Lacroute, Bettinger, Boyle, Kasowski, Grubert, Seki, Garcia, Whirl-Carrillo, Gallardo, Blasco, Greenberg, Snyder, Klein, Altman, Butte, Ashley, Gerstein, Nadeau, Tang and Snyder2012; Li-Pook-Than and Snyder, Reference Li-Pook-Than and Snyder2013; Price et al., Reference Price, Magis, Earls, Glusman, Levy, Lausted, McDonald, Kusebauch, Moss, Zhou, Qin, Moritz, Brogaard, Omenn, Lovejoy and Hood2017; Earls et al., Reference Earls, Rappaport, Heath, Wilmanski, Magis, Schork, Omenn, Lovejoy, Hood and Price2019; Schussler-Fiorenza Rose et al., Reference Schussler-Fiorenza Rose, Contrepois, Moneghetti, Zhou, Mishra, Mataraso, Dagan-Rosenfeld, Ganz, Dunn, Hornburg, Rego, Perelman, Ahadi, Sailani, Zhou, Leopold, Chen, Ashland, Christle, Avina, Limcaoco, Ruiz, Tan, Butte, Weinstock, Slavich, Sodergren, McLaughlin, Haddad and Snyder2019; Levy et al., Reference Levy, Magis, Earls, Manor, Wilmanski, Lovejoy, Gibbons, Omenn, Hood and Price2020; Sailani et al., Reference Sailani, Metwally, Zhou, Rose, Ahadi, Contrepois, Mishra, Zhang, Kidzinski, Chu and Snyder2020; Zimmer et al., Reference Zimmer, Korem, Rappaport, Wilmanski, Baloni, Jade, Robinson, Magis, Lovejoy, Gibbons, Hood and Price2021; Metwally et al., Reference Metwally, Zhang, Wu, Kellogg, Zhou, Contrepois, Tang and Snyder2022). Such studies are essential to explore human intra- and inter-individual variation but leave open the question of how different factors might contribute to different responses to health interventions (Atkinson and Batterham, Reference Atkinson and Batterham2015; Atkinson et al., Reference Atkinson, Williamson and Batterham2019; McInnes et al., Reference McInnes, Yee, Pershad and Altman2021). We note that there are examples of specific therapeutic modalities whose development is consistent with and motivated by a precision medicine orientation in the discussion section.
The purpose of this review is to provide an argument that clinical trials can be pursued that will allow researchers to probe human physiology in ethically-sound ways with unprecedented sophistication. Relevant trials should be rooted in N-of-1 and aggregated N-of-1 designs (Schork, Reference Schork2015; Nikles et al., Reference Nikles, Onghena, Vlaeyen, Wicksell, Simons, McGree and McDonald2021) and focus on exploring multiple phenotypes simultaneously and identifying causal relationships between phenotypes by leveraging emerging, largely non-invasive, health monitoring devices and assays (Izmailova et al., Reference Izmailova, Wagner and Perakslis2018; Bentley et al., Reference Bentley, Kleiman, Elliott, Huffman and Nock2019; Tehrani et al., Reference Tehrani, Teymourian, Wuerstle, Kavner, Patel, Furmidge, Aghavali, Hosseini-Toudeshki, Brown, Zhang, Mahato, Li, Barfidokht, Yin, Warren, Huang, Patel, Mercier and Wang2022). We do not provide an exhaustive review of N-of-1 trials, as there are many excellent resources and introductions to the basic motivation and methodologies (Lillie et al., Reference Lillie, Patay, Diamant, Issell, Topol and Schork2011; Nikles et al., Reference Nikles, Onghena, Vlaeyen, Wicksell, Simons, McGree and McDonald2021; Davidson et al., Reference Davidson, Cheung, Friel and Suls2022), including comprehensive reviews of the applications of N-of-1 trials (Gabler et al., Reference Gabler, Duan, Vohra and Kravitz2011; Li et al., Reference Li, Gao, Punja, Ma, Vohra, Duan, Gabler, Yang and Kravitz2016; Mirza et al., Reference Mirza, Punja, Vohra and Guyatt2017) as well as practical guides as to how to conduct N-of-1 trials (Guyatt et al., Reference Guyatt, Sackett, Adachi, Roberts, Chong, Rosenbloom and Keller1988; Kravitz et al., Reference Kravitz and Duan2014; Nikles et al., Reference Nikles, Onghena, Vlaeyen, Wicksell, Simons, McGree and McDonald2021; Duan et al., Reference Duan, Norman, Schmid, Sim and Kravitz2022). In fact, N-of-1 trials are now receiving attention as strategies for improving health care generally (Keller et al., Reference Keller, Guyatt, Roberts, Adachi and Rosenbloom1988; Senn, Reference Senn1998; Derby et al., Reference Derby, Kronish, Wood, Cheung, Cohn, Duan, St Onge, Duer-Hefele, Davidson and Moise2021; McDonald and Nikles, Reference McDonald and Nikles2021; Selker et al., Reference Selker, Cohen, D’Agostino, Dere, Ghaemi, Honig, Kaitin, Kaplan, Kravitz, Larholt, McElwee, Oye, Palm, Perfetto, Ramanathan, Schmid, Seyfert-Margolis, Trusheim and Eichler2022). Rather, we focus on N-of-1 trials that can address issues plaguing precision medicine and can provide a better understanding of human biology for at least four reasons: (1) They can provide unprecedented insights into human biology, including intra-individual causal claims about interventions and health measures. (2) They provide very comprehensive ways of vetting interventions to see if they work and for whom they work. (3) Their results provide insight into an individual’s health that may benefit them almost immediately, as opposed to much later after all relevant data have been collected and analyzed as part of a larger study. (4) Their results can be aggregated to explore patterns among individuals who exhibit robust responses to interventions. The organization of the review is as follows. We first provide greater insight into why legacy population-wide effect-focused randomized clinical trials (RCTs) are inadequate to address fundamental questions about human biology. We then consider different aspects of, and settings for, the proposed multivariate N-of-1 clinical trials, including the need for better markers of drug activity and availability. We end with a brief discussion of a few emerging therapeutic areas that could benefit from the proposed trials as well as suggestions for future research.
Human biology and legacy clinical trials
Strategies to understand how systems function as a whole, and which components may be dependent on other components, typically involve inducing perturbations to those systems and then determining how the systems respond (e.g., in cellular or mouse physiology studies). Studies seeking to perturb living humans systematically in this way are at worst unethical and at best logistically complicated. However, humans voluntarily subject themselves to perturbations of all sorts via pharmacologic interventions, dietary manipulations, environmental exposures, etc. In fact, clinical trials are routinely pursued to explore responses to such perturbations. Unfortunately, most clinical trials tend to focus on a singular indication (i.e., health or response measure) and the average response to the intervention in the population at large and therefore do not address broader questions about human physiology. We do not provide an in-depth review of clinical trials here (see, e.g., Friedman et al., Reference Friedman, Furburg, DeMets, Reboussin and Granger2015), but rather highlight a few of their key aspects so they can be contrasted with the N-of-1 studies. Typically, health interventions are evaluated in stages to ensure their safety and efficacy, from small (n = 5–20) phase I safety trials, to moderately sized (n = 25–200) phase II efficacy trials, to large (n = 250–10,000) phase III comparative and phase IV post-marketing surveillance studies. Some phase II and virtually all phase III and IV trials are pursued as RCTs where individuals are randomized to receive or not receive the intervention in question to avoid confounding. The health measures collected on these individuals are then compared to determine what effect the intervention may have on the typical person in the population at large.
There are at least six issues in the conduct of phase I–phase IV clinical trials (Deaton and Cartwright, Reference Deaton and Cartwright2018; Schork, Reference Schork2018) that motivate complementary N-of-1 trials: (1) Most standard clinical trials have inclusion and exclusion criteria to make sure the trial has been carried out in individuals likely to benefit, as well as for ensuring safety and avoiding confounding effects, which can complicate their generalizability. (2) Most, if not all, trials focus on the effect of an intervention on a single well-defined endpoint (e.g., such as blood pressure, pain, or rheumatoid arthritis symptoms). (3) Most failures of interventions in clinical trials testing occur in the phase II stage of testing; that is, despite being shown to have potential in ‘pre-clinical’ cellular and non-human experiments and to be safe in phase I trials, many interventions are shown not to modulate or affect the phenotype they were designed to impact, calling into question the pre-clinical, basic-science driven evidence suggesting that they may have benefit in humans in vivo (of course there are other reasons why an intervention may fail in a Phase II trial, for example, due to biased sampling, focusing on the wrong endpoint, measurement error, etc.). (4) Most late phase clinical trials, despite having inclusion and exclusion criteria, are expensive as they are conducted on very large numbers of people to ensure the trial results are generalizable and to overcome often hypothesized weak average effect sizes. (5) The results of clinical trials may identify interventions with the potential to benefit individuals, but unless it is known a priori how to identify individuals most likely to benefit from each intervention, it will be unclear how to optimally provide the interventions (see Figure 1). (6) Standard population-based RCTs can take a very long time to pursue and analyze, whereas more focused participant or patient-oriented alternative trial designs can be aggregated sequentially to enable population-based inferences (Schork, Reference Schork2022).
Basic N-of-1 trial designs
Basic designs
As emphasized, the ultimate goal of N-of-1 trials is to determine, in an appropriately powered way, if an intervention is actually benefitting a target individual by leveraging data collections and analytical methods focused on that target individual’s response. An element common to all N-of-1 clinical trial designs is an intervention ‘crossover’ component in which measurements on a health-related phenotype (e.g., blood pressure, mood, weight, symptoms, etc.) are made while the target individual is receiving, and not receiving, an intervention. This contrast between measures while on and off the intervention can then be exploited to quantify and characterize the individual’s response to the intervention but only if enough reliable measurements are made during each of the intervention periods and data analysis methods are used to control for confounding due to, for example, placebo or unmeasured covariate effects (Lillie et al., Reference Lillie, Patay, Diamant, Issell, Topol and Schork2011; Kravitz et al., Reference Kravitz and Duan2014; Wang and Schork, Reference Wang and Schork2019; Kravitz and Duan, Reference Kravitz and Duan2022). Note that many of the most widely used strategies for avoiding confounding in standard RCTs can be exploited in the design and execution of N-of-1 trials, such as randomizing the order in which the interventions are provided, blinding of the received interventions to the participants and/or researchers analyzing the data, washout periods to avoid carryover effects, etc. (Lillie et al., Reference Lillie, Patay, Diamant, Issell, Topol and Schork2011; Duan et al., Reference Duan, Kravitz and Schmid2013; Kravitz et al., Reference Kravitz and Duan2014; Duan et al., Reference Duan, Norman, Schmid, Sim and Kravitz2022; Kravitz and Duan, Reference Kravitz and Duan2022).
Figure 2 depicts some basic N-of-1 designs. We note that there is growing, but not complete, consensus on the definition of an N-of-1 clinical trial – which many believe requires a randomized order of interventions with, for example, blinding – as opposed to a simple ‘single case study’ which may not include randomization or blinding. We argue that both N-of-1 clinical trials and some single case studies are appropriate for advancing precision medicine (Davidson et al., Reference Davidson, Cheung, Friel and Suls2022) and consider them both as N-of-1 clinical trials. Panel A depicts the simple and often used ‘interrupted time series single case design’ – or basic ‘AB’ design, where ‘A’ and ‘B’ correspond to interventions, one of which could be a placebo or simply no intervention (see, e.g., part V of the book by Huitema, Reference Huitema2011 for an excellent introduction). Panel B depicts the ‘reversal’ or ‘ABAB’ design in which the intervention periods in the interrupted time series design are repeated to ensure the initial set of observations do not reflect false positive or negative results. Panel C depicts the reversal design with washout periods (i.e., periods where no administration of an intervention, including a placebo, are provided) between each administration of an intervention to avoid confounding carryover effects (an ‘AwBwAwB’ design). Note that the number of intervention administration periods and the order of the interventions can vary depending on the sophistication of the design (e.g., ‘ABwBA’ or ‘AwAwBwAwBwBwA’).
The power of N-of-1 trials
N-of-1 trials derive their power to make inferences about the effect of an intervention on an individual from the number of measurements made on the participant while on and off an intervention (Huitema, Reference Huitema2011). However, serial correlations between the measurements can complicate the analysis if not appropriately accounted for, as can aforementioned covariate effects, carryover effects, missing data, non-uniform time points between measurement collections and placebo effects (Rochon, Reference Rochon1990; Huitema, Reference Huitema2011; Lillie et al., Reference Lillie, Patay, Diamant, Issell, Topol and Schork2011; Wang and Schork, Reference Wang and Schork2019; Somer et al., Reference Somer, Gische and Miocevic2022). Many offshoots of N-of-1 trials exist to improve their efficiency and comprehensiveness; for example, sequential designs can be used to minimize the number of measurements made while preserving appropriate false positive and false negative rates (Schork and Goetz, Reference Schork and Goetz2017; Schork, Reference Schork2022). In addition, there is no reason that N-of-1 trial methodology cannot be used in other settings, for example, assessing intervention effects in cell lines, tissue samples, mice, etc. In fact, such studies often make use of samples from a single individual or strain of mice and so, from a biological standpoint, they are, by their nature, assuming that insights from a single individual can shed light on very general biological questions. There are many recent examples of N-of-1 studies, which we will not review exhaustively here (Gabler et al., Reference Gabler, Duan, Vohra and Kravitz2011; Kronish et al., Reference Kronish, Hampsey, Falzon, Konrad and Davidson2018; Nikles et al., Reference Nikles, Evans, Hams and Sterling2022; Samuel et al., Reference Samuel, Wootton, Holder and Molony2022), but rather simply emphasize that they are growing in number and sophistication (Kim et al., Reference Kim, Hu, El Achkar, Black, Douville, Larson, Pendergast, Goldkind, Lee, Kuniholm, Soucy, Vaze, Belur, Fredriksen, Stojkovska, Tsytsykova, Armant, DiDonato, Choi, Cornelissen, Pereira, Augustine, Genetti, Dies, Barton, Williams, Goodlett, Riley, Pasternak, Berry, Pflock, Chu, Reed, Tyndall, Agrawal, Beggs, Grant, Urion, Snyder, Waisbren, Poduri, Park, Patterson, Biffi, Mazzulli, Bodamer, Berde and Yu2019; Lamb et al., Reference Lamb, Stone, D’Adamo, Volkov, Metti, Aronica, Minich, Leary, Class, Carullo, Ryan, Larson, Lundquist, Contractor, Eck, Ordovas and Bland2022; Phyland et al., Reference Phyland, McKay, Olver, Walterfang, Hopwood, Ponsford and Ponsford2022).
Beyond the basics
There are three important aspects of N-of-1 trials that are receiving the attention which are motivating newer approaches. First, the data and results associated with individual N-of-1 trials can be aggregated and analyzed to explore trends among the participants and their responses (Zucker et al., Reference Zucker, Ruthazer and Schmid2010; Araujo et al., Reference Araujo, Julious and Senn2016; Punja et al., Reference Punja, Xu, Schmid, Hartling, Urichuk, Nikles and Vohra2016; Schork and Goetz, Reference Schork and Goetz2017; Barbosa Mendes et al., Reference Barbosa Mendes, Jamshidi, Van den Noortgate and Fernandez-Castilla2022). Second, with sufficient data collected over time, one could characterize causal relationships among the intervention and other measures (Molenaar, Reference Molenaar2019; Izem and McCarter, Reference Izem and McCarter2021; Yeboah et al., Reference Yeboah, Mauer, Hufstedler, Carr, Matthay, Maxwell, Rahman, Debray, de Jong, Campbell, Gustafson, Janisch and Barnighausen2021) (note: an entire recent issue of the journal ‘Evaluation and the Health Professions’ was devoted to causal analysis in N-of-1 trials (Miocevic et al., Reference Miocevic, Moeyaert, Mayer and Montoya2022). Such analyses could provide unprecedented insight into human physiology. The third is that the execution of N-of-1 trials focusing on important physiologic endpoints can be greatly enhanced with emerging digital health-based monitoring devices (such as the Apple Watch and continuous glucose monitors), survey instruments made available through smartphone apps, and largely pain-free and convenient methods for obtaining blood, urine, stool and saliva samples (Enderle et al., Reference Enderle, Foerster and Burhenne2016; Izmailova et al., Reference Izmailova, Wagner and Perakslis2018).
Multivariate n-of-1 trials
N-of-1 clinical trials can be pursued to characterize the effect of an intervention on a specific phenotype (blood pressure) for a target individual and as such complement population-based RCTs, especially when it is unclear if an individual is likely to benefit from the intervention. However, many diseases are not associated with singular phenotypes and, in fact, most individuals who suffer from them do not only have one major symptom or problem (Ong et al., Reference Ong, Lee and Lee2020). This is especially the case for older individuals with many comorbidities (Pearson-Stuttard et al., Reference Pearson-Stuttard, Ezzati and Gregg2019; Onder et al., Reference Onder, Bernabei, Vetrano, Palmer and Marengoni2020; Skou et al., Reference Skou, Mair, Fortin, Guthrie, Nunes, Miranda, Boyd, Pati, Mtenga and Smith2022). As a result, it makes sense to pursue appropriately powered N-of-1 trials that explore the impact of an intervention on more than one outcome (i.e., multivariate N-of-1 trials). Although multivariate trials have been proposed in the context of standard RCTs, there are few, if any, precedents in N-of-1 study contexts (Zhao et al., Reference Zhao, Hu and Lagakos2009). Few published precision medicine studies have measured more than one clinically relevant health measure despite the availability of newer health monitoring technologies (Viana et al., Reference Viana, Edney, Gondalia, Mauch, Sellak, O’Callaghan and Ryan2021). Although we will not go into the mathematical or statistical details here for how such trials can achieve sufficient power, it is arguable that if health is defined broadly (e.g., normal blood pressure, quality sleep, good blood biochemistry profile, etc.) then a good health intervention should at a minimum not negatively affect any of them and at best positively affect them all. In this light, testing multiple measures for intervention effects simultaneously using an omnibus statistical test of the hypothesis that an intervention positively effects them all could lead to an increase in power (Huitema, Reference Huitema2011; Tabachnick and Fidell, Reference Tabachnick and Fidell2012), but only if the number of measures is large (Leroy et al., Reference Leroy, Frongillo, Kase, Alonso, Chen, Dohoo, Huybregts, Kadiyala and Saville2022). Reaching appropriate numbers of observations could be achieved, for example, through the use of the aforementioned continuous wireless devices or microsampling techniques which involve collecting minute amounts of blood or urine for analyses to avoid a standard blood draw or logistically challenging biospecimen collections (Enderle et al., Reference Enderle, Foerster and Burhenne2016; Bentley et al., Reference Bentley, Kleiman, Elliott, Huffman and Nock2019; Anderson et al., Reference Anderson, Razavi, Pope, Yip, Cameron, Bassini-Cameron and Pearson2020).
There are many settings beyond multimorbidity issues that justify an evaluation of multiple health measures in N-of-1 clinical trials. For example, depression is known to impact virtually all aspects of a person’s health due to the various behaviors adopted by depressed individuals (Triolo et al., Reference Triolo, Harber-Aschan, Murri, Calderon-Larranaga, Vetrano, Sjoberg, Marengoni and Dekhtyar2020; Aprahamian et al., Reference Aprahamian, Borges, Hanssen, Jeuring and Oude Voshaar2022). Testing the effect of an antidepressant on mood and depressive symptoms in addition to, perhaps, weight, blood pressure, sleep quality, etc. makes sense. Another example involves geroprotectors, or interventions meant to slow the aging rate and thereby influence susceptibility to, or processes associated with, many different age-related diseases (Mahmoudi et al., Reference Mahmoudi, Xu and Brunet2019; Kritchevsky and Justice, Reference Kritchevsky and Justice2020; Triolo et al., Reference Triolo, Harber-Aschan, Murri, Calderon-Larranaga, Vetrano, Sjoberg, Marengoni and Dekhtyar2020; Aprahamian et al., Reference Aprahamian, Borges, Hanssen, Jeuring and Oude Voshaar2022; Moskalev et al., Reference Moskalev, Guvatova, Lopes, Beckett, Kennedy, De Magalhaes and Makarov2022). Thus, by definition, a geroprotector should affect multiple systems and hence could be tested for this. In fact, if only one or some subset of health measures among many different measures is in fact affected by a purported geroprotector, then the intervention is probably not a geroprotector (Schork et al., Reference Schork, Beaulieu-Jones, Liang, Smalley and Goetz2022).
In addition to testing for the effect of an intervention on multiple health measures, N-of-1 and aggregated N-of-1 studies can be pursued to exploit interventions as ways of perturbing or probing human physiology – the goal being to identify relationships among different health measures or processes. Thus, if enough measures are collected over the time an individual is both receiving and not receiving an intervention, then temporal relationships between the measures can reveal likely causal relationships among them based on, for example, time series analysis, Granger regression and other techniques (McCracken, Reference McCracken2016; Molenaar, Reference Molenaar2019). Such analyses would again be significantly enhanced if the relevant health measures were collected continuously (Enderle et al., Reference Enderle, Foerster and Burhenne2016; Bentley et al., Reference Bentley, Kleiman, Elliott, Huffman and Nock2019; Anderson et al., Reference Anderson, Razavi, Pope, Yip, Cameron, Bassini-Cameron and Pearson2020). In addition, by assessing the effect of the intervention on health measures beyond a primary measure in relevant trials, potential intervention ‘repurposing’ opportunities could arise (Pushpakom et al., Reference Pushpakom, Iorio, Eyers, Escott, Hopper, Wells, Doig, Guilliams, Latimer, McNamee, Norris, Sanseau, Cavalla and Pirmohamed2019; Krishnamurthy et al., Reference Krishnamurthy, Grimshaw, Axson, Choe and Miller2022; Mucke, Reference Mucke2022). In this way, N-of-1 trials can be pursued as proof-of-concept studies for identifying multiple indications, or at least one on solid footing, for an intervention (Pushpakom et al., Reference Pushpakom, Iorio, Eyers, Escott, Hopper, Wells, Doig, Guilliams, Latimer, McNamee, Norris, Sanseau, Cavalla and Pirmohamed2019; Mucke, Reference Mucke2022). In addition, by collecting multiple health measures on an individual N-of-1 trial participant, possibly continuously and in real time, insights into that participant’s health and health trajectory can be obtained even if an intervention being tested is shown not to benefit the participant.
Whole body, biomarker validation and therapeutic drug monitoring studies
There are some very specific areas where multivariate N-of-1 trials can be pursued that will enhance the assessment of individual intervention response and enable deeper insight into human physiology, as emphasized throughout this review. We briefly describe four such areas below.
General assessment of inter-individual variation in intervention response
As noted, given that N-of-1 trials focus on individuals’ responses, they can be used to more precisely identify responders to particular interventions. In addition, if relevant studies collected sufficient data on more than one health measure then they can be used to identify potential side effects, alternative uses for the intervention and different mechanisms of action or physiological processes modulated by the intervention. In fact, it might make sense for all interventions to be evaluated for their whole-body effects in a small number of individuals as they are being developed. If done along the lines outlined in the review, such trials could shed enormous light on how substances put into the human body affect it systemically (see Figure 3).
Biomarker and surrogate endpoint validation
There is great interest in identifying better biomarkers of an intervention’s activity so that these biomarkers can be correlated with other health measures of interest (see, e.g., ‘Therapeutic Drug Monitoring Studies’ section below) (Hendrickson et al., Reference Hendrickson, Thomas, Schork and Raskind2020). In addition, there is also interest in identifying ‘surrogate endpoints’ for clinical trials that initially focus on expensive, lengthy and logistically challenging health outcome measures, and N-of-1 trials are excellent vehicles for validating biomarkers and surrogate endpoints (Burzykowski et al., Reference Burzykowski, Molenberghs and Buyse2005). As an example, consider the development and use of epigenetic clocks as surrogate endpoints in trials of geroprotectors (Schork et al., Reference Schork, Beaulieu-Jones, Liang, Smalley and Goetz2022). The belief is that if an intervention modulates or changes an epigenetic clock among participants in a trial in positive ways – thereby indicating that the intervention in question is slowing the aging rate of the individuals – then those individuals do not necessarily have to be tracked longitudinally until they develop (or do not develop) age-related diseases that the candidate geroprotector is hypothesized to prevent or treat (Mahmoudi et al., Reference Mahmoudi, Xu and Brunet2019; Kritchevsky and Justice, Reference Kritchevsky and Justice2020; Schork et al., Reference Schork, Beaulieu-Jones, Liang, Smalley and Goetz2022). Thus, the epigenetic clocks would act as a surrogate endpoint for the processes that are associated with the disease endpoints of real interest, which are modulated by the intervention. Although epigenetic clocks have been shown to be correlated with disease endpoints, they have been done so via large epidemiological studies and not in focused clinical trials measuring appropriate health measures. Therefore, it is arguable that by measuring epigenetic clocks along with health measures that underlie many common chronic age-related diseases and conditions, such as blood pressure, cholesterol level, sleep quality, etc. in appropriately powered N-of-1 trials, one might not only show that the geroprotector influences these health measures in positive ways, but also that an epigenetic clock is correlated with them as well. This would in effect validate surrogacy of the epigenetic clock at the ‘level of the individuals and the trial’ (Burzykowski et al., Reference Burzykowski, Molenberghs and Buyse2005; Buyse et al., Reference Buyse, Saad, Burzykowski, Regan and Sweeney2022).
Therapeutic drug monitoring studies
Therapeutic drug monitoring (TDM) studies consider the measurement of a drug’s concentration in an individual’s bloodstream in order to correlate the levels of the drug with the phenotype that the drug is hypothesized to modulate (Dasgupta, Reference Dasgupta2012; Clarke and Dasgupta, Reference Clarke and Dasgupta2019). Most drugs do not undergo such evaluation and testing, which is unfortunate since such studies could in theory better characterize mechanisms of action of the drug and its effects on different phenotypic endpoints. Of course, TDM studies are predicated on the assumption that there is a definable relationship between drug dose and plasma or blood drug concentration, and between concentrations and therapeutic effects. In addition, TDM studies require ways of measuring blood levels of a drug which may not be trivial. However, by more precisely measuring drug bioavailability and activity in N-of-1 trials, especially in trials for which participants are monitored for multiple health measures, one could explore temporal relationships between drug bioavailability and activity and not just, for example, pill count-based dosing and outcomes (Dasgupta, Reference Dasgupta2012; Clarke and Dasgupta, Reference Clarke and Dasgupta2019; Irving and Gecse, Reference Irving and Gecse2022; Ordutowski et al., Reference Ordutowski, Dal Dosso, De Wispelaere, Van Tricht, Vermeire, Geukens, Gils, Spasic and Lammertyn2022).
Matching based on data aggregation
As noted previously, if enough N-of-1 trials are pursued using the same interventions, and baseline health assessments with common measures have been collected on each participant, then the data and results can be aggregated and analyzed. The common baseline health examination profiles of the individuals could then be explored for patterns and correlations with intervention responses. This can enable matching a future target individual’s baseline health profile with others’ profiles who previously went through N-of-1 trials. If good matches (however defined) are found, then the interventions to which those individuals matching the target individual responded, would be reasonable first-choice interventions for the target individual (Wicks et al., Reference Wicks, Vaughan, Massagli and Heywood2011; Schork and Goetz, Reference Schork and Goetz2017; Schork et al., Reference Schork, Goetz, Lowey and Trent2020; Davidson et al., Reference Davidson, Cheung, Friel and Suls2022). Different strategies for identifying the matches could be pursued based on, for example, propensity scores and related techniques (Guo and Fraser, Reference Guo and Fraser2014; Liu and Meng, Reference Liu and Meng2016).
Conclusions and future directions
There are few health interventions whose effectiveness is ubiquitous. This can be attributed to the great genetic, physiologic, clinical, behavioral and exposure profile variation exhibited by individuals susceptible to or suffering from diseases (Schork, Reference Schork2015). Identifying interventions that benefit individuals on the basis of their nuanced and possibly unique profiles is the goal of precision or personalized medicine. However, tailoring or matching interventions to individuals will require greater understanding of intra- and inter-individual variation and intervention response and, as argued throughout, can be enabled or enhanced through the use of whole-body N-of-1 clinical trials (Figures 1 and 3).
In this light, many emerging interventions, such as cytotoxic T-cell therapies (Kiyotani et al., Reference Kiyotani, Toyoshima and Nakamura2021; Roesler and Anderson, Reference Roesler and Anderson2022), brain anatomy-guided Transcranial Magnetic Stimulation (TMS) therapies (Siddiqi et al., Reference Siddiqi, Weigand, Pascual-Leone and Fox2021; Williams et al., Reference Williams, Coman, Stetz, Walker, Kozel, George, Yoon, Hack, Madore, Lim, Philip and Holtzheimer2021) and sequence-based antisense oligonucleotide therapies (Kim et al., Reference Kim, Hu, El Achkar, Black, Douville, Larson, Pendergast, Goldkind, Lee, Kuniholm, Soucy, Vaze, Belur, Fredriksen, Stojkovska, Tsytsykova, Armant, DiDonato, Choi, Cornelissen, Pereira, Augustine, Genetti, Dies, Barton, Williams, Goodlett, Riley, Pasternak, Berry, Pflock, Chu, Reed, Tyndall, Agrawal, Beggs, Grant, Urion, Snyder, Waisbren, Poduri, Park, Patterson, Biffi, Mazzulli, Bodamer, Berde and Yu2019; Helm et al., Reference Helm, Schols and Hauser2022), are designed to only work on specific individuals given that the targets they exploit and constructs they use are based on the unique features underlying the pathologies of the individuals for whom they are designed. Testing the effectiveness of these interventions, given that no two individuals with the same condition will likely get exactly the same intervention, could make use of the proposed N-of-1 strategies. Of course, one could address very broad questions about the utility of such interventions using standard RCTs, such as whether individuals who receive the personalized interventions fare better than individuals who receive a more ‘one-size-fits-all’ intervention (Schork et al., Reference Schork, Goetz, Lowey and Trent2020).
Ultimately, the current emphasis on precision medicine, the emergence of sophisticated health monitoring technologies, and the desire of individuals to optimize their health and not simply contribute to studies that may only benefit future generations, demand better approaches to biomedical and translational science. We recognize that there might be impediments to the implementation of multivariate N-of-1 trials of the type described. For example, a greater patient burden for data collection, logistical complications in collecting different data types, and the costs of conducting and monitoring the individual participants may create barriers to the adoption and use of multivariate N-of-1 trials. However, efficient, cost-effective and participant-friendly N-of-1 clinical trials – to the degree that they can be pursued – are very likely to be an appropriate addition to biomedical and translational studies in the future given that they have at least 4 very overt advantages, including: (1) the ability to shed light on fundamental questions about human biology; (2) determine which interventions work and on whom; (3) benefit the participants in the trials directly and almost immediately by collecting vast amounts of health data on them possibly continuously and with real-time interpretive ability; and (4) pave the way for their aggregation and analysis to identify patterns that may inform their use and execution in the future.
Open peer review
To view the open peer review materials for this article, please visit http://doi.org/10.1017/pcm.2022.15.
Acknowledgements
The authors would like to thank Drs. Mark Adler and Stephanie Venn Watson for commenting on earlier versions of this manuscript.
Author contributions
N.J.S. conceived of the orientation and format for the review, and N.J.S., B.B.-J., W.S.L., S.S. and L.H.G. pursued relevant literature reviews. N.J.S. wrote the initial draft, L.H.G. edited the initial draft and B.B.-J., W.S.L. and S.S. edited the subsequent drafts.
Financial support
N.J.S. is supported in part by the following grant support from the National Institutes of Health: 1 U19 AG056169-01A1; U2C CA252973; UH3 AG064706; UH2 AG06470602S1 and U19 AG023122.
Competing interest
N.J.S., S.S. and L.H.G. are founders of net. Bio, a company focusing on pursuing novel clinical protocols to ensure that individuals benefit from health interventions of all sorts. B.B.-J. and W.S.L. are paid consultants for net. Bio.
Comments
To whom it may concern,
Please find a manuscript entitled 'EXPLORING HUMAN BIOLOGY WITH N-OF-1 CLINICAL TRIALS' which we were invited to submit to Cambridge PRISMS by Laetitia Beck. The manuscript has not been submitted to another journal and reviews aspects of N-of-1 trials that make them appealing in an era of precision medicine.
Thanks,
Nicholas J. Schork