Introduction
Randomized clinical trials (RCTs) have long been the gold standard for generating high-quality medical evidence [Reference Gul and Ali1]. The success of RCTs depends on the timely accrual of a representative and qualified study sample, but this remains a challenge [Reference Gul and Ali1,Reference Penberthy, Dahman, Petkov and DeShazo2]. Fewer than 4% of adults in the United States (US) participate in clinical trials [Reference Penberthy, Dahman, Petkov and DeShazo2– 4], and this number has not improved since 1994, despite increasingly prolonged recruitment periods [Reference Moffat, Cannon and Shi5,Reference McDonald, Treweek and Shakur6]. Further, up to 85% of clinical trials fail to recruit or retain a sufficient sample size, leading to failures to meet accrual targets in four out of every five trials, even though nearly $1.9 billion is spent on recruitment annually [Reference Penberthy, Dahman, Petkov and DeShazo2]. Moreover, the lack of diversity and representativeness in study populations is another persistent problem. All of these cause study delays, increase costs, limit statistical power, and subsequently compromise clinical trial quality [Reference Campillo-Gimenez, Buscail and Zekri7]. It is imperative to develop methods to optimize the trial design for better feasibility, inclusiveness, and recruitment efficiency to improve the sustainability and impact of clinical trial research.
Several studies have assessed the impact of individual clinical trial characteristics on recruitment success [Reference Ross, Tu, Carini and Sim8–Reference Zwierzyna, Davies, Hingorani and Hunter10]. Factors contributing to successful recruitment include funding type (e.g., a federal agency, pharmaceutical company), trial phase (phase II having faster recruitment than phase I or phase III trials), and type of trial site (research facility or other) [Reference Tang, Sherman and Price11,Reference Avis, Smith, Link, Hortobagyi and Rivera12]. Other studies have focused on the role of the clinician or the patient in trial recruitment. Clinician efforts toward administrative preparation of the study site, increasing public awareness, and trial recommendations have enhanced enrollment, while the effectiveness of particular recruitment methods remains unclear [Reference Newington and Metcalfe13,Reference Buttgereit, Palmowski and Forsat14]. Patient factors, including insurance coverage (or lack of), perceived drawbacks of participating in research, time and travel constraints, and perception of therapeutic benefit, have been shown to directly impact the likelihood of patient enrollment [Reference Avis, Smith, Link, Hortobagyi and Rivera12]. A potential limitation in these studies is that many focused on a specific disease domain (e.g., oncology) or patient population (e.g., pediatrics), limiting the generalizability of the findings [Reference Ross, Tu, Carini and Sim8–Reference Zwierzyna, Davies, Hingorani and Hunter10].
This study extends prior work to systematically identify clinical trial features associated with recruitment success by employing large database analyses using linked clinical trial registries (one nationally managed and one at a single facility). In this study, we measured RCT recruitment success by accrual percentage [Reference Gul and Ali1,Reference Penberthy, Dahman, Petkov and DeShazo2]. Two regularized linear regression and six tree-based machine learning algorithms were compared, and the optimal algorithm (i.e., CatBoost [Reference Prokhorenkova, Gusev and Vorobev15] regressor) was applied to predict the accrual percentage of RCTs. While interpretability has been considered critical in the domain, existing works lack a comprehensive analysis of feature importance. In this work, we used Tree SHapley Additive exPlanations (SHAP) [Reference Lundberg, Erion and Chen16] for a detailed analysis of feature importance for participant recruitment to RCTs. We further conducted a subgroup analysis between RCTs with high and low accrual percentages indicated by three sets of thresholds, including ≥ 90% and ≤ 10%, ≥ 80% and ≤ 20%, as well as≥70% and ≤ 30%. Finally, recommendations for engaging stakeholders to improve recruitment are provided.
Materials and Methods
We conducted a multi-source retrospective data analysis using machine learning methods to investigate the impact of evidence-based and expert-identified features on the success of RCT recruitment. Fig. 1 depicts the overall methodology and databases used in this study.
Data Source and Trial Selection
We used two data sources: (1) Research Compliance and Administration System (RASCAL, https://rascal.columbia.edu/), a single-institution electronic clinical research registry; and (2) the Aggregate Analysis of ClinicalTrials.gov (AACT) database, a global clinical trials database by the US National Library of Medicine (https://clinicaltrials.gov/) [17]. We extracted clinical trials from RASCAL with protocol approval dates ranging from 06/04/2015 to 07/31/2019. We included randomized interventional treatment studies that were closed to further enrollment. Studies with multiple registered protocols in RASCAL or were terminated due to non-recruitment-related reasons such as loss of funding, study drug toxicity, or other administrative reasons were excluded. Additional recruitment details (i.e., number of study sites and target domain) were extracted from the AACT. Finally, studies without reported target accrual (n = 47) or with greater than 100% accrual percentage in RASCAL (n = 7) were excluded from the main analysis. Accruing more than the approved number of subjects is a violation per Columbia University Irving Medical Center’s Institutional Review Board (IRB). Though the IRB assessed studies with reported over-accrual, we cannot be certain if the information was mistakenly reported (i.e., typographical error) or if it was deemed a violation, hence the exclusion.
Data Processing and Feature Selection
We selected features based on a combination of evidence in the literature (e.g., recruitment methods [Reference George, Duran and Norris18], resources for research staff [Reference Terheyden, Behning and Lüning19,Reference Stein, Shaffer, Echo-Hawk, Smith, Stapleton and Melvin20] study design [Reference Treweek, Pitkethly and Cook21–Reference Quay, Frimer, Janssen and Lamers23], randomization [Reference Hoffman, Baker and Kunkel24,Reference Elliott, Husbands, Hamdy, Holmberg and Donovan25], and consent process [Reference Treweek, Pitkethly and Cook21]) and domain expertise (BI: 7 years as a research nurse and recruitment coordinator; JM: over 20 years as clinical research staff and eight years as multi-site project manager). A detailed list of the extracted and selected features with the selection rationale is included in Supplementary Table S1.
We distinguished the difference between enrollment and accrual based on the RASCAL definition. Individuals who agree to participate in a study, even if just for screening or assessment purposes, are considered to be enrolled in the study. On the other hand, individuals who are confirmed to be eligible for an interventional study with a screening procedure to determine eligibility that occurs after consent is obtained are regarded as accrual. The accrual-to-date number is a subset of the number of enrolled participants. Our outcome of interest, accrual percentage, was calculated by dividing the reported accrual to date by the target accrual.
For all binary variables, such as the recruitment methods class of features, we assumed that a missing value indicates the absence of a feature. One-hot encoding was applied to polytomous variables (categorical variables with more than two possible values), such as the study phase. The target clinical domain of an RCT was extracted from the relevant Medical Subject Headings (MeSH) terms displayed on ClinicalTrials.gov. MeSH are standardized keywords from a controlled and hierarchically organized vocabulary produced by the National Library of Medicine and are publicly available at http://www.ncbi.nlm.nih.gov/pubmed/advanced. For those without a relevant MeSH term, we manually mapped the conditions of an RCT to MeSH terms. Data processing was described in the Supplementary Material 1.
Finally, we utilized Pearson’s correlation coefficient to quantify the relationship between two continuous variables, the Phi correlation coefficient to evaluate correlations between two dichotomous variables, and the Point-biserial correlation coefficient for examining the association between a continuous variable and a dichotomous one.
Model Training and Evaluation
To identify factors associated with successful RCT recruitment, we first built a model to predict the accrual percentage with the selected features. We applied and compared two regularized linear regression models (i.e., Ridge regression with l2 regularization [Reference Hoerl and Kennard26] and Least Absolute Shrinkage and Selection Operator [Lasso] regression with l1 regularization [Reference Tibshirani27]) and six tree-based machine learning models, including the Decision Tree [Reference Breiman, Friedman, Olshen and Stone28], Random Forest [Reference Breiman29], AdaBoost (Adaptive Boosting) [Reference Freund and Schapire30], XGBoost (extreme gradient boosting algorithm) [Reference Chen and Guestrin31], LightGBM (Light Gradient Boosted Machine) [Reference Ke, Meng and Finley32], and CatBoost (Categorical Boosting) [Reference Prokhorenkova, Gusev and Vorobev15]. We used the Classification and Regression Trees for the Decision Tree regression, which predicts the target by learning decision rules from features [Reference Breiman, Friedman, Olshen and Stone28]. It iteratively splits data into two groups based on the feature that minimizes the cost metric until reaching the stopping criteria. It has a tree-like structure with interior nodes representing features and decision rules and leaf nodes containing a prediction score. Random forest regression combines multiple decision trees, each of which is trained on a bootstrap sample from the dataset and a random subset of features and averages the predictions to control overfitting to yield better performance [Reference Breiman29]. AdaBoost regression is a boosting ensemble model that sequentially fits a regressor on the whole dataset with adjusted weights determined by the errors in the current prediction [Reference Freund and Schapire30]. Decision Tree was selected as the regressor in this model in our study. XGBoost, a more robust gradient-boosted trees algorithm with a regularized objective function, iteratively adds decision trees built by learning the errors in prior trees [Reference Chen and Guestrin31]. LightGBM is also a gradient-boosting algorithm with Gradient-based One-Side Sampling and Exclusive Feature Bundling to achieve better efficiency and scalability [Reference Ke, Meng and Finley32]. CatBoost, another gradient boosting method, introduces ordered boosting and an algorithm for categorical features to solve the prediction shift issue [Reference Prokhorenkova, Gusev and Vorobev15].
We tuned each model’s parameters (Supplementary Table S2) by using 50-times repeated 10-fold cross-validation with grid search. In our effort to mitigate overfitting, we closely monitored the disparity between the mean Root Mean Square Error (RMSE) for the training and validation sets. The optimal parameter configuration was found based on the lowest mean validation RMSE. Subsequently, employing this optimally tuned parameter setting, we trained the model on the entirety of the dataset to inform the subsequent analysis. We also investigated how consistently the top three best-performing models identified the important features, thereby adding more confidence to the interpretation.
Moreover, for comprehensiveness of our analysis, we conducted a supplemental analysis that incorporated the seven studies excluded due to having an accrual percentage greater than 100% in RASCAL, despite the limitation that we cannot tell if these studies represented typographical errors or actual IRB violations, by using the best-performing model to explore the potential impact on our results.
Feature Importance Analysis
Tree SHAP was employed to interpret the prediction of accrual percentage and analyze the importance of individual features with respect to successful RCT recruitment. SHAP is a unified framework to interpret model predictions [Reference Lundberg and Lee33]. It calculates the contribution of each feature to the output, which is defined as the SHAP value equivalent to the Shapley value in game theory. The mean absolute SHAP value of each feature determines the order of importance. In addition to the measure of feature importance, it also identifies whether the impact of a feature on the output is positive or negative. TreeExplainer’s Tree SHAP algorithm was proposed later to estimate the SHAP values specifically for tree-based models [Reference Lundberg, Erion and Chen16].
We set specific paired thresholds to discern between the most and least successful recruitment subgroups within the RCTs. The categories were established such that the most successful recruitment group comprised those RCTs with an accrual rate of either ≥ 90%, ≥ 80%, or ≥ 70%, while the least successful recruitment groups were defined by RCTs exhibiting an accrual rate of ≤ 10%, ≤ 20%, or ≤ 30%, respectively, matching each higher threshold with its corresponding lower one. The identified important features were compared between these two subgroups, and their descriptive statistics were also calculated. Continuous variables were evaluated using Mann–Whitney U test (two-sided) with Bonferroni correction, and the binary variables were evaluated using Fisher’s Exact test (two-sided) with Bonferroni correction and a cut-off of corrected p-value < 0.05 to determine statistical significance.
Results
Descriptive Statistics
Among the 2,246 RCTs in the RASCAL dataset, 1,037 (46%) were closed for further enrollment (the terminated study was excluded). A total of 393 RCTs were included in the analysis (Fig. 2).
The average accrual percentage of the included RCTs is 46.7% (SD: 32.0%). Majority of the RCTs are Phase 3 (n = 206; 52.4%), multicenter (96.4%), and industry-funded (67.2%). Most RCTs involve non-English speaking participants (55%), drug or biologic agents (87.5%), collection of biologic specimens (91.9%), and imaging or radiation (61.3%). The most frequently reported recruitment methods are person-to-person (92.9%) and website advertisement (53.9%). Detailed descriptive statistics of the included RCT features are provided in Table 1. The correlations between all analyzed variables are outlined in the accompanying Supplementary Material 2. Key observations include a significant negative correlation of −0.542 between industry-funded studies and protocol duration, suggesting industry trials tend to be shorter. The use of website for recruitment demonstrated high positive correlations with cancer research (coefficient: 0.534) and studies with the target domain of neoplasms (coefficient: 0.493). Cancer research also showed positive associations with studies receiving internal funding (coefficient: 0.436), and the number of sites (coefficient: 0.423), but negatively correlated with studies involving participant compensation (coefficient: −0.723).
CCPH = columbia community partnership for health; CU = columbia university; NYPH = New York presbyterian hospital; Q1 = first quartile; Q3 = third quartile; SD = standard deviation.
* One RCT may have multiple answers.
# Studies may have multiple phases (e.g., Phase 1/2).
‡ Other unspecified vulnerable population other than Minors/Children, Pregnant Women, Lacking Capacity for Consent, CU/NYPH Employees, Economically Disadvantaged, Educationally Disadvantaged, and Non-English Speaking individuals.
† Studies where the expected enrollment does not specifically include, or aim to recruit from, any recognized vulnerable groups. It does not necessarily imply that these groups are excluded from participation by the eligibility criteria, but rather that they are not the targeted or anticipated demographic for recruitment.
¤ Distinctions between the different types of data collection methods used. Noninvasive measures include the gathering of physiological parameters without the use of invasive procedures, such as monitoring heart rate, measuring blood pressure, or checking temperature. Conversely, ’survey, interview, and questionnaires’ referred to tools utilized to acquire information regarding the participants’ feelings, thoughts, behaviors, or experiences through self-reporting methods. While both categories could be considered ‘noninvasive’ in the broad sense, these were separated due to the distinct types of data each method collects.
The included RCTs represented 43 clinical domains (Table 2), with pathological conditions signs and symptoms as the most commonly targeted domain (36.6%), followed by neoplasms (36.1%) and nervous system diseases (23.2%).
MeSH = medical subject headings.
A single RCT may have multiple target domains.
Model Performance
Supplementary Table S2 lists the optimal parameter setting for each model. The performances of these eight regression models for accrual percentage prediction under the optimal parameter setting are shown in Table 3. Among them, the CatBoost regressor achieved the smallest mean validation RMSE (20.31, SD: 2.53), and the difference between the mean train and validation RMSE is 5.75 (accrual percentage is within the [0,100]), signifying the model was not overfitted. Therefore, the CatBoost regressor was selected and trained on the whole dataset for feature importance analysis.
AdaBoost = adaptive boosting; CatBoost = categorical boosting; Lasso = least absolute shrinkage and selection operator; LightGBM = light gradient boosted machine; RMSE = root mean square error; SD = standard deviation; XGBoost = eXtreme gradient boosting.
Feature Importance Analysis
The top 30 most important features that are associated with RCT recruitment based on the CatBoost model are presented in Fig. 3. The top 48 most important features with mean absolute SHAP value > 0.01 are displayed in Supplementary Figure S1. We also provided the important features calculated based on the LightGBM and XGBoost models in Supplementary Figures S2 and S3. The horizontal position of a dot represents the SHAP value of a feature for an RCT. A larger positive (or negative) SHAP value indicates a higher positive (or negative) impact of the feature on the accrual percentage prediction. The color of a dot indicates the feature value. For continuous variables, the redder the dot is, the larger the value is; for binary variables, red indicates the presence of the feature in the RCT.
For the continuous feature “Protocol duration (years),” the higher the value of the feature, the larger the SHAP value (i.e., redder dot) in the positive direction, which indicates a larger positive impact on the accrual percentage. In other words, the RCTs with longer protocol duration in years are more likely to have a high accrual percentage. For the binary feature “Funding type: Federal/State/Local Government,” the SHAP values for RCTs that were funded by the government (red dots) are positive. This indicates that RCTs funded by the government are more likely to have a higher accrual percentage. In contrast, industry-funded RCTs tend to have a lower accrual percentage.
Further, findings show that RCTs with lower target accrual or lower target enrollment are associated with successful recruitment, which is understandable since it is easier to achieve a target with a smaller number of participants. We also found multicenter research tends to have a low accrual percentage. While RCTs involving medical devices were less likely to achieve recruitment success, participant compensation was positively associated with recruitment success. The longer the RCT is active (i.e., the number of protocol years), the more likely it is to accrue participants successfully. Additionally, RCTs not using websites for recruitment are more likely to have a higher accrual percentage. Also, the RCTs involving economically disadvantaged participants are more likely to have a higher accrual percentage. On the other hand, cancer research RCTs and RCTS with target domain C04 (Neoplasms) tend to have a low accrual percentage. RCTS targeting congenital, hereditary, and neonatal diseases and abnormalities (C16) and endocrine system diseases (C19) appears to have a higher percentage accrual.
When comparing the top three prediction models (i.e., CatBoost, LightGBM, and XGBoost; see Fig. 3, Supplementary Figures S2 and S3), all top 30 most important features identified by the CatBoost model are agreed by the XGBoost, and 24 of them are agreed by LightGBM. The features “Multicenter research,” “Population involve: Pregnant,” “Involves imaging or radiation,” “Target domain: C15 (Hemic and Lymphatic Diseases),” “Recruitment method not involved,” and “Written Documentation of consent waived” were deemed important in CatBoost, but not in LightGBM with a mean SHAP value < 0.01.
Finally, in the separate analysis incorporating the seven previously excluded studies, we observed some differences in feature importance (Supplementary Figure S4). Notably, the features “Involves Compensation,” “ Target domain: C10,” “No Resources Utilized,” “ Target domain: C15,” and “Written consent: non-English language not expected” were not identified as important.
Subgroup Analysis of Most and Least Successful Recruitment
Table 4 summarizes the significant differences (corrected p-value < .05) found in multiple features between the worst and best recruitment groups among different successful recruitment cutoffs. Supplementary Tables S3, S4, and S5 provide the details of the comparison.
CO4 = neoplasms; C16 = congenital, hereditary, and neonatal diseases and abnormalities.
Mann–Whitney U and Fisher’s Exact tests with Bonferroni correction were used for continuous and binary variables, respectively. The descriptive statistics of each feature for these two subgroups are also listed.
* Features are continuous variables where the median, the first quantile (Q1), and the third quantile (Q3) were calculated, whereas the others are binary variables where the count and the percentage were calculated.
At the cutoff of ≥ 90% (n = 64) vs ≤ 10% (n = 60), the “Enrolled to date” and “Target enrollment” were notably different with median values of 1.0 and 15.0 for the worst recruitment group and 11.0 and 15.0 for the best recruitment group, respectively. The feature “Cancer research” has a negative association with the accrual percentage (also see Fig. 3), as demonstrated by having more RCTs in the worst recruitment group compared to the best recruitment group (53% vs 9%). The use of a website as a recruitment method was more commonly used in the worst recruitment group. Multicenter research was also more prevalent in the worst recruitment group. Studies that did not utilize available resources were significantly more common in the best recruitment group. Further, the target accrual, protocol duration in years, number of sites, and studies with target domain neoplasms all differed between the two groups.
When the cutoff was adjusted to ≥ 80% (n = 87) vs ≤ 20% (n = 110), additional differences emerged in studies with target domain congenital, hereditary, and neonatal diseases and abnormalities. Studies involving compensation were more prevalent in the best recruitment group while studies involving imaging or radiation were more common in the worst recruitment groups. At the final cut-off of ≥ 70% (n = 108) vs ≤ 30% (n = 155), new differences were also observed in the prevalence of studies funded by government agencies and the number of modifications.
Discussion
In this retrospective data analysis using machine learning methods, we examined the factors associated with RCT recruitment success based on the accrual percentage. Overall, the accrual percentage of our sample RCTs confirms the high frequency of participant recruitment challenges [Reference Natale, Saglimbene and Ruospo34]. Consistent with the mixed evidence of how the funding type is associated with accrual [Reference Iruku, Goros and Gelfond35–Reference Fogel37], our results demonstrated that successful recruitment varies widely by funding type. In the feature importance analysis, we found that government-funded RCTs are more likely to be successful, while industry-funded studies are less likely to be successful. However, we found government funding to be significant for studies with accrual percentages of ≥ 70% and ≤ 30%. The notable negative correlation (coefficient: −0.542) observed between industry-funded studies and protocol duration corroborates the idea that industry-sponsored trials often adopt a faster pace, possibly due to higher resource availability or stricter time constraints [Reference Hay, Thomas, Craighead, Economides and Rosenthal36,Reference Fogel37].
Another key finding in this study is the negative association of the multicenter research feature with the accrual percentage. While this finding does not allow us to definitively gauge the overall success of multicenter RCTs beyond individual institutional accrual, it does imply individual sites recruit easier on single-site RCTs than for multi-site RCTs with the latter imposing more complexities and constraints, despite that multi-site RCTs may scale easily and recruit more participants quickly. Given their manageable sample size and relatively more flexible recruitment strategies that can be customized to the specific locale, single-site RCTs may exhibit higher likelihoods of success [Reference Bellomo, Warrillow and Reade38]. Notably, multicenter research did not demonstrate any substantial correlations with the other variables under investigation in this study.
In examining the target domain of the RCTs, our findings confirm that recruitment for oncology research presents more challenges than other fields, potentially due to high patient competition or stringent eligibility criteria, corroborating previous studies [Reference Campillo-Gimenez, Buscail and Zekri7,Reference Carlisle, Kimmelman, Ramsay and MacKinnon39]. However, the cancer research domain also displayed positive correlations with studies receiving internal funding (coefficient: 0.436) and those involving a larger number of sites (coefficient: 0.423), likely reflecting the high societal and clinical impact of these studies. Curiously, a negative correlation was found between cancer research and participant compensation (coefficient: −0.723), possibly suggesting that potential health benefits or access to novel therapies in cancer research can supersede financial incentives for participants. In a surprising turn, both feature importance and subgroup analyses demonstrated that RCTs targeting congenital, hereditary, and neonatal diseases and abnormalities tend to be successful. Despite inherent recruitment challenges for rare diseases [Reference Carlisle, Kimmelman, Ramsay and MacKinnon39,Reference Augustine, Adams and Mink40], factors such as targeted sample size, well-organized patient communities, specialized research institutions, and limited treatment availability – which in turn heightens the value of clinical trials for patients – may have contributed to their success [Reference Griggs, Batshaw and Dunkle41].
The procedure involvement required by the RCT can also influence recruitment success, as demonstrated by how the involvement of medical devices negatively influences accrual percentage. Perceived drawbacks of participating in research and perception of therapeutic benefit may have directly affected the likelihood of patient enrollment [Reference Avis, Smith, Link, Hortobagyi and Rivera12]. Further, we found in both analyses that proper compensation was associated with better recruitment. The observation that adequate participant compensation is associated with improved recruitment corroborates previous studies and underscores the salient role of compensation in motivating potential participants, especially among economically disadvantaged populations [Reference Treweek, Pitkethly and Cook21]. This could also contribute to why RCTs involving economically disadvantaged participants tend to be successful, as compensation can be an essential incentive for encouraging participation, particularly for individuals with financial constraints or other barriers to participation (e.g., commute to study site, missing work) [Reference Sanchez, Grzenda and Varias42]. However, this relationship necessitates ethical vigilance. A paramount concern is the possibility of undue inducement, where the attractiveness of the compensation might lead potential participants to disregard the potential risks associated with the trial or undermine the voluntariness of their participation [Reference Emanuel43]. Further, the distribution of compensation warrants scrutiny to guard against any unintentional exploitative practices or the inadvertent exclusion of certain demographic groups from trial participation [Reference Grady44]. Hence, while compensation can act as a potent recruitment tool, its deployment should be governed by a conscientious adherence to the principles of respect for persons, beneficence, and justice, as outlined in seminal ethical guidelines such as the Belmont Report [Reference Friesen, Kearns, Redman and Caplan45].
Lastly, we did not find any recruitment method that is positively associated with accrual, implying there is no “one-fitting-all” solution for recruitment so investigators should also analyze the recruitment situation case by case and seek appropriate methods. A flexible infrastructure for recruitment is needed. Though the person-to-person recruitment method is the most commonly used (93%), it did not demonstrate an association with the accrual percentage. However, previous evidence shows that person-to-person recommendations tend to be trusted more than other methods and can influence a potential participant’s decision to participate in an RCT [Reference Van Hoye, Weijters, Lievens and Stockman46]. In line with previous findings [Reference Galbreath, Smith, Wood, Forkner and Peters47], the use of websites, direct mail, and television advertisements for recruitment was negatively associated with accrual percentage. A possible explanation could be that the RCTs struggling with recruitment may exploit more recruitment strategies to expand their outreach. Furthermore, research teams may have other strategies (e.g., chart reviews [Reference Fogel37], clinician engagement [Reference Buttgereit, Palmowski and Forsat14]) that are outside the scope of our data.
Recommendations for Recruitment Improvement
A critical area for increasing recruitment success is focusing overall recruitment strategies based on the population of interest. Previous research efforts have highlighted how passive recruitment methods leveraging novel technologies (e.g., online advertisements, web-based screening tools) can drastically reduce the time and cost associated with clinical trial coordination; however, the effectiveness can depend on the potential participant’s time online and computer literacy levels [Reference Arab, Hahn, Henry, Chacko, Winter and Cambou48]. Although technology provides a wide array of novel recruitment methods, community engagement may be more beneficial depending on the population. For example, personal and community-focused strategies have been successful in racial and ethnic minority populations recruitment [Reference Haley, Southwick, Parikh, Rivera, Farrar-Edwards and Boden-Albala49]. Recruitment methods that demonstrated a negative association (i.e., website, radio, direct mail, and television; see Supplementary Figure S1) with recruitment success should be employed with the understanding that these methods alone may not be sufficient.
Furthermore, planning and implementing a flexible recruitment infrastructure and a comprehensive approach to recruitment is necessary for studies with challenges in accrual (e.g., oncology, medical device involvement, imaging, and radiation involvement). Hence, it is not just about casting the net wide; it’s about casting it smartly, which involves several key aspects. One, we need to ensure we have the appropriate funds allocated to our recruitment efforts. Two, we must invest in the proper training for our clinical research staff so they are equipped to handle nuanced recruitment strategies. And three, patient education is crucial. We need to make sure potential participants understand the trial, its benefits, and its risks.
Lastly, and quite importantly, our research underscores the significant benefit of fairly compensating participants. While our results indicate that patient compensation is associated with higher accrual, we cannot make a definitive recommendation for increasing patient compensation as a strategy to enhance recruitment. Rather, we suggest that trial designers consider our findings as one piece of a complex puzzle when planning their recruitment strategies. Participant compensation not only aids recruitment but also helps reduce the burden on those who participate in these trials, particularly for individuals who may have to travel long distances or miss work to participate in the RCT. Compensation can help ensure that our trials are as inclusive and equitable as possible, by enabling a more diverse range of participants.
By optimizing recruitment strategies, trials can be made more cost-effective, and most importantly, diverse. Therefore, greater emphasis on a thoughtful and successful implementation of these novel recruitment strategies could serve as an essential step for future improvement in recruitment practices.
Strengths, Limitations, and Next Steps
This study has several strengths and limitations. To our knowledge, this is the first study to use a data-driven method to systematically identify factors associated with recruitment successes across disease types, trial designs, recruitment methods, funding types, and patient population (e.g., non-English speaking, economically disadvantaged). However, due to the nature of the retrospective analysis, we were unable to establish causality between the collected features and successful patient recruitment. Further, though we used multiple data sources, the RCTs analyzed are from a single institution; hence, future studies are warranted to test the generalizability of the findings to other institutions. In addition, we were unable to include features that have previously shown substantial influence on clinical trial enrollment, such as the number of competing trials and eligibility criteria complexity due to the incompleteness of the information in our dataset [Reference Franks, Liu, Elkind, Reilly, Weng and Lee50]. Additionally, since studies may not report all recruitment methods and characteristics, our findings could be affected by potential underreporting or incomplete data; this should be considered when interpreting the results. Besides, we made an effort to tune the parameters of models to improve their performances, but there may be additional configurations that we did not explore that could lead to further improvement. Future work in this field should include more longitudinal data collection, improved automated natural language processing, and a greater expansion of trial information for modeling to address these stated limitations and further enhance our understanding of patient recruitment. Finally, assessing the impact of our suggested actions is crucial for validating their effectiveness in enhancing participant recruitment, allowing for a stronger appraisal of our recommendations’ potential benefits.
Conclusion
With continuing challenges in accruing sufficient participants for RCTs, it is imperative to investigate the factors influencing recruitment success to develop more effective solutions. This multi-source retrospective study demonstrated key features that are positively (e.g., government funding, compensation, and target domains on congenital, hereditary, and neonatal diseases and abnormalities) and negatively (e.g., cancer research, recruitment methods) associated with participant recruitment into RCTs. Further, multicenter RCTs tended to have poor accrual percentages in a single institution. Finally, actionable steps are provided to allow clinical researchers and research centers to improve participant recruitment in the future. Though further exploration of the causative relationships between the features and successful recruitment, the scope of this analysis is unprecedented and provides greater generalizability to its findings than previously reported. It also leverages machine learning approaches for assessing various RCT features, strengthening future research efforts in this space.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/cts.2023.623.
Acknowledgments
The content is solely the responsibility of the authors and does not represent the official views of the National Institutes of Health.
Funding statement
This work was supported by the National Library of Medicine grants R01LM009886 (CW, YF), RO1LM012895 (CW, CL), and T15LM007079 (BI), the National Human Genome Research Institute Home grant U01HG008680 (CW, GH, WC, CL), and the National Center for Advancing Clinical and Translational Science grants OT2TR003434 (CW, CT), UL1TR001873 (CW, KM, WC) and U24TR001579 (CW).
Competing interests
The authors have no conflicts of interest to declare.