Hostname: page-component-669899f699-g7b4s Total loading time: 0 Render date: 2025-04-25T07:28:26.004Z Has data issue: false hasContentIssue false

6 Exploring data scraping on ClinicalTrials.gov to identify key variables to include in an EHR-based recruitment tool

Published online by Cambridge University Press:  11 April 2025

Sydney Lash
Affiliation:
University of North Carolina at Chapel Hill
Emily Pfaff
Affiliation:
University of North Carolina at Chapel Hill
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Objectives/Goals: Failure to achieve recruitment goals results in termination of ~20% of clinical trials and delays >85% of trial timelines. We aim to develop an electronic heath record (EHR)-based recruitment tool to ease identification of participants. We sought to determine whether criteria listed on clinicaltrials.gov could support selection of tool variables. Methods/Study Population: To inform the variables to include in the EHR-based recruitment tool, we data scraped clinicaltrials.gov to identify key inclusion and exclusion criteria common across a variety of diabetes clinical trials. We included actively recruiting or recently active phase 2 and 3 clinical trials of adults aged >18 years of age in the USA. We classified identified variables as clinically relevant or not and compared clinically relevant terms with inclusion and exclusion criteria (~20 variables) that were individually identified by three diabetes clinical trialists and two clinical research coordinators (CRCs). Results/Anticipated Results: We reviewed 203 clinical trials listed on clinicaltrials.gov. We identified 115 terms, 91 of which were clinically relevant. Three of 3 clinical trialists, 1 of 2 CRCs, and all trials listed age as a key variable. Consistent with data scraping, all trialists and CRCs identified glucose-lowering medications and kidney function as important criteria. Gender, ethnicity, and race were less commonly noted on clinicaltrials.gov and listed by 2 of 3 trialists and one CRC. Cardiovascular conditions (e.g., history of myocardial infarction), thyroid function tests, and contraceptive requirements were common criteria on clinicaltrials.gov, but only one trialist and one CRC identified these variables. Active infections (e.g., HIV) and c-peptide were not highlighted by trialists or CRCs but common on clinicaltrials.gov. Discussion/Significance of Impact: An EHR-based recruitment tool may facilitate identification of trial participants, but identifying key variables to include is essential. We found that data scraping for variables on clinicaltrials.gov mostly aligned with expert opinion, suggesting that automating variable selection via extraction from clinicaltrials.gov may be acceptable.

Type
Contemporary Research Challenges
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
© The Author(s), 2025. The Association for Clinical and Translational Science