Hostname: page-component-669899f699-tzmfd Total loading time: 0 Render date: 2025-04-25T12:00:50.438Z Has data issue: false hasContentIssue false

22 Advancing clinical trial reporting and AI integration: Optimizing protocol data extraction using LLMs and regulatory best practices

Published online by Cambridge University Press:  11 April 2025

Ramya Sri Baluguri
Affiliation:
University of California, Davis
Nicholas Anderson
Affiliation:
University of California, Davis
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Objectives/Goals: This study aimed to enhance clinical trial data management through large language model information retrieval and generation techniques within the clinical trial reporting workflow. We focused on improving compliance with reporting, reducing human labor, and promoting standardized reporting structure and data quality oversight. Methods/Study Population: We used approved study protocols from UC Davis IRB-approved investigator-initiated studies compared to the same studies reported to ClinicalTrials.gov. Our baseline data extraction system employs commercial large language models (LLMs) and retrieval augmented generation (RAG) to isolate data sources within the secure extraction environment. We stratified protocol documents into easy, complex, and random categories based on study focus, document complexity, the extent of amendments or modifications, and completion metrics from ClinicalTrials.gov. We developed a pilot web-based architecture to capture variations in categorization, labeling, and reporting style and compared generated extraction data. We primarily focused on qualitative evaluation through a review of expert staff. Results/Anticipated Results: Our results revealed significant variations in reporting quality, with dependencies stemming from multiple authors and stages throughout the clinical trial protocol lifecycle. Based on these variations, we used prompt engineering to improve the pilot application’s output compliance with the protocol registration and results system (PRS) structured data format for various study types. We piloted the assisted workflow with prospective studies by partnering with study investigators and the clinical trial office staff to assist in review and clinical trial reporting creation. Initial studies reported by our system were approved and released to the public by PRS staff. We are refining content generation and workflows to different components of studies and evaluating their use in quality and training areas. Discussion/Significance of Impact: Our system fosters collaboration, efficient review, and compliance with clinical trial reporting standards. It supports the promise of AI-driven assistance in clinical trial management, design, and reporting. We focus on the multiple stakeholders, expertise, and data flows in the organizational management of clinical and translational science.

Type
Informatics, AI and Data Science
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
© The Author(s), 2025. The Association for Clinical and Translational Science