Large Language Model (LLM)-Powered Chatbots Fail to Generate Guideline-Consistent Content on Resuscitation and May Provide Potentially Harmful Advice

Alexei A. Birkun; Adhish Gautam

doi:10.1017/S1049023X23006568

Large Language Model (LLM)-Powered Chatbots Fail to Generate Guideline-Consistent Content on Resuscitation and May Provide Potentially Harmful Advice

Published online by Cambridge University Press: 06 November 2023

Alexei A. Birkun

and

Adhish Gautam

Show author details

Alexei A. Birkun*: Affiliation:
Department of General Surgery, Anaesthesiology, Resuscitation and Emergency Medicine, Medical Academy named after S.I. Georgievsky of V.I. Vernadsky Crimean Federal University, Simferopol, 295051, Russian Federation
Adhish Gautam: Affiliation:
Regional Government Hospital, Una (H.P.), 174303, India
*: Correspondence: Alexei A. Birkun, MD, DMedSc Medical Academy named after S.I. Georgievsky of V.I. Vernadsky Crimean Federal University Lenin Blvd, 5/7, Simferopol, 295051, Russian Federation E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Introduction:

Innovative large language model (LLM)-powered chatbots, which are extremely popular nowadays, represent potential sources of information on resuscitation for the general public. For instance, the chatbot-generated advice could be used for purposes of community resuscitation education or for just-in-time informational support of untrained lay rescuers in a real-life emergency.

Study Objective:

This study focused on assessing performance of two prominent LLM-based chatbots, particularly in terms of quality of the chatbot-generated advice on how to give help to a non-breathing victim.

Methods:

In May 2023, the new Bing (Microsoft Corporation, USA) and Bard (Google LLC, USA) chatbots were inquired (n = 20 each): “What to do if someone is not breathing?” Content of the chatbots’ responses was evaluated for compliance with the 2021 Resuscitation Council United Kingdom guidelines using a pre-developed checklist.

Results:

Both chatbots provided context-dependent textual responses to the query. However, coverage of the guideline-consistent instructions on help to a non-breathing victim within the responses was poor: mean percentage of the responses completely satisfying the checklist criteria was 9.5% for Bing and 11.4% for Bard (P >.05). Essential elements of the bystander action, including early start and uninterrupted performance of chest compressions with adequate depth, rate, and chest recoil, as well as request for and use of an automated external defibrillator (AED), were missing as a rule. Moreover, 55.0% of Bard’s responses contained plausible sounding, but nonsensical guidance, called artificial hallucinations, that create risk for inadequate care and harm to a victim.

Conclusion:

The LLM-powered chatbots’ advice on help to a non-breathing victim omits essential details of resuscitation technique and occasionally contains deceptive, potentially harmful directives. Further research and regulatory measures are required to mitigate risks related to the chatbot-generated misinformation of public on resuscitation.

Keywords

artificial hallucination artificial intelligence cardiac arrest cardiopulmonary resuscitation chatbot large language model

Type: Original Research
Information: Prehospital and Disaster Medicine , Volume 38 , Issue 6 , December 2023 , pp. 757 - 763

DOI: https://doi.org/10.1017/S1049023X23006568 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press on behalf of the World Association for Disaster and Emergency Medicine

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Haleem, A, Javaid, M, Singh, RP. An era of ChatGPT as a significant futuristic support tool: a study on features, abilities, and challenges. BenchCouncil Transactions on Benchmarks, Standards, and Evaluations. 2022;2(4):100089.CrossRef Google Scholar

De Angelis, L, Baglivo, F, Arzilli, G, et al. ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Front Public Health. 2023;11:1166120.CrossRef Google Scholar PubMed

Hassani, H, Silva, ES. The role of ChatGPT in data science: how AI-assisted conversational interfaces are revolutionizing the field. Big Data Cogn Comput. 2023;7(2):62.CrossRef Google Scholar

Ahn, C. Exploring ChatGPT for information of cardiopulmonary resuscitation. Resuscitation. 2023;185:109729.CrossRef Google Scholar PubMed

Altamimi, I, Altamimi, A, Alhumimidi, AS, Altamimi, A, Temsah, MH. Snakebite advice and counseling from artificial intelligence: an acute venomous snakebite consultation with ChatGPT. Cureus. 2023;15(6):e40351.Google Scholar PubMed

Birkun, AA, Gautam, A. Instructional support on first aid in choking by an artificial intelligence-powered chatbot. Am J Emerg Med. 2023;70:200–202.CrossRef Google Scholar PubMed

Dahdah, JE, Kassab, J, Helou, MCE, Gaballa, A, Sayles, S 3rd, Phelan, MP. ChatGPT: a valuable tool for emergency medical assistance. Ann Emerg Med. 2023;82(3):411–413.CrossRef Google Scholar PubMed

Fijačko, N, Gosak, L, Štiglic, G, Picard, CT, John Douma, M. Can ChatGPT pass the life support exams without entering the American Heart Association course? Resuscitation. 2023;185:109732.CrossRef Google Scholar PubMed

Sarbay, İ, Berikol, GB, Özturan, İU. Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): a preliminary, scenario-based cross-sectional study. Turkish J Emerg Med. 2023;23(3):156.CrossRef Google Scholar PubMed

Berg, KM, Cheng, A, Panchal, AR, et al. Part 7: Systems of Care: 2020 American Heart Association Guidelines for Cardiopulmonary Resuscitation and Emergency Cardiovascular Care. Circulation. 2020;142(16_suppl_2):S580–S604.CrossRef Google Scholar PubMed

Semeraro, F, Greif, R, Böttiger, BW, et al. European Resuscitation Council Guidelines 2021: systems saving lives. Resuscitation. 2021;161:80–97.CrossRef Google Scholar PubMed

Liu, KY, Haukoos, JS, Sasson, C. Availability and quality of cardiopulmonary resuscitation information for Spanish-speaking population on the Internet. Resuscitation. 2014;85(1):131–137.CrossRef Google Scholar PubMed

Metelmann, B, Metelmann, C, Schuffert, L, Hahnenkamp, K, Brinkrolf, P. Medical correctness and user friendliness of available apps for cardiopulmonary resuscitation: systematic search combined with guideline adherence and usability evaluation. JMIR Mhealth Uhealth. 2018;6:e190.CrossRef Google Scholar PubMed

Birkun, A, Gautam, A, Trunkwala, F, Böttiger, BW. Open online courses on basic life support: availability and resuscitation guidelines compliance. Am J Emerg Med. 2022;62:102–107.CrossRef Google Scholar PubMed

Birkun, AA, Dr, Gautam A.. Google’s advice on first aid: evaluation of the search engine’s question-answering system responses to queries seeking help in health emergencies. Prehosp Disaster Med. 2023;38(3):345–351.CrossRef Google Scholar PubMed

Perkins, GD, Colquhoun, M, Deakin, CD, et al. Resuscitation Council UK. 2021 Resuscitation Guidelines. Adult Basic Life Support Guidelines, 2021. https://www.resus.org.uk/library/2021-resuscitation-guidelines/adult-basic-life-support-guidelines. Accessed August 15, 2023.Google Scholar

Birkun, A, Gautam, A. Dataset of analysis of the large language model-powered chatbots’ advice on help to a non-breathing victim. Mendeley Data. 2023;V1.CrossRef Google Scholar

Kincaid, JP, Fishburne, RP Jr, Rogers, RL, Chissom, BS. Derivation of New Readability Formulas (Automated Readability Index, Fog Count, and Flesch Reading Ease Formula) for Navy Enlisted Personnel. Millington, Tennessee USA: Naval Technical Training Command, Millington TN Research Branch; 1975.CrossRef Google Scholar

Datayze. Readability Analyzer. https://datayze.com/readability-analyzer. Accessed August 15, 2023.Google Scholar

Peters, J. The Bing AI bot has been secretly running GPT-4. The Verge. https://www.theverge.com/2023/3/14/23639928/microsoft-bing-chatbot-ai-gpt-4-llm. Accessed August 15, 2023.Google Scholar

Microsoft Bing. Bing Webmaster Guidelines. https://www.bing.com/webmasters/help/webmasters-guidelines-30fba23a. Accessed August 15, 2023.Google Scholar

Bard. Bard FAQ. https://bard.google.com/faq. Accessed August 15, 2023.Google Scholar

Search Engine Land. SEO. Breaking Bard: Google’s AI chatbot lacks sources, hallucinates, gives bad SEO advice. https://searchengineland.com/google-bard-first-looks-394583. Accessed August 15, 2023.Google Scholar

Alkaissi, H, McFarlane, SI. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus. 2023;15:e35179.Google Scholar PubMed

Bickmore, TW, Trinh, H, Olafsson, S, et al. Patient and consumer safety risks when using conversational assistants for medical information: an observational study of Siri, Alexa, and Google Assistant. J Med Internet Res. 2018;20(9):e11510.CrossRef Google Scholar PubMed

Picard, C, Smith, KE, Picard, K, Can Alexa, Douma MJ., Cortana, Google Assistant and Siri save your life? A mixed-methods analysis of virtual digital assistants and their responses to first aid and basic life support queries. BMJ Innovations. 2020;6.CrossRef Google Scholar

Bing. Introducing the new Bing. https://www.bing.com/new. Accessed August 15, 2023.Google Scholar

Birkun and Gautam supplementary material

File 12.9 KB

Article contents

Large Language Model (LLM)-Powered Chatbots Fail to Generate Guideline-Consistent Content on Resuscitation and May Provide Potentially Harmful Advice

Abstract

Keywords

Access options

References

Birkun and Gautam supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests