OP86 Chatbot-Based Symptom-Checkers: A Systematic Review

Reinhard Jeindl; Gregor Goetz

doi:10.1017/S0266462322001325

Introduction

Symptom-checkers are digital health applications (DHA) with diagnostic algorithms. These symptom-checkers claim to improve the diagnostic process and patient guidance. After asking the user to describe the symptoms using a chatbot interface, the symptom-checkers offer a list of potential diagnoses, and/or give recommendations for appropriate action (self-care, doctor’s visit, or emergency care). Because of the growing number and increasing use of these diagnostic DHA, there is a need to evaluate the evidence.

Methods

We updated a British evidence synthesis on symptom-checkers from the National Institute for Health Research (NIHR, 2019). For the systematic update search, we selected four databases. The following endpoints were selected: effectiveness, safety, diagnostic accuracy, triage accuracy, organizational and patient-relevant endpoints. For accuracy studies included from the update search, we assessed the risk of bias (RoB) using the quality assessment tool of diagnostic accuracy studies (QUADAS-2).

Results

The NIHR-report included 27 studies. We added 14 additional studies via update search. One randomized-controlled-trial (RCT) reported a prolonged illness duration when using symptom-checkers (statistically non-significant). No harms when using symptom-checkers were identified (six observational studies). The diagnostic accuracy ranged from 14-84.3 percent (ten observational studies), the triage accuracy ranged from 33-100 percent (eleven observational studies). For organizational endpoints, the results were inconsistent (one RCT, six observational studies). The patient perspective indicates a high usability for symptom-checkers, but the limited description of symptoms and the missing verbal interaction with health personnel were mentioned as hindering factors (nine survey-studies). The QUADAS-2 assessment for RoB was low in one, and high in seven studies.

Conclusions

The studies were often conducted using fictitious case-vignettes, limiting the validity of the evidence. Therefore, the results for the diagnostic and triage accuracy are insufficient to demonstrate a benefit in real-world settings. Additionally, there is a concern for misdiagnosis and overdiagnosis. We recommend a continuous monitoring of these diagnostic DHA, using high-quality studies.

Article contents

OP86 Chatbot-Based Symptom-Checkers: A Systematic Review

Abstract

Article contents

OP86 Chatbot-Based Symptom-Checkers: A Systematic Review

Abstract

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests