Introduction
Eating disorders (EDs) are serious psychiatric illnesses, taking a life every 52 minutes (Deloitte Access Economics, 2020). Only ~50% of adults with EDs respond to existing evidence-based treatments (Bulik, Berkman, Brownley, Sedway, & Lohr, Reference Bulik, Berkman, Brownley, Sedway and Lohr2007; Keel & Mitchell, Reference Keel and Mitchell1997; Steinhausen, Reference Steinhausen2009; Steinhausen & Weber, Reference Steinhausen and Weber2009; van den Berg et al., Reference van den Berg, Houtzager, de Vos, Daemen, Katsaragaki, Karyotaki and Dekker2019). Relapse is common (Walsh, Xu, Wang, Attia, & Kaplan, Reference Walsh, Xu, Wang, Attia and Kaplan2021), which contributes to the ‘revolving door phenomenon’ where individuals cyclically admit and discharge from ED treatment. Given the high mortality rate associated with EDs (Arcelus, Mitchell, Wales, & Nielsen, Reference Arcelus, Mitchell, Wales and Nielsen2011) and the financial burden of accessing treatment (Deloitte Access Economics, 2020), improved treatment that is effective for more individuals seeking recovery from EDs is critical. Three factors contribute to the suboptimal treatment response and high relapse rates: (1) heterogeneity among those with EDs, even within the same diagnostic group (Levinson et al., Reference Levinson, Hunt, Christian, Williams, Keshishian, Vanzhula and Ralph-Nearman2022; Thompson, Berg, & Shatford, Reference Thompson, Berg and Shatford1987); (2) lack of day-to-day and moment-to-moment support in an individual's naturalistic environment; and (3) the lack of knowledge of how individual heterogeneity in cognitions, emotions, and behaviors impact recovery. Thus, there is a need for passive treatment support methods specifically available for use in an individual's everyday life.
Understanding psychological phenomena in naturalistic settings represents one approach that could lead to improved treatments and reduced relapse rates. Technological advancements have facilitated new areas of clinical research that better bridge empirical research to ‘everyday life’, including the use of ambulatory assessment (e.g. ecological momentary assessment [EMA]; Stone and Shiffman, Reference Stone and Shiffman1994) to inform psychological interventions (Wright & Zimmermann, Reference Wright and Zimmermann2019). One of the most common methods of ambulatory assessment used in the past decade is time-intensive repeated survey measurements delivered through mobile application devices (e.g., Hasselhorn, Ottenstein, and Lischetzke, Reference Hasselhorn, Ottenstein and Lischetzke2022). EMA designs have been increasingly implemented in investigating EDs, yet most EMA research has relied solely on self-report instruments (Presseller, Patarinski, Fan, Lampe, & Juarascio, Reference Presseller, Patarinski, Fan, Lampe and Juarascio2022; Schaefer, Engel, & Wonderlich, Reference Schaefer, Engel and Wonderlich2020; Smith et al., Reference Smith, Mason, Juarascio, Schaefer, Crosby, Engel and Wonderlich2019). Though EMA ostensibly captures dynamic symptom relations in everyday life, intensive self-report approaches have some limitations, such as: participant burden from participants repeatedly completing surveys, subjectivity, social desirability, problems with self-reflection in EDs, and the lack of complete understanding regarding how momentary changes in other relevant process, such as neurocognition, location, and biology may impact ED symptoms (Smith et al., Reference Smith, Mason, Juarascio, Schaefer, Crosby, Engel and Wonderlich2019). Collectively, EMA work suggests that ED behaviors and cognitions fluctuate as a function of time and person (Lavender et al., Reference Lavender, de Young, Wonderlich, Crosby, Engel, Mitchell and le Grange2013; Levinson et al., Reference Levinson, Hunt, Christian, Williams, Keshishian, Vanzhula and Ralph-Nearman2022), highlighting the importance of repeated assessment and analysis at the individual level due to the high heterogeneity from person-to-person even within the same ED diagnostic category.
Along with these explicit momentary measures there are more implicit approaches, such as research examining physiological correlates of EDs, which has largely been conducted in laboratory settings (Christian, Cash, Cohen, Trombley, & Levinson, Reference Christian, Cash, Cohen, Trombley and Levinson2023; Presseller et al., Reference Presseller, Patarinski, Fan, Lampe and Juarascio2022). However, such studies related to EDs are sparse, and laboratory-based designs lack the ecological validity needed for support and interventions in individuals' everyday lives. Wearable sensors, such as the E4 Empatica band, have demonstrated good reliability and validity to be used to passively collect common physiological data such as blood-volume pulse (heart rate [HR]), electrodermal activity ([EDA]; skin resistance and conductance variation, controlled by the sympathetic nervous system in response to arousal), and peripheral skin temperature (PST) in everyday life in a variety of populations (e.g., McCarthy, Pradhan, Redpath, and Adler, Reference McCarthy, Pradhan, Redpath and Adler2016; Park, Jeong, Park, and Lee, Reference Park, Jeong, Park and Lee2019; Ragot, Martin, Em, Pallamin, and Diverrez, Reference Ragot, Martin, Em, Pallamin and Diverrez2018; Ravindran et al., Reference Ravindran, Della Monica, Atzori, Lambert, Revell and Dijk2022; Schuurmans et al., Reference Schuurmans, de Looff, Nijhof, Rosada, Scholte, Popma and Otten2020; van Lier et al., Reference van Lier, Pieterse and Garde2020). Preliminary findings using wearable sensor devices reveal physiological recordings that correspond with self-reported ED symptoms. For example, continuous glucose monitoring demonstrates that glucose levels correspond with self-reported bulimic symptoms of fasting, binge eating, and purging (Presseller, Parker, Lin, Weimer, & Juarascio, Reference Presseller, Parker, Lin, Weimer and Juarascio2020). As individuals with ED may be particularly limited in their ability to accurately know and report their emotions, cognitions, and behaviors in the moment, evidence suggests that physiological recordings may be used to accurately provide real-time assessment of loss of control eating (Ranzenhofer et al., Reference Ranzenhofer, Engel, Crosby, Haigney, Anderson, McCaffery and Tanofsky-Kraff2016) without the burden or limitations of intensive self-report. Additionally, a new review (Ralph-Nearman et al., Reference Ralph-Nearman, Osborn, Chang and Barber2023, under review) demonstrates that physiological patterns often differ between different types of ED diagnoses and behaviors (e.g. restriction v. purging/compensation). These results specifically suggest that EDs characterized by purging behaviors (e.g., BN) have changes in HR reactivity, EDA responses, and PST associated to ED-related stimuli, which may differ from other EDs characterized by restriction or binge-eating (Krantz et al., Reference Krantz, Blalock, Tanganyika, Farasat, McBride and Mehler2020; Ortega-Roldan, Rodríguez-Ruiz, Perakakis, Fernandez-Santaella, & Vila, Reference Ortega-Roldan, Rodríguez-Ruiz, Perakakis, Fernandez-Santaella and Vila2014; Papežová, Yamamotova, & Uher, Reference Papežová, Yamamotova and Uher2005). Thus, physiological recordings, such as HR, EDA, and PST, obtained in individuals’ daily lives detected with enough time pre-onset of ED behaviors may preemptively indicate ED symptom engagement and, subsequently, precise time points of intervention before an individual is aware of these changes (Juarascio, Parker, Lagacey, & Godfrey, Reference Juarascio, Parker, Lagacey and Godfrey2018; Levinson, Christian, Shankar-Ram, Brosof, & Williams, Reference Levinson, Christian, Shankar-Ram, Brosof and Williams2019; Smith & Juarascio, Reference Smith and Juarascio2019).
More recent implicit approaches have also leveraged passive data collection (e.g. wearable sensors, smartphones) with machine learning (ML) to detect, predict, or intervene on clinical phenomena in other types of disorders outside EDs. For example, passive physiological recordings, such as those utilized in the present pilot study (HR, EDA, PST), have been shown to be able to detect or predict depressive symptoms (Zarate, Stavropoulos, Ball, de Sena Collier, & Jacobson, Reference Zarate, Stavropoulos, Ball, de Sena Collier and Jacobson2022), variability in anxiety and avoidance symptoms (Jacobson & Bhattacharya, Reference Jacobson and Bhattacharya2022), changes in perceived safety and discomfort (Welch et al., Reference Welch, Warning, Narayanan, Nethala, Do, Vanaparthy and Daisey2022), and behavioral outbursts in real-time in samples with major depressive disorder, anxiety disorders, and autism spectrum disorder (Alban et al., Reference Alban, Ayesh, Alhaddad, Khalid Al-Ali, So, Connor and Cabibihan2021; Goodwin, Mazefsky, Ioannidis, Erdogmus, & Siegel, Reference Goodwin, Mazefsky, Ioannidis, Erdogmus and Siegel2019; Northrup et al., Reference Northrup, Goodwin, Peura, Chen, Taylor, Siegel and Mazefsky2022; Welch et al., Reference Welch, Pennington, Vanaparthy, Do, Narayanan, Popa and Kuravackel2023). Such research has demonstrated robust detective and predictive performance, which has led to providing preventative support and real-time intervention for unhelpful behaviors across a range of mental and physical issues (e.g. Clifton, Clifton, Pimentel, Watkinson, and Tarassenko, Reference Clifton, Clifton, Pimentel, Watkinson and Tarassenko2014; Regalia, Onorati, Lai, Caborni, and Picard, Reference Regalia, Onorati, Lai, Caborni and Picard2019).
Importantly, knowledge on EDs predominately rests upon scholarship that uses explanatory methods (e.g. statistical inference) (Wang, Reference Wang2021). However, to advance the ED field from modeling existing data to forecasting unseen data (i.e. future behavior), ML approaches are well-suited for detecting the onset of ED behavior and, potentially, in turn, allowing for the delivery of evidence-based intervention before relapse may occur (Wang, Reference Wang2021).
In this pilot we aim to use minimal physiological features, none requiring extensive data cleaning, with machine learning, in order to be easily implemented and used in the future within the real-world context. With the future aims in mind, scientifically we choose to start with these three measures (HR, EDA, and PST) which are shown to be able to detect or predict other problematic symptoms and behaviors. HR, EDA, and PST are features which are commonly detected from wearables and have been shown to have the ability to detect or predict depressive symptoms, changes in perceived safety and discomfort, and behavioral outbursts in real-time.
Thus, the current pilot study uniquely examines the detective performance of ML models using physiological recordings collected via a wearable sensor device to classify ED behaviors among six individuals diagnosed with an ED who endorsed purging behaviors. As research on physiological recordings from wearable sensors (Levinson et al., Reference Levinson, Christian, Shankar-Ram, Brosof and Williams2019; Presseller et al., Reference Presseller, Patarinski, Fan, Lampe and Juarascio2022) and ML is limited in ED literature (Wang, Reference Wang2021), the current study was an exploratory pilot. Our principal aim was to build idiographic (N = 1) detective models using passively collected physiological recordings commonly detected from wearables (HR, EDA, and PST, e.g. Christian et al., Reference Christian, Cash, Cohen, Trombley and Levinson2023; Welch et al., Reference Welch, Pennington, Vanaparthy, Do, Narayanan, Popa and Kuravackel2023) to evaluate ML model performance in detecting individual ED behaviors in their everyday life.
Method
Participants
A convenience sample of the first six participants, ranging from 20 to 38 years old (Mage = 29.5; s.d. = 7.3) who were diagnosed with an ED endorsing purging with adequate data, were selected from an ongoing study (N = 120) for the present pilot. This current pilot sample was recruited, and data collected representatively of the participants within the on-going study. The ongoing study recruited individuals with EDs residing in the United States to participate in a study that uses mobile and sensor technology to predict ED relapse and recovery. Participants were recruited via advertisements on social media, alumni lists from treatment centers, and research participants who have consented to be contacted about research studies. Participants were compensated with an Amazon or Target gift card valued between $25 to $110 for their participation based on the number of days the sensor band was worn and the number of EMA surveys completed.
Procedure
All study procedures were approved by the University of Louisville's Institutional Review Board (22.0503), and all participants provided verbal informed consent. Participants completed an initial phone screening to determine eligibility. To participate, individuals had to meet criteria for an active or partial-remission diagnosis of anorexia nervosa (AN), bulimia nervosa (BN), or otherwise specified ED-atypical AN (OSFED-AAN). Exclusion criteria were: active suicidality, mania, psychosis or medical instability (i.e., endorsing current chest pain, dizziness, shortness of breath, blurred vision, seeing dark spots within the past 24 hours, purging more than three times within the past 24 hours, or consuming fewer than 500 calories within the past 48 hours). Eligible participants completed an online baseline questionnaire where they reported ED and comorbid psychiatric symptoms. The research team mailed participants an Empatica E4 wristband, which participants were instructed to wear for 30 days and to use to indicate ED behaviors, charging the band at night.
Measures
Demographics and diagnoses
Participants self-reported their gender, race, age, comorbidities, and socioeconomic status, and all participants endorsed using medication (7 psychiatric and 2 non-psychiatric medications; see Tables 1 and 2). The Structured Clinical Interview for DSM-5 (SCID-5; First, Williams, Karg, and Spitzer, Reference First, Williams, Karg and Spitzer2015) ED modules and the ED Diagnostic Scale (EDDS; Stice, Telch, and Rizvi, Reference Stice, Telch and Rizvi2000, Reference Stice, Fisher and Martinez2004) were used to determine eligibility criteria and current ED diagnosis.
Note. AN, anorexia nervosa; BN, bulimia nervosa; A, Atypical; BP, Binge-Purge; p, Purge; OSFED, other specified feeding or eating disorder; OCD, obsessive compulsive disorder; PTSD, post-traumatic stress disorder; GAD, generalized anxiety disorder. ABNP (OSFED) meets all the criteria for BN, with less frequency/duration (<1 × per week/<3 months).
Physiological recordings
Over 30 days, we continuously collected the following physiological measurements passively across the waking hours of individuals using the Empatica E4 wristband, which collects continuous physiological indices, of which the present pilot will use indices common within literature to detect onset of disorders outside EDs (HR, EDA, and PST). Immediately after the screening for eligibility, participants were instructed by a researcher about how to use the wearable and were also given contact information in case they had questions or problems. At this time, participants' instructions included how to put the sensor band on in the morning when waking, to take off the sensor band when showering or swimming, how to endorse an ED behavior, how to endorse exercise and eating, and how to charge the E4 during sleeping hours. Empatica sensor-technology is used in the health sciences to identify physiological patterns associated with illness behaviors (Bidwell, Khuwatsamrit, Askew, Ehrenberg, & Helmers, Reference Bidwell, Khuwatsamrit, Askew, Ehrenberg and Helmers2015).
ED behaviors
ED behaviors were assessed broadly, and as such, could reflect a variety of specific behaviors, including self-induced vomiting, laxative or diuretic use, binge eating, restricting food intake, and excessive exercise. During clinical screening, and after meeting eligibility, eating disorder behavior examples and definitions were given to participants, and participants were asked if they had any questions (see online Supplementary Table S1). Participants endorsed engaging in an ED behavior by tapping twice on the Empatica E4 wristband button (88 episodes across participants across the 30 days). At the end of each day wearing the sensor band, participants were asked whether or not they tagged all ED behaviors. On this evening log, across participants, across the 30 days participants reported missing a collective of only five ED behaviors.
Data analytic plan
Preprocessing
Raw physiological data were visually checked for stability readings, and then archived into a postgreSQL relational database through Python's sqlalchemy library (Bayer, Reference Bayer, Brown and Wilson2012). Overall, participants wore the band from 14 to all 30 full days (M = 21.83; s.d. = 6.24), on average per day for 12 hours, 29 minutes, and 7 seconds, and participants endorsed 4 to 42 ED behaviors (M = 14.67; s.d. = 13.82) (see Table 2).
We explored if we would be able to detect the pre-onset of ED behaviors - through HR, EDA, and PST physiological indices - to allow enough time for future intervention pre-onset. Therefore, we queried each participant's physiological data from 20 min prior to a behavioral episode tag and extracted them into ‘windows’ (time periods of analysis), similarly to detecting other types of behaviors (e.g. Welch et al., Reference Welch, Pennington, Vanaparthy, Do, Narayanan, Popa and Kuravackel2023). For each window, to best line up the different signals which were at different hertz (Hz; EDA and PST at 4 Hz, and HR at 1 Hz), we resampled each physiological signal to 4 Hz and used Python to timestamp-match all the signals. As each raw signal is exported independently from the wristband, each discrete data point was provided a Coordinated Universal Time (UTC) timestamp based on the initial UNIX timestamp provided designating the beginning of the recording. The UTC timestamps were then used to align signals to each other during resampling. Next, we derived features from each window (e.g. mean, median, minimum, maximum, standard deviation, root mean square [RMS], mean absolute deviation [MAD], mean absolute value [MAV], 25th percentile, 75th percentile) for each signal for use in our ML algorithms. Based on the three raw sensor readings (HR, EDA, and PST) and 10 derived features, we used a 30-dimensional feature vector to represent each window in ML classification analysis.
Model construction
Following synthesis of the 30 features per window for each individual, the data were separated into two classes: baseline windows (class 0) and pre-onset ED behavior windows (class 1). Once the data were separated into ~100 baseline and ~100 behavior feature vectors through randomized subsampling, we applied a logistic regression classifier (LRC) to explore which models differentiated between vectors provided for the two classes. Models were encoded and tested using Python (Van Rossum & Drake, Reference Van Rossum and Drake2009). To ensure that sufficient training data was available for each model (~90 samples), as well as to minimize overfitting 10-fold cross-validation was utilized. This approach ensures that each model used to classify a testing data sample does not use the same sample during training. Average performance measures are provided across individual cross-validation performances. We built idiographic (N = 1) LRC ML trained models to identify personalized onset of episodes (~600 windows) v. baseline (~571 windows) physiology (HR, EDA, and PST). Accuracy classified cases (True Positives (TP) + True Negatives (TN)/Total), specificity classified behavioral episodes (TN/(TN + False Positives (FP))), and sensitivity classified baseline physiological classification (TP/(TP + False Negatives (FN))). Acceptable classification performance was defined as >70% accuracy of classified cases, sensitivity to classify baseline physiological classification, and specificity to classify behavioral episodes (Swets & Picket, Reference Swets and Picket1982).
Results
Table 3 provides idiographic detective performance. Using physiological data, LRC classified 91% of episodes accurately (Range = 0.84–0.99). Specificity estimates (e.g. classifying behavioral episodes) averaged 92% accuracy (Range = 0.82–1.00). Sensitivity (e.g. baseline physiology classification) averaged 90% (Range = 0.84–0.99).
Discussion
Lack of treatment response in EDs and support in everyday life contributes to EDs being among the deadliest mental health disorders, second to opioid deaths (Berends, Boonstra, & Van Elburg, Reference Berends, Boonstra and Van Elburg2018). This reality indicates the urgent need for new treatment support methods available for use in everyday life. Physiological recordings from wearables have offered clinical utility in forecasting other clinical phenomena, which has led to providing preventative support for problematic behaviors. For example, via wearable or mobile phone, individuals and caregivers are notified of the risk of behavioral episodes, such as the pending onset of epileptic seizures or outbursts by persons with autism prior to onset so that steps may be taken to intervene (e.g. Clifton et al., Reference Clifton, Clifton, Pimentel, Watkinson and Tarassenko2014; Regalia et al., Reference Regalia, Onorati, Lai, Caborni and Picard2019). The current study expands upon previous research examining physiological correlates of ED behaviors in laboratory settings (see Presseller et al., Reference Presseller, Patarinski, Fan, Lampe and Juarascio2022 for review) by investigating these phenomena in naturalistic settings idiographically using ML. Our pilot findings suggest the ability for ML to detect the pre-onset of ED behaviors in a personalized manner with passive wearable data collection in individuals' everyday life 20 minutes pre-onset, similar to detection of pre-onset of other mental health problems, such as depressive symptoms, anxiety and avoidance symptoms, and pre-onset of behavioral outbursts (Alban et al., Reference Alban, Ayesh, Alhaddad, Khalid Al-Ali, So, Connor and Cabibihan2021; Goodwin et al., Reference Goodwin, Mazefsky, Ioannidis, Erdogmus and Siegel2019; Jacobson & Bhattacharya, Reference Jacobson and Bhattacharya2022; Welch et al., Reference Welch, Warning, Narayanan, Nethala, Do, Vanaparthy and Daisey2022; Welch et al., Reference Welch, Pennington, Vanaparthy, Do, Narayanan, Popa and Kuravackel2023; Zarate et al., Reference Zarate, Stavropoulos, Ball, de Sena Collier and Jacobson2022). All idiographic model ML algorithm abilities (accuracy, specificity, and sensitivity) demonstrated well above the 70% acceptable performance for each individual (Swets & Picket, Reference Swets and Picket1982; ranging from 84–99% accuracy, 82–100% specificity, and 84–99% sensitivity), which points to the strength of this method despite the high heterogeneity of symptoms and behavior frequencies, even within the same ED diagnosis. These results demonstrate the ML models' successful detective abilities to provide personalized ED behavior detection within individuals in everyday life despite the heterogeneity of EDs, a range of ED diagnostic types (i.e., BN, AN, AAN, ABN), variation in ED behavior frequency, and other individual differences. As such, wearable sensors may represent a method by which individuals can receive personalized support to their wearable or mobile phone within their environment to provide a warning of risk with a digital therapeutic intervention 20 min pre-onset of ED behavioral episodes, to give enough time to prevent ED behaviors, such as purging, that lead to relapse.
We had a small sample size, and as our goal was to develop idiographic detection, it does not necessitate a large sample. It has been posited that N = 5 or greater is sufficient for establishing direct replication in single-case designs (e.g. Barlow and Hersen, Reference Barlow and Hersen1973; Hensen and Barlow, Reference Hensen and Barlow1984; Kazdin, Reference Kazdin2011). Although there was some diversity within our pilot regarding racial and ethnic background, gender and sexual identity, and various EDs, more diverse samples should be included in future studies for generalizability. Additionally, participants endorsing purging behaviors may have indicated other active (e.g., exercise) and passive (e.g., body checking) ED behaviors, and though the momentary endorsement of ED behaviors were cross-checked with end-of-day logs, there is no way to control for accuracy of annotations. Future research should assess the detective abilities of physiological recordings toward behaviors not explicitly examined in the present study, investigate if some physiological indicators may better classify ED behaviors than others. With a larger sample, we may explore (1) population models and (2) how we may incorporate population-level data into informing idiographic patterns. Overall, results from the current research suggest that ML algorithms using physiological recordings can detect ED behaviors. Because these methods have the potential to identify instances of risk for maladaptive onset of ED behaviors, physiological recordings from wearables may be integrated into timely digital interventions in the future, such as just-in-time adaptive interventions (JITAIs) to provide personalized interventions at the moments most needed (e.g., delivered via smartphones; Juarascio et al., Reference Juarascio, Parker, Lagacey and Godfrey2018). For instance, a detection of the onset of risk of an ED behavior may trigger a JITAI to disrupt the behavior and intervene in a timely manner. Next steps may pinpoint the most essential indicators to best classify the pre-onset of ED behaviors, as well as other types of data which may compliment and strengthen these efforts.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S003329172300288X.
Acknowledgements
Thank you to all our participants in the Predicting Recovery Study that made this research possible, and for our EAT Lab team and collaborators.
Funding statement
This work is funded by grant R15 MH121445 from the National Institute of Mental Health, National Institutes of Health (NIH). CRN is also funded by grant P20GM103436-20 (KY-INBRE) from the National Institute of General Medical Sciences. CEC is funded by the National Science Foundation (NSF) Graduate Research Fellowship Program under grant No. 2021320143. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH or NSF.
Competing interest
None.
Ethical standards
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.