Introduction
Mood disorders, including major depressive disorder (MDD), bipolar I disorder (BDI), and bipolar II disorder (BDII) have life-long clinical courses with high rates of recurrences, predisposing patients to persistent instability in mood, sleep, and vitality. Therefore, patients with mood disorders experience high disease burdens and, often, functional deterioration (Judd et al., Reference Judd, Akiskal, Schettler, Endicott, Maser, Solomon and Keller2002). Proactive management of symptoms and prevention of recurrences are critical for successful treatment. For better prognosis, novel approaches to evaluate, analyze, and manage patient's daily condition are needed, in complement with conventional pharmacologic and psychotherapy treatment.
Ever since Kraepelin (Reference Kraepelin1906) mentioned the link between mental illnesses and sleep disturbance over a century ago, circadian rhythm (CR) disturbance is widely acknowledged for its close association with mood disorders. Insomnia or hypersomnia is a key symptom of depressive episodes, while the decreased need for sleep is a distinguishing feature of manic episodes (Harvey, Reference Harvey2008). Sleep-wake problems are evident even during euthymic periods of mood disorders (Knowles et al., Reference Knowles, Cairns, MacLean, Delva, Prowse, Waldron and Letemendia1986). In a clinical setting, aggravation of sleep-wake disturbances typically accompanies or precedes mood episode recurrences (Jaussent et al., Reference Jaussent, Bouyer, Ancelin, Akbaraly, Peres, Ritchie and Dauvilliers2011). Therefore, sleep should be carefully monitored in mood disorders, both during episode recurrences and remissions.
CRs are endogenous rhythms generated by the circadian clock and have the free-running period of about 24 h which persists in constant conditions without any exogenous input; they are the result of the species' adaptation to the Earth's 24 h cycle (e.g. the light-dark cycle) (Dunlap, Reference Dunlap1999). CRs, regulated by ‘circadian clock’ genes and molecular cascades comprising complex regulatory feedback loops, synchronize various physiological functions and behaviors (e.g. body temperature, hormonal secretion, metabolism, sleep-wake cycles, activity, and mood) of most living organisms (Silver & Kriegsfeld, Reference Silver and Kriegsfeld2014). Accumulating evidence suggests that alterations in CRs play a role in the pathophysiology of mood disorders (Lee, Reference Lee2019; McClung, Reference McClung2007; Moon et al., Reference Moon, Cho, Son, Geum, Chung, Kim and Lee2016). In addition, therapeutic effects of lithium and antidepressants are considered to be mediated by circadian modulation (Etain, Meyrel, Hennion, Bellivier, & Scott, Reference Etain, Meyrel, Hennion, Bellivier and Scott2021).
Rapidly developing digital technologies and machine learning techniques have transformed medical fields, overcoming various limitations of conventional medicine (Darcy, Louie, & Roberts, Reference Darcy, Louie and Roberts2016; Jain, Powers, Hawkins, & Brownstein, Reference Jain, Powers, Hawkins and Brownstein2015). In previous days, the psychiatric practice had to rely heavily on self-reports, inevitably incorporating subjective judgments and recall biases of disturbed patients and their close observers. However, with the advent of wearable devices and smartphones, patients' mental states can be inferred from their daily patterns. Automated digital indicators from the devices may guide clinicians to better identify patients at risk for impending recurrence of mood episodes. Therefore, the psychiatric field may greatly benefit from precise phenotyping and computational modeling of the temporal course of illness onsets and recurrences.
Aims of study
The primary aim of this study is to develop a mood episode prediction model by utilizing lifelog data from a nationwide, multicenter, prospective cohort. We hypothesize that CR disruption suggested from the lifelog data will highly contribute to the accurate prediction of impending mood episodes in mood disorder patients. Predicting mood episodes using passively obtained data in daily lives can have important implications for the prevention and treatment of mood disorders.
Methods
Study population
The Mood Disorder Cohort Research Consortium (MDCRC) study is a multicenter, prospective, observational cohort study investigating the course and outcome of early-onset mood disorders in South Korea (ClinicalTrials.gov: NCT03088657). Participants aged under 35 years with a diagnosis of MDD, BDI, and BDII were included. Further information on the study design and protocol can be found (Cho et al., Reference Cho, Ahn, Kim, Ha, Jeon, Cha and Lee2017). Eight hospitals in five Korean cities composed the consortium, and the study participants formed a convenience series. In the original cohort, 495 patients diagnosed with mood disorders (MDD = 166, BD I = 149, BD II = 180) according to the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) were recruited from March 2015 to April 2019. For analysis, the follow-up clinical data collected until December 2019 from 270 patients who wore wearable activity trackers for at least 30 days were used. By December 2019, 270 patients were followed-up for a mean [s.d.] of 279.7 [263.5] and a median of 505 days (range = 72–1515 days). Demographic and clinical characteristics of the sample are provided in online Supplementary Table S2. For some participants, their diagnoses changed over time because hypomanic or manic episodes occurred during the follow-up period [MDD to BDI (n = 3), MDD to BDII (n = 4), BDII to BDI (n = 5)]. In such cases, the latest diagnoses were used for analysis. Most patients were under medications, and the medications were not affected by this study. The study was approved by the Institutional Review Boards of all participating hospitals and conducted in accordance with the Declaration of Helsinki. The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. All participants provided written informed consents before enrollment after receiving full explanations of the study.
Measures
Participants were asked to complete ecological momentary assessments on their smartphone application called the ‘eMoodChart’ that our team developed for study purposes. The eMoodChart was a simple, intuitive self-assessment of daily mood state and energy level in seven categories [e.g. −3 (severely depressed), −2 (moderately depressed), −1 (mildly depressed), 0 (normal), 1 (mildly elevated), 2 (moderately elevated), 3 (severely elevated)]. Patients assessed and recorded their mood state and energy level once a day using the eMoodChart. Participants were also asked to wear a wearable activity tracker everyday (Fitbit Charge HR, 2 or 3, Fitbit Inc). In a face-to-face clinical assessment of 3 months intervals, clinicians determined the onset of recurrent episodes since the previous assessment by reviewing the eMoodChart and symptoms experienced by participants (Cho et al., Reference Cho, Ahn, Kim, Ha, Jeon, Cha and Lee2017). Clinicians were blinded to both the lifelog data collected from the wearable tracking devices and the results of the prediction algorithm.
For smartphones that use the Android operating system, a built-in sensor detected the amount of light exposure when participants were using their smartphones. The Fitbit wrist-worn trackers collected data on participant's step, heart rate, and sleep inferred from the activity and heart rate data by a proprietary Fitbit algorithm. The minute-by-minute data from the Fitbit cloud server were processed and analyzed. CR parameters were generated by cosinor fitting the heart rate data of two consecutive days based on the previous studies that showed that daily fluctuations in heart rate reflect individual's CRs. In a study by Jeong et al. (Reference Jeong, Seo, Jeon, Cho, Yeom, Jeong and Lee2020), there was a possible positive correlation between the acrophase of the cosine-fitted curve of salivary cortisol and that of heart rate (r = 0.55, p = 0.064). Also, in a study by Bowman et al. (Reference Bowman, Huang, Walch, Fang, Frank, Tyler and Forger2021), the CR of heart rate was demonstrated robustly.
Datasets
Data were occasionally missing for various reasons (e.g. not wearing the device, failed light exposure measurement, and battery discharge). For data imputation, any incomplete data records with missing fields were replaced with other similar complete records as detailed in the online Supplement material. During the data collection period, we assembled 75 506 sample days [mean (s.d.) days per participant, 279.7 (263.5)] from 270 participants after data imputation processing. The dataset for prediction modeling has 140 features (variables) plus a class label of episode type. In the dataset, 38% of the whole cells of our tabular data (=75 506 samples × 140 features) were missing and therefore compensated by the imputation procedure. To test our hypothesis, we developed four categories of features that would capture CR disruption using the data collected from wearable devices and smartphones. Four categories were light exposure, step, sleep, and CR (see online Supplementary Table S1 for details).
Prediction model construction
Random Forest, a supervised learning algorithm, was used to train the episode prediction model (Pavlova & Uher, Reference Pavlova and Uher2020). Our episode prediction model addresses three mood episode types: major depressive episode (MDE), manic episode (ME), and hypomanic episode (HME). Of 270 participants, 37 of 95 MDD participants experienced 50 MDE recurrences; 40 of 78 BDI participants, 52 MDEs, 28 MEs, and 12 HMEs; and 58 of 97 BDII participants, 94 MDEs and 34 HMEs. In total, the participants experienced 196 MDEs, 28 MEs, and 46 HMEs during the study. For each confirmed mood episode, the prediction algorithm retrospectively processed data obtained from wearable activity trackers and smartphones starting from the 18 days prior to the onset of the established episode. Through the process of feature selection, a set of optimal parameters was determined and used to predict whether a mood episode occurred in the next 3 days, which were not used for algorithm training (see the online Supplementary material for details). Information used in clinical interviews, such as the eMoodchart data and self-reported symptoms, and the subsequent results were not utilized in developing the prediction algorithm. Such model-predicted mood episodes were then compared to those made as a result of clinical interviews in order to examine the predictivity of the model.
To avoid an unbalanced model performance between sensitivity and specificity, we used the under-sampling method as explained in the online Supplementary material in the under-sampling process. The performance of the trained prediction model was evaluated by assessing the model's accuracy, sensitivity, specificity, and the area under the curve (AUC) (Ethem, Reference Ethem2014). Detailed process about evaluating model performance is also included in the online Supplementary material. Shapley value (Shapley, Reference Shapley, Kuhn and Tucker1953) was used to examine how much each feature contributed to the prediction of episodes, and ShapRFECV algorithm was employed to identify the fewest number of most influential features that optimizes model performance. For data processing, data analysis, and machine learning model construction and evaluation, we used Python library tools, scikit-learn (Pedregosa et al., Reference Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel and Duchesnay2011; Seabold and Perktold, Reference Seabold and Perktold2010). The detailed processes are described in the online Supplementary material (p 2).
Our preliminary analysis on all episode types suggested that about 30 features were sufficient for optimizing the performance. Therefore, the top 30 out of 140 features were selected to predict mood episode recurrence, as presented in Fig. 1. The list of all 140 features is provided in the online Supplementary material.
Results
Of 495 subjects, 270 who wore wearable activity trackers for at least 30 days were included in the analysis [54.5% of the original cohort; mean (s.d.) age at baseline, 23.3 (3.63) years; 123 (45.6%) male]. There were no significant differences in demographic variables between those included and not included, except for education and follow-up duration. Additional characteristics of the study population are provided in the online Supplementary Table S2.
Performance of mood episode prediction model
The average prediction accuracies for the onset of any episode type occurring during the next 3 days in all participants and those with MDD, BDI, and BDII were 91.9, 93.8, 93.7, and 92.4%, respectively (Table 1). For all participants, impending mood episode prediction accuracies for MDE, ME, and HME were 90.1, 92.6, and 93.0%, respectively. For MDD participants, the prediction accuracy was 93.8%; sensitivity, 91.5%; specificity, 94.3%; and the AUC value, 0.973 for MDE. For BDI participants, prediction accuracies were 92.3, 93.7, and 95.2%; sensitivity values, 86.3, 89.8, and 92.8%; specificity values, 93.1, 94.1, and 95.2%; and the AUC values, 0.950, 0.967, and 0.982 for MDE, ME, and HME, respectively. For BDII participants, prediction accuracies were 92.3 and 92.5%; sensitivity values, 86.6 and 92.4%, specificity values, 93.6 and 92.5%; and the AUC values, 0.955 and 0.972 for MDE and HME, respectively. The number of samples used for model construction is reported in Table 1.
No. of total samples, d: the valid sample days used in the evaluation after the process of excluding invalid ones and data imputation.
No. of positive samples, d: during the entire sampling period, the sample period (day) corresponding to each major mood episode.
Sensitivity: true-positive rate, refers to the ability of the prediction to correctly identify positive days with episodes.
Specificity: true-negative rate, refers to the ability of the prediction to correctly identify negative days without episodes.
Accuracy: true classification rate, refers to the ability of the prediction to correctly differentiate between positive and negative days.
AUC: area under the curve, refers to the measure of the entire two-dimensional area underneath the entire ROC curve; the receiver operator characteristic (ROC) curve is a probability curve that plots the true-positive rate against false-positive rate at various thresholds. The higher the AUC, the better the performance of the model at distinguishing between the positive and negative days.
To examine whether participants at individual sites showed different model outcomes, the model performance was also evaluated for each participating hospital. Overall, prominent performances were reported in online Supplementary Fig. S1, and the performance was similar across eight sites.
Interpretation of mood episode prediction model
Figure 1 shows the top 30 features with the highest SHAP values for each mood episode (MDE, ME, HME) prediction model. The key contributors to the MDE and HME prediction models were features related to sleep and CR. In contrast, features related to step count contributed the most to the ME prediction model.
In detail, features that contributed to increased risk of impending MDE were as follows: lower mean sleep efficiency, larger deviation of observed sleep offset time from sunrise, higher standard deviation of sleep length, higher mean step count during bedtime, lower mean step count during afternoon and evening, higher mean CRs acrophase (reflecting circadian phase delay), and lower mean CR goodness of fit (indicating less robustness of cosinor analysis).
In the ME prediction model, features that contributed to increased risk of impending ME were as follows: higher mean step count during bedtime and evening, shorter mean sleep length, higher standard deviation of sleep length, larger deviation of observed sleep offset time from sunrise, higher mean CR acrophase, and lower average CR goodness of fit.
Finally, features related to increased risk of impending HME were as follows: higher mean step count during bedtime and morning, lower mean morning light exposure, shorter mean sleep length, lower mean CR acrophase (reflecting circadian phase advance), and lower mean CR goodness of fit.
Discussion
The evaluation of symptoms and diagnosis in clinical psychiatry heavily relies on patients' self-reported symptoms. Patients may not only have incorrect or even missing memories due to the retrospective nature of recollection and memory distortion but also may not recognize their own mood episodes. Moreover, family members and close acquaintances may not notice mood episodes until symptoms become prominent.
Digital phenotyping is the real-time objective quantification of a person's state using data automatically obtained from digital devices, such as smartphones, wearable devices, and other internet of things (Jain et al., Reference Jain, Powers, Hawkins and Brownstein2015). As digital phenotyping allows unobtrusive assessment of daily physiology and activity, there has been an upsurge of interest in monitoring or sensing mood states in psychiatric patients. By applying digital phenotypes in psychiatry, many limitations of the conventional clinical evaluation and diagnosis may be overcome. Most importantly, as monitoring of symptoms become possible, mood episode recurrences may be predicted and prevented.
Recent studies have demonstrated the usefulness of digital phenotypes in the field of psychiatry. Saeb et al. (Reference Saeb, Zhang, Karr, Schueller, Corden, Kording and Mohr2015) reported that people with depressive symptoms could be identified with 87% accuracy based on the phone sensor data. They detected the symptoms by tracking phone use time and daily geographical locations with GPS. The following were correlated with depression: more time spent using the phone, most of the time spent either at home or in fewer locations, and less regular day-to-day schedule. In a follow-up study, the authors replicated the previous findings, demonstrating that GPS features may be important and reliable predictors of depressive symptom severity (Saeb, Lattie, Schueller, Kording, & Mohr, Reference Saeb, Lattie, Schueller, Kording and Mohr2016). Garcia-Ceja et al. (Reference Garcia-Ceja, Riegler, Jakobsen, Torresen, Nordgreen, Oedegaard and Fasmer2018) reported that daytime and nighttime movement data from wrist actigraphy can be used to classify diagnostic groups of BD, MDD, and control with the accuracy of 72.7%, using machine learning. When reanalyzed with other methods, the public data showed that diagnostic group status (i.e. mood disorder, control) could be classified with high accuracy of 89%, using features extracted from actigraphy data alone. It was also suggested that actigraphy data could predict symptom change across about 2 weeks (Jacobson, Weingarden, & Wilhelm, Reference Jacobson, Weingarden and Wilhelm2019).
Previous studies have focused on identifying behavioral markers related to mood symptom severity and classifying the diagnostic group using data obtained from digital devices. Expanding on them, our team was able to make a highly accurate prediction of mood episodes that are recurring in a few days. Consistent with the finding that shift in CR phase may be a pathophysiological mechanism of mood disorders (Moon et al., Reference Moon, Cho, Son, Geum, Chung, Kim and Lee2016) our results showed that parameters related to circadian misalignment predicted the onset of mood episodes. Such detection of behaviors before they lead to clinically significant deterioration points to a possibility to improve prevention and intervention, especially for mood disorder patients, who must constantly monitor their symptoms to prevent episode recurrences. Acknowledgment of mood disruptions beforehand and consistent monitoring and management of one's daily life patterns can lead to cost-effective, scalable, individualized, and just-in-time treatment.
In our preliminary observational study with 55 mood disorder patients followed up for 2 years, we suggested that mood states and episodes could be predicted using passive lifelog data obtained from wearable devices and smartphone applications. CR-related digital phenotypes were used to predict episode recurrences. The prediction accuracies for MDEs, MEs, HMEs for all mood disorder patients were 87, 94, and 91.2%, with the AUC values of 0.87, 0.958, and 0.912, respectively (Cho et al., Reference Cho, Lee, Kim, In, Kim and Lee2019). In the current study, we expanded our preliminary study by (1) increasing the sample size, (2) extending the follow-up period, (3) collecting data from multiple sites across the nation, (4) developing independent prediction algorithms for MDD, BD I, and BD II, and (5) refining the CR parameters used for prediction. The current findings showed notably enhanced sensitivity, specificity, and accuracy. Detailed model interpretation by SHAP analysis revealed that irregular and inadequate sleep habits were highly associated with mood episode recurrence, regardless of episode type. Low CR goodness of fit, reflecting CR disturbance in general, was also associated with all types of mood recurrence. MDE and ME recurrences were associated with delayed CR acrophase, and HME recurrences were associated with advanced CR acrophase.
Strengths and limitations
The strengths of this study are the following. First, our method of predicting an impending mood episode was based on the circadian misalignment hypothesis, which has recently accumulated significant evidence. In our previous observational study, we identified that all mood episodes requiring hospitalization were associated with distinct characteristics of circadian phase shift, which were reflected in circadian gene expression and salivary cortisol. Such distinct shifts in the circadian phase were normalized as the episode reached remission with treatment. Second, a long-term nationwide prospective cohort integrated with consistent clinical follow-up with ecological momentary assessments allowed enhanced clinical reliability. Third, our model has a low risk for privacy invasion, because it does not use sensitive personal information such as voice, history of call, text messages, and location.
To the best of our knowledge, this is the first study that developed mood episode prediction models only using automatically collected digital phenotypes based on the circadian misalignment hypothesis of mood disorders. We found that monitoring of CR disruption in daily life can be useful in detecting the worsening of mood symptoms in mood disorder patients. Based on the results of this study, we suggest that digital therapeutics with the clinical application of mood prediction algorithms can improve prevention and treatment for mood disorder patients; digital therapeutics can assist temporally sensitive detection of worsening mood symptoms and thus guide just-in-time and individualized feedback on daily behaviors related to sleep, activity, light exposure, and more to prevent mood recurrences.
This study has some limitations. First, the determined onsets of episode recurrences may not be accurate due to the retrospective aspect of clinical evaluations. However, we utilized a smartphone application that we developed called the eMoodChart to assess daily mood and energy to aid the traditional clinical interviews, and this innovative method raised the accuracy of mood episode evaluation compared to the past studies. Also, we were able to detect hypomanic episodes, which are clinically not easy to identify, relatively well using the daily eMoodChart data. Since patients tend to consider pure hypomanic episodes as non-pathologic and rather more as their best conditions, they tend not to report them. Our clinical interview incorporating the EMA method assisted more accurate detection of hypomanic episodes and thus found that specific digital phenotypes, such as advanced circadian phase, are related to the hypomanic episode.
Second, our prediction model was trained and tested within a set of the same participant group. That the model performance was calculated with the same data set used for training may limit the generalization of the results of our mood episode prediction model. Future studies that apply the model and evaluate its performance to new groups of mood disorder patients are needed. In other words, external validation is necessary to determine the reproducibility and generalizability of the prediction model.
Third, due to iOS restrictions, light data could not be collected from smartphones running iOS, so these missing data were supplemented by imputation. In addition, in the case of android OS, since the illumination value was obtained only when the participants operated their smartphones, it is possible that the recorded illumination value did not accurately reflect the true light exposure, especially at night. We also could not evaluate the light exposure from the smartphone LEDs themselves.
This study developed an algorithm that predicts an impending mood episode, utilizing machine learning by processing automatically-recorded activity data using the circadian misalignment hypothesis. The good predictivity of the algorithm indicates its potential usefulness for preventing and treating mood disorders.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291722002847
Data
De-identified participant data that underlie the results reported in this article and study protocol can be shared with investigators for research purposes. To gain access, data requests can be made by contacting the corresponding author, HJL.
Acknowledgments
We would like to thank Professor Daniel F. Kripke from UCSD for his helpful comments and input. We would also like to thank Professor Kyooseob Ha from Seoul National University for his scientific intuition at the early conceptualization stage.
Author contribution
HJL and TL had full access to all data in the study and take responsibility for the integrity of the data and accuracy of the data analysis. HJL and CHC contributed equally to this work and share the first authorship. HJL, CHC, TL, JJ, EM, JHB, DYP, SJK, THH, BC, HJK, YMA, and JBL collected and analyzed data. HJL, TL, CHC, JJ, JWY, SK wrote the manuscript. HJL, TL, CHC, JJ, JWY, SJ, SK, and LK revised the manuscript. TL, JJ, JBL trained and tested the models. All authors were involved in data acquisition, the general design of the trial, interpretation of the data, and critical revision of the manuscript.
Financial support
This work was supported by the Ministry of Health & Welfare, the Republic of Korea – the National Research Foundation of Korea.
Conflict of interest
HJL is supported by grant number HM14C2606 from the Ministry of Health & Welfare and grant number 2017M3A9F1031220/2019R1A2C2084158 from the National Research Foundation of Korea. LK and HJL are co-founders and shareholders of Hucircadian. The other authors have no conflicts of interest to disclose.