Hostname: page-component-586b7cd67f-rdxmf Total loading time: 0 Render date: 2024-11-21T23:59:04.190Z Has data issue: false hasContentIssue false

Evaluating the performance of machine learning models for automatic diagnosis of patients with schizophrenia based on a single site dataset of 440 participants

Published online by Cambridge University Press:  23 December 2021

Lung-Hao Lee
Affiliation:
Department of Electrical Engineering, National Central University, Taoyuan City, Taiwan Department of Medical Humanities and Education, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan Pervasive Artificial Intelligence Research (PAIR) Labs, Hsinchu, Taiwan
Chang-Hao Chen
Affiliation:
Department of Electrical Engineering, National Central University, Taoyuan City, Taiwan Pervasive Artificial Intelligence Research (PAIR) Labs, Hsinchu, Taiwan
Wan-Chen Chang
Affiliation:
Department of Biomedical Engineering, National Yang Ming Chiao Tung University, Taipei, Taiwan Department of Medical Research, Taipei Veterans General Hospital, Taipei, Taiwan Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan
Po-Lei Lee
Affiliation:
Department of Electrical Engineering, National Central University, Taoyuan City, Taiwan Pervasive Artificial Intelligence Research (PAIR) Labs, Hsinchu, Taiwan
Kuo-Kai Shyu
Affiliation:
Department of Electrical Engineering, National Central University, Taoyuan City, Taiwan Pervasive Artificial Intelligence Research (PAIR) Labs, Hsinchu, Taiwan
Mu-Hong Chen
Affiliation:
Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan Department of Psychiatry, Faculty of Medicine, National Yang-Ming Chiao Tung University, Taipei, Taiwan
Ju-Wei Hsu
Affiliation:
Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan Department of Psychiatry, Faculty of Medicine, National Yang-Ming Chiao Tung University, Taipei, Taiwan
Ya-Mei Bai
Affiliation:
Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan Department of Psychiatry, Faculty of Medicine, National Yang-Ming Chiao Tung University, Taipei, Taiwan Institute of Brain Science, National Yang-Ming Chiao Tung University, Taipei, Taiwan
Tung-Ping Su
Affiliation:
Department of Psychiatry, Faculty of Medicine, National Yang-Ming Chiao Tung University, Taipei, Taiwan Institute of Brain Science, National Yang-Ming Chiao Tung University, Taipei, Taiwan Department of Psychiatry, Cheng Hsin General Hospital, Taipei, Taiwan
Pei-Chi Tu*
Affiliation:
Department of Medical Research, Taipei Veterans General Hospital, Taipei, Taiwan Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan Department of Psychiatry, Faculty of Medicine, National Yang-Ming Chiao Tung University, Taipei, Taiwan Institute of Philosophy of Mind and Cognition, National Yang-Ming Chiao Tung University, Taipei, Taiwan
*
*Author for correspondence: Pei-Chi Tu, E-mail: [email protected]

Abstract

Background

Support vector machines (SVMs) based on brain-wise functional connectivity (FC) have been widely adopted for single-subject prediction of patients with schizophrenia, but most of them had small sample size. This study aimed to evaluate the performance of SVMs based on a large single-site dataset and investigate the effects of demographic homogeneity and training sample size on classification accuracy.

Methods

The resting functional Magnetic Resonance Imaging (fMRI) dataset comprised 220 patients with schizophrenia and 220 healthy controls. Brain-wise FCs was calculated for each participant and linear SVMs were developed for automatic classification of patients and controls. First, we evaluated the SVMs based on all participants and homogeneous subsamples of men, women, younger (18–30 years), and older (31–50 years) participants by 10-fold nested cross-validation. Then, we hold out a fixed test set of 40 participants (20 patients and 20 controls) and evaluated the SVMs based on incremental training sample sizes (N = 40, 80, …, 400).

Results

We found that the SVMs based on all participants had accuracy of 85.05%. The SVMs based on male, female, young, and older participants yielded accuracy of 84.66, 81.56, 80.50, and 86.13%, respectively. Although the SVMs based on older subsamples had better performance than those based on all participants, they generalized poorly to younger participants (77.24%). For incremental training sizes, the classification accuracy increased stepwise from 72.6 to 83.3%, with >80% accuracy achieved with sample size >240.

Conclusions

The findings indicate that SVMs based on a large dataset yield high classification accuracy and establish models using a large sample size with heterogeneous properties are recommended for single subject prediction of schizophrenia.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2022. Published by Cambridge University Press on behalf of the European Psychiatric Association

Introduction

Schizophrenia is a severe mental disorder accompanied by delusions, hallucinations, and cognitive impairment. It affects nearly 1% of the world’s population and the biological underpinnings of schizophrenia have remained elusive despite decades of intensive research [Reference Dhindsa and Goldstein1]. One important theory about its etiology is the dysconnectivity hypothesis, which proposes that the aberration of neural circuits during neural development plays a crucial role in the disease process [Reference Andreasen, Paradiso and O’Leary2]. The development of functional connectivity (FC) analysis [Reference Biswal, Yetkin, Haughton and Hyde3,Reference Greicius, Krasnow, Reiss and Menon4] provides an optimal tool to test the hypothesis and have consistently identified FC abnormalities in widespread cortical and subcortical structures, including anterior cingulate cortex [Reference Tu, Buckner, Zollei, Dyckman, Goff and Manoach5], thalamus [Reference Tu, Lee, Chen, Hsu, Li and Su6,Reference Anticevic, Haut, Murray, Repovs, Yang and Diehl7], basal ganglion [Reference Tu, Hsieh, Li, Bai and Su8,Reference Karcher, Rogers and Woodward9], and cerebellum [Reference Chen, Tu, Lee, Chen, Li and Su10] in patients with schizophrenia. With advancements in machine learning in medical imaging, researchers further explored the use of brain-wide FCs based on a specific anatomical or functional parcellation as features for single subject prediction of patients with schizophrenia [Reference Du, Fu and Calhoun11]. Early studies based on a small sample size have reported classification performances of 93.2% (44 participants) [Reference Tang, Wang, Cao and Tan12] and 83% (56 participants) [Reference Arbabshirani, Kiehl, Pearlson and Calhoun13]. Several more recent studies have included larger samples. For example, Zhao et al. [Reference Zhao, Guo, Linli, Yang, Lin and Tsai14] included 283 participants (135 with schizophrenia and 148 healthy controls) and obtained an accuracy of 71% based on FC features, and Kalmady et al. [Reference Kalmady, Greiner, Agrawal, Shivakumar, Narayanaswamy and Brown15] included 174 participants (81 with drug-naïve schizophrenia and 93 healthy controls) and reported an accuracy of 87% with ensemble learning. Lei et al. [Reference Lei, Pinaya, van Amelsvoort, Marcelis, Donohoe and Mothersill16] evaluated five datasets of 112–192 participants and noted an average accuracy of 82.61% (77.1–87.3%). Together, these preliminary findings indicate that machine learning models are feasible for automated diagnosis of schizophrenia. However, the accuracy range varied substantially across these studies, and the relatively small sample size in many of them limited the application of the models in real-world clinical settings.

Sample size plays a key role in machine learning. A large single-site sample, which automatically covers more variation in disease features, is suggested to be helpful in building more robust classification models for real-world application than are other sample types [Reference Krystal and State17Reference Rashid and Calhoun19]. However, several reviews of previous machine learning studies have observed a negative correlation between sample size and accuracy [Reference Schnack and Kahn18,Reference Varoquaux20] with high accuracy predictions usually limited to studies with small samples [Reference Arbabshirani, Plis, Sui and Calhoun21]. One explanation is that sample size influences the trade-off between accuracy and generalizability [Reference Schnack and Kahn18]. Small, homogeneous samples are able to produce classification models with high accuracy, at the cost of low generalizability, whereas large, heterogeneous samples produce models with better generalizability at the cost of accuracy. However, recent simulation and empirical studies have highlighted the critical role of biased estimations in machine learning studies with small sample sizes. The high accuracy may have been obtained because of inherent large variance of performance in studies with small samples as well as publication bias in reporting significant effects [Reference Varoquaux20] and biased validation processes with a limited sample size [Reference Vabalas, Gowen, Poliakoff and Casson22]. Notably, the popular K-fold cross-validation method produces strongly biased performance estimates with small samples because it does not ensure that the data used to validate the classifier are not part of the data used to train it [Reference Vabalas, Gowen, Poliakoff and Casson22]. Therefore, it was unclear whether high accuracy can be achieved for the identification of patients with schizophrenia based on a large heterogeneous sample with the current approach.

In the present study, we use a large single-site resting fMRI dataset of 220 patients with schizophrenia and 220 healthy controls to develop machine learning models for automatic identification of patients with schizophrenia based on brain-wide FCs and test the hypothesis that support vector machines (SVMs) based on larger, heterogeneous samples can also provide high classification accuracy. SVMs were adopted as machine learning models because they were most commonly used models in recent machine learning studies of psychiatric patients [Reference Zhao, Guo, Linli, Yang, Lin and Tsai14, Reference Vabalas, Gowen, Poliakoff and Casson22Reference Lei, Pinaya, Young, van Amelsvoort, Marcelis and Donohoe24] and showed superior performance than other traditional models [Reference Lei, Pinaya, van Amelsvoort, Marcelis, Donohoe and Mothersill16]. Also, the sample size is too small for application of deep learning algorithms [Reference Cearns, Hahn and Baune25]. To the best our knowledge, this is the largest single site machine learning study of patients with schizophrenia based on brain-wise FCs to date. The data were collected using the same MRI machine and acquisition parameters from 2010 to 2019, thereby minimizing the confounding effects of medical center, MRI machine, and acquisition parameters. Given previous concerns of participant’s homogeneity and training sample size on classification accuracy, we also investigated the effect of these two factors on model performance.

Materials and Methods

Participants

The resting fMRI data set included 220 patients with schizophrenia and 220 sex- and age-matched healthy controls. Their demographic characteristics are presented in Table 1. All patients were and recruited from outpatient and inpatient units of the Taipei Veterans General Hospital in Taiwan. Structured clinical interviews based on the Diagnostic and Statistical Manual of Mental Disorders, fourth edition (DSM-IV) [Reference Cearns, Hahn and Baune26] confirmed the diagnoses and the clinical status of schizophrenic patients was characterized using the Positive and Negative Syndrome Scale (PANSS) [Reference First, Spitzer, Gibbon and Williams27]. We excluded the participants with the following conditions: (a) substance abuse or dependency in the preceding 6 months; (b) a history of head injury that resulted in sustained loss of consciousness or cognitive sequelae; and (c) neurological illnesses or any other disorder that affects cerebral metabolism. Of these patients, there were seven with comorbidity of other psychiatric disorders (detailed in Supplementary Table S1). The patients with schizophrenia were under stable treatments with various antipsychotics, antidepressants, and mood stabilizers before participating in the study.

Table 1. Demographic and clinical features of the patients and controls in this study.

Abbreviations: F, female; HC, healthy control; M, male; PANSS, Positive and Negative Syndrome Scale for Schizophrenia; SZ, schizophrenia.

Healthy controls were recruited through advertisements; they were screened by an experienced psychiatrist with the Mini International Neuropsychiatric Inventory Plus, and candidates with a possible major psychiatric illness were excluded. In addition, candidates with a history of first-degree relatives with axis-I disorders, including schizophrenia, major depressive disorder, and bipolar disorder, were excluded.

MRI image acquisition

MRI images were acquired using a 3.0 Tesla GE Discovery 750 whole-body high-speed imaging device with an eight-channel high-resolution brain coil. Head stabilization was achieved through cushioning, and all participants wore earplugs (29 dB rating) to attenuate noise. Automated shimming procedures were performed, and scout images were obtained. Resting-state functional images were collected using a gradient echo T2* weighted sequence (repetition time [TR]/echo time [TE]/flip angle = 2,500 ms/30 ms/90°). Forty-seven contiguous horizontal slices parallel to the intercommissural plane (voxel size: 3.5 × 3.5 × 3.5 mm3) were acquired and interleaved. These slices covered the cerebellum of each participant. During functional scanning, the participants were instructed to remain awake with their eyes open (each scan lasted 8 min and 24 s across 200 time points). In addition, a high-resolution structural image was acquired in the sagittal plane using a high-resolution sequence (TR = 2,530 ms, echo spacing = 7.25 ms, echo time TE = 3 ms, flip angle = 7°) and an isotropic 1-mm voxel (field of view: 256 × 256).

Quality control

Regarding head motion during image acquisition, we used the method of scrubbing within regression (spike regression) suggested by Yan et al. [Reference Kay, Fiszbein and Opler28] to minimize the effect of head motion on FC measurement. This method identifies “bad” time points using a threshold of framewise displacement (FD) > 0.2 mm as well as one back and two forward neighbors [Reference Yan, Cheung, Kelly, Colcombe, Craddock and Di Martino29]; each “bad” time point was modelled as a separate regressor in the regression models [Reference Power, Barnes, Snyder, Schlaggar and Petersen30,Reference Lemieux, Salek-Haddadi, Lund, Laufs and Carmichael31]. The detailed parameters of motion correction were also provided in Supplementary Table S2 and there was no significant difference between these two groups.

FC preprocessing

All preprocessing was performed using the Data Processing Assistant for Resting-State fMRI (http://www.restfmri.net), which is based on Statistical Parametric Mapping (http://www.fil.ion.ucl.ac.uk/spm) and the Resting-State fMRI Data Analysis Toolkit (http://www.restfmri.net). The functional scans received slice-timing correction, motion correction, and were normalized to a standard anatomical space (Montreal Neurological Institute). Additional preprocessing steps were used to prepare the data for FC analysis. These were as follows: (a) spatial smoothing using a Gaussian kernel (6-mm full width at half-maximum), (b) temporal filtering (0.009 Hz < f < 0.08 Hz), and (c) removal of spurious or nonspecific sources of variance through regression of the following variables. (a) Six head motion parameters and autoregressive models of motion: six head motion parameters, six head motion parameters one time point before, and the 12 corresponding squared items [Reference Satterthwaite, Elliott, Gerraty, Ruparel, Loughead and Calkins32] (Friston 24-parameter model); (b) the mean whole-brain signal; (c) the mean signal within the lateral ventricles; and (d) the mean signal within a white matter mask. The regressors used in the method of scrubbing within regression were also included to minimize the effect of head motion on the measurement of FC. The regression of each of these signals was computed simultaneously, and the residual time course was then retained for the correlation analysis.

Calculation of brain-wise FCs

We chose three parcellations: the automated anatomical labeling atlas version 3 (AAL-3) [Reference Friston, Williams, Howard, Frackowiak and Turner33], AAL-2 [Reference Rolls, Huang, Lin, Feng and Joliot34], and Shen’s 268 parcellations [Reference Rolls, Joliot and Tzourio-Mazoyer35], comprising 166, 120, and 268 regions of interest (ROIs), respectively (Figure 1). The mean time series were derived for each ROI by averaging the time course of all voxels within the ROI. Pearson’s correlation coefficients for each pair of ROIs were calculated and z-transformed, yielding three FC matrices (166 × 166, 120 × 120, and 268 × 268) for each participant. By evaluating the model performance based on the three parcellations, we aimed to choose the one yielding best performance for later experiments. AAL-2 and AAL-3 were selected because the automated anatomical atlas [Reference Shen, Tokoglu, Papademetris and Constable36] was widely used in neuroimaging research. Compared with AAL-2, AAL-3 had a more detailed parcellation of the thalamus (15 parts) and we would like to know if it was helpful for model performance. Shen 268 was selected because it was defined according neuroimaging-based parcellation algorithms based FC data and ever adopted in our previous machine learning study of patients with bipolar disorder [Reference Tzourio-Mazoyer, Landeau, Papathanassiou, Crivello, Etard and Delcroix37].

Figure 1. Automatic classifications of schizophrenic patients and healthy controls based on brain-wise functional connectivity. Brain-wise functional connectivity was calculated for each participant according to three different parcellations and linear support vector machines were developed and evaluated for performance. AAL-3 = the automated anatomical labeling atlas version 3; AAL-2 = the automated anatomical labeling atlas version 2.

Machine learning model creation, training, and performance evaluations

SVM is a supervised learning model with an associated learning algorithm that analyzes data for classification [Reference Chen, Tu, Huang, Bai, Su and Chen38]. The lower triangle elements of the FC matrix were congregated into a vector per subject and regarded as discriminative features to feed into the SVM for classifier training. The hyperparameters C = (1, 10, 100, 1000) and tolerance = (0.001, 0.01, 0.1, 1) of the SVM were optimized using grid search with cross-validation within the training set. To classify an FC matrix in the test set, its classification output was considered as true (positive) for schizophrenia if the probability of class 1 (i.e., diagnosed as schizophrenia) exceeded a predefined threshold (i.e., 0.5).

We used nested 10-fold cross-validation to evaluate SVMs with inner cross-validation for hyperparameter determination and outer cross-validation performance evaluations [Reference Chang and Lin39]. The entire dataset was divided into 10 folds that preserved the relative proportion of the 2 classes (i.e., schizophrenia and healthy controls) according to various experimental setups; nine folds were used as the training set, while the remaining fold was used as the test set. Each training set was used to perform inner cross-validation by dividing into 10 folds again, in which 9 folds were used to train the model and the remaining 1 fold was used for performance validation. This process was repeated 10 times until each of the 10 folds had served as the validation set for hyperparameter determination. The outer cross-validation process was also repeated 10 times until each of the 10 folds had served as the test set. The above process was repeated 10 times until each of the 10 folds had served as the test set. We repeated the experiment 100 times to avoid any bias introduced by random sampling in nested 10-fold cross-validation, and the mean ± standard deviation of the performance was reported.

The performances were evaluated by the following metrics: (a) accuracy: this is the fraction of predictions our machine learning model got right; (b) sensitivity (true positive rate): this refers to the proportion of testing instances who received a positive result out of those participants who actually have schizophrenia; (c) specificity (true negative rate): this refers to the proportion of testing instances who received a negative result out of those participants who do not actually have schizophrenia; (d) F1-score: it is the harmonic mean of precision and recall (sensitivity) that take both false positive and false negative into account; and (e) area under the curve (AUC): this provides an aggregate measure of performance across all possible classification thresholds. AUC ranges in value from 0 (a model whose predications are totally wrong) to 1 (otherwise).

First, we evaluated the classification performance of SVMs based on brain-wide FC of three parcellations, and the one with the best performance was selected for later experiments.

Next, we investigated whether increasing the homogeneity of the demographic properties of sex and age improved SVM performance. We divided the whole sample by sex (235 men: 120 with schizophrenia and 115 healthy controls; 205 women: 100 with schizophrenia and 105 healthy controls) and age (212 younger adults: 18–30 years, 106 with schizophrenia and 106 healthy controls; 228 older adults: 31–50 years, 114 with schizophrenia and 114 healthy controls), and the SVMs based on the subsamples were evaluated using nested 10-fold cross-validation with 100 random sampling, and the mean ± standard deviation of the performance was reported. We also evaluated the generalizability of these SVMs to the participants with different demographic characteristics. The male- or female-specific SVMs, which were trained by only male or female participants, were used to classify the clinical status of the other subsamples with different sex, that is, female or male, respectively. In the similar way, we applied the SVMs trained by only younger participants to predict the clinical status of older adults and those SVMs trained with only older adults to predict participants with younger adults.

At last, we evaluated the effects of training sample size on SVM performance to understand what number of participants is necessary to have a robust machine learning model. We randomly selected a test set of 40 participants (20 with schizophrenia and 20 healthy controls) and fixed the same test set in each testing group for performance comparisons. For training sample size setups, we started with N = 40 (20 with schizophrenia and 20 healthy controls) randomly drawn from the other 400 participants and incrementally increased the 20 patients with schizophrenia and 20 healthy controls until the maximum training set size of N = 400 was reached. A model was built from the training set and tested on the test set repetitively until N = 400. We conducted 100 repetitions with different random samplings of participants, and the mean ± standard deviation of the performance was reported.

Results

Demographic properties

The participants’ demographic data are presented in Table 1. We controlled the age and sex distribution of each group to ensure a balanced study design. Differences in demographic characteristics among the two groups were examined using the chi-square test for categorical variables and the t test for continuous variables. The mean ages of the patients with schizophrenia and healthy controls were 31.7 and 31.8 years, respectively. No significant differences were noted in age and sex distribution between the two groups. However, patients with schizophrenia had significantly lower education illnesses than healthy controls.

The performance of SVMs based on three different parcellations

The detailed results of SVM performance are presented in Table 2. The mean accuracy of the SVMs based on AAL-3, AAL-2, and Shen’s 268 parcellations were 85.05 ± 0.84%, 84.17 ± 0.88%, and 84.45 ± 0.89%, respectively. The SVMs based on AAL3 had slightly but significantly higher than those based on AAL-2 (p < 0.01) or Shen’s 268 (p < 0.01). Therefore, brain-wide FC based on AAL-3 was adopted for later experiments.

Table 2. The performance of support vector machines based on different parcellations for automatic classifications of patients with schizophrenic disorder and healthy controls.

Abbreviations: AAL-2, the automated anatomical labeling atlas version 2; AAL-3, the automated anatomical labeling atlas version 3; AUC, area under curve; SVM, support vector machine.

The effects of demographic homogeneity and training sample size on classification accuracy

The detailed demographic and clinical characteristics of sub-samples according to sex or age range were provided in Supplementary Table S3. Sex-specific SVMs had accuracies of 84.66 ± 1.07% for men and 81.56 ± 1.27% for women (Table 3 and Figure 2a). We also evaluated the generalizability of the sex-specific SVMs to the participants of the other sex. The accuracy was 78.20% for predicting female participants by male-specific models and 81.33 ± 1.23% for predicting male participants by female-specific SVMs (Table 4). Thus, the female-specific SVMs, which yielded an accuracy of 81.56% for predicting female participants, generalized well to predict male participants with an accuracy of 81.33%, but not vice versa. Nevertheless, the sex-specific models had worse performances than the SVMs based on participants of both sexes.

Table 3. The performance of support vector machines based on different homogeneous subsamples for automatic classifications of patients with schizophrenic disorder and healthy controls.

Abbreviations: AUC, area under curve.

Figure 2. The effects of demographic homogeneity and training sample sizes on support vector machines (SVMs) performance. (a) The classification accuracy of SVMs based on all participants and those based on homogeneous subsamples of men, women, younger, and older participants were demonstrated. The SVMs based on homogeneous subsamples were also applied to the other participants with different demographic properties to understand their generalizability. (b) The classification accuracy of SVMs based on incremental training sample sizes improved consistently from 72.61 to 83.32% and >81% accuracy were achieved after training sample size >240.

Table 4. The performance of generalization of support vector machines to participants with different demographic characteristics for automatic classifications of patients with schizophrenic disorder and healthy controls.

Abbreviations: AUC, area under curve; SVMs, support vector machines.

a The classification performance of predicting female participants by male-specific SVMs.

b The classification performance of predicting male participants by female-specific SVMs.

c The classification performance of predicting older-adult participants by younger-adult-specific SVMs.

d The classification performance of predicting younger-adult participants by old-adult-specific SVMs.

Age-specific SVMs yielded accuracies of 80.50 ± 1.38% and 86.13 ± 0.87% for younger and older adults (Table 3 and Figure 2a), respectively. We also evaluated the generalizability of the age-specific SVMs to the participants of the other age range. The accuracies for predicting young participants by using the older adults-specific SVMs and vice versa were 77.24 ± 1.07% and 82.93 ± 1.04%, respectively (Table 4). The younger adults-specific SVMs, which yielded an accuracy of 80.5% for predicting younger participants, generalized well to predict older participants with an accuracy of 82.93%. In the contrary, the older adults-specific SVMs, which yielded an accuracy of 86.13% for predicting older participants, generalized poorly to predict clinical status of young participants with an accuracy of 77.24%.

The relationship between classification accuracy and training sample sizes was shown in Table 5 and Figure 2b. As training sample size increased from 40 to 400, the mean accuracy increased consistently from 72.61 to 83.32% and an average accuracy >81% was achieved after N > 240. According to the standard deviations of classification accuracy across 100 times of random sampling, the SVMs based on higher training sample sizes had lower variance in performance, suggesting a higher stability.

Table 5. The performance of support vector machines based on different training sample size for automatic classifications of patients with schizophrenic disorder and healthy controls.

Abbreviations: AUC, area under curve.

The FCs with greatest contributions to single subject classification

The identification of FCs contributing to differentiate patients from control subjects accurately provided a multivariate approach to identify biomarkers, which could lead to clinically useful tools for establishing both diagnosis and prognosis [Reference Krstajic, Buturovic, Leahy and Thomas40]. Therefore, we further analyzed the FCs contributing to classification performance. In each trained SVM, the absolute values of weights for each brain-wise FCs were regarded as feature importance and averaged across all SVMs based on AAL-3 with whole sample of N = 440. The top 20 FCs with the highest mean weights were listed in Table 6 and involved distributed cortical and subcortical structures (Figure 3). Among them, the thalamo-cerebellar FC had the highest mean weight and played the most important role in differentiating patients from controls.

Table 6. The functional connectivity features with greatest contributions to single subject classification of patients with schizophrenia.

Figure 3. The cortical and subcortical structures involved in the functional connectivities with greatest contributions to single subject classification of patients with schizophrenia.

Discussions

At present, psychiatric diagnoses are based largely on psychiatric interviews, and brain imaging does not play a vital role. However, the approach of combining imaging and machine learning is appealing and could be immensely useful if it is proven to be a robust means of establishing a psychiatric diagnosis. In this study, we used a large single-site dataset to build SVMs to classify patients with schizophrenia and healthy controls based on brain-wide FC, with an accuracy of 85%. In contrast to recent concerns about the biased estimations of classification performance in studies with small samples [Reference Flint, Cearns, Opel, Redlich, Mehler and Emden23], the present results may provide a robust estimation of SVMs for automatic diagnosis of patients with schizophrenia based on brain-wise FCs. On the basis of our data, we recommend AAL-3 for the calculation of brain-wide FC because it yielded higher classification accuracy than AAL-2 and Shen’s 268. Although the models using more homogenous subsamples of narrower age range (the older adult group) seemed to provide better classification accuracy than the overall model, they had poor generalization to other samples with different demographic properties. We also found that classification accuracy increased with incremental increases in training sample size from 40 to 400, with an accuracy of >81% achieved with N > 240. These findings suggest that establishing an SVM based on a large single-site dataset covering varied demographics and disease features may be optimal for the automatic diagnosis of schizophrenia.

Our model had a mean accuracy of 85%, which is slightly better than those reported in recent machine-learning studies based on brain-wide FC: 82.4% [Reference Kalmady, Greiner, Agrawal, Shivakumar, Narayanaswamy and Brown15], 81.74% [Reference Lei, Pinaya, van Amelsvoort, Marcelis, Donohoe and Mothersill16], and 82.61% [Reference Gutiérrez-Gómez, Vohryzek, Chiêm, Baumann, Conus and Cuenod41]. Notably, the performance of these SVMs was highly consistent—between 80 and 85%—suggesting that brain-wide FC is a reliable feature for automatic classification of patients with schizophrenic disorder. The reported high accuracy (>90%) in early studies with small samples may have been due to high variability and over-optimistic estimation of accuracy during cross-validation within a small sample. A recent study systemically investigated the issues with structural MRIs of 1,868 patients with major depressive disorder and healthy controls from the international Predictive Analytic Competition [Reference Flint, Cearns, Opel, Redlich, Mehler and Emden23]. They mimicked the process by which researchers would draw samples of various sizes (N = 4–150) and concluded that a strong risk of misestimation and an accuracy of up to 95% can be observed with sample sizes of 20, mainly due to accuracy overestimation during cross-validation. They recommended using sufficiently large test sets to offset the performance misestimation.

Studies have rarely explored sex- and age-specific machine learning models. One diffusion spectrum imaging study that used a diagnostic index based on whole-brain patterns of altered white matter tract integrity did separate models by sex [Reference Chen, Liu, Hsu, Lo, Hwang and Hwu42]. The overall prediction accuracy was approximately 84% for men, 82% for women, and 76% for men and women together. The results implied that sex has a significant effect on structural connectivity patterns, and it may be helpful to establish different models for male and female participants to improve prediction performance. In our study, sex-specific SVMs performed worse than those based on both sexes. By contrast, the older adult-specific SVMs had slightly better performance than the SVMs based on all ages, but with poor generalization to younger participants. Therefore, it may be practical to establish SVMs based on participants covering various demographic properties in the clinical setting.

We noted that a higher sample size provided better performance and improved the reliability of the SVMs by decreasing performance variance. Our findings are consistent with a previous simulation study suggesting that a larger sample size may improve model stability [Reference Varoquaux20]. Several studies have also explored the relationship between training sample sizes and classification accuracy, but the results have exhibited some disagreement. A study trained SVMs based on structural MRI features and demonstrated a consistent increase of classification accuracy to approximately 70% with increases in sample size (N = 10, 20, 30, …, 220), and the accuracy appeared not to have reached its maximum. Another resting fMRI study used intersubject correlation in functional connectome as to classify patients with schizophrenia and reported higher performance associated with larger training samples [Reference Cui, Liu, Wang, Wang, Guo and Xi43]. By contrast, one study evaluated SVMs based on structural MRI to classify patients with major depressive disorder with variable training set size N = 5–150 and reported no performance improvement for N > 30. Thus, the relationship between classification performance and training sample sizes may depend on the features (structural or functional) and complexities of algorithms, and a higher training sample may not generally lead to better performance. Our findings indicate that the performance continued to improve at N = 400; we therefore suggest increasing the sample size of the dataset even further with the current models.

The choice of brain parcellations has been rather arbitrary in previous machine learning studies using the brain-wide FC as features. AAL-3 is a recently announced brain parcellation [Reference Friston, Williams, Howard, Frackowiak and Turner33]. Compared with AAL-2, AAL-3 has 26 new regions, a new subdivision of the thalamus into 15 parts, and subdivision of the anterior cingulate cortex into subgenual, pregenual, and supracallosal parts. Given the critical role of the thalamocortical FC in schizophrenic disorder [Reference Tu, Lee, Chen, Hsu, Li and Su6,Reference Anticevic, Haut, Murray, Repovs, Yang and Diehl7,Reference Chen, Liu, Hsu, Lo, Hwang and Hwu44], finer parcellations of the thalamus in AAL-3 may have contributed to its higher performance in our study. Nevertheless, the SVMs based on the three parcellations all had high accuracy, thus supporting the reliability of the models.

Our study had several limitations. First, all our patient groups received treatment with various antipsychotics, so the performance of our models on drug-naïve or first-present patients remains unclear. While diagnostic interviews had the most critical values in first-presentation patients, the factor may limit their clinical applications. Secondly, our machine learning models adopted the features of brain-wise FCs and was limited to only one modality. Previous studies suggested that multi-modal techniques may provide superior performance [Reference Lei, Pinaya, Young, van Amelsvoort, Marcelis and Donohoe24,Reference Ji, Chen, Bai, Wang, Wei and Gao45], and it should be explored about the performance of SVMs using multi-modal features in the large single site dataset. Finally, our dataset was limited to a single site, precluding the cross-site generalization of our models. Models based on single-site datasets have a much lower performance in cross-site generalization [Reference Woodward, Karbasforoushan and Heckers46,Reference Lin, Li, Dong, Wang, Sun and Shi47], likely due to various confounding factors such as different MRI machines, acquisition parameters, and diagnostic processes. Future studies should explore the performance of SVMs using multimodal features in large single-site and multi-site datasets.

Conclusions

In this study, SVMs trained on brain-wide FC retrieved from a large single-site dataset of patients with schizophrenia and healthy controls provided a classification accuracy of 85.05%. The results provided support for the diagnostic values of brain-wise FCs in patients with schizophrenia with the largest single site sample size to date. The feature importance analysis found that the thalamo-cerebellar FC played the most important role in differentiating patients from controls and might serve as potential neural biomarker for patients with schizophrenia. AAL-3 was recommended for brain-wise FC constructions. The use of more homogenous participants with the same sex or age range did not provide better performance and establishing SVMs with a large sample size with heterogeneous properties is a recommend for their applications in single subject prediction of patients with schizophrenia.

Supplementary Materials

To view supplementary material for this article, please visit http://doi.org/10.1192/j.eurpsy.2021.2248.

Acknowledgments

We gratefully thank all the participants who took part in this research and all the research assistants and staff who facilitated their involvement.

Data Availability Statement

The data that support the findings of this study are available from the authors.

Restrictions in relation to potentially person identifiable information apply.

Financial Support

The study was supported by grants from Taipei Veterans General Hospital (V99C1-040, V101C1-159, V104C-039, V105C-119, V106C-091, and V107C-100), Taiwan Ministry of Science and Technology (NSC 99-2628-B-010-021-MY2, MOST 105-2314-B-075-056-MY2, MOST 103-2314-B-075-065-MY2, and MOST 109-2314-B-075-062) and the Ministry of Science and Technology, Taiwan under the grant MOST 108-2218-E-008-017-MY3 and MOST 108-2634-F-008-003- through Pervasive Artificial Intelligence Research (PAIR) Labs, Taiwan.

Authorship contributions

Conceptualization: L.-H.L. and P.-C.T.; Formal analysis: L.-H.L., C.-H.C., W.-C.C., and P.-C.T.; Funding acquisition: L.-H.L., P.-L.L., K.-K.S., J.-W.H., Y.-M.B., and P.-C.T.; Investigation: L.-H.L., C.-H.C., W-C.C., M.-H.C., J.-W.H., Y.-M.B., T.-P.S., and P.-C.T.; Methodology: L.-H.L., C.-H.C., W.-C.C., M.-H.C., and P.-C.T.; Supervision: L.-H.L., P.-L.L., K.-K.S., Y.-M.B., and T.-P.S.; Validation: L.-H.L., C.-H.C., W.-C.C., and P.-C.T.; Project administration: C.-H.C., W.-C.C., and P.-C.T.; Resources: C.-H.C., W.-C.C., P.-L.L., M.-H.C., Y.-M.B., and P.-C.T.; Software: C.-H.C. and W.-C.C.; Data curation: C.-H.C., K.-K.S., J.-W.H., Y.-M.B., and P.-C.T.; Visualization: W.-C.C. and T.-P.S.; Writing – original draft: L.-H.L. and P.-C.T.; Writing – review & editing: L.-H.L., W.-C.C., and P.-C.T.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

Dhindsa, RS, Goldstein, DB. From genetics to physiology at last. Nature. 2016;530(7589):162–3.CrossRefGoogle ScholarPubMed
Andreasen, NC, Paradiso, S, O’Leary, DS. “Cognitive dysmetria” as an integrative theory of schizophrenia: a dysfunction in cortical-subcortical-cerebellar circuitry? Schizophr Bull. 1998;24(2):203–18.CrossRefGoogle ScholarPubMed
Biswal, B, Yetkin, FZ, Haughton, VM, Hyde, JS. Functional connectivity in the motor cortex of resting human brain using echo-planar MRI. Magn Reson Med. 1995;34(4):537–41.CrossRefGoogle ScholarPubMed
Greicius, MD, Krasnow, B, Reiss, AL, Menon, V. Functional connectivity in the resting brain: a network analysis of the default mode hypothesis. Proc Natl Acad Sci U S A. 2003;100(1):253–8.CrossRefGoogle ScholarPubMed
Tu, P, Buckner, RL, Zollei, L, Dyckman, KA, Goff, DC, Manoach, DS. Reduced functional connectivity in a right-hemisphere network for volitional ocular motor control in schizophrenia. Brain. 2010;133(Pt 2):625–37.CrossRefGoogle Scholar
Tu, PC, Lee, YC, Chen, YS, Hsu, JW, Li, CT, Su, TP. Network-specific cortico-thalamic dysconnection in schizophrenia revealed by intrinsic functional connectivity analyses. Schizophr Res. 2015;166(1–3):137–43.CrossRefGoogle ScholarPubMed
Anticevic, A, Haut, K, Murray, JD, Repovs, G, Yang, GJ, Diehl, C, et al. Association of thalamic dysconnectivity and conversion to psychosis in youth and young adults at elevated clinical risk. JAMA Psychiatry. 2015;72(9):882–91.CrossRefGoogle Scholar
Tu, PC, Hsieh, JC, Li, CT, Bai, YM, Su, TP. Cortico-striatal disconnection within the cingulo-opercular network in schizophrenia revealed by intrinsic functional connectivity analysis: a resting fMRI study. Neuroimage. 2012;59(1):238–47.CrossRefGoogle ScholarPubMed
Karcher, NR, Rogers, BP, Woodward, ND. Functional connectivity of the striatum in schizophrenia and psychotic bipolar disorder. Biol Psychiatry Cogn Neurosci Neuroimaging. 2019;4(11):956–65.Google ScholarPubMed
Chen, YL, Tu, PC, Lee, YC, Chen, YS, Li, CT, Su, TP. Resting-state fMRI mapping of cerebellar functional dysconnections involving multiple large-scale networks in patients with schizophrenia. Schizophr Res. 2013;149(1–3):2634.CrossRefGoogle ScholarPubMed
Du, Y, Fu, Z, Calhoun, VD. Classification and prediction of brain disorders using functional connectivity: promising but challenging. Front Neurosci. 2018;12:525.CrossRefGoogle Scholar
Tang, Y, Wang, L, Cao, F, Tan, L. Identify schizophrenia using resting-state functional connectivity: an exploratory research and analysis. Biomed Eng Online. 2012;11:50.CrossRefGoogle ScholarPubMed
Arbabshirani, MR, Kiehl, KA, Pearlson, GD, Calhoun, VD. Classification of schizophrenia patients based on resting-state functional network connectivity. Front Neurosci. 2013;7:133.CrossRefGoogle ScholarPubMed
Zhao, W, Guo, S, Linli, Z, Yang, AC, Lin, C-P, Tsai, S-J. Functional, anatomical, and morphological networks highlight the role of basal Ganglia–Thalamus–Cortex circuits in schizophrenia. Schizophr Bull. 2020;46:422–31.Google Scholar
Kalmady, SV, Greiner, R, Agrawal, R, Shivakumar, V, Narayanaswamy, JC, Brown, MRG, et al. Towards artificial intelligence in mental health by improving schizophrenia prediction with multiple brain parcellation ensemble-learning. NPJ Schizophr. 2019;5(1):2.CrossRefGoogle ScholarPubMed
Lei, D, Pinaya, WHL, van Amelsvoort, T, Marcelis, M, Donohoe, G, Mothersill, DO, et al. Detecting schizophrenia at the level of the individual: relative diagnostic value of whole-brain images, connectome-wide functional connectivity and graph-based metrics. Psychol Med. 2020;50(11):1852–61.CrossRefGoogle Scholar
Krystal, JH, State, MW. Psychiatric disorders: diagnosis to therapy. Cell. 2014;157(1):201–14.CrossRefGoogle ScholarPubMed
Schnack, HG, Kahn, RS. Detecting neuroimaging biomarkers for psychiatric disorders: sample size matters. Front Psychiatry. 2016;7:50.CrossRefGoogle ScholarPubMed
Rashid, B, Calhoun, V. Towards a brain-based predictome of mental illness. Hum Brain Mapp. 2020;41(12):3468–535.CrossRefGoogle ScholarPubMed
Varoquaux, G. Cross-validation failure: small sample sizes lead to large error bars. Neuroimage. 2018;180:6877.CrossRefGoogle ScholarPubMed
Arbabshirani, MR, Plis, S, Sui, J, Calhoun, VD. Single subject prediction of brain disorders in neuroimaging: promises and pitfalls. Neuroimage. 2017;145(Pt B):137–65.CrossRefGoogle ScholarPubMed
Vabalas, A, Gowen, E, Poliakoff, E, Casson, AJ. Machine learning algorithm validation with a limited sample size. PLoS One. 2019;14(11):e0224365.CrossRefGoogle ScholarPubMed
Flint, C, Cearns, M, Opel, N, Redlich, R, Mehler, DMA, Emden, D, et al. Systematic misestimation of machine learning performance in neuroimaging studies of depression. Neuropsychopharmacology. 2021;46(8):1510–7.CrossRefGoogle ScholarPubMed
Lei, D, Pinaya, WHL, Young, J, van Amelsvoort, T, Marcelis, M, Donohoe, G, et al. Integrating machining learning and multimodal neuroimaging to detect schizophrenia at the level of the individual. Hum Brain Mapp. 2020;41(5):1119–35.CrossRefGoogle Scholar
Cearns, M, Hahn, T, Baune, BT. Recommendations and future directions for supervised machine learning in psychiatry. Transl Psychiatry. 2019;9(1):271.CrossRefGoogle Scholar
First, M, Spitzer, R, Gibbon, M, Williams, J. Structured clinical interview for DSM-IV Axis I disorders, research version, patient edition with psychotic screen (SCID-I/P W/PSY SCREEN). New York: Biometrics Research, New York State Psychiatric Institute; 1997.Google Scholar
Kay, SR, Fiszbein, A, Opler, LA. The positive and negative syndrome scale (PANSS) for schizophrenia. Schizophr Bull. 1987;13(2):261–76.CrossRefGoogle Scholar
Yan, C-G, Cheung, B, Kelly, C, Colcombe, S, Craddock, RC, Di Martino, A, et al. A comprehensive assessment of regional variation in the impact of head micromovements on functional connectomics. Neuroimage. 2013;76:183201.CrossRefGoogle ScholarPubMed
Power, JD, Barnes, KA, Snyder, AZ, Schlaggar, BL, Petersen, SE. Steps toward optimizing motion artifact removal in functional connectivity MRI; a reply to Carp. Neuroimage. 2013;76:439–41. doi:10.1016/j.neuroimage.2012.03.017.CrossRefGoogle ScholarPubMed
Lemieux, L, Salek-Haddadi, A, Lund, TE, Laufs, H, Carmichael, D. Modelling large motion events in fMRI studies of patients with epilepsy. Magn Reson Imaging. 2007;25(6):894901.CrossRefGoogle ScholarPubMed
Satterthwaite, TD, Elliott, MA, Gerraty, RT, Ruparel, K, Loughead, J, Calkins, ME, et al. An improved framework for confound regression and filtering for control of motion artifact in the preprocessing of resting-state functional connectivity data. Neuroimage. 2013;64: 240–56. doi:10.1016/j.neuroimage.2012.08.052.CrossRefGoogle ScholarPubMed
Friston, KJ, Williams, S, Howard, R, Frackowiak, RS, Turner, R. Movement-related effects in fMRI time-series. Magn Reson Med. 1996;35(3):346–55.CrossRefGoogle ScholarPubMed
Rolls, ET, Huang, C-C, Lin, C-P, Feng, J, Joliot, M. Automated anatomical labelling atlas 3. Neuroimage. 2020;206:116189.CrossRefGoogle ScholarPubMed
Rolls, ET, Joliot, M, Tzourio-Mazoyer, N. Implementation of a new parcellation of the orbitofrontal cortex in the automated anatomical labeling atlas. Neuroimage. 2015;122:15.CrossRefGoogle ScholarPubMed
Shen, X, Tokoglu, F, Papademetris, X, Constable, RT. Groupwise whole-brain parcellation from resting-state fMRI data for network node identification. Neuroimage. 2013;82:403–15.CrossRefGoogle ScholarPubMed
Tzourio-Mazoyer, N, Landeau, B, Papathanassiou, D, Crivello, F, Etard, O, Delcroix, N, et al. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage. 2002;15(1):273–89.CrossRefGoogle ScholarPubMed
Chen, YL, Tu, PC, Huang, TH, Bai, YM, Su, TP, Chen, MH, et al. Using minimal-redundant and maximal-relevant whole-brain functional connectivity to classify bipolar disorder. Front Neurosci. 2020;14:563368.CrossRefGoogle ScholarPubMed
Chang, C-C, Lin, C-J. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol. 2011;2(3):Article no. 27.CrossRefGoogle Scholar
Krstajic, D, Buturovic, LJ, Leahy, DE, Thomas, S. Cross-validation pitfalls when selecting and assessing regression and classification models. J Cheminform. 2014;6(1):10.CrossRefGoogle ScholarPubMed
Gutiérrez-Gómez, L, Vohryzek, J, Chiêm, B, Baumann, PS, Conus, P, Cuenod, KD, et al. Stable biomarker identification for predicting schizophrenia in the human connectome. Neuroimage Clin. 2020;27:102316.CrossRefGoogle ScholarPubMed
Cui, LB, Liu, L, Wang, HN, Wang, LX, Guo, F, Xi, YB, et al. Disease definition for schizophrenia by functional connectivity using radiomics strategy. Schizophr Bull. 2018;44(5):1053–9.CrossRefGoogle ScholarPubMed
Chen, YJ, Liu, CM, Hsu, YC, Lo, YC, Hwang, TJ, Hwu, HG, et al. Individualized prediction of schizophrenia based on the whole-brain pattern of altered white matter tract integrity. Hum Brain Mapp. 2018;39(1):575–87.CrossRefGoogle ScholarPubMed
Ji, GJ, Chen, X, Bai, T, Wang, L, Wei, Q, Gao, Y, et al. Classification of schizophrenia by intersubject correlation in functional connectome. Hum Brain Mapp. 2019;40(8):2347–57.CrossRefGoogle ScholarPubMed
Woodward, ND, Karbasforoushan, H, Heckers, S. Thalamocortical dysconnectivity in schizophrenia. Am J Psychiatry. 2012;169(10):1092–9.CrossRefGoogle Scholar
Lin, X, Li, W, Dong, G, Wang, Q, Sun, H, Shi, J, et al. Characteristics of multimodal brain connectomics in patients with schizophrenia and the unaffected first-degree relatives. Front Cell Dev Biol. 2021;9:631864.CrossRefGoogle ScholarPubMed
Cai, X-L, Xie, D-J, Madsen, KH, Wang, Y-M, Bögemann, SA, Cheung, EFC, et al. Generalizability of machine learning for classification of schizophrenia based on resting-state functional MRI data. Hum Brain Mapp. 2020;41(1):172–84.CrossRefGoogle ScholarPubMed
Orban, P, Dansereau, C, Desbois, L, Mongeau-Pérusse, V, Giguère, C, Nguyen, H, et al. Multisite generalizability of schizophrenia diagnosis classification based on functional brain connectivity. Schizophr Res. 2018;192:167–71.CrossRefGoogle ScholarPubMed
Figure 0

Table 1. Demographic and clinical features of the patients and controls in this study.

Figure 1

Figure 1. Automatic classifications of schizophrenic patients and healthy controls based on brain-wise functional connectivity. Brain-wise functional connectivity was calculated for each participant according to three different parcellations and linear support vector machines were developed and evaluated for performance. AAL-3 = the automated anatomical labeling atlas version 3; AAL-2 = the automated anatomical labeling atlas version 2.

Figure 2

Table 2. The performance of support vector machines based on different parcellations for automatic classifications of patients with schizophrenic disorder and healthy controls.

Figure 3

Table 3. The performance of support vector machines based on different homogeneous subsamples for automatic classifications of patients with schizophrenic disorder and healthy controls.

Figure 4

Figure 2. The effects of demographic homogeneity and training sample sizes on support vector machines (SVMs) performance. (a) The classification accuracy of SVMs based on all participants and those based on homogeneous subsamples of men, women, younger, and older participants were demonstrated. The SVMs based on homogeneous subsamples were also applied to the other participants with different demographic properties to understand their generalizability. (b) The classification accuracy of SVMs based on incremental training sample sizes improved consistently from 72.61 to 83.32% and >81% accuracy were achieved after training sample size >240.

Figure 5

Table 4. The performance of generalization of support vector machines to participants with different demographic characteristics for automatic classifications of patients with schizophrenic disorder and healthy controls.

Figure 6

Table 5. The performance of support vector machines based on different training sample size for automatic classifications of patients with schizophrenic disorder and healthy controls.

Figure 7

Table 6. The functional connectivity features with greatest contributions to single subject classification of patients with schizophrenia.

Figure 8

Figure 3. The cortical and subcortical structures involved in the functional connectivities with greatest contributions to single subject classification of patients with schizophrenia.

Supplementary material: File

Lee et al. supplementary material

Tables S1-S3

Download Lee et al. supplementary material(File)
File 25.6 KB
Submit a response

Comments

No Comments have been published for this article.