Introduction
The relationship between autism and schizophrenia is long and complicated. In the beginning of the twentieth century, the concept of autism was introduced by Bleuler. Here, the concept designated detachment from reality coupled with a predominance of inner life, and it was considered a complex fundamental symptom of schizophrenia (Bleuler, Reference Bleuler1950). On Bleuler's account, autism was not a well demarcated symptom or sign but rather a generic term, expressing a specific intersubjective displacement, which could manifest in various domains such as behavior (e.g. negativism) or cognition (e.g. idiosyncratic logic or beliefs) (Parnas, Licht, & Bovet, Reference Parnas, Licht, Bovet, Maj, Akiskal, Mezzich and Okasha2005a, 9). In the 1920s, Minkowski reconceived autism as the very ‘generative disorder’ of schizophrenia, defining it as loss of vital contact with reality (Minkowski, Reference Minkowski1926), expressing a characteristic disruption of the ordinary, unmediated attunement or resonance with others and of immersion in the shared world. Other substantial studies on schizophrenic autism can be found in the works of Binswanger (Reference Binswanger1957) and Blankenburg (Reference Blankenburg1971) as well as in more recent schizophrenia research (Ballerini et al., Reference Ballerini, Stanghellini, Chieffi, Bucci, Punzo, Ferrante, Merlotti, Mucci and Galderisi2015; Henriksen, Raballo, & Nordgaard, Reference Henriksen, Raballo and Nordgaard2021; Parnas et al., Reference Parnas, Møller, Kircher, Thalbitzer, Jansson, Handest and Zahavi2005b).
Through the works of Kanner and Asperger in the 1940s, the concept of autism was extracted from the psychopathology of schizophrenia and used to designate a rare syndrome with abnormalities of social relationships, stereotyped behavior, and restricted interests detectable already in infancy (Asperger, Reference Asperger1944; Kanner, Reference Kanner1943). DSM-III (American Psychiatric Association, 1980) became a crucial publication for research in what today is considered autism spectrum disorder (ASD). Here, the syndrome initially reported by Kanner and Asperger became a formal diagnosis with the introduction of the category of infantile autism. Crucially, DSM-III defined infantile autism as a pervasive developmental disorder and not as a kind of psychosis (Rutter & Schopler, Reference Rutter and Schopler1992, 469). Previously, children exhibiting signs of this syndrome as well as other severe mental conditions had often been diagnosed with childhood schizophrenia (Rutter, Reference Rutter1972); a diagnostic category that was omitted in DSM-III.
In DSM-IV from 1994 (American Psychiatric Association, 1994), Asperger's disorder was introduced. Asperger's disorder shared the basic characteristics of infantile autism (which was here renamed ‘autistic disorder’) but without delays in language and cognitive development and without loss of developmental skills (American Psychiatric Association, 1994, 75.). Despite concerns about the diagnostic validity of Asperger's disorder (e.g. Ghaziuddin, Tsai, & Ghaziuddin, Reference Ghaziuddin, Tsai and Ghaziuddin1992; Rutter & Schopler, Reference Rutter and Schopler1992; WHO: World Health Organization, 1992, 203), it quickly became a popular diagnosis. In DSM-5 from 2013 (American Psychiatric Association, 2013), the diagnostic categories of autistic disorder, Asperger's disorder, and pervasive developmental disorder were consolidated into ASD, representing a single continuum from mild to severe impairment in the domains of social interaction/communication and restrictive repetitive behaviors/interest (American Psychiatric Association, 2013, xliii). Here, the previous diagnostic onset criteria for infantile autism in DSM-III (<30 months of age) and autistic disorder in DSM-IV (<3 years of age) were diluted, requiring only symptoms to be present in the early development period, but stating that these symptoms may not be fully manifest until later in life (American Psychiatric Association, 2013, 50). Since ‘the early development period’ remains undefined and symptoms are allowed to be undetectable ‘until social demands exceed limited capacities’ (American Psychiatric Association, 2013, 50), the introduction of ASD further extended the diagnostic boundaries of autism. Correspondingly, there has been a dramatic increase in cases of autism over the last 4 decades, from 2–4 children per 10 000 in 1980 (American Psychiatric Association, 1980) to 1 in 44 children (Maenner et al., Reference Maenner, Shaw, Bakian, Bilder, Durkin, Esler and Cogswell2021).
The widening of the diagnostic boundaries of autism has enabled further overlaps with the symptomatology of other mental disorders. Today, the differential diagnosis between autism and schizophrenia, which scholars like Kanner (Reference Kanner1943), Asperger (Reference Asperger1944), and Rutter (Reference Rutter1972) worked hard to establish, has again become unclear. Although ASD and schizophrenia spectrum disorders (SSD) are distinct syndromes with different clinical profiles, natural histories, and treatment options, research has emphasized points of convergence between the two syndromes, including shared genetic liability, neurobiology, psychopathology, and social cognitive impairments (Baribeau & Anagnostou, Reference Baribeau and Anagnostou2013; Jutla, Foss-Feig, & Veenstra-VanderWeele, Reference Jutla, Foss-Feig and Veenstra-VanderWeele2022). Especially, overlaps in the domains of psychopathology and social cognitive impairments may have clinical implications for the differential diagnosis between ASD and SSD and subsequent treatment decisions. In contrast to studies using crude psychopathological measures, recent phenomenologically informed, empirical studies have reported crucial psychopathological differences between ASD and SSD (Nilsson et al., Reference Nilsson, Arnfred, Carlsson, Nylander, Pedersen, Mortensen and Handest2020a, Reference Nilsson, Handest, Carlsson, Nylander, Pedersen, Mortensen and Arnfredb).
In this study, we focus on the reported overlap of social cognitive impairments in ASD and SSD. Systematic reviews and meta-analyses have consistently found similar social cognitive impairments in the two syndromes (Chung, Barch, & Strube, Reference Chung, Barch and Strube2014; Fernandes, Cajão, Lopes, Jerónimo, & Barahona-Corrêa, Reference Fernandes, Cajão, Lopes, Jerónimo and Barahona-Corrêa2018; Oliver et al., Reference Oliver, Moxon-Emre, Lai, Grennan, Voineskos and Ameis2021). Nonetheless, methodological heterogeneity related to sample characteristics and test measures has been emphasized as a major limitation (Chung et al., Reference Chung, Barch and Strube2014; Crespi, Reference Crespi2020; Oliver et al., Reference Oliver, Moxon-Emre, Lai, Grennan, Voineskos and Ameis2021; Veddum & Bliksted, Reference Veddum and Bliksted2022). This prompts the question as to whether the claim of similar social cognitive impairments in ASD and SSD is sufficiently corroborated. Could the overlap of social cognitive impairments reflect imprecision of applied test measures to detect differences (Fernandes et al., Reference Fernandes, Cajão, Lopes, Jerónimo and Barahona-Corrêa2018) or could it be an artifact of methodological heterogeneity across studies? Clarifying these questions may aid differential diagnostic efforts. The purpose of our systematic review is therefore to assess not the results, but the methodology of studies comparing social cognition in ASD and SSD. Only by assessing the studies’ methodology, can we properly assess their results and the validity of conclusions drawn across studies.
Methods
Following the PRISMA guidelines, we conducted a systematic review to identify studies comparing social cognition in patients with ASD and SSD. On January 20th, 2023, PubMed, PsycINFO, and Embase were searched using the following search string: schizophrenia AND autism AND ‘social cognition’. See Fig. 1 for a PRIMSA flow diagram. We applied the following inclusion criteria:
1) Studies had to be original, peer-reviewed, empirical research (not including abstracts from scientific meetings and conference proceedings)
2) Studies had to be in English
3) Studies had to be conducted on human subjects
4) Studies had to include BOTH a schizophrenia spectrum group (including schizophrenia, schizoaffective disorder, schizophreniform disorder, schizotypal personality disorder, psychosis risk syndrome, or psychosis not otherwise specified) AND an autism spectrum group (including autism, Asperger's syndrome, or pervasive developmental disorders)
5) Studies had to utilize social cognitive measures to compare the patient groups
Data extraction
We extracted the following data from each of the eligible studies: title, authors, publication year, number of participants in each group, inclusion/exclusion criteria for each study, diagnostic assessment, age, gender, and other factors compared across groups, and methodology used to assess social cognition and neurocognition.
Results
21 studies met our criteria and were included in the systematic review (see Table 1 for study characteristics and quality assessment). Below, we present the results of the assessment of the studies’ methodology in the following order: social cognitive measures and sample characteristics.
a Quality Assessment made using a modified version of the Newcastle-Ottawa Scale.16 The maximum score for each study is 10 points. Low scores indicate greater risk of bias.
b Tasks are the same construction with different names.
Note: ACSo, Self-Assessment of Social Cognition Impairments; ACS-SP, Advanced Clinical Solutions for WAIS-IV and WMS-IV Social Perception Subset; ADI, Autism Diagnostic Interview; ADOS, Autism Diagnostic Observation Schedule; AIHQ, Ambiguous Intentions and Hostility Questionnaire; AQ, Autism-Spectrum Quotient; ASD, Autism Spectrum Disorder; ASP, Asperger's Disorder; ASQ, Autism Screening Questionnaire; BFRT, Benton Facial Recognition Test; Bio Motion, Basic Biological Motion Task; BLERT, Bell Lysaker Emotion Recognition Task; BP, Bipolar Disorder; CLVT, California Verbal Learning Test; CToM, Cartoon Theory of Mind Task; CSSCEI, Cognitive Styles and Social Cognition Eligibility Interview; DISCD, Diagnostic Interview for Social and Communication Disorders; DIGS, Diagnostic Interview for Genetic Studies; DISCO, Diagnostic Interview for Social and Communication Disorders; DSM, Diagnostic and Statistical Manual of Mental Disorder; ECT, Emotions in Context Task; EEPP, Empathy for Emotional Pain Paradigm; Ekman60, Facial Expressions of Emotion Stimuli and Tests; EmoBio, Emotional Biological Motion Task; EQ, The Empathy Quotient; ER-40, Penn Emotion Recognition Test; HC, Healthy Control; HFA, High Functioning Autism; ICD, International Statistical Classification of Disease; IRI, Interpersonal Reactivity Index; JART-50, Japanese Adult Rating Scale-50; JL-AER, Juslin & Laukka Auditory Emotion Recognition Battery; K-SADS-PL, Schedule for Affective Disorders and Schizophrenia for School-Age Children- Present and Lifetime Version; MASC, Movie for the Assessment of Social Cognition; MINI, The Mini International Neuropsychiatric Interview; MSCEIT, Mayer-Salovey-Caruso Emotional Intelligence Test; NEPSY-II, Developmental Neuropsychological Assessment; PARS, Pervasive Developmental Disorder Assessment Rating Scales; Psychosis NOS, Psychosis Not Otherwise Specified; QCAE, Questionnaire of Cognitive and Affective Empathy; RAD, Relationships Across Domains Test; RAP, Ross Attitudinal Prosody Battery; RMET, The Reading the Mind in the Eyes Test; SCID, Structured Clinical Interview for DSM; SCIP, Screen for Cognitive Impairment in Psychiatry; SCZ, Schizophrenia; SSPA, Social Skills Performance Assessment; STICSS, Subjective Scale to Investigate Cognition in Schizophrenia; SZA, Schizoaffective Disorder; TASIT, The Awareness of Social Inferences Test; ToM, Theory of Mind; TREF, The Facial Emotion Recognition Test; WAIS, Weschler Adult Intelligence Scale; WASI, The Weschler Abbreviated Scale of Intelligence; WISC, Weschler Intelligence Scale for Children; WMS-Faces, Wechsler Memory Scale: Memory for Faces Subtest; WRAT, Wide Range Achievement Test.
Social cognitive measures
Across the 21 studies, 37 different measures of social cognition were used (see Fig. 2). 25 of those measures were used in only 1 study. Based on their methodology, the 37 social cognition measures can be sorted into 10 general categories: (1) self-reports or questionnaires, (2) tasks requiring participants to view still images of faces or eyes without any background or context, (3) tasks involving still images of people within a context, (4) tasks requiring participants to read written social scenarios and answer questions, (5) tasks involving watching videos of people interacting and conversing, (6) tasks including videos of people moving and emoting in silence, (7) tasks involving watching videos of objects, shapes, or dots moving, (8) tasks involving in-person role play with an experimenter, (9) tasks which had participants view a series of images with text, as in a storyboard, (10) tasks involving listening to audio or voice recordings (see Fig. 2).
For example, the most frequently used measure was the Reading the Mind in the Eyes Test (RMET) (Baron-Cohen, Wheelwright, Hill, Raste, & Plumb, Reference Baron-Cohen, Wheelwright, Hill, Raste and Plumb2001) also referred to as ‘Eyes’ or ‘the Eyes Task’. This task was utilized in six studies and requires participants to recognize emotions and mental states in photographs of the eye region of different faces and choose the most accurate descriptor for the thought or feeling being portrayed. Unlike many social cognition tasks, this task does not provide any situational details or context for the emotion states. The Frith-Happé Animations, also referred to as ‘Triangles’ or the ‘Social Perception Task’ (Abell, Happé, & Frith, Reference Abell, Happé and Frith2000), were used in four studies. Here, participants are asked to watch a series of short, animated clips of triangles with varying patterns of movement and then classify the movement in the clip as random, goal-directed, or implying a mental state attribution. The Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT), which is a subtest within the MATRICS Consensus Cognitive Battery (Green et al., Reference Green, Nuechterlein, Gold, Barch, Cohen, Essock and Marder2004), also appeared in four studies. It is primarily comprised of written stories of emotional problems and the participants are asked to answer questions about consequences of one's thoughts, feelings, and actions.
Tasks that were used in several studies were, across studies, often described as testing different social cognitive constructs. For example, the RMET was said to assess for emotion recognition, facial affect recognition, affective Theory of Mind (ToM), social perceptual ToM, social perception, or mental state attribution; The Frith-Happé Animations was said to assess for ToM, implicit ToM, or mental state attribution; The MSCEIT was said to assess for emotion processing, emotional intelligence, emotional perception, or understanding and modulation of emotions; and The Movie for the Assessment of Social Cognition (MASC) was said to assess mental states, over and under mentalizing, or ToM (see Table 2 for descriptions of what the measures that were used in >2 studies were said to assess and what these measures were said to assess by their developers).
1 Altschuler et al. (Reference Altschuler, Trevisan, Wolf, Naples, Foss-Feig, Srihari and McPartland2021), 2Booules-Katri et al. (Reference Booules-Katri, Pedreño, Navarro, Pamias and Obiols2019); 3Couture et al. (Reference Couture, Penn, Losh, Adolphs, Hurley and Piven2010); 4Kandalaft et al. (Reference Kandalaft, Didehbani, Cullum, Krawczyk, Allen, Tamminga and Chapman2012); 5Lugnegård et al. (Reference Lugnegård, Unenge Hallerbäck, Hjärthag and Gillberg2013); 6Pinkham et al. (Reference Pinkham, Morrison, Penn, Harvey, Kelsven, Ludwig and Sasson2020); 7Boada et al. (Reference Boada, Lahera, Pina-Camacho, Merchán-Naranjo, Díaz-Caneja, Bellón and Parellada2020); 8Veddum et al. (Reference Veddum, Pedersen, Landert and Bliksted2019); 9Eack et al. (Reference Eack, Bahorik, McKnight, Hogarty, Greenwald, Newhill and Minshew2013); 10Kuo et al. (Reference Kuo, Wojtalik, Mesholam-Gately, Keshavan and Eack2019); 11Nakata et al. (Reference Nakata, Kanahara, Kimura, Niitsu, Komatsu, Oda and Iyo2020); 12Graux et al. (Reference Graux, Thillay, Morlec, Sarron, Roux, Gaudelus and Peyroux2019); 13Martinez et al. (Reference Martinez, Alexandre, Mam-Lam-Fook, Bendjemaa, Gaillard, Garel, Dziobek and Krebs2017); 14Dubreucq et al. (Reference Dubreucq, Martin, Gabayet, Plasse, Wiesepape, Quilès, Verdoux, Franck and Lysaker2022); 15Tobe et al. (Reference Tobe, Corcoran, Breland, MacKay-Brandt, Klim, Colcombe, Leventhal and Javitt2016).
Sample characteristics
The 21 studies included a total of 1733 patients: 779 with ASD and 954 with SSD. Across studies, the weighted mean age was 25.2 for ASD and 30.5 for SSD.
Diagnostic makeup
In 14 studies, the ASD sample was defined precisely as ASD. In 6 studies, the ASD sample consisted only of patients with high-functioning autism (HFA) or Asperger's disorder, and 1 study the sample consisted of patients with pervasive developmental disorder. In 10 studies, the SSD sample consisted only of patients with schizophrenia, and in 5 studies the SSD sample included patients with schizophrenia or schizoaffective disorder. In the remaining 6 studies, the SSD sample was slightly different (see Table 1).
To diagnose ASD, 16 studies used AAA, ADOS, ADI, or DISCO, 2 studies used ADOS for some but not all patients with ASD, and 3 studies did not specify the diagnostic method. To diagnose SSD, 14 studies used SCID/SCID-II, 2 studies used DIGS, and 5 studies did not specify the diagnostic method. Only 4 studies conducted a sufficient differential diagnostic assessment of both their SSD and the ASD group (Altschuler et al., Reference Altschuler, Trevisan, Wolf, Naples, Foss-Feig, Srihari and McPartland2021; Boada et al., Reference Boada, Lahera, Pina-Camacho, Merchán-Naranjo, Díaz-Caneja, Bellón and Parellada2020; Martinez et al., Reference Martinez, Alexandre, Mam-Lam-Fook, Bendjemaa, Gaillard, Garel, Dziobek and Krebs2017; Martinez et al., Reference Martinez, Mosconi, Daban-Huard, Parellada, Fananas, Gaillard and Amado2019). The remaining 17 studies (81%) used apparently solely an insufficient, specialized diagnostic method (AAA, ADOS, ADI, or DISCO) to diagnose ASD, meaning these 17 studies did not conduct a comprehensive differential diagnostic assessment of this group. If such comprehensive assessments were, in fact, conducted in these studies, it has not been transparently conveyed in the published articles.
IQ
18 of the 21 studies reported an estimated IQ, utilizing varying versions of the WASI, WAIS, WISC, WRAT-3, SCIP, Jart-50, and Quick Test. Each of these studies reported a mean IQ for each diagnostic group, except for Graux et al. (Reference Graux, Thillay, Morlec, Sarron, Roux, Gaudelus and Peyroux2019), which only reported an average IQ for their ASD group. Dubreucq et al. (Reference Dubreucq, Martin, Gabayet, Plasse, Wiesepape, Quilès, Verdoux, Franck and Lysaker2022) reported WAIS-IV short-term memory and working memory subtest scores only. 16 of the 18 studies reporting IQ, reported IQ averages above 100 for their ASD group. Across the studies, the IQ weighted average was 105.31 (s.d. = 7.75) for the ASD group and 100.22 (s.d. = 6.69) for the SSD group, respectively.
Medications
Across the 21 included studies, 13 studies reported participants’ medication usage in some fashion, while 8 studies did not. Of the 13 studies that recorded medication, 4 of them reported that every participant in the SSD group received at least 1 antipsychotic medication, while the ASD group was not on any medication. In each of the remaining 9 studies, the SSD group was more frequently on antipsychotics, more frequently on combinations of multiple antipsychotics, and prescribed higher dosages than their ASD counterparts. In 8 studies, it was reported that some of the participants in the ASD group were prescribed antipsychotics. 7 studies had exclusion criteria related to medication usage, e.g. not allowing for changes in medications within a certain time-period or for antipsychotic dosages above a certain chlorpromazine equivalent threshold. 2 studies (Eack et al., Reference Eack, Bahorik, McKnight, Hogarty, Greenwald, Newhill and Minshew2013; Kuo, Wojtalik, Mesholam-Gately, Keshavan, & Eack, Reference Kuo, Wojtalik, Mesholam-Gately, Keshavan and Eack2019) required that the SSD group received antipsychotic medication.
Substance use
2 of the 21 included studies reported participants’ history of substance use disorder. In Kuo et al. (Reference Kuo, Wojtalik, Mesholam-Gately, Keshavan and Eack2019), substance use disorder was noted only for the SSD group, revealing that 44% of SSD participants had substance use disorder. In Eack et al. (Reference Eack, Bahorik, McKnight, Hogarty, Greenwald, Newhill and Minshew2013), 60% of participants in the SSD group met criteria for substance use disorder. In both studies, it was unclear if these instances of substance use disorder were current or lifetime.
Discussion
In this review, we investigated the methodology of studies comparing social cognition in SSD and ASD. Upon reviewing the literature, serious methodological issues became evident, which collectively question the validity of the main result from recent systematic reviews and meta-analyses, namely that of similar social cognitive impairments in ASD and SSD. In sum, we found that the measures used to assess social cognition were remarkably heterogenous, there was little consensus about what domains of social cognition the many measures actually assessed, and there were methodological issues pertaining to diagnostic assessment and sample characteristics. Below, we discuss each of these issues in turn.
We identified 37 different measures of social cognition used across the 21 reviewed studies, with 25 measures appearing in only a single study each. These tasks vary greatly in how they are constructed and administered, and they range from identifying elements of photographs to watching shapes move in a video to reading and responding to written social scenarios. This diversity testifies to a pervasive heterogeneity in the methodology for assessing social cognition. It also emphasizes that ecological validity remains a substantial issue for most of these measures (Beer & Ochsner, Reference Beer and Ochsner2006; Revsbech et al., Reference Revsbech, Mortensen, Nordgaard, Jansson, Saebye, Flensborg-Madsen and Parnas2017). Put differently, reflecting upon and forming judgements about the movement of shapes or dots, emotions expressed in the eye region only, or what takes place in videos or written scenarios seem far removed from real-life, contextual social interactions. Real-world social interactions take place on a backdrop of a basic, immediate attunement between the interacting individuals. The social cognitive measures do not tap into this basic level of interpersonal attunement, which according to both founding and contemporary scholars in schizophrenia and autism research is where the root problems, different as they may be, lie in these disorders (Asperger, Reference Asperger1944; Blankenburg, Reference Blankenburg1971; Bleuler, Reference Bleuler1950; Hobson, Chidambi, Lee, & Meyer, Reference Hobson, Chidambi, Lee and Meyer2006; Kanner, Reference Kanner1943; Minkowski, Reference Minkowski1926).
Another issue is the conceptual ambiguity surrounding the definitions of the domains of social cognition and how these domains were tested. The same measure – administered in the same way – was often used to test different domains of social cognition across the different studies. For example, the RMET was said to assess emotion recognition, affective ToM, social perceptual ToM, or mental state attribution depending on the study. Notably, this conceptual confusion is not really a matter of the authors of the reviewed studies mislabeling the targeted social cognitive domains of the measures they use. Rather, the confusion seems mainly to stem from ambiguous and imprecise definitions of what these measures test in the original studies that introduced them (see Table 2). To illustrate some of these basic problems, we here focus on the most used measure, the RMET.
In the study that introduced the RMET (Baron-Cohen et al., Reference Baron-Cohen, Wheelwright, Hill, Raste and Plumb2001), it is described as an ‘advanced theory of mind test’. Referencing Premack and Woodruffs’ (Reference Premack and Woodruff1978) classical study on ToM in chimpanzees, ToM is defined as ‘the ability to attribute mental states to oneself or another person’ (Baron-Cohen et al., Reference Baron-Cohen, Wheelwright, Hill, Raste and Plumb2001). The authors state that ToM ‘is the main way in which we make sense of or predict another person's behaviour’; that ToM is also referred to as ‘mentalizing’, ‘mind reading’, ‘social intelligence’; that ToM ‘overlaps’ with ‘empathy’ (cf. Premack & Woodruff, Reference Premack and Woodruff1978, 518); and that RMET measures ‘social sensitivity or mind-reading’. Several basic problems can be pointed out: (1) The abundance of partially overlapping but clearly not identical concepts induce confusion about what the RMET examines from the very outset. This confusion could be solved by specifying each of these concept's extension (i.e. the set of objects to which it applies) and intension (i.e. the properties connected to it) but no such attempt is made in the study. (2) The authors are apparently unsure about whether their measure tests ‘social sensitivity or mindreading’ (our emphasis). (3) ToM is a broad construct, concerning our ability to ascribe mental states like intentions, beliefs, knowledge, and emotions to others, and the guiding assumptions are that (i) these ascriptions are based on inferences and (ii) that we make inferences because others’ mental states are not directly observable to us. Given that ToM is such a broad construct, it seems questionable, at least, that the RMET, which narrowly tests emotion recognition in still photos of the eyes can be said to test ToM as such. Put differently, does performance on the RMET enable us to draw conclusions about the person's capacities for ToM, social sensitivity, or social intelligence beyond the specific tasks of emotion recognition examined in RMET? Is it not imaginable that a person may perform poorly on the RMET and still be able to attribute mental states like intentions, beliefs, knowledge, or emotions to others?
We fully recognize that carving out and delimiting domains of social cognition for specific measures is not an easy task. Yet, the conceptual confusion surrounding the definition of the original measures is telling for the variety of labels of social cognitive domains these measures subsequently have been said to test. If we do not have a firm conceptual grasp of the constructs or phenomena we aim to study and assess, our empirical research is not likely to yield clear results (Marková & Berrios, Reference Marková and Berrios2016). When the delineation between domains of social cognitions is so blurred and the same measure is said to be assessing different domains, it becomes difficult to draw any solid conclusion about the character of the social cognitive impairments being measured and about the shared vs. distinct nature of social cognitive impairments in ASD and SSD, though such distinctions could provide important targets for etiological research.
To advance research on social cognition, interdisciplinary collaboration, combining theoretical models of social cognition, which conceptually carve out its inner domains and their boundaries, and empirical studies, testing the discriminative power of the different measures in accordance with these domains is strongly needed. While testing the psychometric properties of different social cognitive measures is crucial to this end (see below), it is of utmost importance to conceptually delineate the social cognitive domains these measures test – good psychometric properties cannot compensate for lack of conceptual delineation of what the measure tests. Paraphrasing an insight by Kendler (Reference Kendler1990), psychiatric research is confronted by both empirical and ‘nonempirical’ issues (e.g. the conceptual clarity of the constructs or phenomena we study empirically) and they both need to be considered for psychiatric research to thrive and prosper.
The abundance of measures used testifies to the importance of research like The Social Cognition Psychometric Evaluation (SCOPE) study (Pinkham et al., Reference Pinkham, Penn, Green, Buck, Healey and Harvey2014; Pinkham, Harvey, & Penn, Reference Pinkham, Harvey and Penn2018), which assesses the psychometric validity of social cognitive measures. One of the findings from SCOPE was that RMET – the most frequently used social cognitive measure across the included studies in our review – did not show sufficient psychometric properties to be evaluated as ‘acceptable’. By contrast, the 3 measures, which in the SCOPE study were evaluated as ‘acceptable’ and recommended for use in clinical trials, were only used in 4 of the 21 included studies in our review: The Penn Emotion Recognition Test (ER-40) was used in 3 studies (Eack et al., Reference Eack, Bahorik, McKnight, Hogarty, Greenwald, Newhill and Minshew2013; Pinkham et al., Reference Pinkham, Morrison, Penn, Harvey, Kelsven, Ludwig and Sasson2020; Tobe et al., Reference Tobe, Corcoran, Breland, MacKay-Brandt, Klim, Colcombe, Leventhal and Javitt2016), The Hinting Task in 2 studies (Boada et al., Reference Boada, Lahera, Pina-Camacho, Merchán-Naranjo, Díaz-Caneja, Bellón and Parellada2020; Pinkham et al., Reference Pinkham, Morrison, Penn, Harvey, Kelsven, Ludwig and Sasson2020), and The Bell Lysaker Emotion Recognition Task (BLERT) in 1 study (Pinkham et al., Reference Pinkham, Morrison, Penn, Harvey, Kelsven, Ludwig and Sasson2020). Prioritizing measures with the best psychometric properties will solve many problems related to test heterogeneity.
As noted briefly above, ecological validity is also an issue in many of the used measures and it deserves some unpacking in this context. The construct of ecological validity is usually divided into ‘veridicality’, referring to the degree to which a measure correlates with measures of real-life functioning, and ‘verisimilitude’, referring to the degree to which the cognitive demands of a measure resemble the cognitive demands at stake in real-life situations (Chaytor & Schmitter-Edgecombe, Reference Chaytor and Schmitter-Edgecombe2003; Franzen & Wilhelm, Reference Franzen, Wilhelm, Sbordone and Long1996). In the SCOPE study (Pinkham et al., Reference Pinkham, Harvey and Penn2018), ecological validity of the social cognitive measures was assessed to some extent in terms of ‘veridicality’, finding some correlations between these measures and functional outcome in schizophrenia.
The other aspect of ecological validity, ‘verisimilitude’, is perhaps even more challenging. Admittedly, it may be very difficult to design a measure of social cognition that has perfect verisimilitude, because every test situation of social cognition might be a somewhat artificial setup compared to real-life social cognition. In principle, however, it is possible to differentiate between degrees of verisimilitude by providing arguments for which methodologies of the social cognition measures that best approximate real-life social cognition – e.g. should priority be given to measures that target humans (instead of moving shapes or dots), measures that include situational or contextual information, and/or to measures that entail interactional elements to better resemble real-life social cognition? In the Results section ‘Social cognitive measures’, we sorted the 37 applied measures of social cognition into 10 different categories based on their methodology. This division may serve as a preliminary reference for reflecting upon and providing arguments for assessing the verisimilitude of these measures. While there is a need for future research to develop new social cognitive measures with a high degree of verisimilitude, the success of such new measures hinges on the described interdisciplinary work of conceptually carving out the inner domains of social cognition and delineating their boundaries.
Regarding sample characteristics, we found several critical issues. First, it is of major concern that 17 studies (81%) apparently relied solely on an insufficient, specialized diagnostic method to assess ASD. Without conducting a comprehensive differential diagnostic assessment, we cannot be sure that the patients with ASD in these studies are correctly diagnosed. Although they fulfill diagnostic criteria for ASD, they may also fulfill criteria for other mental disorders, including SSD. Although some studies state that they excluded patients with ASD with a comorbid diagnosis of SSD or a psychotic disorder, these disorders cannot be ruled out when the patients with ASD were not assessed for such disorders. Given overlaps between ASD and SSD (Jutla et al., Reference Jutla, Foss-Feig and Veenstra-VanderWeele2022), this is a crucial issue. For example, a recent nationwide cohort study of 11 170 adolescents and adults with ASD found a progression rate to schizophrenia of 10.26% (Hsu et al., Reference Hsu, Chu, Tsai, Hsu, Huang, Cheng and Chen2022; Lugo Marín et al., Reference Lugo Marín, Alviani Rodríguez-Franco, Mahtani Chugani, Magán Maganto, Díez Villoria and Canal Bedia2018). To tackle this issue, future studies must conduct comprehensive differential diagnostic assessment of their sample, including their ASD groups.
Another recurring issue was attempts to draw conclusions from samples that were not adequately matched – e.g. comparing HFA (which only represents a part of ASD) to chronic schizophrenia (which also only represents a part of SSD). This issue was also reflected in the IQ assessments. Of the 21 included studies, 16 reported IQ averages of >100 for their ASD sample. This indicates that not many patients in the more severe end of ASD were included in the sample. For example, a recent birth cohort study found that in the group with the most inclusive definition of ASD, 59.1% had an IQ score in the range of average or higher (average defined as 86 to116), meaning an estimated 40.9% of participants should have an IQ score of 85 or below (Katusic, Myers, Weaver, & Voigt, Reference Katusic, Myers, Weaver and Voigt2021).
Another issue related to sample matching is medication usage, which was often not reported at all. In studies that did report it, the samples drastically differed in medication usage both within and across studies. In more than half of the studies, medication usage was noted in some fashion, but not always controlled for. In four studies, all patients in the SSD group were taking at least one antipsychotic, while the ASD sample were taking none. Medication usage is an important issue to consider because psychotropic medication has been shown to affect general cognition as well as social cognition – e.g. a recent meta-analysis (Oliver et al., Reference Oliver, Moxon-Emre, Lai, Grennan, Voineskos and Ameis2021) found that as antipsychotic treatment increased, ToM performance decreased. We agree with the authors of this meta-analysis, who argue that future studies must assess how antipsychotic treatment affects social cognition across ASD and SSD.
A final issue about group matching concerns substance use. Most studies did not record substance use, and in the two studies that did, it was unclear whether patients had current and/or lifetime substance use disorders. In these studies, only patients with SSD had some sort of substance use disorder. Since current and historic substance use disorders may impact cognitive performance (Bora & Zorlu, Reference Bora and Zorlu2017; Potvin et al., Reference Potvin, Pelletier, Grot, Hébert, Barr and Lecomte2018), the issue of substance use must also be addressed in full detail in future studies.
In our view, the methodological issues discussed above collectively indicate a more global need for a renewed focus on methodological rigor in psychiatric research. Without a solid methodological basis, the validity, applicability, and clinical relevance of empirical results remain dubious. Perhaps with the intention of solving some of these issues, a general trend in contemporary psychiatric research, also found in our review, is to create ever new tests or scales and validate them against existing ones. In our view, such new tests or scales rarely contribute to advance psychiatric knowledge but instead they unintentionally end up further increasing methodological heterogeneity as was the case in our study.
Conclusion
We found substantial and pervasive methodological heterogeneity across studies, which collectively questions the validity of the reported finding of similar social cognitive impairments in ASD and SSD. Drawing this conclusion seems premature. By highlighting shortcomings in the contemporary literature, we have emphasized challenges and possible solutions for future research on social cognition in clinical populations. Specifically, we emphasize a need for (i) interdisciplinary efforts to improve delineation of social cognitive domains and identify suitable measures for each domain, (ii) increased homogeneity in measures used to assess social cognition, and (iii) improving differential diagnostic assessment and group matching.
Authors’ contributions
The authors jointly identified the study's objective, search string, and selection criteria for the systematic review. GEK searched the data bases and all authors participated in the sorting of articles, data extraction, and quality assessment. All authors discussed the study's results and their interpretation. GEK wrote a first draft of the manuscript, which was substantially revised by MGH and JN. All authors approved the final version.
Financial support
This study was not supported by a research grant.
Conflict of interest
None were reported.