1. Introduction
Spontaneous co-speech gestures play a central role in language acquisition (Brentari & Goldin-Meadow, Reference Brentari and Goldin-Meadow2017), language processing (Alibali et al., Reference Alibali, Kita and Young2000; McNeill, Reference McNeill1985, Reference McNeill1992), and learning (Cook et al., Reference Cook, Mitchell and Goldin-Meadow2008; Goldin-Meadow et al., Reference Goldin-Meadow, Cook and Mitchell2009; Goldin-Meadow & Wagner, Reference Goldin-Meadow and Wagner2005). However, the gestures produced by individuals with autism spectrum disorder (ASD) show striking developmental and maturational differences from the gestures produced by their neurotypical peers. Fluent autistic talkers display an asynchrony of speech and gesture coordination, with gestures both preceding and lagging the associated speech (de Marchena & Eigsti, Reference de Marchena and Eigsti2010; Morett et al., Reference Morett, O’Hearn, Luna and Ghuman2016). Similar findings have been reported for gesture/gaze asynchrony (Bloch et al., Reference Bloch, Tepest, Jording, Vogeley and Falter-Wagner2022) and visual/motor asynchrony (Nebel et al., Reference Nebel, Eloyan, Nettles, Sweeney, Ament, Ward, Choe, Barber, Pekar and Mostofsky2016). Speech-gesture synchrony impacts listener comprehension such that narratives characterized by more asynchronous gestures are rated by naïve listeners as more difficult to follow (de Marchena & Eigsti, Reference de Marchena and Eigsti2010). That is, autistic narrators produced more iconic gestures that were poorly synchronized with the semantically relevant speech (e.g., a “throwing” gesture that is temporally displaced from the phrase “and he threw it”); furthermore, the degree of temporal asynchrony was strongly associated with naïve ratings of how easy a story was to understand. Motor coordination itself is often impaired in autism (indeed, some have suggested it be added as a diagnostic criterion of autism; Bhat, Reference Bhat2021). As such, this raises an important question: is autistic speech-gesture asynchrony due to impairments in motor coordination, or is it a reflection of impairments in higher-order language processes such as discourse planning? The current study was designed to investigate the deliberate coordination of hand movements with speech in autistic individuals to help rule out the impact of motor processes in gesture asynchrony in autism.
The study focuses on beat-like rather than representational qualities of gesture as related to development and neurodiversity. In typical adults, pure beat gestures are structured, pulse-like movements (typically, a simple up-and-down “biphasic” motion) that couple with the prosody of speech, conveying pragmatic information with limited semantic or referential content (Pouw & Dixon, Reference Pouw and Dixon2019; Wagner et al., Reference Wagner, Malisz and Kopp2014). However, some semantically laden iconic gestures may also have a beat-like speech-synchronized quality (Pouw & Dixon, Reference Pouw and Dixon2019; Rohrer et al., Reference Rohrer, Delais-Roussarie and Prieto2023). Beat gestures are closely synchronized with acoustically prominent accented syllables (Krahmer & Swerts, Reference Krahmer and Swerts2007; Wang & Chu, Reference Wang and Chu2013). Studies of the development of beat gestures suggest such gestures contribute to listener comprehension; for example, children who actively move their hands while telling a story produced more prosodically fluent stories with stronger narrative structure compared to children who are not encouraged to gesture (Vilà-Giménez & Prieto, Reference Vilà-Giménez and Prieto2020). Most studies of gesture production in development and in autism have not specifically examined beat gestures (see below), but what we do know suggests some alternative developmental patterns in this domain. We review here the literature on gesture production in autism, speech-gesture synchrony and relevant studies of synchrony and related phenomena in autism, and studies of motor control and coordination in autism.
1.1. Gesture development in autism
Mounting evidence indicates that the development of gestures is altered in autism.Footnote 1 Clinically, impairments in gesture are asserted on gold-standard ASD diagnostic measures and screeners such as the Autism Diagnostic Observation Schedule (Lord et al., Reference Lord, Rutter, DiLavore, Risi, Gotham and Bishop2012), the Autism Diagnostic Interview (ADI; Lord et al., Reference Lord, Rutter and LeCouteur1994), and the Modified Checklist for Autism in Toddlers (M-CHAT; Robins et al., Reference Robins, Fein, Barton and Green2001); on these measures, absence or infrequent gestures is thought to be symptomatic. Deictic gestures early in development in autism have been well-studied; results suggest that early declarative deictics (i.e., pointing to share attention) are reduced in frequency, whereas instrumental deictics (i.e., pointing to request) are less affected, which suggests an impairment in the communicative rather than the motoric domain (Bono et al., Reference Bono, Daley and Sigman2004; Loveland & Landry, Reference Loveland and Landry1986; Mundy et al., Reference Mundy, Sigman, Ungerer and Sherman1986, Reference Mundy, Sigman and Kasari1990). The expression and comprehension of deictics are reduced (Mundy et al., Reference Mundy, Sigman, Ungerer and Sherman1986) and delayed (Camaioni et al., Reference Camaioni, Perucchini, Muratori and Milone1997) in autism, and these delays are tied to delays in language acquisition (Loveland & Landry, Reference Loveland and Landry1986; Mundy et al., Reference Mundy, Sigman and Kasari1990). Some studies report reduced rates of gesture in children with ASD (Bartak et al., Reference Bartak, Rutter and Cox1975, Reference Bartak, Rutter and Cox1977; Bono et al., Reference Bono, Daley and Sigman2004), and most studies report delays in the onset of gesture production (Charman et al., Reference Charman, Drew, Baird and Baird2003; Luyster et al., Reference Luyster, Lopez and Lord2007). Children with ASD produce a reduced variety of gestures (Colgan et al., Reference Colgan, Lanter, McComish, Watson, Crais and Baranek2006; Wetherby & Prutting, Reference Wetherby and Prutting1984), and their gestures are less likely to occur in combination with both vocalizations and eye contact (Murillo et al., Reference Murillo, Camacho and Montero2021). Gesture production in autism is a strong predictor of later communication skills (Taverna et al., Reference Taverna, Huedo-Medina, Fein and Eigsti2021), as reviewed by Ramos-Cabo et al. (Reference Ramos-Cabo, Vulchanov and Vulchanova2019). Speech and gesture development is longitudinally closely connected in autistic toddlers, as is true in typical development (Dimitrova & Özçalışkan, Reference Dimitrova and Özçalışkan2022; Ingersoll & Lalonde, Reference Ingersoll and Lalonde2010; Özçalışkan et al., Reference Özçalışkan, Adamson, Dimitrova and Baumann2017; Tager-Flusberg et al., Reference Tager-Flusberg, Calkins, Nolin, Baumberger, Anderson and Chadwick-Dias1990). Overall, the literature suggests significant delays in early gesture development in autism, particularly for more semantically complex and more social gestures, and suggests that these early delays correlate with later language attainment.
1.2. Gestures in verbally fluent autistic individuals
While most of the autism gesture literature has focused on the period of early language acquisition, a growing empirical literature describes gesture production impairments in verbally fluent children and adults. Many studies examine spontaneous gesture production during narrative tasks. Some studies report reduced gesture rates in autism (Silverman et al., Reference Silverman, Eigsti and Bennetto2017), while others find no group differences in gesture frequency after accounting for number of utterances (Attwood et al., Reference Attwood, Frith and Hermelin1988; de Marchena & Eigsti, Reference de Marchena and Eigsti2010; Garcia-Perez et al., Reference Garcia-Perez, Lee and Hobson2007; Tantam et al., Reference Tantam, Holmes and Cordess1993; Wong & So, Reference Wong and So2018). A study of verbally fluent autistic and non-autistic school-age children used an experimental task eliciting gestures and found that the autism group was less likely to gesture to specific spatial locations to refer to non-present events or objects (So et al., Reference So, Lui, Wong and Sit2015). A study of conversation in verbally fluent autistic adults and their neurotypical peers reported group differences in both semantic/pragmatic and motoric features of spontaneously produced co-speech gestures (de Marchena et al., Reference de Marchena, Kim, Bagdasarov, Parish-Morris, Maddox, Brodkin and Schultz2019). The autistic adults were more likely to use gestures to facilitate turn-taking in conversation and produced more unimanual than bimanual gestures. De Marchena and Eigsti (Reference de Marchena and Eigsti2014) found that adolescents with ASD produced fewer gestures while telling a story to a listener, but produced more deictic gestures when completing an individual, non-communicative executive function task, compared to an age- and IQ-matched non-autistic group, suggesting that the gestures were used more to regulate one’s own processing rather than for communicating with a listener. Relatedly, Morett et al. (Reference Morett, O’Hearn, Luna and Ghuman2016) coded gestures produced in the presence or absence of a visible listener and found that, for the non-autistic comparison group, communicative quality and gesture frequency increased in the presence of a visible listener; there was no such increase in the autism group, suggesting that social impairments contribute to gesture production differences.
1.3. Speech-gesture synchrony
Gesture and speech have a tightly synchronized timecourse that connects to semantic and pragmatic features (Levelt et al., Reference Levelt, Richardson and la Heij1985; Nobe, Reference Nobe and McNeill2010) and prosodic features Wagner et al., Reference Wagner, Malisz and Kopp2014); they form an integrated system through which we produce and comprehend meaning (Kelly et al., Reference Kelly, Ozyurek and Maris2010). In addition to coupling and coordination on a conceptual (e.g., at a conceptual, semantic level), synchrony (coupling) may also reflect purely motoric processes required to coordinate motor planning for speech with that involved in gesture production. For example, when speaking mono-syllable utterances while simultaneously tapping a finger, adult speakers tend to transfer an emphasis from gesture to speech, and from speech to gesture (Parrell et al., Reference Parrell, Goldstein, Lee and Byrd2014). Specifically, during more emphatic tapping, mouth aperture increases, and when producing more emphatic speech, tapping motions are larger. Similarly, in a recent study, it was found that more beat-like (decelerative) gestures during counting-out-rhymes were associated with more emphasized speech (Kadavá et al., Reference Kadavá, Cwiek, Fuchs and Pouw2023). Such gesture-speech coupling has been argued to be supported by biomechanical interactions of the upper limbs and the respiratory-vocal system (for a review, see Pouw & Fuchs, Reference Pouw and Fuchs2022). In this account, gestures play a direct physical role in the control of vocalization; that is, the biomechanics of motor control push vocalizations and gestures into synchronized activity. Consistent with this hypothesis, results indicate that in the context of severe impairments in speech production (e.g., aphasia), aspects of gesture-speech synchrony such as coupling between gesture-kinematics and vocal-acoustics are maintained to some degree (Jenkins & Pouw, Reference Jenkins and Pouw2023; Pouw et al., Reference Pouw, Harrison and Dixon2022).
In autism, several studies of semantic speech-gesture synchrony and movement timing report clear group differences. An older study reported a significant reduction in the co-occurrence of gestures with vocalization in autism (Tantam et al., Reference Tantam, Holmes and Cordess1993); this finding has been conceptually replicated by more recent studies studying the timing between gesture strokes and their lexical affiliates (de Marchena & Eigsti, Reference de Marchena and Eigsti2010; Morett et al., Reference Morett, O’Hearn, Luna and Ghuman2016). In the de Marchena study, the degree of asynchrony of speech and gestures predicted ratings of communicative quality and autism symptom severity, indicating that speech-gesture synchrony was correlated with comprehension in listeners who saw videos of story narrations and were asked to rate the clarity of the stories (see also Habets et al., Reference Habets, Kita, Shao, Ozyurek and Hagoort2010).
Autistic gestures appear atypical in dimensions beyond synchrony with speech, specifically in kinematics or motion velocity. These differences may be early-emerging; a retrospective review of home videos of infants later diagnosed with autism suggested timing differences in early bimanual, but not unimanual, repetitive movements (Purpura et al., Reference Purpura, Costanzo, Chericoni, Puopolo, Scattoni, Muratori and Apicella2017). Using a motion-tracking system, one study reported significant kinematic differences in the number of meaningful holds between movements in the autism group (Trujillo et al., Reference Trujillo, Özyürek, Kan, Sheftel-Simanova and Bekkering2021). Another motion-tracking study examined the production of nonverbal signals toward a target that was invisible to a communication partner and found greater delays (reduced temporal coherence) of nonverbal gestures in the adult autistic group (Bloch et al., Reference Bloch, Tepest, Jording, Vogeley and Falter-Wagner2022). A study of the kinematics of simple movements in autistic adults found group differences in jerk, acceleration, and velocity, suggesting atypical movement quality (Cook et al., Reference Cook, Blakemore and Press2013). Similarly, a study of 9–14-year-old autistic and non-autistic children revealed an atypical kinematic profile that was correlated with differences in the perception of biological motion and with measures of autism characteristics (Butera et al., Reference Butera, Delafield-Butt, Kilroy, Harrison, Anzulewicz, Sobota and Aziz-Zadeh2017). A study of the dynamics of movement in autistic youth found that while grasping activity was typical during a non-social activity, the autism group displayed faster transitioning of grasping activity due to object size changes (reduced hysteresis) when a social component was added to the task as compared to a control group (Amaral et al., Reference Amaral, Romero, Kloos and Richardson2017). Although a study of interpersonal coordination of whole-body movements during conversation found no group differences (Romero et al., Reference Romero, Fitzpatrick, Roulier, Duncan, Richardson and Schmidt2018), most research to date examining subtle motor dynamics reports significant differences in autism.
1.4. Motor impairments in autism
As described, studies in autism have revealed generally atypical gesture development, atypical speech-gesture coordination, and atypical motor kinematics. Of course, these findings could all be due to difficulties with motor coordination. Numerous studies suggest that autism is characterized by important developmental differences in motor coordination (Macneil & Mostofsky, Reference Macneil and Mostofsky2012; McAuliffe et al., Reference McAuliffe, Pillai, Tiedemann, Mostofsky and Ewen2017; Mostofsky & Ewen, Reference Mostofsky and Ewen2011; Nebel et al., Reference Nebel, Eloyan, Nettles, Sweeney, Ament, Ward, Choe, Barber, Pekar and Mostofsky2016). Motor impairments in movement preparation (Rinehart et al., Reference Rinehart, Bradshaw, Brereton and Tonge2001), movement coordination (McAuliffe et al., Reference McAuliffe, Pillai, Tiedemann, Mostofsky and Ewen2017; Vilensky et al., Reference Vilensky, Damasio and Maurer1981), and differences in reaching and grasping movements (Glazebrook et al., Reference Glazebrook, Gonzalez, Hansen and Elliott2009), all serve as important predictors of long-term outcomes in core ASD domains such as social and communication skills (Stevens et al., Reference Stevens, Fein, Dunn, Allen, Waterhouse, Feinstein and Rapin2000). A study of 95 autistic toddlers followed longitudinally found that, somewhat surprisingly, early motor skills as assessed in infancy were a stronger predictor of later social and communicative outcomes than the severity of autism symptomatology (Sutera et al., Reference Sutera, Pandey, Esser, Rosenthal, Wilson, Barton, Green, Hodgson, Robins, Dumont-Mathieu and Fein2007); see also (Eigsti, Reference Eigsti2024). This evidence supports the importance of motor skills in autism for later outcomes. While motor coordination is not currently part of the diagnostic criteria for ASD, it may be central to the phenotype. Multiple studies report significant differences and deficits in autistic samples compared to non-autistic individuals (Cattaneo et al., Reference Cattaneo, Fabbri-Destro, Boria, Pieraccini, Monti, Cossu and Rizzolatti2007; Dziuk et al., Reference Dziuk, Gidley Larson, Apostu, Mahone, Denckla and Mostofsky2007; Fabbri-Destro et al., Reference Fabbri-Destro, Cattaneo, Boria and Rizzolatti2009; Hughes, Reference Hughes1996; Jansiewicz et al., Reference Jansiewicz, Goldberg, Newschaffer, Denckla, Landa and Mostofsky2006; Macneil & Mostofsky, Reference Macneil and Mostofsky2012; Scharoun & Bryden, Reference Scharoun and Bryden2016). There are inconsistences; for example, some studies report age-appropriate motor adaptation to changes in the environment (Gidley Larson et al., Reference Gidley Larson, Bastian, Donchin, Shadmehr and Mostofsky2008) and no impairment in simple (one-step) action imitation tasks (Hamilton et al., Reference Hamilton, Brindley and Frith2007; van Swieten et al., Reference van Swieten, van Bergen, Williams, Wilson, Plumb, Kent and Mon-Williams2010). However, given the prevalence of motor delays and impairments in ASD (Bhat et al., Reference Bhat, Landa and Galloway2011), and the relevance of such difficulties to diagnostic classification (Harrison et al., Reference Harrison, Kats, Kilroy, Butera, Jayashankar, Keles and Aziz-Zadeh2021), some have proposed that motor deficits be incorporated into diagnostic criteria (Bhat, Reference Bhat2021). As such, speech-gesture asynchrony in autism could be primarily due to impairments in motor skills rather than impairments in language production.
1.5. Rhythmic speech in autism
Speech-gesture coordination requires precise timing of speech production. The production of rhythmic speech in autism has not been well studied, but research to date suggests that phonological awareness (central to speech and reading) was correlated with musical beat perception in autistic children (Rimmer et al., Reference Rimmer, Dahary and Quintin2023). A larger study of 78 autistic and 84 non-autistic adults found intact perception of rhythmic information, but a reduction in rhythmic entrainment in the autism group (Cannon et al., Reference Cannon, Eldracher, Cardinaux, Irfan, Bungert, Li, O’Brien, Treves, Diamond and Sinha2023). Studies of rhythmic production are limited, but suggest autistic impairments in rhythmic prosodic production and articulatory timing (Lau et al., Reference Lau, Losh and Speights2023).
In summary, research to date suggests that gesture development and gesture production in autism is atypical. Autistic people have significant asynchrony of speech-gesture coordination, and this asynchrony has a significant negative impact on listener perception of meaning (de Marchena & Eigsti, Reference de Marchena and Eigsti2010). While many have interpreted this as related to broader impairments in language and communication, the literature reviewed above also suggests autism-specific difficulties with subtle motor dynamics as well as broader deficits in motor coordination of both hand movements and the motor aspects of speech planning.
1.6. The current study
Motor coordination is critical for producing gestures that are well synchronized with speech. It is an open question whether speech-gesture asynchrony in autism is primarily a motor control issue or arises due to broader communication difficulties such as discourse planning (of course, these possibilities are not mutually exclusive and both may influence performance simultaneously). To address this question, an exploratory study probed for differences in the deliberate coordination of speech with beat-like motor movements, comparing autistic and non-autistic individuals matched on age, gender, and nonverbal cognitive abilities. The approach used sensorimotor synchronization methods (Repp, Reference Repp2005), and results were analyzed using an approach developed in multimodal studies of gesture-speech synchrony (Pouw et al., Reference Pouw, Trujillo and Dixon2020). The study focused on coordination in verbally fluent adolescents ages 12–17. This age range was chosen because the process of language acquisition is largely complete at this point. However, adolescents undergo rapid changes in body size and shape, including changes in limb length that are relevant to gesture production and continued improvements in motor coordination (Kemper et al., Reference Kemper, van der Sluis, Brink, Visscher, Frencken and Elferink-Gemser2015), as well as extensive changes in neural structure and organization (Fuhrmann et al., Reference Fuhrmann, Knoll and Blakemore2015). The transition from adolescence into young adulthood may be marked by meaningful changes in speech-gesture coordination; thus, this study included a non-autistic adult sample to provide a mature stable baseline for comparison. This study utilized a deliberate, motoric task with reduced utterance planning demands, designed as an “analogue” to speech-gesture coordination during spontaneous speech. Asynchrony in this task was expected to reflect motor impairments, whereas intact synchrony would be consistent with the hypothesis that speech-gesture coordination deficits in autism instead reflect impairments in discourse planning or other linguistic processes.
2. Methods
2.1. Participants
This study included participants from three groups; see Table 1. Adolescents with (n = 9) and without (n = 10) autism diagnoses provided the primary comparison of interest. Adolescents were recruited through clinical contacts, local schools, resource fairs, study flyers, and by word of mouth. Inclusion criteria for the autism group were a parent-reported diagnosis of ASD, which was validated by clinical evaluation in the study (see below). Exclusion criteria were major psychopathology that would preclude participation (schizophrenia or bipolar disorder), ascertained via parent report, and uncorrected hearing or vision impairments; in addition, non-autistic participants could not have a first-degree autistic relative or parent-reported history of significant developmental delay. In addition to the gesture task, participants completed a short battery of measures to capture cognitive and language functioning and executive processes, described below. All participants had nonverbal cognitive abilities in the average or high average range; see Table 1 for details. A convenience sample of non-autistic college students ages 18–22 years (n = 11) provided an adult performance baseline; these participants completed only the gesture-speech synchrony task. All participants spoke English as a primary or first language, though some participants were bilingual speakers; these data were not systematically recorded and thus were not considered in further analyses. Race and ethnicity data were not recorded and are thus not reported.
Notes: Data are presented as M(SD); range. Standard scores (SS) = 100(15). Full-scale IQ = Wechsler Adult Scales of Intelligence (Wechsler, Reference Wechsler2011) SS. CELF Core Lang. = Clinical Evaluation of Language Fundamentals, 4th Edition SS (CELF-4; Semel et al., Reference Semel, Wiig and Secord2003). Working memory = Wechsler Intelligence Scales for Children – 4 (Wechsler, Reference Wechsler2003) Letter-Number Sequencing SS. Planning = D-KEFS (Delis et al., Reference Delis, Kaplan and Kramer2001) Tower of London subscale scores M(SD) = 10(3). ADOS = Autism Diagnostic Observation Schedule (Module 3; Lord et al., Reference Lord, Rutter, DiLavore, Risi, Gotham and Bishop2012); higher scores indicate greater symptomatology, and scores >4 are in the autism spectrum range. SCQ = Social Communication Questionnaire (Rutter et al., Reference Rutter, Bailey and Lord2003); higher scores indicate greater symptomatology, and scores >15 are in the autism spectrum range. Adult participants (n = 11) did not complete clinical assessments and are not included in this table.
3. Procedures
All study procedures were approved by the [redacted for review] Institutional Review Board. Participants were seen in a quiet room at their own home or in the lab, according to their preference. They recited six nursery rhymes in English while using their writing hand, “as if you are hammering a hammer;” see Figure 1 for a schematic. They completed six trials in total; the six rhyming verses, which had three to six words per line, were Jack Sprat Could Eat No Fat; Mary, Mary, Quite Contrary; Peter, Peter, Pumpkin Eater; Hickory, Dickory, Dock; Sing a Song of Sixpence; and There Was a Crooked Man. The nursery rhymes were printed on a computer screen; participants were invited to read and review each rhyme until they felt comfortable with it. To decrease working memory demands, the text remained onscreen for the entire trial; most participants referred to the screen during the trial. They received no feedback on their performance; indeed, none was needed, as all participants were able to comply with the task without training and without further questions about the procedure. No trials were repeated. Most of the rhymes were at least somewhat familiar to most (but not all) participants. Participants were asked to stand while reciting the rhyme to avoid interference from table tops or chair arms. Performance was recorded on digital video (25 frames per second), and audio was recorded with an external Shure PG42USB Cardioid condenser microphone. The motion peaks in the gestures and the acoustic peaks in the speech signal were identified and probed for coordination, as detailed below.
In addition to the nursery rhyme task, the adolescents completed standardized assessments to evaluate the role of cognitive and diagnostic contributors to coordination. These assessments were the Wechsler Abbreviated Intelligence Scales (Wechsler, Reference Wechsler2011), which is normed for ages six to 90 years, as a measure of Full-scale IQ; the Clinical Evaluation of Language Fundamentals, 4th Edition (CELF-4; Semel et al., Reference Semel, Wiig and Secord2003) Core Language composite score, as a measure of structural language abilities; the Letter-Number Sequencing subtest of the Wechsler Intelligence Scales for Children – 4 (Wechsler, Reference Wechsler2003), as a measure of working memory; and the Tower of London subtest of the Delis-Kaplan Executive Functioning scales (Delis et al., Reference Delis, Kaplan and Kramer2001) as a measure of executive planning abilities. Autism diagnoses were confirmed in the autism group via administration of the Autism Diagnostic Observation Schedule, Module 3 (Lord et al., Reference Lord, Rutter, DiLavore and Risi2002) by trained graduate clinicians and the Social Communication Questionnaire (Rutter et al., Reference Rutter, Bailey and Lord2003) parent report form, supplemented by clinical observation. Diagnostic evaluations were supervised by a licensed clinical psychologist (the first author) with extensive experience in autism evaluation. Non-autistic status was confirmed in the non-autistic comparison group using the Social Communication Questionnaire (Rutter et al., Reference Rutter, Bailey and Lord2003). Adult participants (n = 11) completed a self-report questionnaire verifying an absence of developmental delays or concerns; they did not complete clinical assessments, which would have been age-inappropriate in most cases.
4. Speech-movement coordination
A total of 3486 data points (beat gestures or speech peaks) were recorded. To minimize effects of outliers, we excluded data points that were equal to or greater than 3 SD from the mean (9.3% of the data). In some samples (8.4% of the data), no speech peaks were detected between two gesture peaks above the thresholded peak finding function (see below); these data were also excluded. The Amplitude Envelope was highly reliable, providing a viable method for tracking acoustic peaks in speech. The remaining 2980 datapoints were included in all analyses.
4.1. Estimating movement peaks
Estimation of movement timing was partially automated. We calculated the instantaneous pixel change in video frames using a Python script provided by Brookshire and colleagues (Brookshire et al., Reference Brookshire, Lu, Nusbaum, Goldin-Meadow and Casasanto2017); from these values, we applied a first-order 10 Hz Butterworth filter for smoothing and then identified the derivatives of instantaneous pixel change (pixel change acceleration). Via visual inspection of the video data in ELAN (Lausberg & Sloetjes, Reference Lausberg and Sloetjes2009), we manually identified the time window encompassing the middle of each movement’s extension phase to the middle of the subsequent flexion phase; thus, the time window included the point of maximum extension of the arm. We also checked for a peak in deceleration during that window, as registered by the pixel change acceleration time series viewable alongside the video data in ELAN (Crasborn et al., Reference Crasborn, Sloetjes, Auer and Wittenburg2006). The exact timing of the peak in deceleration was extracted using a custom-written R function using (R base), which, for each time window, assessed the moment of the maximum pixel change deceleration.
4.2. Identification of acoustic peaks in speech: Amplitude envelopes
Pilot analyses used the Fundamental Frequency estimation tool in PRAAT; however, with occasionally unreliable recording quality (e.g., when participants moved from the microphone or there were external noises), voicing was observed, but F0 could not be tracked. Thus, peaks in the speech signal were identified via gross changes in the raw audio waveform (e.g., the Amplitude Envelope and its derivative, Amplitude Envelope Change) using a PRAAT script (He & Dellwo, Reference He and Dellwo2016, Reference He, Dellwo, Trouvain, Steiner and Möbius2017). We employed a Hilbert Transform to track gross amplitude changes in the speech signal, while ignoring finer structural changes; see Figure 2. The Amplitude Envelope is a scaled time series with values expressed in Hilbert units ranging from 0 to 1 (minimum to maximum observed amplitude). The Amplitude Envelope corresponds closely with the rhythmic structure of speech and with the oral kinematics of speech (Chandrasekaran et al., Reference Chandrasekaran, Trubanova, Stillittano, Caplier and Ghazanfar2009; Tilsen & Arvaniti, Reference Tilsen and Arvaniti2013). In the current dataset, peaks in the amplitude envelope corresponded closely with each syllable of the rhyme, as shown in Figure 2; while this correspondence was generally true, where data were checked by hand, the relationship is not necessarily one-to-one (some syllables may have been omitted, and some peaks may not map onto a unique syllable). The peaks in the acceleration time series reflect downward and upward movements, each of which produced an upward positive acceleration and a deceleration (max flexion), and a downward positive acceleration and a deceleration (the downbeat). The current analysis included only the deceleration peaks produced during downbeats (which were initially hand-annotated).
To compute a temporal marker comparable to the gestural peaks, we calculated Amplitude Envelope Change as the time-derivative of the amplitude envelope, which represents changes in the speech amplitude envelope. Sudden positive changes (SD > 33%) of the amplitude change, chosen by algorithm identification of peaks above the 33% threshold in the acoustic Amplitude Envelope Change time series, served as markers of speech segment onset Peaks, excluding smaller signal fluctuations. This threshold, though arbitrary, seemed to associate most closely with the syllable boundaries that it was aimed to detect, based on visual inspection of several audio samples.
4.3. Computing gesture-speech coordination and rhythmicity
Calculations of the temporal synchrony of motion peaks in the gesture signal relative to acoustic peaks in the speech signal utilized well-studied measures from the sensorimotor synchronization literature (Repp, Reference Repp2005). Figure 2 provides a graphical overview of these measures, which evaluated asynchrony and speed and variation in rhythmicity.
4.4. Asynchrony
Two measures of asynchrony were computed. For each movement peak deceleration, we identified the closest Amplitude Envelope Change peak. The difference in msec between these peaks served as the estimation of temporal asynchrony of speech and movement. In addition, we assessed the stability (consistency) of speech-movement coordination by calculating the variability across each participant’s speech-movement peak asynchrony. Specifically, we calculated the standard deviation of peak asynchrony for each participant, and each trial; smaller SD values indicate greater stability of speech-movement coordination.
4.5. Inter-beat interval (IBI) and inter-speech interval (ISI)
In addition to the coordination of individual gesture-speech events, the sensorimotor synchronization literature highlights the importance of the rhythmic qualities of speech-movement coordination (Michon et al., Reference Michon, Zamorano-Abramson and Aboitiz2022; Repp, Reference Repp2005). We measured the temporal distance between, or speed of, beat (IBI) or speech (ISI) segments, to calculate inter-beat interval (IBI) and inter-speech interval (ISI) measures. The standard deviation of these measures provided an estimate of the rigidity or rhythmicity of these intervals. For example, a lower standard deviation indicates that events were produced at more regular intervals.
5. Results
5.1. Speech-movement asynchrony
Coordination was assessed using nlme mixed regression models (random intercept for participants) in R (Pinheiro et al., Reference Pinheiro, Bates, DebRoy, Sarkar, Heisterkamp, Van Willigen and Maintainer2017; R Core Team, 2012). Basic models predicting overall means were contrasted with models including group (autism vs. neurotypical vs. adult) as a predictor; see Table 2 for descriptive statistics, and Figure 3 for a graphical overview. Results, including significant effects of the group, were further investigated with Bonferroni corrected post hoc t-tests implemented with the lsmeans package in R (Lenth & Lenth, Reference Lenth and Lenth2018). Nursery rhyme was not included as a random factor due to data loss (see open science statement).
Note: Group results are presented as M(SD), 95%CI [lower, upper]. All data are presented in ms. Mixed regression model results represent a change in χ2 with group as a predictor. Post-hoc comparisons are reported with Bonferroni corrections (significant at p < 0.025).
Participants across groups displayed negative mean asynchrony, shown in Figure 3; the apices (peaks) in hand movements reliably preceded speech peaks (i.e., accented syllables). There were no reliable group differences for mean gesture-speech asynchrony; see Table 2, Figure 3. Flatter distributions indicate more variability around the mean. Groups differed in the variability of asynchrony (SD); speech and movements were less tightly coordinated in groups with higher variability (larger SD), shown as flatter distributions in Figure 3. Post-hoc comparisons indicated that only the adult/non-autistic adolescent difference was significant; the adult group had more consistent speech-movement coordination (an estimated difference of 26 ms, SE = 8.76, p = 0.02). The autistic adolescent group’s asynchrony did not differ from the adults (an estimated difference of 17 ms, SE = 8.89, p = 0.18). Thus, there was more reliable gesture-speech coordination for adults as compared to neurotypical adolescents. This set of results suggests that the method and analysis were sufficiently sensitive to detect individual and group differences in coordination, even in this small sample, given the finding of a developmental change between the adolescent and adult neurotypical participants, with the adults showing tighter synchrony. Results indicate that the autistic group displayed speech-movement coordination equivalent to their age-matched non-autistic peers.
5.2. Results of ISI and IBI analyses
The analysis of inter-syllable-intervals (ISI) yielded a significant effect of group; see Table 2. This indicates that groups differed in rhythmicity, or tempo, of speech peak production. Post hoc comparisons indicated that the adults had a significantly shorter ISI than the non-autistic adolescents, indicating a faster tempo (estimated difference = 192 ms, SE = 67.67, p = 0.025). The autistic adolescents also displayed a slower tempo relative to adults, though the difference was not significant (estimated difference = 142 ms, SE = 68.01, p = 0.138). There were no reliable differences between autistic and non-autistic groups in ISI. Similarly, ISI variability (SD) showed a significant effect of group, indicating greater rhythmicity of speech peaks in adults relative to non-autistic adolescents (estimate difference = 26 ms, SE = 8.76, p = 0.017). Autistic adolescents had greater mean ISI SD compared to adults, but this difference was not significant (estimated difference = 18 ms, SE = 8.89, p = 0.178). There were no reliable differences between autistic and non-autistic groups on this measure. Thus, the adults maintained a much faster speech tempo, and a more rhythmic pace, as compared to non-autistic adolescents, and autistic adolescents’ performance was between adults and non-autistic adolescents. There were no statistically significant effects of group in gesture intervals (IBI) or in IBI SD (rhythmicity). Exploratory correlational analyses examined the associations between working memory (digit span) and planning (Tower) abilities, on the one hand, and the synchrony and rhythmicity variables shown in Table 2. While results are necessarily unreliable given the sample size, they suggested no meaningful correlations (all p’s > 0.10).
These results suggest differences in speech-movement coordination for neurotypical adults as compared to younger adolescents, indicating developmental effects. However, there were no differences in coordination between autistic and non-autistic peers.
6. Discussion
Gesture-speech synchrony is temporally precise, with a strong coupling of gesture and speech prosody, even in people with severe language production impairments (Jenkins & Pouw, Reference Jenkins and Pouw2023). A decrease in synchrony between spontaneous speech and iconic gestures has been documented in autism, with a negative impact on listener comprehension. That is, after watching a video of an individual telling a story, naïve ratings of how easy to follow and clear the story was are strongly and significantly correlated with measures of speech-gesture synchrony (de Marchena & Eigsti, Reference de Marchena and Eigsti2010). In the context of significant impairments in motor abilities in autism (Bhat, Reference Bhat2021), it is reasonable to propose that gesture-speech coordination differences are a function of impaired motor control; after all, both speech and gesture are motor actions and require ensemble coordination (Kelso et al., Reference Kelso, Southard and Goodman1979). The current study was designed to test for difficulties in a task requiring the deliberate coordination of conscious, gesture-like hand movements and speech. An absence of group differences on such a task would suggest that speech-gesture asynchrony reflects language production challenges rather than motor difficulties (Kelly et al., Reference Kelly, Ozyurek and Maris2010). In addition, we know little about developmental changes in gesture-speech coordination over the course of adolescence; we included adults in order to establish the mature endpoint of this developmental process.
To address these issues, the current exploratory study investigated the deliberate synchronization of speech with beat-like movements in a small group of autistic and non-autistic adolescents and non-autistic adults. Participants were asked to move their dominant hand in a biphasic beat-like pattern, while reciting rhythmic utterances (nursery rhymes); the task was analogous to spontaneous co-speech gesture production, but with reduced utterance planning demands (because the utterances were read from a text prompt). Partially automated video and audio analysis tools were used to capture the primary peaks in both signals (that is, the endpoint of the beat-like hand movements, and the most prosodically prominent syllables in speech), and to measure the lag between these peaks. Analyses were thus designed to capture the coordination of two distinct motor acts: speech and gesture production.
Analyses of this conscious coordination task revealed that beat gestures were closely synchronized with speech across groups, and that hand movements reliably anticipated speech peaks. This negative mean asynchrony finding is consistent with results of other simple sensorimotor synchronization tasks such as finger tapping to a beep (Repp, Reference Repp2005). Another interpretation is that, while the body movement precedes the speech peak, characteristics of the speech peak correlate with earlier-occurring characteristics of the movement, suggesting that the two are tightly bound together as a single communicative act; this was the conclusion suggested by a study of pitch accents and brow raises during spontaneous speech (Gast, Reference Gast2023). This possibility calls for additional research.
Comparisons of autistic and non-autistic adolescents revealed no group differences in synchrony, tempo, or rhythmicity. This contrasts with results of prior studies reporting striking asynchrony of speech with spontaneously produced gestures with the autistic group showing significantly greater asynchrony (de Marchena & Eigsti, Reference de Marchena and Eigsti2010; Morett et al., Reference Morett, O’Hearn, Luna and Ghuman2016). The difference between previous studies and the current one centers on language production; prior studies examined spontaneous narratives, whereas in the current study, participants read aloud a prepared text. Furthermore, the current study is not about semantic synchrony, but about kinematic-acoustic coupling, while the previous studies have looked at the synchrony between the lexical affiliate and the stroke (de Marchena & Eigsti, Reference de Marchena and Eigsti2010; Morett et al., Reference Morett, O’Hearn, Luna and Ghuman2016). While preliminary, the current result is consistent with the hypothesis that linguistic factors may be more important contributors to asynchrony in autism relative to motor coordination. That is, presented with highly rhythmic phrases to read aloud, while performing a simple biphasic repeated hand motion, the autistic participants showed no difference in coordination.
Cognitive resource availability constrains language processing. For example, a study using a dual-task method with either low or high working memory demands reported differences in the perception of congruent or incongruent speech-gesture combinations (Momsen et al., Reference Momsen, Gordon, Wu and Coulson2020). The experimenter-defined (instructed) tasks in the current study presented minimal working memory and planning demands. If cognitive resource availability is the critical bottleneck, one would predict that asynchrony might vary as a function of morphosyntactic complexity and word frequency. This leads to a prediction that individuals with reduced cognitive resources (e.g., working memory) would show greater asynchrony and could help explain the finding of speech-gesture asynchrony in autism, which is generally characterized by more limited verbal and visuospatial working memory (Ellis Weismer et al., Reference Ellis Weismer, Kaushanskaya, Larson, Mathée and Bolt2018; Schuh & Eigsti, Reference Schuh and Eigsti2012; St John et al., Reference St John, Woods, Bode, Ritter and Estes2022).
The results here are consistent with findings of spared rhythmic processing by autistic children in identifying beeps that were well aligned with the metric or rhythmic structure of music (Dahary et al., Reference Dahary, Rimmer and Quintin2023) and broader findings of intact melodic and rhythmic perception in autistic children (Jamey et al., Reference Jamey, Foster, Sharda, Tuerk, Nadig and Hyde2019). Together, these results, along with the current study, suggest that the highly rhythmic structure of nursery rhymes may have given the autistic participants an advantage relative to spontaneous speech production. This suggestion awaits further study.
The current results indicate intact, or even developmentally advanced, synchrony for “prosodic,” non-representational, beat-like movements; in contrast, synchrony of representational (iconic) gestures seems to be disrupted in autism (De Marchena & Eigsti, Reference de Marchena and Eigsti2010). Relevant to our understanding of the interface between spoken prosody and gesture production, the current study suggests the hypothesis that processing more complex conceptual semantic content adds complexity to the coordination of gesture and speech.
A second finding in this study was the presence of differences between non-autistic adolescents and adults, indicating developmental increases in the precision of speech-movement coordination. Coordination in the autistic adolescents was intermediate between the other two groups, with no significant differences; the study is underpowered to detect subtle group differences. As one might expect, adults displayed more consistent timing between speech and movement (higher speech-movement coupling), a faster speech tempo, and greater rhythmicity in syllabic intervals compared to non-autistic adolescents. On discrete timing indices, the autistic adolescent and adult groups did not differ, suggesting intact or even developmentally advanced explicit bimodal timing of self-produced actions. We did not evaluate puberty stages in this study; it is possible that participants in the autism group were physically advanced relative to their age-matched peers, as has been reported for some neurodevelopmental conditions (Siddiqi et al., Reference Siddiqi, Van Dyke, Donohoue and McBrien1999). The persistence of developmental effects in coordination in adolescence is somewhat surprising, given the early emergence of coordination of rhythmic arm movements and infant vocalizations (e.g., Ejiri & Masataka, Reference Ejiri and Masataka2001). Indeed, rudimentary vocal-motor synchronization is established in infancy (Ejiri & Masataka, Reference Ejiri and Masataka2001; Pouw & Fuchs, Reference Pouw and Fuchs2022). The finding that coordination continues to develop and improve during adolescence may highlight the additional cognitive constraints involved in coordinating motor action and utterance production.
6.1. Limitations
These results are limited in several dimensions. First, the labor-intensive nature of analyses limited inclusion to a small and relatively homogeneous sample of adolescent participants, spanning a relatively wide age range of 12-17 years; adult autistic participants were not included. Given the limited data at distinct ages, the results do not probe trajectories of development, and the results should be taken as exploratory; conclusive results await replication within a larger sample. Second, the autistic participants were not representative of the broader autism population, given their age-appropriate language and cognitive abilities; results might look very different with a more cognitively delayed sample. Third, the current sample was homogenous with respect to race and ethnicity, and composed of a larger number of males. Finally, executive processes might be engaged in an unexpected way by the artificial task demands; a direct assessment of this possibility awaits further study.
The results of this preliminary study suggest that speech and movement coordination in a deliberate task is intact in autistic individuals, in contrast to findings of reduced coordination of spontaneous, self-generated representational gestures with speech during discourse. This finding is consistent with other reports that responses look more typical in autism when participants receive explicit guidance to perform a behavioral task (Eigsti, Reference Eigsti2013). Results also suggest the importance of rhythmic action in the context of speech production, and point to the long arc of development in this domain, with differences between the adult state apparent even in neurotypical adolescents ages 12–17 years.
Acknowledgements
The authors thank He and colleagues for sharing their Praat code.
Author contributions
IME conceived the study and led the writing of the introduction, method (design and experiment), and discussion of the manuscript, with critical revisions by WP. WP conceived and performed the analyses and led the writing of the results and method (analyses), with critical revisions by IME.
Data availability statement
The processed datasets (i.e., timing data) are available on Open Science Framework (link). The raw audio-visual data is stored by the first author, but contain sensitive data that cannot be openly shared. Since the project was stalled for several years, during which the second author changed institutions, the project files that contained the processing scripts, statistical analysis scripts, raw time series data, and a copy of the audiovisual recordings were deleted. However, the analyses can be reproduced based on descriptions in the paper, and the authors are happy to share relevant information upon inquiry.
Funding statement
This research was supported by NIMH-1R01MH112687-01A1 to Eigsti and Fein (Co-PIs) and by NIDCD T32DC017703 to Eigsti and Myers (Co-PIs).
Competing interest
The authors declare that they have no conflict of interest.