Neural correlates of lexical-tone and vowel-quality processing in 6- and 9-month-old German-learning infants and adults

Antonia Götz; Claudia Männel; Gudrun Schwarzer; Anna Krasotkina; Barbara Höhle

doi:10.1017/S030500092400014X

Neural correlates of lexical-tone and vowel-quality processing in 6- and 9-month-old German-learning infants and adults

Published online by Cambridge University Press: 29 April 2024

Anna Krasotkina and

Antonia Götz*: Affiliation:
The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, New South Wales, Australia Linguistics Department, University of Potsdam, Potsdam, Germany
Claudia Männel: Affiliation:
Department of Audiology and Phoniatrics, Charité–Universitätsmedizin Berlin, Berlin, Germany Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
Gudrun Schwarzer: Affiliation:
Department of Developmental Psychology Justus Liebig University Giessen, Giessen, Germany
Anna Krasotkina: Affiliation:
McMaster University, Hamilton, Ontario, Canada
Barbara Höhle: Affiliation:
Linguistics Department, University of Potsdam, Potsdam, Germany
*: Corresponding author: Antonia Götz; Email: [email protected]

Article contents

Abstract
Introduction
Experiment 1: Neural correlates of lexical-tone and vowel-quality processing in German-speaking Adults
Experiment 2: Neural correlates of lexical-tone and vowel-quality processing in German-learning 6- and 9-month-old Infants
Results
General discussion
Limitations of our study
Conclusion
Data availability statement
Funding
References

Rights & Permissions

Abstract

We examined the neurophysiological underpinnings of lexical-tone and vowel-quality perception in learners of a non-tonal language. We tested 25 6- and 25 9-month-old German-learning infants, as well as 24 German adults and expected developmental differences for the two linguistic properties, as they are both carried by vowels, but have a different status in German. In adults, both lexical-tone and vowel-quality contrasts elicited mismatch negativities, with a stronger response to the vowel-quality contrast. Six-month-olds showed positive mismatch responses for lexical-tone and vowel-quality contrasts, with an emerging negative mismatch response for vowel-quality only. The negative mismatch responses became more pronounced for the vowel-quality contrast at 9 months, while the lexical-tone contrast elicited mainly positive mismatch responses. Our data reveal differential developmental changes in the processing of vowel properties that differ in their lexical relevance in the ambient language.

Keywords

perceptual reorganisation non-native lexical tone perception vowel perception MMN infants

Type: Article
Information: Journal of Child Language , Volume 52 , Issue 3 , May 2025 , pp. 592 - 614

DOI: https://doi.org/10.1017/S030500092400014X [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2024. Published by Cambridge University Press

Introduction

Infants’ first year of life is characterised by extensive developmental changes, one major change being that infants’ perceptual system attunes to the sound properties of their native language (for a review, see Werker & Gervain, Reference Werker and Gervain2013). Consequently, infants’ sensitivity in their discrimination of phonologically and lexically relevant sound contrasts increases, while their discrimination of sound contrasts not relevant for the native linguistic system often decreases (e.g., Kuhl et al., Reference Kuhl, Stevens, Hayashi, Deguchi, Kiritani and Iverson2006). This change from universal to language-specific speech perception is referred to as perceptual reorganisation, which has been reported for consonants (e.g., Rivera-Gaxiola et al., Reference Rivera-Gaxiola, Silva-Pereyra and Kuhl2005; Werker & Tees, Reference Werker and Tees1984), vowels (e.g., Polka & Bohn, Reference Polka and Bohn2011; Tsuji & Cristia, Reference Tsuji and Cristia2014), word stress (e.g., Bijeljac-Babic et al., Reference Bijeljac-Babic, Serres, Höhle and Nazzi2012; Höhle et al., Reference Höhle, Bijeljac-Babic, Herold, Weissenborn and Nazzi2009), and for lexical tones (e.g., Götz et al., Reference Götz, Yeung, Krasotkina, Schwarzer and Höhle2018; Liu & Kager, Reference Liu and Kager2014; Mattock et al., Reference Mattock, Molnar, Polka and Burnham2008; Yeung et al., Reference Yeung, Chen and Werker2013). However, developmental changes in infants’ perception are not only related to the phonological system of infants’ native language but also acoustic properties, such that for perceptually salient sound contrasts discriminability can be maintained throughout development (e.g., Chládková & Paillereau, Reference Chládková and Paillereau2020; Narayan, Reference Narayan2019, Reference Narayan2020). The interplay of lexically relevant and less relevant properties in speech-sound perception is, however, not yet fully understood. Our study aims to provide insights into this interplay by investigating the neural underpinnings of developmental changes in the perception of vowel quality and lexical tone in German-learning 6- and 9-month-olds and comparing infant processing with adult native German speakers. In our study, vowel quality refers to changes in vowel height. It is a sound property that is lexically relevant in German, while lexical tone is not – yet the same speech segment carries the acoustic properties that determine vowel quality and lexical tone. Although vowel length also holds lexical relevance in German, it potentially interacts with lexical tone contrasts and was thus not chosen as a contrast of interest. For example, syllables with a higher fundamental frequency (f0) may be perceived as longer than those with a lower f0 (e.g., Yu, Reference Yu, Fougeron, Kühnert, D’Imperio and Vallée2010). We chose the broader term “vowel quality” because we believe our results would be generalisable across various aspects of vowel quality, including, but not limited to, vowel height, tongue position (front, central, or back), or lip rounding. This provides an opportunity to investigate developmental changes in the neural responses to acoustic changes within the same speech segment that are either relevant or irrelevant in the linguistic system of a given language.

Behavioural studies on the perceptual reorganisation of vowel quality and lexical tones

Previous research has examined infants’ perception of vowels across various languages. A meta-analysis by Tsuji and Cristia (Reference Tsuji and Cristia2014) on 22 studies showed that between 6 and 10 months of age, the effect sizes for native and non-native vowel discrimination begin to diverge with an increasing discrimination performance for native vowels. However, no decline in non-native vowel discrimination was found in the meta-analysis nor in two additional studies not covered in the analysis (de Klerk et al., Reference de Klerk, de Bree, Kerkhoff and Wijnen2019; Mazuka et al., Reference Mazuka, Hasegawa and Tsuji2014). Thus, perceptual attunement for vowels seems to be characterised by enhanced perceptual sensitivity to native vowel differences, with no clear evidence of a decline in non-native vowel discrimination.

Tonal languages, such as Mandarin, use pitch variations called lexical tones, mainly carried by the vocalic segments, to differentiate word meaning. Research on infants’ perception of lexical tones presents mixed findings. For instance, Mandarin-learning infants demonstrated improved discrimination of a native, acoustically salient tone contrast from 6 to 13 months, while no changes were observed in their discrimination abilities for less acoustically salient tone contrasts (e.g., Shi et al., Reference Shi, Santos, Gao and Li2017). Mixed findings have also been reported for studies with infants learning a non-tone language: some studies reported a decline in discrimination abilities at 9 months (Götz et al., Reference Götz, Yeung, Krasotkina, Schwarzer and Höhle2018; Mattock et al., Reference Mattock, Molnar, Polka and Burnham2008; Yeung et al., Reference Yeung, Chen and Werker2013), while others found an increase in perceptual sensitivity between 4 and 12 months (Chen & Kager, Reference Chen and Kager2016; Chen et al., Reference Chen, Stevens and Kager2017). Other studies found no evidence of a change in perceptual sensitivity for lexical tones (Ramachers et al., Reference Ramachers, Brouwer and Fikkert2018; Shi et al., Reference Shi, Santos, Gao and Li2017). A few studies have tested infants beyond the first year of life and found a U-shaped development with a decline in tone discrimination between 6 and 9 months of age and a regain in discrimination during the second year of life (Götz et al., Reference Götz, Yeung, Krasotkina, Schwarzer and Höhle2018; Liu & Kager, Reference Liu and Kager2014). The discrepancies among these studies might stem from differences in the contrasts tested, infants’ native languages, and experimental methods (see Götz et al., Reference Götz, Yeung, Krasotkina, Schwarzer and Höhle2018, for a discussion).

The current study has two aims. First, to compare the neural underpinnings of the perception of speech contrasts that are either lexically irrelevant (lexical tones) or lexically relevant (vowel quality) in the infants’ native language. Second, to contribute to the understanding of the heterogeneous picture of tone perception in non-tonal language-learning infants. We examine whether the decrease in behavioural discrimination of a lexically irrelevant lexical-tone contrast in German-learning infants aged 6 to 9 months (Götz et al., Reference Götz, Yeung, Krasotkina, Schwarzer and Höhle2018) would be reflected in developmental changes in the neurophysiological responses to identical sounds.

The auditory Mismatch Response

Neurophysiological measures offer the advantage of testing speech perception independently of potential restrictions from behavioural paradigms (e.g., infants’ attention) and may thus be more sensitive to capture infants’ speech discrimination and developmental changes. In adults, neurophysiological measures, such as the mismatch negativity (MMN) of the auditory event-related potentials (ERPs), have been used to assess neural speech discrimination (e.g., Näätänen et al., Reference Näätänen, Paavilainen, Rinne and Alho2007). The MMN reflects differences between ERP responses to rare deviant stimuli and frequent standard stimuli, peaking at around 100-250 ms after acoustic divergence and is most prominent at frontocentral electrode positions. Following the MMN, a Late Discriminative Negativity (LDN) can occur at approximately 300-600 ms post-acoustic divergence. The occurrence of both components suggests a two-stage sequential process. In addition to auditory discriminability, the LDN is suggested to be more associated with complex auditory stimuli reflecting higher cognitive involvement (Čeponienė et al., Reference Čeponienė, Lepistö, Soininen, Aronen, Alku and Näätänen2004; Yu et al., Reference Yu, Shafer and Sussman2018).

In contrast to adult listeners, infants show a Mismatch Response (MMR) with positive (pMMR) or negative (nMMR) polarities. This response is influenced by, for example, the infants’ age (e.g., Leppänen et al., Reference Leppänen, Guttorm, Pihko, Takkinen, Eklund and Lyytinen2004), sex (e.g., Mueller et al., Reference Mueller, Friederici and Männel2012), familial risk for dyslexia (e.g., Thiede et al., Reference Thiede, Virtala, Ala-Kurikka, Partanen, Huotilainen, Mikkola, Leppänen and Kujala2019), type and acoustic distance of tested speech contrasts (Cheng et al., Reference Cheng, Wu, Tzeng, Yang, Zhao and Lee2015; Morr et al., Reference Morr, Shafer, Kreuzer and Kurtzberg2002), and data pre-processing approaches (e.g., high-pass filtering; Weber et al., Reference Weber, Hahne, Friedrich and Friederici2004). Infant MMRs often have a later onset and longer duration compared to adult MMNs (e.g., Friederici et al., Reference Friederici, Friedrich and Weber2002; Garcia-Sierra et al., Reference Garcia-Sierra, Ramírez-Esparza and Kuhl2016; Marklund et al., Reference Marklund, Schwarz and Lacerda2019; Shafer et al., Reference Shafer, Yu and Datta2011). They can occur in early (i.e., 150 to 350 ms) and late time windows (i.e., 350 to 600 ms), with pMMRs and nMMRs in either window, influenced by factors such as the infants’ age (Yu et al., Reference Yu, Tessel, Han, Campanelli, Vidal, Gerometta, Garrido-Nag, Datta and Shafer2019), language experience (Garcia-Sierra et al., Reference Garcia-Sierra, Ramírez-Esparza and Kuhl2016; Marklund et al., Reference Marklund, Schwarz and Lacerda2019), and speech stimuli category (Cheng et al., Reference Cheng, Wu, Tzeng, Yang, Zhao and Lee2015). The functional underpinnings of these temporal differences are still debated, with early effects possibly reflecting acoustic stimulus processing and later effects being related to native language experience (e.g., Garcia-Sierra et al., Reference Garcia-Sierra, Ramírez-Esparza and Kuhl2016). These findings suggest that infants show neural markers of speech discrimination with polarity and latency differences compared to adults, which are related to stimulus characteristics and language experience. This makes the MMR an ideal measure to study developmental changes in the perception of lexical tone and vowel quality as lexically irrelevant and relevant features in German, respectively.

ERP Studies on the perceptual reorganisation of vowel quality and lexical tones

ERP studies with infants have reported both pMMRs and nMMRs to vowel changes. For example, Yu et al. (Reference Yu, Tessel, Han, Campanelli, Vidal, Gerometta, Garrido-Nag, Datta and Shafer2019) found that MMR amplitude/polarity and response time-window to the native vowel contrast /ɛ/ versus /ɪ/ were associated with infant age. In the early window (160-360 ms), all age groups (3- to 47-month-olds) showed a pMMR, which became less positive with increasing age. In the late window (400-600 ms), infants up to 12 months displayed a pMMR, while older infants exhibited an nMMR. Similarly, Marklund et al. (Reference Marklund, Schwarz and Lacerda2019) reported a pMMR for the native vowel contrast /e/ versus /i/ in 4- to 8-month-old Swedish-learning infants in the early time window (150-350 ms), with no MMR in the later one (350-550 ms). In a longitudinal study, Cheng et al. (Reference Cheng, Wu, Tzeng, Yang, Zhao and Lee2015) tested Mandarin-learning infants from birth to 6 months, finding that both acoustically similar (/da/ vs. /du/) and distinct (/da/ vs. /di/) vowel contrasts elicited pMMRs in newborns. At 6 months, infants were showing a pMMR for the similar vowel contrast in the late time window (250-400 ms) and an nMMR for the distinct contrast in the early time window (150-250 ms). This suggests that the polarity and timing of infant MMRs to vowel contrasts are influenced by age, the acoustic distance between vowels, and their status in the native language.

Few studies have investigated the neural processing of lexical tones in infants. Cheng et al. (Reference Cheng, Wu, Tzeng, Yang, Zhao and Lee2013) tested Mandarin-learning infants longitudinally from birth to 6 months with native-tone contrasts with a large or a small acoustic distance. The large tone contrast elicited a pMMR both at birth and at 6 months in the late time window (300-400 ms), with an additional nMMR in an early time window (150-250 ms) at 6 months only. In contrast, the small tone contrast elicited no MMR at birth, but a pMMR at 6 months in the late time window. Again, as for the studies on vowel discrimination, the results on lexical-tone discrimination demonstrate the influence of age and acoustic stimulus properties on the polarity and time window of the MMR.

Even fewer studies investigated how infants learning non-tonal languages process lexical tones on the neural level. Liu et al. (Reference Liu, Peter and Weidemann2019) investigated a Mandarin tone-contrast in English-learning infants aged 5-6 months and 11-12 months. Their ERP results revealed pMMRs between 100-400 ms for 5- to 6-month-olds, but no MMR for the 11- to 12-month-old infants. The absence of an MMR in older infants may indicate an attenuated neural response to the non-native tone contrast – which resembles the behavioural findings (Liu & Kager, Reference Liu and Kager2014). Alternatively, the absence of an MMR in the older age group may have resulted from individual differences in a potential shift from a pMMR to an adult-like MMN, resulting in overall null effects in the ERPs.

In adults, neurophysiological evidence shows that an MMN can, in principle, be evoked by non-native tone contrasts, but is influenced by several factors (Chen et al., Reference Chen, Peter, Wijnen, Schnack and Burnham2018; Kaan et al., Reference Kaan, Barkley, Bao and Wayland2008; Politzer-Ahles et al., Reference Politzer-Ahles, Schluter, Wu and Almeida2016). First, non-native speakers do not show an MMN for all contrasts that evoke an MMN in native speakers (Kaan et al., Reference Kaan, Barkley, Bao and Wayland2008), and if present, the response may differ in latency and amplitude from that of native speakers (Chen et al., Reference Chen, Peter, Wijnen, Schnack and Burnham2018). Second, high acoustic variability in the stimuli and the duration of the interstimulus interval can influence the MMN response to non-native tone contrasts (Politzer-Ahles et al., Reference Politzer-Ahles, Schluter, Wu and Almeida2016; Yu et al., Reference Yu, Shafer and Sussman2017). Third, an MMN to non-native tones may only emerge throughout the experiment, suggesting that non-native speakers may need additional exposure for successful discrimination (Liu et al., Reference Liu, Ong, Tuninetti and Escudero2018).

The current study

So far, no study has simultaneously addressed the question of how infants learning a non-tonal language perceive lexical tones within the first year of life and how this compares to their perception of vowel quality. The present study used ERPs to investigate the neurophysiological underpinnings of developmental changes in lexical-tone and vowel-quality perception in 6- and 9-month-old infants learning German (a non-tonal language). Presenting both lexical-tone and vowel-quality changes within one paradigm allowed us to investigate acoustic properties that are lexically irrelevant (lexical tone) or relevant (vowel quality) in German. Moreover, using identical stimuli as in Götz et al. (Reference Götz, Yeung, Krasotkina, Schwarzer and Höhle2018), we aimed to examine whether neurophysiological measures would reveal similar results as the behavioural measurements that revealed a developmental decline in German-learning infants’ sensitivity to lexical tones.

We used a double-deviant oddball paradigm testing German adults’ and German-learning infants’ processing of the Cantonese mid-level (T33) versus high-rising (T25) tone and the vowel contrast /ɛ/ versus /i/. The stimuli were produced by a Cantonese speaker. The vowels differed in vowel height and corresponded to German vowel categories (see stimuli section). Deviant stimuli either differed from the standard by changing the vowel quality and keeping the tone constant (i.e., /sɛ/ as standard and /si/ as deviant, both with the mid-level tone) or by changing the tone and keeping the vowel quality constant (i.e., /sɛ/ with the mid-level tone as standard and /sɛ/ with the high-rising tone as deviant).

We hypothesize that as infants gain more exposure to sound properties of their native language, there will be corresponding changes in their neural speech discrimination. We propose two transitions for the vowel contrast to occur. Firstly, we suggest transitioning from pMMRs to nMMRs as age increases and exposure to the native language expands. Secondly, we propose a transition from an early to a late MMR as the early effects may predominantly show acoustic stimulus processing, while later effects are proposed to be associated with native language experience (e.g., Garcia-Sierra et al., Reference Garcia-Sierra, Ramírez-Esparza and Kuhl2016). Based on these developmental patterns, we expected infants at 6 months to show comparable MMRs for both types of contrasts. By 9 months, however, we expected differences in their responses to the lexical-tone and vowel-quality contrasts. If extended native-language experience affects the polarity of the discrimination effect, an initial pMMR should shift to an nMMR with increasing age, with more pronounced changes for the vowel-quality than the lexical-tone contrasts. We additionally tested adults to compare the properties of the MMNs in response to the tested lexical-tone and vowel-quality changes in the mature adult brain of native speakers of German.

Experiment 1: Neural correlates of lexical-tone and vowel-quality processing in German-speaking Adults

Methods

Participants

Twenty-four native German-speaking adults (aged 18-31 years, 14 females) were included in the final sample of this study. They reported not having learned any tone or pitch-accented language. All participants were right-handed (Edinburgh handedness inventory, Oldfield, Reference Oldfield1971), had no self-reported hearing deficits, had normal or corrected-to-normal vision, and reported no history of neurological or psychological disorders. All participants received course credits as compensation for participating in the experiment and were recruited from the local student participant pool. Five additional participants were tested but excluded from the final data analysis because they contributed less than 50 artifact-free trials per condition. Each participant provided written informed consent according to the Declaration of Helsinki.

Stimuli

Several exemplars of the syllables /sɛ/ and /si/ were recorded with either the high-rising (T25) or the mid-level (T33) tone by a female native-speaker of Cantonese in a sound-attenuated booth. All recordings were digitalised with a sampling rate of 44.1 kHz. We selected two different tokens for each syllable. The syllable duration was similar across the different tokens (578-586 ms) and the vowels started 97-101 ms after stimulus onset. The results of acoustic analyses are given in Table S1 and the pitch contours of the stimuli are displayed in Figure S1 in the supporting information. All stimuli were normalised in intensity (using the scale intensity function in Praat – Boersma & Weenink, Reference Boersma and Weenink2016) at 60 dB SPL. The syllable /sԑ33/ (i.e., /sɛ/ with a T33 tone contour) was always presented as standard, the syllable /sԑ25/ (i.e., /sɛ/ with a T25 tone contour) as tone deviant and /si33/ (i.e., /si/ with a T33 tone contour) as vowel deviant. To verify the assimilation to native vowel categories, we performed a perceptual test with 15 German-speaking adults who did not participate in the EEG study. The test consisted of a categorisation task and a perceptual goodness rating. In the categorisation task, two tokens of each syllable (/si33/, /si25/, /sԑ33/ and /sԑ25/) were presented three times (24 trials) and participants were asked to assign the vowels to one of the German categories: /iː/, /ɪ/, /ɛː/, /e:/. The vowel categories were displayed on the screen with the grapheme equivalents of <ie>, <i>, <äh>, and <eh>. Following the categorisation task, participants were asked to indicate the goodness of the category fit (1 = Poor and 7 = Perfect). The participants categorised the /ɛ/ vowel to 100% to the <äh> (the German /ɛː/ vowel) and /i/ to 100% to the <i> (the German /i:/ vowel). Both vowels did not statistically differ in their category goodness fit (/ɛ:/ mean = 6.00, SD = 0.79; /i:/ mean = 6.18, SD = 0.82).

Procedure

Participants were seated approximately 1.5 m from a computer screen and listened to the auditory stimuli via earphones (E-A-RTONE 3A Insert Earphones, Aearo Technologies Auditory Systems). During the stimulus presentation (Presentation Software, Version 18.0, Neurobehavioral Systems, Inc., Berkeley, CA, www.neurobs.com), the participants watched a silent movie (Baby Einstein). Stimuli were presented with varying interstimulus intervals (ISI) ranging between 800-900 ms in steps of 50 ms to prevent participants from perceiving regular rhythmic patterns. Overall, the experiment contained 800 stimuli: 640 standards, 80 lexical-tone deviants and 80 vowel-quality deviants, which were distributed to four blocks of 200 stimuli each. Each block started with eight standards. Deviants were presented pseudo-randomly with 3 to 8 standards presented between two deviants. The first eight standards and standards directly following deviants were excluded from further analysis. Five participant datasets were removed from analysis due to less than 50 artifact-free deviants per condition, as predetermined. Adults had an average of 413 (SD = 42.3) artifact-free trials for standards, 71 (SD = 7.6) for vowel deviants and 71 (SD = 7.1) for tone deviants.

ERP Recording and Analysis

The EEG was continuously recorded from 30 cap-mounted active Ag/AgCl electrodes (Brain Products, Gilching, Germany) at a sampling rate of 1000 Hz. Electrodes (F3, F7, F9, F4, F8, F10, FC1, FC5, C3, FC2, FC6, C4, CP1, CP5, P3, P7, CP2, CP6, P4, P8, FCz, Fz, Cz, CPz, Pz, O1, O2) were positioned following the 10-20 system convention. The electrooculogram was recorded from electrodes placed below and above the right eye. Impedances were kept below 25 kΩ. The ground electrode was placed at the FP1 position. The EEG data were analysed using Brain Vision Analyzer (version 2.01; Brain Products, Gilching, Germany). The EEG recording was referenced online to the left mastoid and re-referenced offline to the linked mastoids. The signal was filtered with a 0.5-30 Hz bandpass filter (zero-phase IIR Butterworth filters of order 2, -12, dB/oct roll-off). Data were segmented in epochs of 1000 ms and baseline-corrected 100 ms before stimulus onset. Eye blinks and eye movements in the segments were corrected by an algorithm (Gratton et al., Reference Gratton, Coles and Donchin1983). All other artifacts were detected automatically (exceeding ±100 μV) and excluded from further analysis.

The MMN is expected to occur in the time window of 100-250 ms after the point of acoustic divergence between stimuli (Näätänen et al., Reference Näätänen, Paavilainen, Rinne and Alho2007). For the vowel-quality contrast, this time window was 200-350 ms after syllable onset, and for the lexical-tone contrast 400-550 ms after the syllable onset (due to the earlier point of divergences for the vowel-quality compared to the tone contrast, see Table S1 and Figure S1 in the supporting information). The LDN was analysed in a later time window of 350-600 ms after the point of divergence, which was at 450-700 ms for the vowel-quality contrast and at 650-900 ms for the lexical-tone contrast. While the MMN is typically most prominent at frontocentral regions, our objective was to incorporate a broad range of data points to enhance our comprehension of the developmental processes related to lexical tone and vowel-quality processing in infants. This approach also considers the possibility of observing an inverse negative response at posterior sites (see Peter et al., Reference Peter, Kalashnikova, Santos and Burnham2016). Hence, electrodes were clustered into two regions: frontocentral (F3, F4, F7, F8, FC5, FC6, Fz, C3, C4, Cz) and posterior (O1, O2, P3, P4, P7, P8, Pz, CP5, CP6, CPz).

Data analysis

We calculated the MMNs and LDNs for the lexical-tone and vowel-quality contrast by subtracting the ERP amplitude of the standard from the ERP amplitude of the tone or vowel deviant, respectively. For statistical analysis, we used the average amplitude of the difference wave (deviant minus standard) in the respective time window as the dependent variable. We conducted the analyses with the statistical software R, version 4.0.4 (R Core Team, 2021) and the lme4 package (Bates et al., Reference Bates, Maechler, Bolker and Walker2015). Plots were created using ggplot2 (Wickham, Reference Wickham and Wickham2016) and post-hoc tests were performed using emmeans (Lenth et al., Reference Lenth, Singmann, Love, Buerkner and Herve2018). P-values were calculated with lmerTest (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017), which uses Satterthwaite approximations to degrees of freedom.

We computed separate linear mixed effects models for the MMN and LDN. The models included the effect of Deviant type (lexical tone vs. vowel quality, coded as +0.5 and -0.5) and Region (frontocentral vs. posterior, coded as +0.5 and -0.5) and their interaction. The models included a random intercept by Subject and a random by-subject slope of Deviant type. Region was not included as random slope as this led to singular model fits, showing overfitting. All contrast-codings were performed by using the general inverse (Schad et al., Reference Schad, Vasishth, Hohenstein and Kliegl2020). For the first step, the intercept was set to zero (lmer (amplitude~-1+Deviant type*Region+ (1+Deviant type|Subject) to compare the difference wave against zero and verify an MMN and an LDN. For the second step, the grand-mean data were used as intercept (lmer(amplitude~Deviant type*Region+ (1+Deviant type|Subject). For both MMN and LDN results, we first present the results of the difference wave comparison against zero, and then the model output, with the comparison of lexical-tone and vowel-quality processing.

Results

Figures 1 and 2 depict the grand-average ERPs for standards, lexical-tone contrasts, vowel-quality contrasts, and the corresponding ERP difference wave obtained from frontocentral and posterior electrode regions.

Figure 1. Grand-average ERPs of the lexical-tone contrast for German adults

Note. Grand-average ERPs for German-speaking adults for standards (black line), lexical-tone deviants (blue line), and the corresponding difference waves (deviant–standard, dashed line) at frontocentral and posterior electrode regions.

Figure 2. Grand-average ERPs of the vowel-quality contrast for German adults

Note. Grand-average ERPs for German-speaking adults for standards (black line), vowel-quality deviants (red line), and the corresponding difference waves (deviant–standard, dashed line) at frontocentral and posterior electrode regions.

Mismatch Negativity (MMN)

The results for the MMN time-window revealed that both lexical-tone and vowel-quality changes elicited significant MMNs at frontocentral and posterior regions, as the respective difference between standards and deviants differed significantly from zero (lexical tones: frontocentral β (SE) = -0.852 (0.281), t = -3.030, p = 0.011, posterior β (SE) = -0.648 (0.236), t = -2.748, p = 0.021; vowel quality: frontocentral β (SE) = -1.539 (0.281), t = -5.472, p < 0.001, posterior β (SE) = -1.065 (0.236), t = -4.515, p < 0.001; all p-values are Bonferroni-corrected for multiple comparisons).

The full output of the linear mixed model with the MMN difference-waves of both lexical-tone and vowel-quality contrasts is given in the supporting information S2. The results revealed an effect of Deviant type (β (SE) = 0.276 (0.057), t = 4.874, p <0.001) which indicates that the MMN differs between lexical-tone and vowel-quality changes (independent of electrode region), with a more pronounced MMN amplitude for vowels than tones. In addition, the effect of region shows that the MMN is generally larger at frontocentral than posterior regions (β (SE) = -0.169 (0.057), t = -2.993, p < 0.01), see Figure 3.

Figure 3. MMN amplitude in German-speaking adults

Note. MMN amplitude (difference wave) in German-speaking adults for lexical-tone and vowel-quality changes at frontocentral and posterior electrode regions. Each dot represents one participant. The horizontal lines within boxes show medians, boxes show the interquartile (IQ) range, whiskers the 1.5 * IQ range, dots are potential outliers.

Late Discriminate Negativity (LDN)

The results for the LDN time-window revealed that both lexical-tone and vowel-quality deviants elicited a significant LDN, as the difference between standards and deviants differed significantly from zero (lexical tones: frontocentral β (SE) = -1.522 (0.354), t = -4.301, p < 0.01, posterior β (SE) = -0.687 (0.244), t = -2.811, p = 0.018; vowel quality: frontocentral β (SE) = -1.833 (0.354), t = -5.208, p < 0.001, posterior β (SE) = -1.460 (0.244), t = -5.977, p < 0.001; all p-values are Bonferroni-corrected for multiple comparisons).

The full output of the linear mixed model with the LDN difference-waves of both lexical-tone and vowel-quality contrasts is given in the supporting information S3. The effect of Region shows that the LDN is overall larger at frontocentral than posterior regions (β (SE) = -0.169 (0.057), t = -2.993, p < 0.01), see Figure 4. The interaction of Deviant type and Region reveals that the difference between lexical tones and vowel quality is larger at the posterior than the frontocentral region (β (SE) = -0.113 (0.056), t = -2.019, p = 0.044), as the LDN had a frontocentral focus in response to lexical-tone changes, but was similarly pronounced across regions for vowel-quality changes.

Figure 4. LDN amplitude in German-speaking adults

Note. LDN amplitude (difference wave) in German-speaking adults for lexical-tone and vowel-quality changes at frontocentral and posterior electrode regions Each dot represents one participant. The horizontal lines within boxes show medians, boxes show the interquartile (IQ) range, whiskers the 1.5 * IQ range, dots are potential outliers.

Discussion

Experiment 1 yielded two main findings for German adults. First, both lexical-tone and vowel-quality deviants elicited MMNs and LDNs at frontocentral and posterior regions. Second, the MMN was generally more pronounced for vowel quality than lexical tone and the LDN was also more pronounced for vowel quality than lexical tone but only at the posterior region not at the frontocentral region. Thus, German-speaking adults showed neural responses indicating discrimination between the high-rising and mid-level lexical tones and the vowel-quality change from /ɛ/ to /i/. Our MMN findings of adults’ lexical-tone discrimination are in line with several behavioural and neurophysiological studies that found evidence for tone discrimination in adult speakers of non-tonal languages (Chen et al., Reference Chen, Peter, Wijnen, Schnack and Burnham2018; Kaan et al., Reference Kaan, Barkley, Bao and Wayland2008; Politzer-Ahles et al., Reference Politzer-Ahles, Schluter, Wu and Almeida2016). Regarding vowel discrimination, our MMN results confirm the hypothesis that the tested changes in vowel quality, being acoustically close to participants’ native language, should evoke stronger neural discrimination responses than the lexical-tone changes (see Yu et al., Reference Yu, Shafer and Sussman2018). Adult participants additionally showed an LDN following the MMN for both lexical-tone and vowel-quality contrasts. The functional significance of the LDN component has been controversially discussed (e.g., Čeponienė et al., Reference Čeponienė, Lepistö, Soininen, Aronen, Alku and Näätänen2004), and might reflect high-order cognitive processing, such as integrating the speech stimuli into the native phonology (e.g., Barry et al., Reference Barry, Hardiman and Bishop2009), reorienting of attention (Mueller et al., Reference Mueller, Brehmer, von Oertzen, Li and Lindenberger2008), or representations of long-term memory in deviancy detection (Zachau et al., Reference Zachau, Rinker, Körner, Kohls, Maas, Hennighausen and Schecker2005). Moreover, the emergence of both components (MMN and LDN) suggests a two-stage sequential process in which the occurrence of both components suggests a higher need for resources to encode the speech signal (Yu et al., Reference Yu, Shafer and Sussman2018).

Our finding that vowel-quality changes elicited a stronger MMN and LDN than lexical-tone changes likely stems from the different relation of these sound classes to the participants’ native phonological system. Given that German adults had assigned both vowels to German vowel categories, with no difference in category goodness fit, we suggest that listeners likely mapped the Cantonese vowels onto the German vowel system and that the stronger neural responses to the vowels (compared to the tones) reflect their phonological processing. This explanation is in line with other studies showing stronger ERP responses to native compared to non-native sounds (e.g., Rivera-Gaxiola et al., Reference Rivera-Gaxiola, Silva-Pereyra and Kuhl2005; Yu et al., Reference Yu, Shafer and Sussman2018). However, acoustic differences in the realisation of the lexical-tone and the vowel-quality contrasts could also have induced differences in the responses. Specifically, the acoustic difference between the vowels /ɛ/ and /i/ is characterised by a fast formant transition while the differences in the pitch contour between the tones develop slowly over the length of the vowel. This faster acoustic transition from one vowel to the other vowel might have led to greater acoustic salience of the contrast in comparison to the slower trajectory of pitch in lexical tones. However, Yu et al. (Reference Yu, Chen, Wang, Wang and Li2022) have demonstrated that native tone language speakers exhibit similar neural responses to vowel-quality and lexical-tone contrasts. These results may speak against a pure acoustic explanation for the difference in the German speakers’ responses to the vowel-quality and lexical-tone contrasts, but support our assumption that these differences reflect the status of the contrast in the phonological system. However, our results confirm that German-speaking adults show a neural response to both types of contrasts – hence, we used the stimuli to investigate neural discrimination in German-learning infants.