Introduction
In everyday activities such as conversation, humans receive inputs from auditory (sound) and visual (lip movements) channels, and integrate them into a coherent perceptual representation; the aforementioned process is commonly referred to as audiovisual integration (Alsius et al., Reference Alsius, Navarra, Campbell and Soto-Faraco2005; Koelewijn et al., Reference Koelewijn, Bronkhorst and Theeuwes2010). Audiovisual temporal integration refers to the cognitive process by which individuals perceptually integrate asynchronous audiovisual stimuli into a coherent and meaningful unified event (Stevenson & Wallace, Reference Stevenson and Wallace2013; Stevenson et al., Reference Stevenson, Fister, Barnett, Nidiffer and Wallace2012). Temporal factors (such as synchrony, duration, speed, and rhythmic structure) play pivotal roles in effectively integrating visual and auditory signals (Fujisaki et al., Reference Fujisaki, Shimojo, Kashino and Nishida2004; van Wassenhove et al., Reference van Wassenhove, Grant and Poeppel2007). However, although temporal proximity is deemed crucial in audiovisual integration (van der Burg et al., Reference van der Burg, Olivers, Bronkhorst and Theeuwes2008), the synchronous presence of audiovisual stimuli is not necessary for this process (Lewkowicz, Reference Lewkowicz1996). When a pair of audiovisual stimuli is presented within the temporal binding window (TBW), they undergo integration (Dixon & Spitz, Reference Dixon and Spitz1980; Vroomen & Keetels, Reference Vroomen and Keetels2010). Humans are typically able to perceive audiovisual stimuli with a few hundred milliseconds of asynchrony as being presented synchronously (Beneventi et al., Reference Beneventi, Tønnessen, Ersland and Hugdahl2010; Birch & Belmont, Reference Birch and Belmont1964; van Wassenhove et al., Reference van Wassenhove, Grant and Poeppel2007). On the contrary, the audiovisual stimuli that are well separated in time and fall outside the TBW are often perceived as irrelevant independent information (Zhou et al., Reference Zhou, Shi, Yang, Cheung and Chan2020).
The audiovisual temporal integration ability can be gauged by the magnitude of TBW (e.g., Francisco et al., Reference Francisco, Jesse and McQueen2014; Hairston et al., Reference Hairston, Burdette, Flowers, Wood and Wallace2005), or by indicators that reflect the magnitude of TBW, such as the mean proportion of synchronous responses (e.g., Francisco, Groen, et al., Reference Francisco, Groen, Jesse and McQueen2017; Francisco, Jesse, et al., Reference Francisco, Jesse, Groen and McQueen2017) and the threshold of time perception (e.g., just noticeable differences, JND; Mossbridge et al., Reference Mossbridge, Zweig, Grabowecky and Suzuki2017). TBW refers to a temporal interval, during which two asynchronous stimuli will be integrated and perceived as one event (Dixon & Spitz, Reference Dixon and Spitz1980; Vroomen & Keetels, Reference Vroomen and Keetels2010). An abnormally narrowed TBW impairs the integration of related stimuli; conversely, an abnormally widened TBW hinders individuals from distinguishing unrelated asynchronous stimuli and mistakenly integrating them into one event (Francisco et al., Reference Francisco, Jesse and McQueen2014; Francisco, Jesse, et al., Reference Francisco, Jesse, Groen and McQueen2017; Hairston et al., Reference Hairston, Burdette, Flowers, Wood and Wallace2005; Virsu et al., Reference Virsu, Lahti-Nuuttila and Laasonen2003). However, only focusing on participants’ average-level audiovisual temporal integration processing overlooks its dynamic changes over time. Navarra et al. (Reference Navarra, Vatakis, Zampini, Soto-Faraco, Humphreys and Spence2005, Reference Navarra, Soto-Faraco and Spence2007, Reference Navarra, Hartcher-O’Brien, Piazza and Spence2009) have found that the audiovisual TBW is not fixed but undergoes dynamic changes throughout the integration process, including three stages: temporal window expansion, the point of subjective simultaneity (PSS) shift, and temporal window restoration, known as temporal recalibration. PSS refers to a stimulus onset asynchrony (SOA) in which an individual is most likely to subjectively perceive the stimuli as synchrony (Navarra et al., Reference Navarra, Vatakis, Zampini, Soto-Faraco, Humphreys and Spence2005, Reference Navarra, Soto-Faraco and Spence2007, Reference Navarra, Hartcher-O’Brien, Piazza and Spence2009). The expansion and contraction of the temporal window depend on the previous sensory experience (Powers et al., Reference Powers, Hillock and Wallace2009). Specifically, the audiovisual temporal processing will be dynamically adjusted based on prior sensory experience; in an asynchronous audiovisual environment, the PSS will shift toward the asynchronous direction (Fujisaki et al., Reference Fujisaki, Shimojo, Kashino and Nishida2004). For example, after watching a movie in which the auditory and visual elements are not synchronous for a while, the SOA in which an individual is most likely to perceive the stimuli as synchrony (i.e., PSS) will align with the SOA between visual and auditory stimuli in the movie, thus no longer feeling uncomfortable. When the temporal recalibration can be completed in a short time (e.g., 1 trial), it is called rapid temporal recalibration (van der Burg et al., Reference van der Burg, Alais and Cass2013). This short-term plasticity (the ability to rapidly adapt to changing sensory statistics of environments) plays an important role in audiovisual temporal integration (Noel et al., Reference Noel, De Niear, Stevenson, Alais and Wallace2017). De Niear et al. (Reference De Niear, Noel and Wallace2017) and van der Burg et al. (Reference van der Burg, Alais and Cass2015) have found that the TBW will significantly change (initially increase and subsequently decrease) with the progress of experiments (the increase in the number of trials). The change of TBW during the process of audiovisual temporal integration has also been substantiated by numerous training studies (e.g., Powers et al., Reference Powers, Hillock and Wallace2009, Reference Powers, Hevey and Wallace2012; Theves et al., Reference Theves, Chan, Naumer and Kaiser2020), thereby reflecting its plasticity. It is evident that a comprehensive assessment of the change process of TBW can provide more informative insights compared to simply aggregating all trials to obtain an overall measure of TBW. Several empirical studies have indicated that despite certain individuals possessing a typical multi-sensory integration ability at the average level, the process of integration (e.g., rapid audiovisual temporal recalibration) may be different from typically developing individuals (Noel et al., Reference Noel, De Niear, Stevenson, Alais and Wallace2017; Zaidel et al., Reference Zaidel, Goin-Kochel and Angelaki2015). Therefore, the rapid audiovisual temporal recalibration may serve as a more sensitive indicator, considering the more process information provided by this recalibration (e.g., Noel et al., Reference Noel, De Niear, Stevenson, Alais and Wallace2017). Investigating the rapid audiovisual temporal recalibration holds great importance in further elucidating the underlying mechanism of audiovisual temporal integration.
Reading is the cognitive process of integrating visual and auditory input information (Pammer & Vidyasagar, Reference Pammer and Vidyasagar2005). The phonetic information and graphemic information of written language are integrated into the form of graphemes–phonemes binding, and this binding is automated through repeated memory (Chein & Schneider, Reference Chein and Schneider2005; Dijkstra et al., Reference Dijkstra, Schreuder and Frauenfelder1989), and finally fluent reading is acquired. Gori et al. (Reference Gori, Ober, Tinelli and Coubard2020) emphasized the significance of an appropriate audiovisual TBW in facilitating the accurate binding of pronunciation to print when learning to read. Therefore, it can be inferred that audiovisual temporal integration is a crucial influencing factor in reading. This view has also been confirmed by many empirical studies. Francisco, Groen, et al. (Reference Francisco, Groen, Jesse and McQueen2017) found that the proportion of participants’ synchronous response to speech and non-speech audiovisual asynchronous stimuli was positively correlated with the frequency of reading errors, and could effectively explain the variance in reading errors. Later, both behavioral and EEG studies conducted by Mossbridge et al. (Reference Mossbridge, Zweig, Grabowecky and Suzuki2017) supported that audiovisual temporal integration ability was positively correlated with reading comprehension performance and could explain 16% and 25% of reading comprehension variance respectively. Mossbridge and colleagues suggested that audiovisual temporal integration processing might contribute to synchronizing phonological representations with visual orthographic processing in the process of orthographic-to-phonetic decoding (Mossbridge et al., Reference Mossbridge, Zweig, Grabowecky and Suzuki2017). Liu et al. (Reference Liu, Wang and Liu2019) did similar research on the Chinese writing system and found that the audiovisual cross-modal temporal order judgment task performance not only directly impacted reading ability, but also indirectly impacted it through rapid naming and orthographic processing (Liu et al., Reference Liu, Wang and Liu2019). Therefore, previous results substantiated the correlation between audiovisual temporal integration and reading. The rapid audiovisual temporal recalibration capability represents a dynamic adjustment process in response to environmental changes and a measure of variability, which ultimately impacts stable temporal representations, such as the TBW (Noel et al., Reference Noel, De Niear, van der Burg and Wallace2016). In addition, there exists a positive correlation between individuals’ temporal integration ability and rapid audiovisual temporal recalibration ability (Harvey et al., Reference Harvey, van der Burg and Alais2014; Noel et al., Reference Noel, De Niear, van der Burg and Wallace2016; van der Burg et al., Reference van der Burg, Alais and Cass2013). Specifically, rapid recalibration may serve as the underlying mechanism facilitating temporal integration by permitting swift adjustments of TBW and PSS in response to successive environmental stimuli for successful integration (Noel et al., Reference Noel, De Niear, Stevenson, Alais and Wallace2017). Consequently, there may be a significant correlation between rapid audiovisual temporal recalibration and reading. A strong rapid audiovisual temporal recalibration ability may enhance the process of reading acquisition by dynamically modulating temporal windows and PSS. Specifically, during real-world reading acquisition, individuals inevitably encounter instances of asynchrony between sound and image (e.g., when learning via video on an iPad). At such times, individuals with skillful temporal recalibration abilities can flexibly adjust their TBW and PSS in response to constant environmental changes, thereby acquiring accurate morphology-pronunciation correspondence rules and ultimately achieving optimal reading ability. However, there remains a lack of empirical evidence, especially in Chinese reading. This study sought to clarify this question.
Compared with children, there is a diminished correlation observed between audiovisual temporal integration ability and reading in adults (children: Wu, Reference Wu2020; adults: Francisco, Groen, et al., Reference Francisco, Groen, Jesse and McQueen2017). Therefore, the relationship between rapid audiovisual temporal recalibration and reading may also be modulated by age group. From a developmental perspective, audiovisual temporal integration matures earlier for non-speech stimuli (flashes and beeps) than for speech stimuli (speech) (Noel et al., Reference Noel, De Niear, van der Burg and Wallace2016). Since the temporal integration ability and rapid audiovisual temporal recalibration ability of speech and non-speech audiovisual stimuli are correlated (Harvey et al., Reference Harvey, van der Burg and Alais2014; Noel et al., Reference Noel, De Niear, van der Burg and Wallace2016; van der Burg et al., Reference van der Burg, Alais and Cass2013), the rapid audiovisual temporal recalibration may have a similar development trend with audiovisual temporal integration. Several studies have found that the temporal integration ability for speech and non-speech audiovisual stimuli in childhood has already comparable to adults (Chen et al., Reference Chen, Shore, Lewis and Maurer2016; Hillock-Dunn et al., Reference Hillock-Dunn, Grantham and Wallace2016; Lewkowicz & Flom, Reference Lewkowicz and Flom2014). Additionally, the rapid audiovisual temporal recalibration ability matures earlier than the audiovisual temporal integration ability (Burr & Gori, Reference Burr, Gori, Murray and Wallace2012; Noel et al., Reference Noel, De Niear, van der Burg and Wallace2016). Therefore, the rapid audiovisual temporal recalibration ability may be comparable to adults in childhood. But so far, in the context of Chinese culture, only Zhou et al. (Reference Zhou, Shi, Yang, Cheung and Chan2020) have studied the effects of age and stimulus type on rapid audiovisual temporal recalibration. They found that adults exhibited significant rapid audiovisual temporal recalibration for both speech and non-speech stimuli. However, adolescents only exhibited significant temporal recalibration for speech stimuli, while failing to exhibit this significant effect for non-speech stimuli. It can be observed that within the context of Chinese culture, individual’s rapid audiovisual temporal recalibration for speech stimuli matures earlier than that for non-speech stimuli. However, Zhou et al. (Reference Zhou, Shi, Yang, Cheung and Chan2020) only examined the differences between adolescents and adults. To date, there has been a lack of exploration into the difference in rapid audiovisual temporal recalibration ability between children and adults, including its impact from stimulus types (speech and non-speech) and whether its relationship with reading is modulated by age group.
In summary, investigating the cross-age difference of rapid audiovisual temporal recalibration and its relationship with reading is crucial for elucidating the temporal course and mechanisms underlying audiovisual temporal integration, as well as understanding how audiovisual temporal integration explains the variance of reading abilities. The current body of research is insufficient to establish whether the rapid audiovisual temporal recalibration observed in childhood is comparable to that observed in adults, and its relationship with reading remains unconfirmed. Therefore, the present study focused on the ability of rapid audiovisual temporal recalibration and made a further exploration of three aspects: age, stimulus types, and their correlation with reading. In Experiment 1, both children and adults underwent the audiovisual simultaneous judgment (SJ) task with speech stimuli; in Experiment 2, both children and adults underwent the audiovisual SJ task with non-speech stimuli. Built upon previous efforts, the present study hypothesizes that: (1) children’s rapid audiovisual temporal recalibration ability matures earlier for speech stimuli than for non-speech stimuli; (2) rapid audiovisual temporal recalibration ability is related to reading ability.
Experiment 1
Materials and methods
Participants
Upon ethical approval and informed consent, we recruited 36 children and 36 adults from China. The children (18 females) were recruited from primary schools, with a mean age of 10.70 years old (SD = 1.88). The adults (27 females) were recruited online from colleges, with a mean age of 24.52 years old (SD = 1.82). All participants were native Chinese speakers with Asians as their race who possessed normal hearing and either normal vision or corrected-to-normal vision. None of the participants had a history of neurological disease or psychiatric disorders.
Audiovisual SJ task with speech stimuli
Participants performed a standard SJ task with multisensory stimuli (judge the synchrony of audiovisual stimuli; e.g., Francisco et al., Reference Francisco, Jesse and McQueen2014; Francisco, Groen, et al., Reference Francisco, Groen, Jesse and McQueen2017; Francisco, Jesse, et al., Reference Francisco, Jesse, Groen and McQueen2017). The audiovisual clips featuring a female speaker uttering the single syllable “ba” were used (i.e., speech stimuli, see Figure 1). The visual stimuli (1200 × 680 pixels; 30 frames per second) had a duration ranging from 2090 to 2370 ms. Each video clip contained the entire process of articulation, including pre-articulatory gestures. The auditory stimuli had a sampling rate of 44.1 kHz and 16-bit depth, with a duration of approximately 225 ms. The set of SOAs utilized were 0, ±40, ±80, ±120, ±160, ±200, ±240, and ±280 ms, for the auditory and visual stimuli. The negative values indicate that the presentation of auditory stimulus precedes that of visual stimulus, while positive values indicate that the presentation of visual stimulus precedes that of auditory stimulus. The SJ task comprised a total of 150 trials, with 10 trials for each SOA condition. The inter-trial interval between consecutive trials was set at 500 ms. We obtained written consent from the female speaker to utilize these videos in experiments and publications.

Figure 1. The Speech Stimulus in Experiment 1.
Note. Top to bottom represents the individual frames from dynamic visual stimuli and the auditory waveform for the stimulus utilized (i.e., SOA = 0 ms).
Reading measures
Participants’ reading fluency was assessed using the reading test used by Qian et al. (Reference Qian, Deng, Zhao and Bi2015) and Meng et al. (Reference Meng, Wydell and Bi2019, Reference Meng, Liu and Bi2022), which has been widely used in studies of children and adults. The test demonstrated high levels of reliability and validity, with an internal consistency coefficient (Cronbach’s alpha) ranging from 0.93 to 0.97 (Meng et al., Reference Meng, Wydell and Bi2019, Reference Meng, Liu and Bi2022). A printed list containing 160 high-frequency single characters (with a mean frequency of 233.71 times per million) was provided to participants, who were instructed to read aloud as many characters as possible within one minute. The total number of characters correctly named was recorded.
Design and general procedure
In the audiovisual SJ task with speech stimuli, a single-factor design was applied, utilizing group (child group and adult group) as the between-subjects factor.
The presentation of stimuli was controlled by the E-prime 3.0 software (E-Prime Psychology Software Tools Inc., Pittsburgh, USA), and the audiovisual clips were presented on a Lenovo Legion 7 15.6-inch liquid crystal display (LCD) screen with a resolution of 1920 × 1080 pixels, 144 Hz refresh rate, and black background. The screen was positioned at a distance of approximately 70 cm from the participants. The auditory stimuli were delivered via headphones. The intensity of the auditory stimuli was individually adjusted for each participant, corresponding to a comfortable hearing level of approximately 73 dB. As shown in Figure 2, each trial commenced with the presentation of a central fixation point, with duration randomized from 500 to 1000 ms. Ten repetitions per SOA condition were presented randomly for the audiovisual clips.

Figure 2. The Procedure of Audiovisual SJ Task in Experiment 1.
Participants were instructed to indicate whether the audiovisual stimuli were presented synchronously or not by pressing the corresponding response key on the keyboard (i.e., the ‘‘f’’ key for synchronous judgment and the ‘‘j’’ key for asynchronous judgment). Before the formal experiment, eight trials were administered for practice, consisting of four trials with synchronous audiovisual stimuli (SOA = 0 ms), two trials with the largest positive SOA (+280 ms), and two trials with the largest negative SOA (–280 ms). Only after participants reached an accuracy rate of ≥ 75% could they proceed to the formal experiment. It typically took 10–12 min to complete the audiovisual SJ task. Following the audiovisual SJ task, participants completed the reading fluency test to assess their reading ability.
Data analysis
Audiovisual TBW. The proportion of “simultaneity” responses for each SOA condition was used to calculate the magnitude of each participant’s audiovisual TBW in the audiovisual SJ task. The distributions of “synchrony” reports across SOAs were fitted with a Gaussian distribution.

In function (1), the mean of the best-fitting distribution was taken as an indicator for PSS, while the standard deviation was taken as an indicator for TBW (Harvey et al., Reference Harvey, van der Burg and Alais2014; Noel et al., Reference Noel, De Niear, van der Burg and Wallace2016; Zhou et al., Reference Zhou, Shi, Yang, Cheung and Chan2020). The data that failed to fit were excluded, and the shape of Gaussian distribution could accurately depict the reports of synchrony (mean adjusted R 2 > 0.8, see Table S1; see the data distribution pattern of a few representative participants in Figure S1). It should be noted that there were no participants excluded from the analysis due to failing to fit with a Gaussian distribution. The PSS and TBW were then compared between children and adults using an independent sample t-test. In cases where the assumption of the homogeneity of variances was violated, Welch corrections were applied to adjust the degrees of freedom.
Rapid audiovisual temporal recalibration. To reveal the temporal course of audiovisual temporal integration, the present study first examined the rapid audiovisual temporal recalibration. Separate inter-trial analyses were conducted for the audiovisual SJ task to explore whether the modality order (auditory-leading or visual-leading) in the prior trial (trial t–1) affected the distribution of “synchrony” responses (i.e., whether a rapid audiovisual temporal recalibration occurred) in the current trial (trial t ), as reflected by ΔPSS and ΔTBW which are represent different manifestations of the same underlying mechanism (van der Burg et al., Reference van der Burg, Alais and Cass2013). Gaussian distributions were fitted to both types of trial t–1 order [i.e., auditory-leading (trial t–1AV) or visual-leading (trial t–1VA)] data for each participant to estimate ΔPSS [function (2)] and ΔTBW [function (3)] (e.g., De Niear et al., Reference De Niear, Noel and Wallace2017; Harvey et al., Reference Harvey, van der Burg and Alais2014; Noel et al., Reference Noel, De Niear, van der Burg and Wallace2016; Zhou et al., Reference Zhou, Shi, Yang, Cheung and Chan2020). If rapid audiovisual temporal recalibration occurred, the ΔPSS and/or ΔTBW were expected to be significantly larger than 0. This study conducted a one-sample t-test to examine the presence of rapid audiovisual temporal recalibration in children and adults. Subsequently, an independent sample t-test was conducted to compare the ΔPSS and ΔTBW between children and adults. In cases where the assumption of the homogeneity of variances was violated, Welch corrections were applied to adjust the degrees of freedom.


Second, the temporal course analysis was conducted on a total of 150 trials. A sliding-window approach was used to fit Gaussian distribution trial by trial (i.e., trial1 ∼ trial50, trial2 ∼ trial51, …, trial101 ∼ trial150) to obtain TBWs that changed with trial (e.g., De Niear et al., Reference De Niear, Noel and Wallace2017; van der Burg et al., Reference van der Burg, Alais and Cass2015), resulting in a total of 101 time points. After excluding data that failed to fit (less than 6% of the total for each condition in each group, as shown in Table S2), the temporal course of audiovisual temporal integration was analyzed via a one-sample t-test versus the initial estimate of the given parameter (i.e., TBW on trial1 ∼ trial50). Given the inherent multiple comparisons problem in utilizing a sliding-window approach, the present study corrected for false positives by considering an effect significant by setting α < .01 for at least 10 consecutive time points (e.g., De Niear et al., Reference De Niear, Noel and Wallace2017). Referring to van der Burg et al. (Reference van der Burg, Alais and Cass2015), to fully compare the performance of children and adults, the present study subsequently divided the temporal course into some bins and conducted linear fitting, followed by a comparison of the differences (i.e., analysis of variance, ANOVA) in the slopes of TBWs in each bin between children and adults. The division of bins is exploratory and post-hoc, based on the dynamic trend of TBW (van der Burg et al., Reference van der Burg, Alais and Cass2015). This approach aimed to investigate the differences in the temporal course of audiovisual temporal integration between children and adults. The Greenhouse-Geisser correction was reported whenever Mauchly’s test of sphericity was significant. Furthermore, the linear trend at point method was used to substitute any missing values generated during the fitting process (Vercoulen et al., Reference Vercoulen, Swanink, Fennis, Galama, van der Meer and Bleijenberg1994).
The relationship between rapid audiovisual temporal recalibration and reading. The partial correlation analysis was conducted to reveal the correlation between audiovisual temporal integration (TBW), rapid audiovisual temporal recalibration (ΔPSS and ΔTBW), and reading ability while controlling for age and gender. Additionally, hierarchical regressions were conducted to examine the unique contributions of ΔPSS in explaining variance in reading ability after controlling for age, gender, and TBW.
Results
The average level of audiovisual temporal integration for speech stimuli in child and adult group
Table 1 and Figure 3 show the mean proportion of responses judged as synchronous responses for both children and adults across each SOA for speech stimuli, suggesting a tendency that children perceive audiovisual speech information as synchrony over a wider temporal window compared to adults. The independent sample t-test revealed a significant effect of the group on the width of TBW. The child group (M = 191.41, SD = 41.89) showed a wider TBW than the adult group (M = 144.96, SD = 23.73), t(70) = 5.79, p < .001, Cohen’s d = 1.36. These results indicated that the average level of audiovisual temporal integration for speech stimuli in children was weaker than that of adults. Moreover, it is noteworthy that the PSS in the adult group exhibited a negative value, indicating a preference for auditory information when perceiving audiovisual asynchronous stimuli. One sample t-tests (test value = 0) indicated that the adult group [M = –26.69, SD = 14.65, t(35) = –10.93, p < .001, Cohen’s d = 1.82] demonstrated the auditory processing advantages compared to the child group [M = 0.30, SD = 22.59, t(35) = 0.08, p = .937, Cohen’s d = 0.01].

Figure 3. Synchrony Responses by SOAs for Audiovisual Speech Stimuli in Children and Adults Instruction.
Note. Experiment 1 included a sample of 36 children and 36 adults.
Table 1. Synchrony responses by SOAs and groups for audiovisual speech stimuli

Note: When the assumption of the homogeneity of variances was violated, Welch corrections were applied to adjust the degrees of freedom. To correct for multiple comparisons, this study conducted false discovery rate (FDR) correction on the resulting p values (same as below; e.g., Benjamini & Hochberg, Reference Benjamini and Hochberg1995).
a Children: n = 36.
b Adults: n = 36.
c The bold ones are significant results.
The rapid audiovisual temporal recalibration of audiovisual temporal integration for speech stimuli in child and adult group
For rapid audiovisual temporal recalibration, the independent sample t-test on ΔPSS and ΔTBW revealed no significant effect of group, indicating that there is no discernible difference in the rapid audiovisual temporal recalibration between children and adults [ΔPSS: t(55.95) = –0.32, p = .752, Cohen’s d = 0.07; ΔTBW: t(59.93) = 0.29, p = .774, Cohen’s d = 0.07]. Furthermore, one sample t-tests (test value = 0) indicated the significant positive ΔPSS in both child group [M = 11.32, SD = 21.51, t(35) = 3.16, p = .003, Cohen’s d = 0.53] and adult group [M = 12.63, SD = 12.40, t(35) = 6.11, p < .001, Cohen’s d = 1.02]. The significant positive ΔPSS indicates that the PSS has adjusted in accordance with the previous trial, demonstrating the presence of rapid audiovisual temporal recalibration (e.g., van der Burg & Goodbourn, Reference van der Burg and Goodbourn2015; van der Burg et al., Reference van der Burg, Alais and Cass2013, Reference van der Burg, Alais and Cass2015).
As for the other side of rapid audiovisual temporal recalibration, that is, the changes of TBW at trial-by-trial level during the integration process, there was an observed visual trend that the TBW increased first and then remained stable over time in the child group (see Figure 4). The significant differences were detected in width between TBW1 (i.e., trial1 ∼ trial50, M = 167.07, SE = 6.95) and the interval including TBW25 (i.e., trial25 ∼ trial74) to TBW101 (i.e., trial101 ∼ trial150; one sample t-test, all ps < 0.01). On the contrary, as shown in Figure 4, there was an observed visual trend (rather than a statistical trend) that the TBW increased first and then decreased over time in the adult group. The significant differences were detected in width between TBW1 (M = 132.64, SD = 5.72) and the interval including TBW18 (i.e., trial18 ∼ trial67) to TBW36 (i.e., trial36 ∼ trial85; one sample t-test, all ps < 0.01).

Figure 4. The Temporal Course of Audiovisual Temporal Integration of the TBW for Speech Stimuli in Child and Adult Group Instruction.
Note. The shaded region illustrated the standard error of the mean (SEM) at each trial across the temporal course analysis. To correct for multiple comparisons, this study considered an effect significant at α < .01 for at least 10 consecutive trials. Solid bars shown above the temporal course were indicative of at least 10 consecutive trials at which the TBW significantly differed from TBW1 (α < .01 for all trials).
To more clearly illustrate the different change process of TBW between children and adults, the present study divided the temporal course into 3 bins (according to Figure 4), performed linear fitting, and subsequently compared the differences in the slopes of TBWs in each bin [bin1: TBW1 ∼ TBW20 (i.e., trial1 ∼ trial50 and trial20 ∼ trial69); bin2: TBW21 ∼ TBW60 (i.e., trial21 ∼ trial70 and trial60 ∼ trial109); bin3: TBW61 ∼ TBW101 (i.e., trial61 ∼ trial110 and trial101 ∼ trial150)] between children and adults. The 2 (group: child group, adult group) × 3 (bin: bin1, bin2, bin3) ANOVA conducted on the slopes of TBWs revealed a significant main effect of bin [F(2, 140) = 10.83, p < .001, η p 2 = 0.13]. Specifically, the slopes of TBWs for bin1 (M = 0.88, SE = 0.19) were significantly greater than that for bin2 (M = –0.07, SE = 0.11; mean difference = 0.95, p < .001) and bin3 (M = 0.11, SE = 0.13; mean difference = 0.77, p = .005), while there was no difference between the slopes of TBWs for bin2 and bin3 (mean difference = –0.18, p = .689). The main effect for the group was not significant [F(1, 70) = 0.04, p = .835, η p 2 < 0.01], and there was no significant interaction effect observed [F(2, 140) = 2.99, p = .054, η p 2 = 0.04]. These results suggested that, similar to adults, children can rapidly adjust their response to the temporal relationship of the current audiovisual stimuli based on the temporal relationship of the audiovisual stimuli in the prior trial, with a dynamical adjustment of TBW during the audiovisual temporal integration for speech stimuli.
In summary, children around 10 years old exhibit robust rapid audiovisual temporal recalibration for speech stimuli. Although no significant differences are observed between age groups in ΔTBW and ΔPSS, differences in the recalibration process at the trial-by-trial level do exist between the two groups (as shown in Figure 4). Therefore, children’s rapid audiovisual temporal recalibration has the potential for further improvement.
The relationship between the temporal course of audiovisual temporal integration for speech stimuli and reading
The relationships between TBW, ΔTBW, ΔPSS, and reading ability were analyzed using partial correlation analysis, with age and gender controlled. There was a marginally significant correlation between TBW and reading fluency, r = –.19, p = .114. ΔPSS was significantly correlated with reading fluency, r = .51, p < .001 (see Figure 5). However, the correlation between ΔTBW and reading fluency was not significant, r = .02, p = .859.

Figure 5. Correlation of ΔPSS and reading fluency in Experiment 1.
Hierarchical regression analyses were conducted to investigate the impact of the average level of audiovisual temporal integration (with TBW as the indicator) and the temporal course of audiovisual temporal integration (with ΔPSS as the indicator) for speech stimuli on reading ability. Age and gender were controlled, which entered into the regression models as Step 1. Subsequently, TBW entered into the regression models as Step 2. Finally, ΔPSS entered into the regression models as Step 3. The results of the hierarchical regression analyses indicated that after controlling age and gender, ΔPSS explained 9.3% of the variance in reading fluency. When further controlling for TBW, ΔPSS remained a significant predictor of reading fluency, explaining 10.6% of the variance in reading fluency (see more details in Table 2).
Table 2. Hierarchical regression of TBW and ΔPSS on reading fluency in Experiment 1

Note: † p < .1, * p < .05,** p < .01, *** p < .001.
Experiment 2
Materials and methods
Participants
Upon ethical approval and informed consent, we recruited 36 children and 36 adults from China. The children (23 females) were recruited from primary schools, with a mean age of 10.19 years old (SD = 1.94). The adults (24 females) were recruited online from colleges, with a mean age of 24.26 years old (SD = 1.98). All participants were native Chinese speakers with Asians as their race who possessed normal hearing and either normal vision or corrected-to-normal vision. None of the participants had a history of neurological disease or psychiatric disorders. The fact that there was an overlap (30 adults) between the participants in Experiment 2 and Experiment 1 should be acknowledged. To ensure that the task quality was not affected by fatigue and careless attitude, adequate rest time was provided to the adults between the two experiments. Furthermore, a counterbalanced order was implemented for conducting the two experiments.
Audiovisual SJ task with non-speech stimuli
Participants performed a standard SJ task with non-speech audiovisual stimuli (i.e., the dynamic handheld tools, see Figure 6). The visual stimuli (1200 × 680 pixels; 30 frames per second) consisted of a complete cycle of motion of a hand utilizing a hammer. The videos had a duration ranging from 2090 to 2370 ms. The auditory stimuli were a congruent hammering noise presented by a sampling rate of 44.1 kHz and 16-bit depth, with a duration of approximately 206 ms. The set of SOA utilized was consistent with Experiment 1. The SJ task comprised a total of 150 trials, with 10 trials for each SOA condition. The inter-trial interval between consecutive trials was set at 500 ms.

Figure 6. The non-speech stimulus in experiment 2.
Note. Top to bottom represents the individual frames from dynamic visual stimuli and the auditory waveform for the stimulus utilized (i.e., SOA = 0 ms).
Reading measures
The same reading measure was used in Experiment 2 as in Experiment 1.
Design and general procedure
In the audiovisual SJ task with non-speech stimuli, a single-factor design was applied, utilizing group (child group and adult group) as the between-subjects factor.
The presentation of stimuli was controlled by the E-prime 3.0 software (E-Prime Psychology Software Tools Inc., Pittsburgh, USA), and the audiovisual clips were presented on a Lenovo Legion 7 15.6-inch LCD screen with a resolution of 1920 × 1080 pixels, 144 Hz refresh rate, and black background. The distance between the screen and the participants was approximately 70 cm. The auditory stimuli were delivered via headphones. The intensity of the auditory stimuli was individually adjusted for each participant, corresponding to a comfortable hearing level of approximately 73 dB. As shown in Figure 7, each trial commenced with the presentation of a central fixation point, with duration randomized from 500 to 1000 ms. Ten repetitions per SOA condition were presented randomly for the dynamic handheld tools.

Figure 7. The procedure in Experiment 2.
A similar audiovisual SJ task (i.e., judge whether the audiovisual stimuli were presented synchronously or not) in Experiment 1 was used in Experiment 2 for non-speech stimuli. The setup of SOA conditions and the setup of practice and formal experiments were the same as in Experiment 1. It typically took 10–12 min to complete the audiovisual SJ task. Following the audiovisual SJ task, participants completed the reading fluency test.
Data analysis
The same data analyses were used in Experiment 2 as in Experiment 1. The shape of Gaussian distribution could accurately depict the reports of synchrony (mean adjusted R 2 > 0.8, see Table S3; see the data distribution pattern of a few representative participants in Figure S2). It should be noted that there were no participants excluded from the analysis due to failing to fit with a Gaussian distribution. The proportion of data excluded in each condition in each group was less than 6% of the total (see Table S4).
Results
The average level of audiovisual temporal integration for non-speech stimuli in child and adult group
Table 3 and Figure 8 show the mean proportion of responses judged as synchronous responses for both children and adults across each SOA for non-speech stimuli, suggesting a tendency that children perceive audiovisual non-speech information as synchrony over a wider temporal window than adults. The independent sample t-test revealed a significant effect of the group on the width of TBW. The child group (M = 157.56, SD = 43.36) showed a wider TBW than the adult group (M = 130.46, SD = 22.10), t(52.03) = 3.34, p = .002, Cohen’s d = 0.79. These results indicated that the average level of audiovisual temporal integration for non-speech stimuli in children is weaker than that of adults. Moreover, the PSS in both children and adults exhibited positive values, indicating a preference for visual information when perceiving audiovisual asynchronous stimuli. One sample t-tests (test value = 0) indicated that both the adult group [M = 24.50, SD = 8.95, t(35) = 16.42, p < .001, Cohen’s d = 2.74] and the child group [M = 32.57, SD = 23.92, t(35) = 8.17, p < .001, Cohen’s d = 1.36] had visual processing advantages. It should be noted that the adult group was slightly biased towards auditory processing, as compared to the child group, t(44.62) = 1.90, p = .065, Cohen’s d = 0.45.
Table 3. Synchrony responses by SOAs and groups for audiovisual non-speech stimuli

Note: The Welch corrections and multiple comparison corrections were consistent with those in Table 1.
a Children: n = 36.
b Adults: n = 36.
c The bold ones are significant results.

Figure 8. Synchrony responses by SOAs for audiovisual non-speech stimuli in children and adults. Note. Experiment 2 included a sample of 36 children and 36 adults.
The temporal course of audiovisual temporal integration for non-speech stimuli in child and adult group
For rapid audiovisual temporal recalibration, the independent sample t-test on ΔPSS revealed a significant effect of group. Adults (M = 10.83, SD = 12.61) had stronger rapid audiovisual temporal recalibration than children (M = 4.26, SD = 11.88), t(70) = 2.28, p = .026, Cohen’s d = 0.54. However, there was no difference in ΔTBW between children (M = 0.11, SD = 32.79) and adults (M = 5.58, SD = 31.15), t(70) = –0.73, p = .470, Cohen’s d = 0.17. Furthermore, one sample t-test (test value = 0) indicated the presence of audiovisual temporal recalibration in both child group [ΔPSS: t(35) = 2.15, p = .038, Cohen’s d = 0.36] and adult group [ΔPSS: t(35) = 5.15, p < .001, Cohen’s d = 0.86].
As for the other side of rapid audiovisual temporal recalibration, that is, the changes of TBW in trial-by-trial level during the integration process, there was an observed visual trend that the TBW first increased, followed by a period of stability, and subsequently increased in the child group (see Figure 9). The differences in width between TBW1 (M = 140.30, SE = 9.66) and the interval including TBW50 (i.e., trial50 ∼ trial99) to TBW60 (i.e., trial60 ∼ trial109), TBW78 (i.e., trial78 ∼ trial127) to TBW87 (i.e., trial87 ∼ trial136), and TBW89 (i.e., trial89 ∼ trial138) to TBW101 (i.e., trial101 ∼ trial150; one sample t-test, all ps < 0.01) were significant. On the contrary, there was an observed visual trend that the TBW initially increased and subsequently decreased over time in the adult group (see more details in Figure 9). The differences in width between TBW1 (M = 119.72, SD = 3.89) and the interval including TBW25 (i.e., trial25 ∼ trial74) to TBW88 (i.e., trial88 ∼ trial137; one sample t-test, all ps < 0.01) were significant.

Figure 9. The temporal course of audiovisual temporal integration of the TBW for non-speech stimuli in child and adult groups. Note. The shaded region illustrated the standard error of the mean (SEM) at each trial across the temporal course analysis. To correct for multiple comparisons, this study considered an effect significant at α < .01 for at least 10 consecutive trials. Solid bars shown above the temporal course were indicative of at least 10 consecutive trials at which the TBW significantly differed from TBW1 (α < .01 for all trials).
To more clearly illustrate the different change process of TBW between children and adults, the present study divided the temporal course into 3 bins (according to Figure 9), performed linear fitting, and subsequently compared the differences in the slopes of TBWs in each bin [bin1: TBW1 ∼ TBW30 (i.e., trial1 ∼ trial50 and trial30 ∼ trial79); bin2: TBW31 ∼ TBW70 (i.e., trial31 ∼ trial80 and trial70 ∼ trial119); bin3: TBW71 ∼ TBW101 (i.e., trial71 ∼ trial120 and trial101 ∼ trial150)] between children and adults. The 2 (group: child group, adult group) × 3 (bin: bin1, bin2, bin3) ANOVA conducted on the slopes of TBWs revealed a significant main effect of bin [F(1.84, 128.48) = 4.22, p = .020, η p 2 = 0.06] and a significant main effect of group [F(1, 70) = 18.80, p < .001, η p 2 = 0.21]. Moreover, there was a significant interaction effect of group and bin, F(1.84, 128.48) = 3.86, p = .027, η p 2 = 0.05. Specifically, simple effect analysis with Sidak correction for multiple comparisons showed that the slopes of TBWs for adults (M = –0.63, SE = 0.19) were smaller than that for children (M = 0.64, SE = 0.19) in bin3 (mean difference = –1.27, p < .001), while there were no differences between children (bin1: M = 0.83, SE = 0.25; bin2: M = 0.18, SE = 0.19) and adults (bin1: M = 0.48, SE = 0.25; bin2: M = 0.15, SE = 0.19) in bin1 (mean difference = –0.35, p = .315) and bin2 (mean difference = –0.02, p = .928). Besides, there were no significant differences between bin1, bin2, and bin3 in the child group; all ps > .100. However, within the adult group, the slopes of TBWs in bin1 were significantly greater than that in bin3 (mean difference = 1.10, p = .009), while the slopes of TBWs in bin2 were also significantly greater than that in bin3 (mean difference = 0.78, p = .018). No difference was found between the slopes of TBWs in bin1 and bin2 (mean difference = 0.32, p = .723). Furthermore, one sample t-test (test value = 0) indicated no significant slopes of TBWs for bin2 [t(35) = 1.01, p = .319, Cohen’s d = 0.17], but did show significant positive slopes of TBWs for bin1 [t(35) = 3.44, p = .002, Cohen’s d = 0.58] and bin3 [t(35) = 3.50, p = .001, Cohen’s d = 0.59] in child group. However, the adult group exhibited marginally significant positive slopes of TBWs for bin1 [t(35) = 1.91, p = .065, Cohen’s d = 0.32], significant negative slopes of TBWs for bin3 [t(35) = –3.30, p = .002, Cohen’s d = 0.54], but no significant slopes of TBWs for bin2 [t(35) = 0.73, p = .469, Cohen’s d = 0.12]. These results suggested that, compared with children, adults can adjust TBW more flexibly during the audiovisual temporal integration for non-speech audiovisual stimuli, and had a stronger ability to rapidly adjust their response to the temporal relationship of the current audiovisual stimuli based on the temporal relationship in the prior trial.
In summary, these findings suggested that children around the age of 10 already exhibit a strong rapid audiovisual temporal recalibration for non-speech stimuli; however, the significant disparity in this ability between children and adults persists.
The relationship between the temporal course of audiovisual temporal integration for non-speech stimuli and reading
Considering the significant differences in rapid audiovisual temporal recalibration ability between children and adults, the data of children and adults were split, and the partial correlation analysis between TBW, ΔTBW, ΔPSS, and reading fluency was conducted separately. The partial correlation analysis controlling for age and gender showed that children’s TBW was significantly correlated with reading fluency, r = –.37, p = .034, while adults’ TBW was marginally significant with reading fluency, r = –.33, p = .058. In addition, there was a significant positive correlation between children’s ΔPSS and reading fluency, r = .76, p < .001, and between adults’ ΔPSS and reading fluency, r = .57, p < .001 (see Figure 10). However, the correlation between ΔTBW and reading fluency was not significant in both children (r = .09, p = .610) and adults (r = –.19, p = .286).

Figure 10. Correlation of ΔPSS and reading fluency in Experiment 2.
To explore whether the temporal course of audiovisual temporal integration could explain variations in reading ability, separate hierarchical regression analyses were conducted by group (the dependent variable is reading fluency, and independent variables included the TBW and ΔPSS). The order of independent variables entry into the regression models remained consistent with that in Experiment 1 (i.e., Step 1: age and gender; Step 2: TBW; Step 3: ΔPSS). The results of the hierarchical regression analyses indicated that after controlling age and gender, ΔPSS explained 58.0% and 29.3% of the variance in reading fluency among children and adults respectively. When further controlling for TBW, ΔPSS remained significant for reading fluency, explaining 57.1% and 32.2% of the variance in reading fluency among children and adults respectively (see more details in Table 4; see other regression results in Table S5).
Table 4. Hierarchical regression of TBW and ΔPSS on reading fluency in both children and adults in Experiment 2

Note: † p <.1, * p <.05,** p <.01, *** p <.001.
Discussion
The present study compared the rapid audiovisual temporal recalibration using SJ tasks with speech and non-speech stimuli in a sample comprising children and adults. The results showed no significant age-related differences in both ΔTBW and ΔPSS for speech stimuli. By contrast, significant differences were observed between children and adults in ΔPSS for non-speech stimuli. During the trial-by-trial integration process, there were differences between children and adults for both speech and non-speech stimuli. Besides, the rapid audiovisual temporal recalibration (with ΔPSS as the indicator) for both speech and non-speech stimuli was significantly correlated with reading fluency. Moreover, ΔPSS accounted for the unique variance in reading fluency for both children and adults.
Cross-age changes in rapid audiovisual temporal recalibration abilities
Comparing the results of the two experiments, the present study found that rapid temporal recalibration matures earlier for speech stimuli than for non-speech stimuli. The results align with those of Zhou et al. (Reference Zhou, Shi, Yang, Cheung and Chan2020) in Chinese culture and further extend this line of work to developing children. Specifically, Zhou et al. (Reference Zhou, Shi, Yang, Cheung and Chan2020) found that both adolescents and adults exhibited rapid temporal recalibration to speech stimuli, while only adults showed rapid temporal recalibration to simple non-speech stimuli. However, the development of rapid temporal recalibration abilities in the present study contradicted the expectation that statistical learning of temporal regularities for simple stimuli should develop before more complex stimuli. One possible explanation is that this developmental pattern and earlier maturation for speech-related temporal recalibration may reflect the ethological significance and primacy of language and communicative signals over other simple stimuli (e.g., Murray et al., Reference Murray, Lewkowicz, Amedi and Wallace2016; Zhou et al., Reference Zhou, Shi, Yang, Cheung and Chan2020), thereby reaching maturity earlier. Previous studies have found that the TBW of speech stimuli narrowed significantly during the preschool years (Lewkowicz & Flom, Reference Lewkowicz and Flom2014) and reached maturity by age 7 (Hillock-Dunn et al., Reference Hillock-Dunn, Grantham and Wallace2016), providing further evidence for the early development and swift maturation of temporal integration processing in speech stimuli. Furthermore, the results of the present study are inconsistent with those of Noel et al. (Reference Noel, De Niear, van der Burg and Wallace2016) in Western cultures, as they found that rapid audiovisual temporal recalibration does not reach maturity until adulthood; specifically, the rapid temporal recalibration matured at age 18 for flash beeps and age 29 for syllables. The inconsistent results may stem from cultural differences, as Zhou et al. (Reference Zhou, Shi, Yang, Cheung and Chan2020) highlighted. The study of Noel et al. (Reference Noel, De Niear, van der Burg and Wallace2016) was conducted in Western cultures, while the present study and the study of Zhou et al. (Reference Zhou, Shi, Yang, Cheung and Chan2020) investigated age-related differences in the rapid audiovisual temporal recalibration in the Chinese context. The earlier maturation observed in non-western populations may reflect the necessity of considering the potential effect of different cultures on this basic multisensory ability (e.g., Zhou et al., Reference Zhou, Shi, Yang, Cheung and Chan2020). Interestingly, the present study also found that adults exhibited a preference for referring to auditory information when perceiving both speech and non-speech audiovisual asynchronous stimuli, as compared to children. The previous cross-cultural study suggested that East Asians, compared with Western people, may exhibit a greater reliance on auditory cues during multisensory speech integration (Tanaka et al., Reference Tanaka, Koizumi, Imai, Hiramatsu, Hiramoto and de Gelder2010). Tanaka et al. (Reference Tanaka, Koizumi, Imai, Hiramatsu, Hiramoto and de Gelder2010) propose that the preference of East Asians for auditory information in speech stimuli may be partly attributed to their limited range of facial muscle movements, which restricts the availability of facial information. The different processing preferences for visual and auditory inputs in multisensory integration may underlie the differential processing of rapid audiovisual temporal recalibration across Eastern and Western cultures. Besides, it is worth noting that non-speech visual stimuli lack the offset cue (i.e., “hammer lift”), which may impede an individual’s judgment of the determination of the endpoint where the hammer should make a sound in the video, thereby interfering with their temporal integration and recalibration of the non-speech audiovisual stimulus. It cannot be ruled out that the later maturation of non-speech stimuli than speech stimuli is potentially attributed to deficiencies in task materials as mentioned above, necessitating future research to validate this finding through task refinement.
The relationship between rapid audiovisual temporal recalibration and reading
The present study found a significant correlation between rapid audiovisual temporal recalibration and reading fluency, while rapid audiovisual temporal recalibration could explain the unique variance of reading fluency. It is worth noting that irrespective of the type of stimuli (speech vs. non-speech) processed and the age group of participants (children vs. adults), rapid audiovisual temporal recalibration could significantly predict reading ability. One possible explanation for this effect is that rapid temporal recalibration can generate prior knowledge through continuously receiving new temporal relationships (e.g., different SOA between audiovisual stimuli), thereby affecting the perception of temporal sequence (perception of simultaneity, succession, and sequence of objective events; Sato & Aihara, Reference Sato and Aihara2011). Moreover, the increase in the TBW in the initial stage of temporal recalibration also reflects that the brain can mitigate discomfort caused by successive sensory stimuli by reducing the resolution of the perception of temporal sequence, thus facilitating the integration of perception and memory (Yuan & Huang, Reference Yuan and Huang2011). The temporal sequential processing of arranging the order of characters and their sounds is a crucial aspect of reading (Vidyasagar & Pammer, Reference Vidyasagar and Pammer2010), as it enables individuals to perceive character positions and overall glyphs, thereby promoting their acquisition of grapheme-phoneme correspondences (Hood & Conlon, Reference Hood and Conlon2004). Therefore, the stronger an individual’s ability to rapid temporal recalibration, the stronger their temporal sequence perception will be, thereby promoting the development of a series of reading-related skills such as graphemes-phonemes bundling and ultimately enabling fluent reading.
Another possible explanation for this effect is that, within the Bayesian framework, the brain generates an estimation of the reliability of the information from each sensory channel by perceiving the interrelationship between information derived from different sensory channels. Subsequently, it utilizes this prior knowledge to compare with stimuli inputted through the current sensory channels and adjusts the weighting of each channel’s information during multisensory integration, to optimize decision-making (Ernst & Bülthoff, Reference Ernst and Bülthoff2004). Temporal recalibration can be regarded as a dynamic process that adapts to varying temporal delays. Within a dynamic context, prior knowledge regarding the temporal relationship is re-established, subsequently affecting the multisensory integration under different temporal relationships (Noel et al., Reference Noel, De Niear, van der Burg and Wallace2016; Sato & Aihara, Reference Sato and Aihara2011). Specifically, the existing prior knowledge can produce top-down predictions (i.e., the internal sensory representation). The recalibration process aims to minimize the discrepancy between these top-down predictions and actual sensory inputs, thereby optimizing the multisensory temporal integration (Noel et al., Reference Noel, De Niear, Stevenson, Alais and Wallace2017). An appropriate audiovisual TBW is important for the proper binding of speech sounds to print when learning to read (Gori et al., Reference Gori, Ober, Tinelli and Coubard2020). Given the significant correlation between rapid audiovisual temporal recalibration and audiovisual temporal integration (Harvey et al., Reference Harvey, van der Burg and Alais2014; Noel et al., Reference Noel, De Niear, van der Burg and Wallace2016; van der Burg et al., Reference van der Burg, Alais and Cass2013), a strong ability of rapid audiovisual temporal recalibration can improve reading ability by flexibly modulating temporal window. Previous studies have found that individuals with dyslexia exhibited phonetic recalibration deficits at the verbal level (Keetels et al., Reference Keetels, Bonte and Vroomen2018). Moreover, their rapid neural adaptation ability was comparatively weaker than that of typical readers (Perrachione et al., Reference Perrachione, Del Tufo, Winter, Murtagh, Cyr, Chang, Halverson, Ghosh, Christodoulou and Gabrieli2016), and they were unable to effectively integrate audiovisual temporal information according to the optimal Bayesian principle (Gori et al., Reference Gori, Ober, Tinelli and Coubard2020). These aforementioned findings collectively provide evidence that rapid audiovisual temporal recalibration can effectively affect reading ability.
Furthermore, our results revealed that the rapid audiovisual temporal recalibration (i.e., ΔPSS) could better explain the variance of reading ability than the average indicator of audiovisual temporal integration (i.e., TBW). This indicates that rapid audiovisual temporal recalibration is a more effective indicator for investigating reading ability. Some previous studies have reported no significant differences in TBW or related indicators between dyslexic readers with impaired reading ability and typically developing readers (e.g., Francisco et al., Reference Francisco, Jesse and McQueen2014; Francisco, Groen, et al., Reference Francisco, Groen, Jesse and McQueen2017; Francisco, Jesse, et al., Reference Francisco, Jesse, Groen and McQueen2017; Laasonen et al., Reference Laasonen, Service and Virsu2002). However, Noel et al. (Reference Noel, De Niear, Stevenson, Alais and Wallace2017) found that individuals with autism spectrum disorders (ASD), who exhibited significantly lower scores on vocabulary tests compared to typically developing individuals, had no significant difference in TBW when integrating non-speech audiovisual stimuli from typically developing individuals. Nevertheless, their ΔPSS (representing the ability of rapid audiovisual temporal recalibration) was significantly smaller than that of typically developing individuals. Consequently, our results also provide a plausible explanation for some previous studies on dyslexia that failed to find that the average level of audiovisual temporal integration is related to reading ability (e.g., Francisco et al., Reference Francisco, Jesse and McQueen2014; Francisco, Groen, et al., Reference Francisco, Groen, Jesse and McQueen2017; Francisco, Jesse, et al., Reference Francisco, Jesse, Groen and McQueen2017; Laasonen et al., Reference Laasonen, Service and Virsu2002).
Limitations and prospects
The present study has several limitations: (1) The present study exclusively only focused on children approximately 10 years old. Future research should expand this investigation to the entire span of childhood, to delineate the developmental trajectory of rapid audiovisual temporal recalibration ability throughout childhood and establish comparisons with adults. (2) Only two types of stimuli were included in our experiments. It is worthwhile to investigate more simple non-speech stimuli (e.g., flash-beep stimuli) or more complex speech-related stimuli (e.g., words, sentences), which may exhibit distinct developmental trajectories. (3) Considering that the audiovisual SJ task requires monotonous and continuous button pressing for a duration of 10 ∼ 12 minutes, the present study incorporated stimulus type (speech vs. non-speech) as a between-subject factor to prevent children from experiencing boredom and responding insincerely. Future studies may benefit from including stimulus type as a within-subject factor to facilitate a more comprehensive comparison between the speech and non-speech stimuli (e.g., the interaction of age and stimulation type). (4) Notice that our main goal was to primarily reveal the relationship between rapid audiovisual temporal recalibration and reading ability. It was beyond our scope to clarify the causal inferences, which is a critical question for future research. The investigation into whether individuals with dyslexia exhibit deficits in rapid audiovisual temporal recalibration and whether training this ability can improve the reading ability of dyslexic readers will contribute to addressing this research question.
Conclusions
The present study found that children, approximately 10 years old, exhibit a strong rapid audiovisual temporal recalibration ability for speech and non-speech stimuli. In the case of speech stimuli, no significant differences were observed between children and adults regarding ΔTBW and ΔPSS. Nevertheless, the trial-by-trial recalibration process of rapid audiovisual temporal recalibration in children exhibits potential for further development. For non-speech stimuli, significant differences between children and adults were observed in both the ΔPSS indicator and the trial-by-trial recalibration process. Moreover, the rapid audiovisual temporal recalibration ability for both speech and non-speech stimuli explained reading fluency uniquely in both children and adults.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0142716425000098
Replication package
Study materials, data, and analysis codes are available at https://doi.org/10.6084/m9.figshare.25534639.v2.
Competing interests
The authors declare that no competing interests exist.
Ethics approval statement
This study was approved by the Ethics Committee of the Institute of Psychology, Chinese Academy of Science.