For children enrolled in a dual language immersion (DLI) program in the United States, instruction is delivered in English and another language, often referred to as the partner language. Research on DLI students has tended to focus on the development of their language(s) (i.e., English and/or the partner language) in different domains (e.g., listening or reading), without necessarily differentiating between different features of language within each domain (i.e., word, sentence, and discourse). In this study, we examine the progression of oral language skills in a picture description task in both languages for early-elementary students enrolled in a French-English DLI program, where daily instructional time is split equally between French and English.
Background
Language development in bilingual children
Language development can be understood as a complex adaptive system. This understanding follows a set of assumptions about language learning and use that considers the dynamic, adaptive, and nonlinear nature of how language is used and develops (Beckner et al., Reference Beckner, Blythe, Bybee, Christiansen, Croft, Ellis, Holland, Ke, Larsen-Freeman and Schoenemann2009). By taking this approach, language learning is seen within a framework that considers multiple features of language use as well as the social context of the learner (Bailey & Heritage, Reference Bailey and Heritage2014). As young children develop language from initial recognition of sounds to oral production and beyond, a number of factors may come into play that influence, sometimes in dramatic ways, the course of language development. Listening and oral production are key in that they serve as precursors to later development of reading and writing (e.g., Dickinson, et al., Reference Dickinson, McCabe, Anastasopoulos, Peisner-Feinberg and Poe2003; Speece et al., Reference Speece, Roth, Cooper and De La Paz1999). A number of pattern-identifying skills help young children acquire an awareness of sounds, their meanings, and rules governing their association. This, in turn, leads to children’s acquisition of vocabulary and the ability to form a variety of different sentence structures and increasingly complex and extended discourse such as conversation, narration, explanation, and argumentation (e.g., Bailey et al., Reference Bailey, Osipova, Kelly, Corno and Anderman2015). These oracy skills are later applied to the decoding necessary for reading and writing, for which oral production is a necessary precursor (Dickinson et al., Reference Dickinson, McCabe, Anastasopoulos, Peisner-Feinberg and Poe2003).
For bilingual children, language development largely mirrors that of their monolingual peers, with some notable differences. Bilingual children acquire two (sometimes very different) systems, which interact and support the development of the otherFootnote 1 . For example, phonological systems acquired by bilinguals may transfer positively or negatively between languages, with the phonological development in one language serving in the reading and development in the other, including for languages that do not share the same orthography (Hambly et al., Reference Hambly, Wren, McLeod and Roulstone2013). Bilingual children’s vocabulary develops as two systems; however, they may not necessarily acquire two words for one underlying concept, meaning that the size of their vocabulary may be smaller in one language but larger when considering both (Oller et al., Reference Oller, Pearson and Cobo-Lewis2007). Sentence production for bilinguals seems to occur at the same rate as monolinguals, though this depends on exposure to each language, and one language may serve as a transfer to the other, with a silent period of production in the other language influenced by several external and internal factors (Gathercole & Thomas, Reference Gathercole and Thomas2009). For bilingual children, the development of narration, conversation, and argumentation is predicated on their development of word, sentence, and discourse features, which may differ between languages and are culturally embedded (e.g., Fiestas & Peña, Reference Fiestas and Peña2004). Finally, bilingual children demonstrate pragmatic abilities not seen in the monolingual child, including code switching, which is the highly contextually dependent and rule-governed switch from one language to the other to convey meaning (Martínez & Mejía, Reference Martínez and Mejía2019).
Language progressions of DLI students
Most existing studies on the effects of bilingual education in the U.S. focus on emergent bilingual populations, whose language skills in English and in the home language benefit from enrollment in such programs (e.g., Rolstad et al., Reference Rolstad, Mahoney and Glass2005). For DLI students specifically, the first few years are characterized by an increased proficiency in listening and speaking for the partner language (Fortune & Tedick, Reference Fortune and Tedick2015). Program enrollment also has long-term benefits on English language development (e.g., Steele et al., Reference Steele, Slater, Zamarro, Miller, Li, Burkhauser and Bacon2017). DLI students display increased levels of proficiency in the partner language from elementary to secondary grades (Watzinger-Tharp et al., Reference Watzinger-Tharp, Tharp and Rubio2021), and for elementary-aged students, progress is especially apparent in listening and reading (Watzinger-Tharp et al., Reference Watzinger-Tharp, Swenson and Mayne2018). Heritage language status has also been found to influence partner language trajectories, especially in partner language instruction (e.g., Lindholm-Leary & Hernández, Reference Lindholm-Leary and Hernández2011; Xu et al., Reference Xu, Padilla and Silva2015). For instance, at the preschool level, Spanish-dominant children had higher scores than bilinguals who had relatively comparable proficiencies in their languages at the beginning of the school year, and both groups displayed higher rates of growth in Spanish when receiving Spanish-only instruction as opposed to bilingual or English-only instruction (Durán et al., Reference Durán, Wackerle-Hollman, Miranda, Chávez, Pentimonti, Zyskind and Rodriguez2022). In a separate study with fourth-grade students in an English-Spanish DLI program, coming from a household where Spanish is spoken was associated with higher listening and writing performance, but similar reading and speaking performance compared to peers from non-Spanish-speaking households (Burkhauser et al., Reference Burkhauser, Steele, Li, Slater, Bacon and Miller2016). A recent study similarly found that home exposure to Gaelic in a Gaelic-English revitalization/immersion program was associated with higher vocabulary output and size, though this is also depended on time of exposure to the partner language (Chondrogianni et al., Reference Chondrogianni, Judge-Clayden and Butcher2022).
While the aforementioned studies focused on the progression of all language domains (i.e., speaking, writing, listening, and reading), this study is specifically concerned with speaking and the separate trajectories for the development of word, sentence, and discourse skills. Research shows that for children in immersion settings, their oral language development in the partner language is often characterized by “lower-than-expected production skills in terms of grammatical accuracy, lexical variety, and sociolinguistic appropriateness” (Lyster & Tedick, Reference Lyster and Tedick2022, p. 329). Fortune and Tedick (Reference Fortune and Tedick2015), analyzing Spanish oral fluency, grammar, vocabulary, and listening comprehension in English-dominant students enrolled in two-way immersion programs, found significant differences across elementary grade levels but not between fifth and eighth graders. In another study, Fortune and Ju (Reference Fortune and Ju2017) found no significant differences between second and fifth grade students on standardized assessments of Mandarin. However, differences emerged when conducting a follow-up linguistic complexity analysis of representative speech samples: progress was made in grammar but not lexicon. Such findings highlight the need for finer-grained measures of oral language development. Indeed, it remains to be seen whether different levels of oral language (i.e., word, sentence, and discourse) progress at different rates.
Prior use of language progressions in speaking and writing tasks has revealed various rates of development in discrete domains of language and on tasks of differing themes or contexts (e.g., Bailey, Reference Bailey2017; Bailey & Heritage, Reference Bailey and Heritage2019). For multilingual students learning English in particular, language progressions can be influenced by factors such as mode (i.e., oral vs. written), type of task (i.e., academic vs. non-academic), and language feature (i.e., vocabulary, sentence structure and discourse coherence/cohesion). For instance, one study reported greater strengths in written explanations compared to oral explanations for topic vocabulary (Blackstock-Bernstein et al., Reference Blackstock-Bernstein, Woodbridge and Bailey2022), while another study reported strengths in English oral vocabulary but slower development of discourse-level oral skills in elementary students’ explanations (Bailey et al., Reference Bailey, Blackstock-Bernstein and Heritage2015). However, to our knowledge, research on language progressions and the detailed advancements in students’ language learning has yet to be conducted across different languages in the context of DLI programming.
Oral language skills in picture description tasks
This study focuses on early oral language skills used in picture description tasks, which have been linked to literacy development (Dickinson & Snow, Reference Dickinson and Snow1987; Griffin et al., Reference Griffin, Hemphill, Camp and Wolf2004; Snow et al., Reference Snow, Tabors, Nicholson and Kurland1995). Picture description tasks are especially helpful to track language development “because they elicit samples of language that share the same content (…), thus facilitating direct comparisons of the linguistic characteristics of the samples”, and because “the amount of textual input can be kept minimal and so learners need to rely on their own linguistic resources” (Boers, Reference Boers2018, p. 375). Thus, having children all describe the same pictures allows us to control the content of their language while maximizing their opportunities to use their own language skills.
Picture description tasks have been used with various populations, including neurotypical adults (e.g., Boucher et al., Reference Boucher, Slegers and Brambati2019), individuals with language and communication disorders (e.g., Eisenberg & Guo, Reference Eisenberg and Guo2013; Vandenborre et al., Reference Vandenborre, Visch-Brink, van Dun, Verhoeven and Mariën2018), adult L2 learners (e.g., Saito & Hanzawa, Reference Saito and Hanzawa2018), and bilingual preschoolers (e.g., Smith et al., Reference Smith, Pacheco and Khorosheva2021). For neurotypically developing children, De Temple and Beals (Reference De Temple and Beals1991) identify the key characteristics of a successful oral decontextualized picture description task. These characteristics include quantity (i.e., amount of talk), specificity (i.e., “the degree of explicitness and correctness in the child’s description,” p. 475), density (i.e., “the linguistic complexity and elaboration” of the description, p. 475), main theme (i.e., whether the key elements of the picture appear in the description), and narrativity (i.e., the extent to which the child interpreted the task as a straight description or as a story-like narrative). In other words, a good picture description will be thorough, will contain sophisticated topic vocabulary, and will be organized in a coherent and cohesive way. Compared to other oral language tasks, Nurss and Hough (Reference Nurss and Hough1985) note that “single pictures are very useful for producing labels and descriptive words, that is, for increasing vocabulary. They are not particularly useful, however, for helping children gain a concept of a story or more complex narrative structure” (p. 284). In other words, as a task, single picture descriptions seem to highlight children’s lexical skills, but not necessarily their discourse-level skills. In addition, children may adjust their language based on whether they are describing the picture in a contextualized or decontextualized condition (i.e., for a present or an absent audience). For example, in a recent study, Cho and Kim (Reference Cho and Kim2023) found that linguistic and discourse features varied based on context, as “second graders used more ENPs (elaborated noun phrases) and exhibited precise character introduction in the decontextualized setting, whereas higher lexical diversity and discourse beyond simple description occurred more often in the contextualized setting” (p. 10).
Within the context of bilingual programming, there is a dearth of research on the role played by home language status on oral language skills used in picture description tasks, with the exception of one study. The participants in Wu et al.’s (Reference Wu, De Temple, Herman and Snow1994) study were fifty-two second- through fifth-grade children attending the United Nations International School, where the curriculum is in English, with French taught as a foreign language. Participants provided written and oral picture descriptions in French and English, both in contextualized and decontextualized conditions. Results suggested that household language background influenced the quality of the picture description in French, but not in English. Indeed, on the French task, children from monolingual English households performed less well in measures of quantity (counts of total words), specificity (explicitness and correctness), and main theme (key elements of the picture) than children from bilingual households where French was not spoken. However, there were no significant differences between children from monolingual French households and children from bilingual households where French was spoken. It remains to be seen whether such findings would hold at a younger age, and also in another bilingual programming context, such as DLI.
To summarize, while enrollment in a DLI program is associated with growth in speaking in both languages, there is limited research on the progression of more fine-grained domains of language at the word, sentence, and discourse levels and the developmental trajectories of the two languages. Indeed, as the literature shows, children use different oral language skills in picture description tasks, and for those enrolled in a bilingual program, their performance on such tasks may be influenced by their home language(s).
Theoretical framework
This research is guided by complex dynamic systems theory (CDST), “a post-structural metatheory with its own ontology and epistemology” (Larsen-Freeman & Todeva, Reference Larsen-Freeman and Todeva2021, p. 209). CDST highlights the complex synergy between experiences such as language exposure in the home, social interactions such as use of language in DLI classrooms, and cognitive processes at play in individual language systems (Beckner et al., Reference Beckner, Blythe, Bybee, Christiansen, Croft, Ellis, Holland, Ke, Larsen-Freeman and Schoenemann2009). Language development is viewed as a self-organizing process in which a complex multifaceted system is in constant interaction with the environment. CDST is “fundamentally concerned with describing and tracing emerging patterns in dynamic systems in order to explain change and growth” (Larsen-Freeman, Reference Larsen-Freeman2020, p. 248), thus viewing language development as a dynamic process that evolves over time, with variation expected both within and across individuals (De Bot et al., Reference De Bot, Lowie and Verspoor2007; Larsen-Freeman, Reference Larsen-Freeman2011).
The original design of the language learning progressions used in the current study was made on this developmental premise (Bailey & Heritage, Reference Bailey and Heritage2014). This perspective emphasizes the role played by initial conditions and posits that subsequent language development is expected to happen in a nonlinear fashion (De Bot et al., Reference De Bot, Lowie and Verspoor2007), which influences our approach to analysis of the longitudinal data collected for this study. CDST is also particularly relevant for research on multilingual individuals (Larsen-Freeman & Todeva, Reference Larsen-Freeman and Todeva2021) given its emphasis on the role played by the environment on the longitudinal development of languages, as well as on the complexity and fluidity afforded to multilinguals in deploying their languages and literacy practices depending on the context and interlocutor.
This study examines the French and English oral language range of performances and trajectories over time of early-elementary school children enrolled in a French-English DLI program in a large city of the southwestern US. This study explores the following questions:
-
RQ1a. What are the French growth patterns as measured by learning progressions for early-elementary DLI students across 11 months of programming?
-
RQ1b. What are the English growth patterns as measured by learning progressions for early-elementary DLI students across 11 months of programming?
-
RQ1c. What is the relationship between French and English growth patterns as measured by learning progressions for early-elementary DLI students across 11 months of programming?
-
RQ2a. How are French progressions associated with grade level and household language background?
-
RQ2b. How are English progressions associated with grade level and household language background?
Methods and data sources
Data for this study came from a larger project on the French–English language development of early-elementary students in a DLI program (see Ryan, Reference Ryan2021a, Reference Ryan2021b).
Participants
Participants included 42 children (21 male; 21 female) who attended a French–English two-way DLI program in a public school in the Southwestern United States, with time equally divided between French and English, every other day across all content areas. When the study began, the participants, whose ages ranged between 5 years; 1 month and 7 years; 6 months, were in transitional KindergartenFootnote 2 (TK; n = 4), Kindergarten (K; n = 20), and first grade (n = 18). At the time of the study, the TK and Kindergarten students were placed in the same classroom and were thus merged together for analysis. According to a survey filled out by the parents, among the 24 TK/K participants, 10 (42%) came from multilingual households where French was spoken, seven (29%) came from multilingual households where French was not spoken, and seven (29%) came from monolingual English households. Among the 18 first-grade participants, six (33%) came from multilingual households where French was spoken, five (28%) came from multilingual households where French was not spoken, and seven (39%) came from monolingual English households.
Participants were assessed five times over an eleven-month period, including three times in the winter and spring, and two times in the fall, after the summer hiatus. Of the 42 participants, 39 took part in all five waves of data collection, one took part in three waves of data collection, and two took part in only one wave of data collection.
Procedures
Overview
Children were asked to provide picture descriptions to an imaginary friend who only spoke French (for the French session) or English (for the English session). Children were instructed to describe the picture in the target language (i.e., French or English) so that the imaginary friend could later draw it without looking at it. Participants were then presented with two different pictures: one depicting a school scene (the “school picture”), where a child is performing show-and-tell in a classroom, and the other depicting a domestic scene (the “home picture”), where a child is doing dishes in a kitchen. During the first wave of data collection, half of the students were presented with the school picture first, while the other half were presented with the home picture. The order of the pictures (i.e., school or home scene) was alternated at each of the five waves, as was the order of the target language (i.e., French or English). Students’ responses were later transcribed by a research assistant and then verified by a researcher.
Coding
After language samples were collected, researchers rated students’ descriptions using a modified version of the Linguistic Features Analysis Protocol (LFAP) of the Dynamic Language Learning Progressions (DLLP) approach (Bailey & Heritage, Reference Bailey and Heritage2014; Bailey, Reference Bailey2017) for the progression of several language features including the sophistication of topic vocabulary (vocabulary choices keyed to adult responses to the picture tasks), the sophistication of sentence structure (increasingly complex and inclusion of more varied sentence structures), stamina (completeness of the picture descriptions), and coherence/cohesion of the description. With regard to the latter, as a form of discourse, descriptions may exceed the boundaries of a single sentence, requiring both coherence and cohesion among sentences and ideas. Together, coherence and cohesion assist the listener with understanding a student’s description. Coherence features included temporal discourse connectors (e.g., “first”, or “next” in English; “d’abord,” or “en suite” in French) and location markers (e.g., “on”, or “next to” in English; “sur”, or “à côté de” in French) to logically organize information in descriptions. Cohesion features included referential ties across and within sentences, such as use of pronouns to refer to previously introduced nouns and ellipses to create a parsimonious description. The presence of both coherence and cohesion markers was needed for a description to be placed beyond the Early Emerging phase, as explained below. After reviewing the French samples and given the low French proficiency of most participants, the phases of the original DLLP were expanded to include additional phases at the early stages of the progression. As can be seen in Appendix A (see OSF page), each linguistic feature except for coherence/cohesion could thus be placed at one of seven phases of development: Not yet evident, Pre-emergent, Early emerging, Emerging, Early developing, Developing, or Controlled. Coherence/cohesion could be placed at one of five phases: Not yet evident, Early emerging, Emerging, Developing, or Controlled.
Three raters proficient in both French and English compared the language features of each picture description transcript to the corresponding features on the language progression and selected the phase that best matched. Rater 1 served as the anchor rater and coded all the transcripts. Raters 2 and 3 each independently coded half of the transcripts. Consensus was reached whenever there was a disagreement between Rater 1 and the other rater. The number of agreements between the two ratings was divided by the total number of agreements and disagreements before consensus. The proportion of inter-rater agreements (IRA) between Raters 1 and 2 ranged from 80% to 90% in French and 73% to 95% in English. The proportion of IRA between Raters 1 and 3 ranged from 88% to 96% in French and 76% to 91% in English. The majority of IRA calculated comfortably surpass the threshold (75%) set as acceptable by Chaturvedi and Shweta (Reference Chaturvedi and Shweta2015), particularly in the case of the large number of adjacent levels.
Analysis
Bayesian framework overview
Many perspectives exist that underlie the scientific rationale for using Bayesian methods (Levy & McNeish, Reference Levy and McNeish2021). For example, a model may be too complex for frequentist estimation methods, such as maximum likelihood. In such instances, researchers view prior specification as not something integral to their scientific question but rather just a necessary step to carry out Bayesian estimation. Other researchers instead see prior specifications as central to the scientific question at hand. The perspective we adopt here can be viewed as somewhere in the middle, as our incorporation of prior information is neither an afterthought nor is it central to answering our research questions. Instead, our use of prior information, which we discuss in Appendix B (see OSF page), is intended to (a) help our models converge and (b) protect our inferences from implausible findings. Priors used for this purpose are commonly referred to as weakly informative priors (e.g., Gelman et al., Reference Gelman, Jakulin, Pittau and Su2008). What constitutes a weakly informative prior varies based on the statistical model and the available substantive information. To illustrate, a critical aspect of our prior information incorporation involved applying half-Cauchy priors to random effect standard deviations. This prior choice stems from our understanding that negative standard deviations are impossible. The use of such priors holds particular significance in our current study due to the frequent occurrence of negative standard deviations when employing frequentist estimation methods with limited sample sizes.
Before answering research questions, model comparisons were conducted. For research question 1, model comparisons helped us choose between the random linear growth and random quadratic growth models to then (a) inspect the point estimates and 95% credible intervals (CIs) and (b) visualize the growth trajectories. For research question 2, model comparisons helped us choose between models 1 and 2 for the inspection of point estimates and 95% CIs corresponding to the effects of grade and home language. In this paper, models were compared using the expected log pointwise predictive density (ELPD; Vehtari et al., Reference Vehtari, Gelman and Gabry2017) a measure of overall fit and out-of-sample predictive accuracy (Bendixen & Purzycki, Reference Bendixen and Purzycki2022). Using the loo package in R, the ELPD and its standard error were calculated using approximate leave-one-out cross-validation (Vehtari et al., Reference Vehtari, Gabry, Yao and Gelman2018). We do not focus on the ELPD itself but rather on the ELPD difference and its standard error in order to facilitate model comparisonsFootnote 3 .
Besides model comparisons, each research question was assessed with reference to summaries of the posterior (i.e., posterior means and 95% CIs). Like frequentist point estimates, the Bayesian posterior mean is a single-number estimate of the strength of association between two variables. For example, ${\gamma _{01}}$ is the effect of being in first grade as opposed to TK/K. Positive values of ${\gamma _{01}}$ denote that being in first grade is associated with an increased probability of high ratings for each language feature, whereas negative values of ${\gamma _{01}}$ denote that being in first grade is associated with a lower probability of high ratings on each language feature. Similarly, ${\gamma _{02}}$ is the effect of home language background. For the English progressions, this means the effect of being an English-only household. However, for the French progressions, this means the effect of having any French spoken in the household. Positive values of ${\gamma _{02}}$ denote that being from an English-only household is associated with an increased probability of high ratings for each language feature for pictures described in English. Positive values of ${\gamma _{02}}$ denote that being from a French-speaking household is associated with an increased probability of high ratings for each language feature for pictures described in French. Like frequentist point estimates, Bayesian posterior means should be assessed jointly with measures of uncertainty (i.e., 95% CIs) to infer whether the association is “significant” or “meaningful.” Random effect standard deviations and correlations are considered meaningful when their 95% CI does not overlap with zero. Because all other parameters are reported on the odds scale, they are considered meaningful when their 95% CI does not overlap with one. Using CIs, we can state that there is a 95% probability that a parameter’s true value lies between the intervals’ upper and lower bounds. In addition to being useful for hypothesis testing, CIs help assess the variability of an effect. The wider the CI, the less certain we are about the magnitude of the effect.
Models
All research questions were assessed using a combination of both models presented below, each representing some form of a hierarchical ordinal logistic regression model (Hedeker & Gibbons, Reference Hedeker and Gibbons1994). Our use of ordinal regression is justified because DLLP progressions may be thought of as coarse representations of continuous underlying variables (Bauer & Sterba, Reference Bauer and Sterba2011). In addition, treating ordinal outcomes as continuous may be problematic, and as the descriptives in Figure 1 show, our outcomes are not normally distributed (Ali et al., Reference Ali, Ali, Khan and Hussain2016; Bauer & Sterba, Reference Bauer and Sterba2011; Hung & Huang, Reference Hung and Huang2011). Finally, treating ordinal outcomes as continuous in scenarios when data are nested may exacerbate issues related to inflated or spurious estimates of random slope and random quadratic variances (see Bauer & Cai, Reference Bauer and Cai2009; and Bauer & Sterba, Reference Bauer and Sterba2011, for more information), both of which are relevant to RQ1 concerning DLLP growth patterns.
Instead of assuming the outcome is continuous as in traditional linear regression models, the outcome in ordinal regressionFootnote 4 models is treated as an ordered set of discrete responses (e.g., a Likert scale response). Changes in predictor variables result in shifts toward either end of the ordinal scale, which, in the context of our study, means that positive coefficients increase the probability of higher DLLP ratings on the learning progression (i.e., Early developing, Developing, and Controlled) as opposed to lower DLLP ratings (i.e., Not yet evident, Pre-emergent, Early emerging, and Emerging).
We use ordinal logistic regression (McCullagh, Reference McCullagh1980), which is also referred to as the cumulative logit model or proportional odds model and is one of several models available for ordinal data. A benefit of ordinal logistic regression is that its coefficients may be transformed into cumulative odds ratios, which makes point estimates and CIs easier to interpret. An important assumption of this model is that all coefficients estimated are invariant to each ordinal outcome. Put differently, it is assumed that the effect of being in Grade 1 as opposed to TK/K is the same across all language progressions, or that the effect of home language background is the same across all language progressions.
Because our data are nested (i.e., assessment occasion nested within students), traditional ordinal regression would violate the independence assumption (e.g., Bauer & Sterba, Reference Bauer and Sterba2011), hence our decision to use a hierarchical ordinal regression model, which can easily accommodate the complex growth models considered in our analyses.
Equations depicting the two models fit to each language feature can be seen in Table 1. Both models allow for student-level intercepts ${\beta _{0j}}\;$ to vary, which reflects our prior belief that baseline propensity to receive a high rating varies considerably across this sample of students (see Ryan, Reference Ryan2021a, Reference Ryan2021b). Further, as can be seen in each model’s level 2 equation, each student’s baseline propensity to receive a high rating is a function of Grade (0 = TK/K; 1 = Grade 1) and Home Language effect (French-speaking or English-only-speaking household), denoted as $\;{\gamma _{01}}$ and $\;{\gamma _{02}}$ , respectively. Both $\;{\gamma _{01}}$ and $\;{\gamma _{02}}$ were used to answer RQ2Footnote 5 . Additionally, the coefficients $\;{\beta _{2j}}\;$ (Model 1) and $\;{\beta _{3j}}$ (Model 2) in the level 1 equations depict the change in expected propensity to receive a high rating when assessed with the school versus the home picture, which we could control for when interpreting coefficients relevant to our research questions. Finally, both models’ level 2 equations for $\;{\beta _{0j}}$ specify that student-level intercepts are allowed to vary. The average student intercept ${\gamma _{00}}$ is a fixed effect, and random student deviations ${u_{0j}}$ are random effects. Instead of individual random effects, a variance component $\;{\tau _{00}}$ is estimated. High values of ${\tau _{00}}\;$ indicate that students differ greatly in baseline proficiency, whereas values of ${\tau _{00}}$ approaching zero indicate that student baseline proficiency is relatively homogeneous.
Note. ${\eta _{it}}$ denotes the jth student’s linear predictor at time t, where ${j = 1 \ldots 42\ {\rm{and}}\ t = 1 \ldots 5}$ . With the help of ${C = {L_O} - 1}$ threshold parameters, v, or ${v^1}, \ldots {v^c}, \ldots {v^C}$ , where ${L_0}$ is the number of levels observed in our data set (see Appendix B), the linear predictor can be used to predict the cumulative probability that the jth student at time t receives a rating of c or lower $\pi _{jt}^c = {{exp\left( {{v^c} - {\eta _{jt}}} \right)} \over {1 + exp\left( {{v^c} - {\eta _{jt}}} \right)}}$ . Because threshold parameters $v$ and the average student intercept ${\gamma _{00}}$ are not jointly identified, we set $\gamma_{00}$ to zero in both models.
Differences between the two models can be seen in both the level-1 and level-2 equations in Table 1, as each model parameterizes growth differently. Model 1, the random linear growth model, contains three parameters that pertain to growth. The first is the average rate of linear growth ${\gamma _{10}}$ , which is the average, per-month, linear rate of growth expected when controlling for all other variables in the model, i.e., Grade, Home Language, and Picture Type. This parameter may be found in Table 1, and it is added to student-level differences in linear growth ${u_{1j}}$ to produce student-level growth rates ${\beta _{1j}}$ (i.e., ${\beta _{1j}}\; = \;{\gamma _{10}}\; + \;{u_{1j}}$ ). With ${\gamma _{10}}$ as the average, or expected, rate of linear growth, the student variation in linear growth is governed by the second growth parameter of the random linear growth model, $Var\left( {{u_{1j}}} \right) = {\tau _{11}}$ . In other words, ${\tau _{11}}$ is the linear growth variance component, and it represents the variance in the rate of linear growth among students. Large values of ${\tau _{11\;}}$ indicate considerable student-level differences in growth, and low values indicate that most students’ growth rates ${\beta _{1j}}$ are close to ${\gamma _{10}}$ . The third growth parameter of the random linear growth model, $Cov\left( {{u_{0j}},{u_{1j}}} \right) = {\tau _{01}} = {\tau _{10}}$ , is the covariance between student variation in baseline proficiency ${u_{0j}}$ and linear growth ${u_{1j}}$ . A positive value of ${\tau _{01}}$ indicates that students with a high initial proficiency ${\beta _{0j}}$ also have high rates of linear growth ${\beta _{1j}}$ , and a negative ${\tau _{01}}$ would indicate that students with a low ${\beta _{0j}}$ tend to have higher ${\beta _{1j}}$ .
In addition to the three growth parameters ${\gamma _{10}},\;{\tau _{11}},\;{\text{and}}\;{\tau _{01}}$ found in model 1, model 2, the random quadratic growth model, contains four additional growth parameters, namely ${\gamma _{20}},\;{\tau _{22}},\;{\tau _{02}},\;{\tau _{12}}$ . The first, ${\gamma _{20}}$ , which is the population average quadratic effect, represents the degree to which growth over the 11 months of programming decelerates or accelerates. More specifically, positive values of ${\gamma _{20}}$ indicate that average student growth accelerates over time, and negative values of ${\gamma _{20}}$ indicate that student growth decelerates over time. The remaining three growth parameters can all be seen in the variance covariance matrix in Table 1. First, $Var\left( {{u_{2j}}} \right)\; = \;{\tau _{22}}$ represents the amount of between-student variation in levels of growth in terms of acceleration or deceleration. Finally, $Cov\left( {{u_{0j}},\;{u_{2j}}} \right)\; = \;{\tau _{02}}\; = \;{\tau _{20}}$ represents the amount of covariation between student-level differences in the intercept and the quadratic effect, while $Cov\left( {{u_{1j}},\;{u_{2j}}} \right)\; = \;{\tau _{12}}\; = \;{\tau _{21}}$ represents the amount of covariation between the student-level linear growth rate and the quadratic effect.
Results
Each of the two models was fit to each language (i.e., French and English) by language feature combination (i.e., vocabulary sophistication, sentence sophistication, stamina, and coherence/cohesion), resulting in 16 total sets of results. All models were estimated in R (R Core Team, 2021) using the probabilistic programming language Stan (Carpenter et al., Reference Carpenter, Gelman, Hoffman, Lee, Goodrich, Betancourt, Brubaker, Guo, Li and Riddell2017; Stan Development Team, 2022), which implements the No-U-turn (NUTS; Hoffman and Gelman, Reference Hoffman and Gelman2014; Betancourt, Reference Betancourt2017), an adaptive variation of Hamiltonian Monte Carlo (HMC) sampling. Additionally, the R package brms (Bürkner, Reference Bürkner2017) was used to generate stan scripts. Each set of results was estimated using 6000 posterior samples, 2000 of which were warm-up drawsFootnote 6 . Further, to check the validity of the posterior means and 95% CIs, four chains were used, and results indicated convergence. To ensure the model data fit, posterior predictive checks were performed and visually inspected. We found that all models fit the data adequately well. Finally, to ensure our findings are robust to our choice of priors, we conducted a sensitivity analysis and found that the use of an alternative set of priors had no impact on our results. (For more information on convergence information, posterior predictive checks, and our sensitivity analysis, see Appendices C, D, and E on the OSF page, which also contains the data and syntax used for our analyses.)
Bayesian analysis results
RQ 1a – French growth patterns
To assess growth patterns in French oral language across the duration of the study, we first inspected ELPD differences, which suggested that all four language features were best explained by a random linear growth model. Consequently, the random linear growth model was selected to answer all research questions pertaining to French oral language development.
Of the four language features evaluated in French, coherence/cohesion’s average linear growth rate was the smallest (see Table 2, Random Linear Growth Model: ${\widehat y_{10}}$ = 1.135). Nevertheless, its 95% CI contains only positive values, implying that students on average progressed in French coherence/cohesion over the eleven months of programming, though it is unclear how meaningful such growth was (95% CI = [1.003,1.283]).
Note. All parameter estimates are on the odds scale except for random effect standard deviations and random effect correlations. Posterior means corresponding to parameters on the odds scale are denoted with an asterisk when accompanied with a 95% that does not contain one. Posterior means corresponding to random effect standard deviations and random effect correlations are denoted with an asterisk when accompanied with a 95% CI that does not contain zero.
The average linear growth rates estimated for sentence structure (see Table 3, Random Linear Growth Model: ${\widehat y_{10}}$ = 1.326, 95% CI = [1.166, 1.527]), stamina (see Table 4, Random Linear Growth Model: ${\widehat y_{10}}$ = 1.417, 95% CI = [1.278, 1.592]), and vocabulary (see Table 5, Random Linear Growth Model: ${\widehat y_{10}}$ = 1.365, 95% CI = [1.252, 1.493]) were all noticeably larger than for coherence/cohesion. Indeed, as Figure 2 shows, from the beginning to the end of the study duration, the percentage of students receiving a higher rating increased. For example, as Figure 2 shows that at baseline, few (if any) students were expected to receive a rating of emerging, whereas the percentage of students receiving such a rating increased after 11 months. Similarly, at baseline, the majority of students were expected to receive a rating of not evident for all language features, unlike at the end of the study, except in the case of coherence/cohesion.
Note. All parameter estimates are on the odds scale except for random effect standard deviations and random effect correlations. Posterior means corresponding to parameters on the odds scale are denoted with an asterisk when accompanied with a 95% that does not contain one. Posterior means corresponding to random effect standard deviations and random effect correlations are denoted with an asterisk when accompanied with a 95% CI that does not contain zero.
Note. All parameter estimates are on the odds scale except for random effect standard deviations and random effect correlations. Posterior means corresponding to parameters on the odds scale are denoted with an asterisk when accompanied with a 95% that does not contain one. Posterior means corresponding to random effect standard deviations and random effect correlations are denoted with an asterisk when accompanied with a 95% CI that does not contain zero.
Note. All parameter estimates are on the odds scale except for random effect standard deviations and random effect correlations. Posterior means corresponding to parameters on the odds scale are denoted with an asterisk when accompanied with a 95% that does not contain one. Posterior means corresponding to random effect standard deviations and random effect correlations are denoted with an asterisk when accompanied with a 95% CI that does not contain zero.
While meaningful positive growth was found in all four features in French, there were important differences in the relationship between individual differences in student baseline proficiency ${u_{0j}}$ and linear growth ${u_{1j}}$ . As can be seen in Tables 2 through 5, the correlations between ${u_{0j}}$ and ${u_{1j}}$ were negative for all four language features, though this negative relationship was only found to be meaningful for sophistication in sentence structure (see Table 3, Random Linear Growth Model: ${\widehat \rho _{01}}$ = −0.62, 95% CI = [−0.849, −0.269]) and for vocabulary (see Table 5, Random Linear Growth Model: ${\widehat \rho _{01}}$ = −0.72, 95% CI = [−0.928, −0.385]). In other words, while both positive and meaningful growth was observed in French overall, sentence structure and vocabulary growths were different in that students with higher baseline ratings in these two features tended to progress at slower rates than students with lower baseline ratings.
RQ 1b – English growth patterns
To assess growth patterns in English oral language across the duration of the study, we first inspected ELPD differences, which suggested that a random linear growth model best explained coherence/cohesion, stamina, and vocabulary sophistication. In contrast, sentence sophistication was best explained by a random quadratic growth model. Given this, the random linear growth model was selected to answer research questions relevant to all language features except sentence sophistication, where the random quadratic growth model was used.
Of the four language features assessed in English, vocabulary is the only one with a meaningful positive linear growth trajectory parameter, indicating that with each month of programming, the odds that students received a higher rating compared to all lower ratings combined increased by an average of 10.8% (see Table 5, Random Linear Growth Model: ${\widehat y_{10}}$ = 1.108, 95% CI = [1.026, 1.202]). By comparison, the average linear growth for coherence/cohesion (see Table 2, Random Linear Growth Model: ${\widehat y_{10}}$ = 1.016, 95% CI = [0.917, 1.129]) and stamina (see Table 4, Random Linear Growth Model: ${\widehat y_{10}}$ = 1.064, 95% CI = [0.964, 1.175]) were both positive but not meaningful (i.e., their CIs overlapped with 1). Finally, as mentioned earlier, sophistication of sentence structure was the only feature best explained by the quadratic growth model, with a negative average linear growth rate and a positive average quadratic growth rate (see Table 3, Quadratic Growth Model: ${\widehat y_{10}}$ = 0.793, 95% CI = [0.527, 1.185]; ${\widehat y_{20}}$ = 1.013, 95% CI = [0.974, 1.053]). In other words, the odds of receiving a low rating compared to all higher ratings combined increased slightly over the study duration, though the rate at which this occurred decelerated over time.
As Figure 2 shows, the percentage of students receiving a higher rating from the beginning to the end of the study duration increased only for the vocabulary feature. At baseline, the majority of students were expected to receive a rating of Emerging and Early Developing, whereas by the end of the study, the expected percentage of students receiving a rating of Emerging decreased considerably, and the expected percentage of students receiving a rating of Early Developing and Controlled noticeably increased.
RQ 1c – Relationship between French and English growth patterns
We used Spearman product moment correlations with the SpearmanCI package (de Carvalho, Reference de Carvalho2018) to investigate the relationship between French and English across all four language features. Table 6 shows the five sets of correlations between French and English ratings at each individual wave. Correlations are accompanied by 95% confidence intervals, and when intervals do not overlap with 0, correlations are denoted with an asterisk. As can be seen, both vocabulary and coherence/cohesion appear to have a weak to moderate association between languages at waves 2, 3, and 5. Stamina in French is correlated with stamina in English at waves 2–5. Sentence structure, on the other hand, only appears to be correlated between French and English at the last two waves. While these results do not point to definitive positive relationships across all language features, overall, these results point to a possible weak to moderate positive correlation between French and English, suggesting that growth in language features in one language is generally associated with growth in the other language.
Note. Correlations are denoted with an asterisk when accompanied with a 95% confidence interval that does not contain zero.
RQ 2a – Associations between French progressions and grade level/household language background
After selecting a model based on the ELPD difference for each language feature assessed in French, we then inspected the posterior mean and 95% CIs of the level-2 fixed effects for Grade ${\widehat y_{01}}$ and Home Language $\;{\widehat {\;y}_{02}}$ . For all language features assessed in French, coming from a French-speaking household (i.e., Home Language = 1) was associated with greater odds of receiving a higher rating when controlling for other variables in the model (i.e., grade and linear growth). By far, the effect of having French spoken in the home was largest for sophistication of sentence structure (see Table 3, Random Linear Growth Model: $\;{\widehat {\;y}_{02}}$ =70.396, 95% CI = [11.032, 500.633]). However, CIs suggest the increase in odds due to having French spoken in the household could plausibly range from a value as low as 11.032 to one as large as 500.633. Similarly, wide 95% CIs were observed for coherence/cohesion (see Table 2, Random Linear Growth Model: $\;{\widehat {\;y}_{02}}$ = 8.135, 95% CI = [1.175, 39.404]), stamina (see Table 4, Random Linear Growth Model: $\;{\widehat {\;y}_{02}}$ =23.88, 95% CI = [4.355, 137.47]), and sophistication of topic vocabulary (see Table 5, Random Linear Growth Model: $\;{\widehat {\;y}_{02}}$ = 19.41, 95% CI = [5.515, 72.505]).
Similarly, being in first grade (i.e., Grade = 1) was meaningfully associated with greater odds of receiving a high rating in most features in French after controlling for home language and linear growth. This can be seen in the sophistication of sentence structure feature (see Table 3, Random Linear Growth Model: $\;{\widehat y_{01}}$ = 13.548, 95% CI = [2.314, 91.64]), the stamina feature (see Table 4, Random Linear Growth Model: $\;{\widehat y_{01}}$ = 10.936, 95% CI = [2.104, 61.48]), and the sophistication of topic vocabulary feature (see Table 5, Random Linear Growth Model: $\;{\widehat y_{01}}$ = 4.927, 95% CI = [1.538, 15.94]). However, being in first grade was not meaningfully associated with greater odds of receiving a high rating in coherence/cohesion (see Table 2, Random Linear Growth Model: $\;{\widehat y_{01}}$ = 2.728, 95% CI = [0.618, 13.214]).
To summarize, with the exception of coherence/cohesion, where only home language was meaningful, both grade and home language were meaningful predictors in all French language progressions. However, the patterns of association differed across the two predictors. Specifically, across all language progressions, the increased odds of receiving a higher rating due to being in first grade were noticeably smaller than the increased odds associated with having French spoken at home. Figures 2, 3 and 4 depict the model-predicted distributions conditional on values of time, grade, and home language. For example, with regards to sentence sophistication in French, first graders were roughly five times more likely to be placed at pre-emergent on the DLLP than TK/K students, and coming from a French-speaking household increased by 12% the chances of being placed at pre-emergent as opposed to not evident on the DLLP.
RQ 2b – Associations between English progressions and grade level/household language background
After selecting a model based on the ELPD difference for each language feature assessed in English, we inspected the posterior mean and 95% CIs of the level-2 fixed effects for Grade ${\widehat y_{01}}$ and Home Language $\;{\widehat y_{02}}$ . Figures 3 and 4 depict the model-predicted distributions conditional on the values of grade and home language. Coming from an English-only household (i.e., Home Language = 1) did not play a meaningful role in English progressions. In contrast, being in first grade (i.e., Grade = 1) was meaningfully associated with greater odds of receiving a high rating in stamina (see Table 4, Random Linear Growth Model: $\;\;{\widehat y_{02}}$ = 7.298, 95% CI = [1.853, 29.483]).
Results summary
To answer RQ1, the Random Linear Growth model was selected for all language by DLLP feature combinations, except for English sentence structure, where a random quadratic growth model was selected. For all language features assessed in French, estimates of average linear growth trajectories ${\gamma _{10}}$ are all positive and have 95% CIs that do not overlap with one, which suggests that, on average, participants’ probability of receiving higher ratings increased over the duration of the study. In contrast, in English, meaningful growth was only found in vocabulary, which suggests that, on average, participants’ probability of receiving higher ratings over the duration of the study only increased for vocabulary.
To answer RQ2, a random linear growth model was selected for all language features assessed in French. Coefficients corresponding to the effect of grade and home language were all positive and had 95% CIs containing only positive values, implying that being in first grade and coming from a French-speaking household were both meaningfully associated with a greater probability of receiving high ratings on all language features assessed in French. For English, a random linear growth model was selected for coherence/cohesion, sophistication of topic vocabulary, and stamina. A random quadratic growth model was selected for sophistication of sentence structure. Coming from an English monolingual household was not meaningfully associated with the probability of receiving higher ratings on any of the language features assessed in English. However, for the stamina feature, coefficients corresponding to the effect of grade were positive and had 95% CIs containing only positive values, implying that being in first grade was meaningfully associated with a greater probability of receiving high ratings on stamina in English.
Discussion
In this study, we examined French and English oral language trajectories for early-elementary DLI students across 11 months of programming based on a picture description task. For French oral language, we found evidence of meaningful positive linear growth for all features and picture scene combinations with the smallest growth for coherence/cohesion. In contrast, for English oral language, meaningful positive growth was only detected for sophistication of topic vocabulary. Overall, coming from a French-speaking household improved French oral language trajectories, but coming from an English-only household did not improve English oral language trajectories. In both languages, grade level influenced the trajectories of some – but not all – features.
It is worth noting that the nature of the task itself may explain the varying degrees of growth on different features. Indeed, compared to other language elicitation tasks, Nurss and Hough (Reference Nurss and Hough1985) found that picture descriptions favored lexical skills over discourse-level skills. Similarly, in our study, we note that for both languages, growth is detected on word-level features (i.e., sophistication of topic vocabulary), but growth is much smaller on discourse-level features (e.g., coherence/cohesion), which makes sense considering that describing a picture itself does not call for discourse complexity. The small growth for coherence/cohesion echoes findings from previous research with elementary school children who were either multilingual (predominantly Spanish-English) speakers or monolingual English speakers (Bailey, Reference Bailey2017). Given the young age of the participants, there may be a developmental explanation. Early elementary children may not yet have sufficient forms and structures to be formulating language at the level of more sophisticated multi-utterance organization that coherence among utterances and cohesive devices within them requires. Another potential explanation is that the progression for coherence/cohesion is not precise enough to detect growth at such early phases, which implies that there are precursor aspects of the feature that are not represented in the current progression.
Findings from our research suggest that French and English language growth within individual children is not uniform across features. The fact that grade level was meaningfully associated with greater progression for most features in French confirms that there is growth in the partner language between the first and second year of the DLI program, corroborating trends that were identified with DLI students in later grades (e.g., Watzinger-Tharp et al., Reference Watzinger-Tharp, Swenson and Mayne2018; Reference Watzinger-Tharp, Tharp and Rubio2021). However, unlike previous studies with DLI students, growth was not detected for most features in English, except for sophistication of topic vocabulary and stamina. One hypothesis is that the eleven-month time frame was simply too short to detect meaningful growth in sentence structure and coherence/cohesion in English, the students’ dominant language. Furthermore, at least for the vocabulary feature, the skewed distribution points to the possibility that students performed at ceiling levels. Other DLI studies have found a comparatively slow growth in English for students who start at more advanced proficiency levels (e.g., Watzinger-Tharp et al., Reference Watzinger-Tharp, Tharp and Rubio2021)Footnote 7 .
This study also suggests great variability in baseline scores and growth rates, in line with CDST, which highlights the high level of variability that exists within and across individuals. For example, both in English and in French, individual students varied more in their baseline proficiency than in their growth rates. Additionally, there is more variability in baseline proficiency in French compared to English for vocabulary, sentence structure, and stamina. These findings suggest that there is a lot of individual variability in where students start off, especially for the nondominant language. Taken together with the findings that speaking French at home predicted improvements in French language progression, we hypothesize that the source of this variability stems from individual differences in French language experiences at home (Paradis, Reference Paradis2023). Furthermore, across both French and English, variability in growth rates is fairly similar, suggesting that the effect of the education program is the same across students. Finally, for French vocabulary and sentence structure and English stamina, participants with higher baseline scores tended to experience less growth over the study duration than peers with lower baseline scores (and vice versa). It is possible that children who were already scoring high at the beginning of the task had little room to improve given the nature of the task combined with their language abilities at this developmental time point.
Furthermore, coming from a French-speaking household was meaningfully associated with greater progression for most language features in French, whereas coming from a monolingual English household was not meaningfully associated with greater progression in English. Indeed, previous studies with the same participants also showed a significant advantage in French receptive and expressive vocabulary based on levels of French exposure in the household (input, output, or literacy), an advantage that was not shared in English for children from English-only households (Ryan, Reference Ryan2021a; Reference Ryan2021b). Other studies in DLI contexts also show greater performance in the partner language for heritage language students, at least for some skills (e.g., Burkhauser et al., Reference Burkhauser, Steele, Li, Slater, Bacon and Miller2016; Lindholm-Leary & Hernández, Reference Lindholm-Leary and Hernández2011; Xu et al., Reference Xu, Padilla and Silva2015). It is worth noting that English was the dominant language for most children in the sample, including those from French-speaking households (Ryan, Reference Ryan2023). Consequently, considering their high levels of English proficiency at baseline, students may have been demotivated in English, finding the task repetitive and excessively easy.
Limitations, future work, and conclusion
As explained earlier, given the exploratory nature of this study and our choice of analyses, we used weakly informative priors that give less weight to the data than traditional uninformative priors. Using a Bayesian framework allowed us to run more complex analyses despite the small sample size of the dataset, letting models converge and protecting our inferences from impossible findings. However, despite the benefits provided by a Bayesian approach, including the flexibility to examine the results more comprehensively, having a larger dataset would still be beneficial. Additionally, the analyses we conducted treated English and French language progressions separately. However, children are learning both languages at once. Thus, a promising direction for future research could be to examine how the knowledge of each language taken together in one model shapes children’s concurrent bilingual language development. For example, the use of a latent variable model such as factor analysis may help determine whether the aforementioned correlations between French and English growth indicate interdependence between the two linguistic systems. Such an investigation can be motivated and guided by existing theories of multilingual development that would suggest a common underlying proficiency (Cummins, Reference Cummins1980) or unitary linguistic system (e.g., Vogel & Garcia, Reference Vogel, Garcia, Noblit and Moll2017). Indeed, common underlying proficiency theory (Cummins, Reference Cummins1980) argues that knowledge (e.g., cognition, literacy skills) shared across two or more languages can be accessed and utilized for the acquisition of all of a speaker’s languages and transfers to the acquisition of additional languages, thus supporting a learner’s multilingual development overall. On the other hand, the notion of a unitary linguistic system central to translanguaging implies that language users do not possess separate linguistic systems for each of the languages they know, rather they “select and deploy particular features from a unitary linguistic repertoire to make meaning and to negotiate particular communicative contexts” (Vogel & Garcia, Reference Vogel, Garcia, Noblit and Moll2017, p. 1).
Finally, we did not conduct formal tests to analyze disparities in growth patterns by different language features. Such comparisons can be conducted within the Bayesian framework, but we deliberately refrained from pursuing them in our current study. The rationale behind this decision is that, in order to detect such differences, we would likely require either a larger sample size, more informative prior information, or both.
In conclusion, this study helps fill a gap in research on DLI education by describing the progression of fine-grained features of language at the word, sentence, and discourse levels and the developmental trajectories of two languages, namely French and English, for early-elementary students. Theoretically, this research adds to the literature on CDST as it illustrates the dynamic and adaptive ways in which multiple features of two languages develop longitudinally in the context of bilingual schooling. In addition, this study offers practical implications for bilingual educators by providing them with a language progression approach that can help untangle deviance from delay in children’s bilingual language development. In order to obtain a more informed picture of DLI outcomes, we encourage future research to apply such a language progression approach to a broader range of contexts, by examining additional languages, domains (e.g., writing), grades, or content areas (e.g., mathematics).
Replication package
Data and code have been made publicly available on OSF at the following link: https://osf.io/4ugk3/.
Competing interests
The authors have no conflicts of interests to disclose.