Introduction
Speaking and writing have different characteristics that affect how second language (L2) learners engage with them. One important difference is that speaking requires more spontaneous production, while writing provides more opportunities to revise and edit, because it typically entails limited simultaneous interaction (Byrnes & Manchón, Reference Byrnes, Manchón, Byrnes and Manchón2014; Gilabert, Manchón, & Vasylets, Reference Gilabert, Manchón and Vasylets2016; Williams, Reference Williams2012). The distinct features of each modality often lead L2 researchers to consider them separately.
The acquisition of the two modalities, however, is not completely separate. Nevertheless, most L2 development studies analyze only oral production, and L2 development in writing is less well theorized, with most writing research instead focusing on topics such as learners’ changes in knowledge of genres, goal setting, and the writing process (Cumming, Reference Cumming and Manchón2012; Sasaki, Reference Sasaki2004; Tardy, Reference Tardy and Manchón2012; also see Polio, Reference Polio2017; Polio & Park, Reference Polio, Park, Manchón and Matsuda2016). To begin to address this gap and given the fundamental connection between speaking and writing, this study explores whether processability theory (PT) (Pienemann, Reference Pienemann1998, Reference Pienemann and Pienemann2005), which offers a framework for understanding L2 development based on oral production, can be applied to L2 written production. PT posits a universal implicational order of L2 development. According to PT, as learners develop the L2, they acquire processing procedures at a higher level, which allows more diverse structures to emerge.
This study compares the emergence of PT’s target morphosyntactic structures in the oral and written production of learners at four proficiency levels. It further compares the learners’ accuracy in their oral and written production of the structures. Comparing the emergence and accuracy of particular morphosyntactic structures between oral and written production will lead to better understanding of the similarities and differences of language development in the two modalities.
Literature review
Modality
In terms of constraints on online processing, speaking and writing typically differ along three dimensions: (a) the presence or absence of an audience during production, (b) the stability of the language signal, and (c) the degree of control of the language user over the linguistic output (Ravid & Tolchinsky, Reference Ravid and Tolchinsky2002, p. 426). Speaking is usually, but not always, interactive and writing is usually, but not always, not interactive. In particular, speakers usually produce speech while interacting with listeners, whereas writers do not usually expect immediate interaction with readers while producing writing. The lack of immediate interaction also means that writing is usually completed at the writer’s own pace, which is much slower but more stable than the pace of spontaneous speech production (Gilabert et al., Reference Gilabert, Manchón and Vasylets2016). In addition, written production shows less variation in accuracy than oral production, because writers can usually take time to revise and edit their writing before the audience reads it (Granfeldt, Reference Granfeldt, van Daele, Housen, Kuiken, Pierrard and Vedder2008). Writers thus generally have more control over their attentional resources and more opportunities to pay attention to their production, compared to speakers (Byrnes & Manchón, Reference Byrnes, Manchón, Byrnes and Manchón2014; Williams, Reference Williams2012). Due to these distinct characteristics, writing, even under time constraints, allows access to both explicit (declarative) and implicit (procedural) knowledge, while speaking frequently demands the use of implicit (procedural) knowledge for fluent execution (Polio, Reference Polio2012).
Speaking and writing have these different characteristics, regardless of the status of the language (e.g., native language [L1], L2) for the language user. In this regard, some differences between L2 speaking and L2 writing have been found in the field of L2 acquisition. For instance, similar to L1 speakers, L2 learners can attend to form more in writing than in speaking, which in turn may lead to specific structures and vocabulary appearing earlier in L2 learners’ writing than in their speaking (Polio, Reference Polio2012). Despite their different characteristics, research has shown overlap in L2 oral and written production in terms of cognitive processes; complexity, accuracy, and fluency (CAF); and developmental sequence.
Cognitive processes in speaking and writing
Theoretical approaches to the cognitive processes involved in L2 speaking and writing either try to explain one or the other, such as Levelt’s (Reference Levelt1989) speaking model (conceptualization, formulation, and articulation) and Kellogg’s (Reference Kellogg, Levy and Ransdell1996) writing model (formulation, execution, and monitoring), or suggest that speaking and writing use the same cognitive processes but draw on different subsets of these processes (Bourdin & Fayol, Reference Bourdin and Fayol1994, Reference Bourdin and Fayol2000; Brown, McDonald, Brown, & Carr, Reference Brown, McDonald, Brown and Carr1988). In particular, although both Levelt’s speaking model and Kellogg’s writing model include a formulation stage in which language users generate acceptable forms and produce a temporary linguistic output, Kellogg’s monitoring stage in the writing process allows language users to go through and edit their writing. Compared to the stages of the speaking process, the stages of the writing process are more interactive and recursive (Kellogg, Reference Kellogg2001; McCutchen, Reference McCutchen1996). The later stages of the writing process may be positively or negatively affected by the earlier stages, and writers can go back and forth among these stages until they complete their production. In addition, Brown et al. (Reference Brown, McDonald, Brown and Carr1988) argued that language production includes formulation, execution, and monitoring stages, regardless of modality. They posited that formulation processes might be interchangeable across modalities but would be combined with different sets of execution processes and different monitoring processes to produce either oral or written language.
Regardless of which approach it takes, existing research often tacitly assumes that speaking and writing production share at least some of the cognitive processes. If this is the case, then some characteristics of speaking production may also be found in writing production, and vice versa.
CAF in oral and written production
Task-based language teaching studies that touch on the effects of task modality on performance have had mixed results (Gilabert et al., Reference Gilabert, Manchón and Vasylets2016; Granfeldt, Reference Granfeldt, van Daele, Housen, Kuiken, Pierrard and Vedder2008; Kormos & Trebits, Reference Kormos and Trebits2009; Kuiken & Vedder, Reference Kuiken, Vedder and Robinson2011; Son, Reference Son2022; Tavakoli, Reference Tavakoli, Byrnes and Manchón2014; Vasylets, Gilabert, & Manchón, Reference Vasylets, Gilabert and Manchón2017; Zalbidea, Reference Zalbidea2017). For instance, Kuiken and Vedder (Reference Kuiken, Vedder and Robinson2011) and Zalbidea (Reference Zalbidea2017) both compared an L2 oral task group to an L2 written task group. Kuiken and Vedder found greater syntactic complexity in the written task (i.e., writing a letter) than the oral task (i.e., leaving a voice message). In contrast, Zalbidea’s study found greater complexity in the oral task (i.e., leaving a voice message) than the written task (i.e., writing an email). Other studies have observed no significant difference in syntactic complexity between oral and written tasks (Granfeldt, Reference Granfeldt, van Daele, Housen, Kuiken, Pierrard and Vedder2008; Kormos, Reference Kormos, Byrnes and Manchón2014; Kormos & Trebits, Reference Kormos and Trebits2009; Vasylets et al., Reference Vasylets, Gilabert and Manchón2017). Likewise, some studies (Kormos, Reference Kormos, Byrnes and Manchón2014; Kormos & Trebits, Reference Kormos and Trebits2009; Tavakoli, Reference Tavakoli, Byrnes and Manchón2014; Zalbidea, Reference Zalbidea2017) showed greater accuracy in a written task; they suggested that the additional opportunities to revise and monitor in a written task allow for greater accuracy. On the contrary, other studies had different results (Granfeldt, Reference Granfeldt, van Daele, Housen, Kuiken, Pierrard and Vedder2008; Vasylets et al., Reference Vasylets, Gilabert and Manchón2017).
Changes in CAF over time have also been compared between oral and written modalities (Bulté & Housen, Reference Bulté and Housen2009; Serrano, Tragant, & Llanes, Reference Serrano, Tragant and Llanes2012; Weissberg, Reference Weissberg2000), providing evidence of how L2 knowledge develops (e.g., Larsen-Freeman, Reference Larsen-Freeman1978). For example, Serrano et al.’s (Reference Serrano, Tragant and Llanes2012) longitudinal study implemented oral narrative tasks and descriptive essay tasks three times during a 1-year study abroad program. The study showed that although both oral and written production developed, the time at which each construct in CAF developed differed across the two modalities. For instance, fluency and lexical diversity in oral production developed earlier, while accuracy in both modalities developed later.
L2 grammar development
The development of learners’ grammatical knowledge seems comparable across speaking and writing to some extent (e.g., Boss, Reference Boss2008; Byrnes & Sinicrope, Reference Byrnes, Sinicrope, Ortega and Byrnes2009; Kyle, Crossley, & Verspoot, Reference Kyle, Crossley and Verspoor2021; Park, Reference Park2017; Weissberg, Reference Weissberg2006; Zalbidea, Reference Zalbidea2021). For instance, Zalbidea (Reference Zalbidea2021) examined the development of beginner-level L2 Spanish learners’ knowledge of future and clitic forms with a pre/post/delayed-posttest experimental design. The study found that after the treatment, speaking and writing groups had comparable scores, showing significant gains, on the tasks, although the writing group showed better performance on some parts (i.e., in a production task on the posttest and on the clitic in an aural acceptability judgment task on the delayed posttest).
In addition, many studies posit an L2 acquisition order of specific morphosyntactic forms (e.g., Gass, Reference Gass, Scarcella and Krashen1980). Most of these investigations discuss only oral production, while only a few address written production (e.g., Boss, Reference Boss2008; Byrnes & Sinicrope, Reference Byrnes, Sinicrope, Ortega and Byrnes2009; Kyle et al., Reference Kyle, Crossley and Verspoor2021; Park, Reference Park2017; Weissberg, Reference Weissberg2006). However, the results of some recent L2 writing studies indicate the possibility of applying L2 speaking studies’ approach to writing.
Boss (Reference Boss2008) investigated the acquisition order of L2 German verb morphology, which Pienemann (Reference Pienemann1998) originally predicted based on oral production, in the written modality. The results indicated that as L2 learners’ knowledge of German developed, the number of present tense conjugations of sein and haben that emerged increased, aligning with Pienemann’s predictions, while the emergence of past participles of irregular verbs did not appear to fit Pienemann’s predictions. Likewise, Byrnes and Sinicrope (Reference Byrnes, Sinicrope, Ortega and Byrnes2009) explored whether a universal progression for relative clauses suggested by the Noun Phrase Accessibility Hierarchy (Keenan & Comrie, Reference Keenan and Comrie1977) applies to L2 written production. They found that most of the learners’ use of relative clauses followed the Noun Phrase Accessibility Hierarchy; however, 20–30% of the relative clauses in the data contradicted the predictions.
Both Boss (Reference Boss2008) and Byrnes and Sinicrope (Reference Byrnes, Sinicrope, Ortega and Byrnes2009) suggested that the learning process in the written modality contrasts with the learning process in the oral modality to some extent. However, caution is needed in comparing the results from these studies to the results of studies on learners’ development in speaking. As Norris and Ortega (Reference Norris, Ortega, Doughty and Long2003) pointed out, researchers attempting to assess the emergence of complex phenomena must be careful not to base their interpretations “on a lack of evidence, as opposed to evidence for the lack of emergence…[I]t is likely that measurement data are more frequently underinterpreted when researchers do not adequately conceptualize the complexities of measurement behaviors that they intend to elicit” (p. 733–734). That is, a lack of evidence for similarities across the modalities may be due not to different developmental paths in oral and written modalities but to research design.
For example, Boss (Reference Boss2008) chose specific topics for the writing tasks (e.g., a travel diary) to elicit particular structures; but the number of tasks (one or two) and the length of writing (around 100 words) for each task may not have been enough to elicit sufficient past participles of weak verbs in German (e.g., sagen → gesagt, [say → said]). In addition, the instructions for the writing tasks allowed the learners to avoid using certain structures. Likewise, Byrnes and Sinicrope (Reference Byrnes, Sinicrope, Ortega and Byrnes2009) analyzed tasks designed to evaluate learners’ overall achievement at the end of a course. Writing tasks designed specifically for the elicitation of target structures might better reveal a developmental progression for written production that could be validly compared to the developmental progression of oral production.
In sum, very few studies have investigated the L2 developmental sequence in the written modality, and those few have not been based on the theoretical frameworks widely used to explore L2 development in the spoken modality. Nevertheless, some similar patterns have been observed. But because external factors (e.g., task type, instructions) may have led to findings that show differences between development in oral production and development in written production, it is important to investigate whether particular developmental theories can be applied to both modalities with the use of similar tasks (Polio, Reference Polio2012, Reference Polio2017).
Although there is no consensus on the effects of modality, previous studies have found comparable effects across modalities at least to some extent. In addition, only a few studies have attempted to apply theoretical frameworks of L2 development to the written modality, and they have had some methodological limitations (e.g., Boss, Reference Boss2008; Byrnes & Sinicrope, Reference Byrnes, Sinicrope, Ortega and Byrnes2009). Given the connections between the oral and the written modalities and the possibility of the application of theoretical frameworks to both, this study casts light on whether L2 language development is consistent across the oral and the written modalities on the basis of PT, which allows for investigating theory-based L2 development in writing as well as comparability of L2 development between the two modalities (Polio, Reference Polio2012 2017; Polio & Park, Reference Polio, Park, Manchón and Matsuda2016).
Processability Theory
Pienemann (Reference Pienemann1998) proposed PT to explain the developmental stages in language learners’ interlanguage. PT hypothesizes that, at each developmental stage, language learners acquire a processing procedure that manages information transfer, which is necessary for the production of morphosyntactic structures. The processing procedures are based on lexical functional grammar’s (Kaplan & Bresnan, Reference Kaplan, Bresnan and Bresnan1982) feature unification process, which posits three processing procedures, in which language users (a) classify the grammatical information in a lexical item, (b) store the information temporarily, and (c) utilize the information at different points in the constituent structure. Pienemann (Reference Pienemann1998, Reference Pienemann and Pienemann2005) argued that processing procedures are hierarchical and the levels of the hierarchy can be addressed through feature unification. In addition, Pienemann claimed that language learners go through these developmental stages in a universal implicational order. In other words, a learner cannot skip lower stages and follow a different developmental order. This view of development in processing procedures is based on Levelt’s (Reference Levelt1989) speaking model, which suggests that language processing is incremental and linear: While surface forms are still being constructed, grammatical information is temporarily stored in grammatical memory storage, which ultimately allows automatic processing of the information. According to PT, as language learners reach higher developmental stages, the feature unification (or matching) described by lexical functional grammar takes place across greater structural distances within sentences.
Although PT research has primarily analyzed L2 learners’ oral production, the hierarchy of processing procedures is not limited to a specific modality or specific task type. Pienemann’s (Reference Pienemann1998) steadiness hypothesis posits that the developmental order does not change across different communicative tasks as long as the tasks are designed to draw on the same production skill (e.g., Ellis, Reference Ellis2008; Pienemann, Reference Pienemann1998; Spinner, Reference Spinner2011). For instance, Pienemann (Reference Pienemann1998) found that six different communicative tasks reflected the developmental order of PT in learners’ production, supporting the steadiness hypothesis, although morphology production showed a few mismatches (i.e., in about 1% of the data).
Based on the processing procedures that become available to learners at each stage, PT proposes six developmental stages: Lemma access (Stage 1), Category procedure (Stage 2), Phrasal procedure (Stage 3), Verb phrase procedure (Stage 4), S-procedure (Stage 5), and Subordinate clause procedure (Stage 6). In Stage 1, learners are limited to producing single words or formulaic expressions. Lacking processing procedures, they are not ready to exchange information, so feature matching or unification does not occur. At this stage, nonlinguistic strategies, such as facial expressions, play a crucial role in keeping communication going. In Stage 2, when category procedures become available, learners are able to assign categorical information to lexical items. At this stage, operations occur only within a single constituent (e.g., dog[s]). When it comes to syntax, learners start to produce syntactic structures with canonical word order (e.g., Mary woke up at 7:00 am).
Next, in Stage 3, a single lexical item that involves categorical information can take the role of head of a phrasal category. Within a phrasal category, feature information can be exchanged across elements (e.g., these dogs). In terms of syntax, some fronting elements, such as adverbs and prepositional phrases, emerge (e.g., Yesterday, Mary woke up at 7:00 am). In addition, learners begin to produce questions starting with do-verbs (e.g., Did Mary wake up at 7:00 am?). According to the topic hypothesis (Pienemann, Di Biase, Kawaguchi, & Håkansson, Reference Pienemann, Di Biase, Kawaguchi, Håkansson, Kroll and de Groot2005; Pienemann & Lenzing, Reference Pienemann, Lenzing, VanPatten, Keating and Wulff2020), at this stage learners also start to differentiate syntactic topics and grammatical functions from the subject.
In Stage 4, learners can share a feature across phrases (e.g., Mary woke up at 7:00 am and came to the office at 9:00 am). In terms of syntax, learners begin to produce yes-no questions by inverting the subject and auxiliary verb, and to produce a wh-question with copular verbs or other auxiliary verbs (e.g., Has Mary met you?; where is Mary?). In Stage 5, grammatical features are matched across phrases. Subject and verb agreement of number and person features occurs at this stage (e.g., Mary loves these dogs). Learners also start to produce wh-questions in which do or other auxiliary verbs are in second position (e.g., Where did you find this book?). Finally, in Stage 6, the processing procedures for subordinate clauses are developed, such as wh-complement clauses in English (e.g., Mary shows what she made yesterday). A grammatical wh-complement in English does not include auxiliary inversion, which is necessary to make a question. When learners reach this developmental stage, they can differentiate the syntactic orders of a wh-complement and a wh-question and produce the wh-complement with the correct grammatical order.
In short, as processing procedures develop, the morphosyntactic structures that a learner can produce are more various. The processing procedures and morphosyntactic structures that become available to learners at each developmental stage are summarized in Table 1.
PT has been confirmed by many studies testing several languages (e.g., Bonilla, Reference Bonilla2015; Jansen, Reference Jansen2008) in addition to English (e.g., Pienemann, Reference Pienemann1998; Spinner, Reference Spinner2011). Although the studies had participants with diverse L1 backgrounds, different age ranges, and different contexts of language acquisition (i.e., foreign language vs. L2; classroom vs. naturalistic setting), their results lend support to the implicational stages that PT proposes. In other words, the stages appear to be applicable to typologically different languages.
Accuracy and PT
The criterion of development in PT is the emergence of structures, so accuracy is not a primary focus. According to Pienemann (Reference Pienemann1998, p. 132), different morphosyntactic structures follow distinct developmental paths to reach accuracy; in addition, the rate at which the production of various structures reaches specific levels of accuracy varies. Therefore, Pienemann argued that accuracy, unlike emergence, is not a valid measurement of language learners’ development.
However, the PT perspective on language learners’ variability in production and automatization provides an opportunity to measure their accuracy in the use of morphosyntactic structures together with emergence, for the following reasons. When language learners attempt to produce morphosyntactic structures at stages higher than their present stage of development, they display inaccuracy and create variations (e.g., *What Mary do? instead of What does Mary do?), because they have not yet developed the required processing procedure to produce the structures (Pienemann, Reference Pienemann1998, Reference Pienemann and Pienemann2005). In other words, based on a learner’s developmental stage, his/her interlanguage has structurally limited variations: Learners initially produce inaccurate morphosyntactic forms and only later, when the relevant processing procedures are developed, become able to produce accurate forms.
PT defines language acquisition as acquisition of procedural knowledge that learners use to process their L2. In addition, developing processing procedures leads to automatization (Pienemann, Reference Pienemann1998, Reference Pienemann and Pienemann2005; Pienemann & Håkansson, Reference Pienemann and Håkansson1999). When the underlying processing becomes automatized, a specific structure can be produced with procedural knowledge rather than metalinguistic knowledge (Kawaguchi & Di Biase, Reference Kawaguchi and Di Biase2012). Acquisition of a certain structure includes acquisition of the routine that makes the processing procedure of the structure available. Research has shown that such automatization of the underlying routine is closely related to increased accuracy in producing the structure as well as processing efficiency. For instance, Anderson (Reference Anderson1992) demonstrated that L2 learners show faster production and fewer errors when they reach the procedural and automatic stages for a given structure. In addition, according to Hulstijn, Van Gelderen, & Schoonen (Reference Hulstijn, Van Gelderen and Schoonen2009), coming across novel vocabulary leads L2 learners to establish new form-meaning connections within their mental lexicons, while coming across familiar words reinforces pre-existing form-meaning connections. Through these processes, the learners experience automatization and become able to perform more quickly and accurately.
Nevertheless, until a routine is automatized for particular learners, they may show variability in their production of the morphosyntactic form. Even at the same developmental stage, their accuracy with different structures may differ based on the extent to which the underlying processing is automatized. In short, while accuracy is not a measure that PT utilizes, there is leeway within the PT framework to consider accuracy together with emergence.
Lee and Spinner (Reference Lee and Spinner2017) conducted elicited oral production interviews with L2 learners. They measured accuracy as well as the emergence of target structures at different stages. Accuracy was calculated by means of obligatory occasion analysis (e.g., Ellis & Barkhuizen, Reference Ellis and Barkhuizen2005), and accuracy of all the target structures that each learner produced was measured, regardless of occurrence of emergence. The study showed that accuracy differed across participants and morphological structures. For instance, participants who had reached Stage 5 had different accuracy rates (from 30% to 91%). Participants at Stage 3 had substantially lower accuracy for Third person singular -s than those at Stage 5. In addition, within Stage 2, Possessive ’s had the highest accuracy (77%), whereas Past -ed had the lowest accuracy (51%) among the four structures analyzed. These results may be due to the different extents to which the underlying routine of the forms had become automatized for different learners. Although the study targeted only several morphological forms, it cast light on the possibility of using the PT framework to investigate accuracy and emergence simultaneously to gain understanding of language learners’ development.
PT in the written modality
Pienemann’s (Reference Pienemann1998) steadiness hypothesis posits that L2 learners’ ability to process morphosyntactic structures is consistent across different oral tasks, if these tasks require the same knowledge and skills. A few studies have examined the topic in written production. For example, in Håkansson and Norrby’s (Reference Håkansson, Norrby and Mansouri2007) study, 20 L1-English students of L2 Swedish completed free writing and English-to-Swedish translation tasks. Oral data were also collected through communicative tasks designed to elicit particular morphosyntactic structures.
The results indicated that participants’ development in written production was parallel to their development in oral production. It is important to note that there was no time constraint in the writing tasks. Thus, unlike spontaneous speech production, the writing tasks allowed the participants to take additional time to use their declarative knowledge and to plan and monitor their production in writing. Despite the difference, participants did not produce higher level morphosyntactic structures in writing. However, the participants’ oral and written production were different in terms of language complexity. For instance, although the participants met the criterion of emergence of subordinate clauses in both the speaking and the writing tasks, they tended to avoid subordinate clauses in the speaking tasks. Based on the results, Håkansson and Norrby (Reference Håkansson, Norrby and Mansouri2007) concluded that L2 learners’ ability to process structures may be stable, as predicted by PT, but that the complexity they produce may differ across the modalities.
Håkansson and Norrby’s (Reference Håkansson, Norrby and Mansouri2007) study was the first to attempt to investigate whether PT can be applied to L2 learners’ written production. However, some elements of the study suggest that the results should be interpreted cautiously and that further research is needed. First, because the participants all were at fairly high levels of development, it is difficult to conclude that processability is identical across the two modalities based on their production. In addition, one of the written tasks was a translation task. This task may not be as burdensome as other communicative tasks, as translating does not require learners to develop ideas and generate morphosyntactic structures based on their ideas. Rather, the learners needed only to translate “pre-made” English sentences into Swedish. Given differences in task difficulty, written production in a translation task may not reflect processability well. Employing similar tasks for speaking and writing would provide a better understanding of the application of PT to the written modality.
The possibility of applying PT to writing remains an open question due to the scarcity of research attempting to do so. In addition, some of the different results across the modalities may be due to methodological weaknesses, such as the use of noncomparable task types. To fill this gap, this study hypothesizes that the L2 developmental sequence is comparable across the oral and written modalities based on their several overlapping characteristics and assesses whether PT predicts the order of emergence for written production as well as oral production by employing comparable oral and written tasks (see also, Polio, Reference Polio2012, Reference Polio2017). In addition, given PT’s perspective on learners’ variability and automatization in their production and inherent differences in the oral and written modalities (i.e., additional opportunity to revise and edit in writing), this study also explores developmental sequences across the modalities.
Research Questions:
-
1. Does the emergence of morphosyntactic structures in L2 learners’ oral and written production follow the developmental order predicted by PT?
-
2. Does accuracy in L2 learners’ oral and written production develop in parallel with the development of emergence of morphosyntactic structures?
-
3. Does task modality affect the development of accuracy?
Method
Participants
The study’s participants were 87 L1-Korean EFL learners (47 female; age: M = 22.97, SD = 2.53), all undergraduate students who had learned English in instructional settings but rarely used the language elsewhere. Thirty had participated in various work/study-abroad programs in English-dominant countries (years: M = .31, SD = .76). The participants were categorized into four proficiency levels (A2–C1 in the Common European Framework of Reference for Languages [CEFR]) based on their standardized test scores (e.g., Test of English as a Foreign Language [TOEFL], International English Language Testing System [IELTS], Test of English for International Communication [TOEIC]; see Papageorgiou Tannenbaum, Bridgeman, & Cho, Reference Papageorgiou, Tannenbaum, Bridgeman and Cho2015 for CEFR equivalencies). The four proficiency groups were high beginner (A2, N = 21), low intermediate (B1, N = 22), high intermediate (B2, N = 22), and advanced (C1, N = 22).
Materials
Speaking tasks
Each of the 11 speaking tasks was conducted one-on-one between the learner and me, and each was designed to elicit a particular morphosyntactic structure that characterizes a stage in PT. The target structures in the tasks were S neg V, Plural -s, Progressive -ing, Possessive ‘s, and Past -ed in Stage 2; Plural NP in Stage 3; and Third person singular -s in Stage 5; and Cancel INV in Stage 6. (Following previous studies, e.g., Di Biase, Kawaguchi, & Yamaguchi, Reference Di Biase, Kawaguchi, Yamaguchi, Bettoni and Di Biase2015, multiple Stage 2 morphosyntactic structures were included to enable an investigation of a developmental sequence within a single stage; however, such intra-stage development is beyond the scope of the current study.) Given that an insufficient number of tasks may limit the number of emerged forms produced, leading to underestimation of development, two or three tasks were designed to elicit each structure to give sufficient opportunities to produce it. For each task, I provided an oral prompt and/or question with relevant visual materials, such as pictures and flyers shown on a computer monitor, along with the time limit for the response. The oral prompt and/or question was sometimes elaborated or repeated when a participant did not understand fully. The pictures were copyrighted images from Pixabay (https://pixabay.com), Openclipart (https://openclipart.org/), LibreShot (https://libreshot.com/), Flickr (https://www.flickr.com/), and Pexels (https://www.pexels.com/).
The specific tasks intended to produce the Stage 2 and Stage 3 structures were of several types. First, two tasks designed to elicit S neg V (e.g., Students should not make a loud voice in a dormitory) focused on making an announcement; and participants took the role of a movie theater manager in one task and a teacher in the other and were shown several pictures on which to base their production for each task. For the first task, they were asked to inform theater audiences of what they were or were not allowed to do during a movie, and, for the second, they were asked to explain a list of rules intended to make riding an elevator safe for students. In each task, participants were asked to produce one or two sentences relevant to each picture. Second, picture description tasks were designed to elicit Plural -s (e.g., cookies), Plural NP (e.g., five bicycles), and Progressive -ing (e.g., Two players are jumping). In these tasks, a series of pictures was given, accompanied by questions (What are they? What are they doing?). Each picture included different subjects and backgrounds. Third, tasks designed to elicit Possessive ’s (e.g., Jane’s shoes) required participants to describe objects in several pictures to answer various questions (Whose is it? What color is it?) and the directions to a particular building using a map that included several houses with owners’ names on them. Finally, two narrative tasks designed to elicit Past -ed (e.g., I enjoyed the time.) required participants to describe past experiences (a winter holiday, first semester at university).
For the Stage 5 structure, three types of tasks were used to encourage participants to elicit Third person singular -s (e.g., Internet provides useful information.). In one task, participants were provided with flyers about a conference and asked to respond to a voice message based on the flyer. Two argumentative tasks required participants to present their opinions in response to particular questions (Do you think that it is a good idea to live with a roommate? What do you think is the most useful way to prepare for a trip?). For the final task, participants were asked to describe one of their family members.
Finally, two tasks were designed to elicit the Stage 6 structure, Cancel INV (e.g., I wondered whether I could park my car), both in the format of leaving a voice message, with the participant taking the role of a transfer student in one task, and as a potential guest of a hotel in the other. After reading through a script of a voice message from a university and a hotel flyer, respectively, the participants were required to leave a short voice message including at least three questions about their upcoming school life or their hotel reservation. These last two tasks did not intend to elicit particular types of questions; however, they were expected to give the participants opportunities to produce Cancel INV.
The time limits varied according to the required processing procedure and expected length of response and were based on time limits for similar tasks in standardized tests and a pilot study. The time limits were intended to elicit spontaneous responses by limiting the use of knowledge and favoring automatic processing in both modalities, following the assumptions of PT (Kawaguchi & Di Biase, Reference Kawaguchi and Di Biase2012; Nicholas et al., Reference Nicholas, Lenzing, Ross, Lenzing, Nicholas and Ross2019; Spinner, Reference Spinner2011), and to make the experimental settings as comparable as possible between the modalities. In particular, unlike interactive tasks in previous PT studies (e.g., interactions between participant and researcher or between two participants; Pienemann, Reference Pienemann1998), where spontaneity is supported by the conversational format in which participants take turns, this study’s speaking tasks were completed individually. Likewise, untimed writing may not favor spontaneous responses, as it allows writers as much time as they like to monitor and revise their production as well as to use outside resources (see also Nicholas et al., Reference Nicholas, Lenzing, Ross, Lenzing, Nicholas and Ross2019). In this regard, the time limits contributed to the establishment of similar experimental conditions in both modalities that encouraged the learners to complete the tasks spontaneously in an unplanned manner.
More specifically, some tasks, such as those designed to elicit S neg V and Possessive ’s, required participants to respond spontaneously right after the oral prompt. In real-time conversation even in L1, a speaker typically takes a second (or two) to respond to an interlocutor’s question and then starts to speak fluently (see also Butterworth, Reference Butterworth1975; Lee, Reference Lee2018). In accordance with this observation, Levelt’s (Reference Levelt1989) speaking model demonstrates that the processing procedure for oral production is incremental, progressing through different stages, which may take at least a brief moment to articulate. In this regard, this study considered a response within five seconds after the oral prompt to be a spontaneous response. Because the tasks asked participants to produce one or two sentences only, the time spent producing was not controlled unless it was obviously excessive. For the argumentative tasks in this study, the time limit was decided based on an argumentative task in the TOEFL Speaking component, which allows 15 seconds for planning and 45 seconds for production (one minute in total). However, in this study’s pilot experiment using a one minute time limit, students at the low intermediate level did not fully complete the argumentative task, and the responses overall lacked contexts in which the Third person singular -s should be used. Therefore, the time limit for the argumentative tasks was extended to one and half minutes, considering the participants’ varied proficiency levels. The longer time allowed meant participants were required to manage the assigned time limit of each task by themselves to plan and produce their own response; however, I sometimes intervened by asking further related questions when participants spent too much time planning (especially in spontaneous tasks) or when participants’ responses were too short. Any production after the time limit was over was excluded from the dataset.
Writing tasks
The writing tasks were also conducted one-on-one. As with the speaking tasks, an oral prompt and/or question was given with relevant visual materials and the time limit, along with any additional guidance needed. The participant completed the tasks on a computer.Footnote 1 The oral prompt minimized the chances of the participants borrowing expressions from the prompts in their writing. The writing tasks were similar to the speaking tasks, with slight adjustments to more closely approximate real-life writing situations. Except for the modality, all aspects of the writing tasks were designed to be as comparable as possible to those of the speaking tasks. Because typing takes longer than speaking, the time limit for each writing task was twice that of the corresponding oral task.Footnote 2 Likewise, in spontaneous tasks, participants’ responses were considered spontaneous if they started to type their response within five seconds.
For the Stage 2 and 3 features, similar to the equivalent oral tasks, several task types were employed. The two writing tasks designed to elicit S neg V required participants, first, to create a flyer to inform students of what they could and could not (or should not) do while living in a dormitory, and second, to write guidelines for a box of contact lenses, both based on pictures. Next, the participants were shown a series of pictures accompanied by questions to elicit Plural -s, Plural NP, and Progressive -ing. For Possessive ’s, participants were asked to describe depicted objects, and to provide directions to someone’s desk using a map of an office. Two narrative tasks required participants to describe the most memorable event they had attended and their last summer vacation with the intention of eliciting Past -ed.
For Stage 5, three types of tasks were designed to elicit Third person singular -s. In one task, participants responded to an email that included questions about a children’s art exhibition based on information in a flyer. In two argumentative tasks, participants presented their opinions in response to particular questions (Do you think it is a good idea to use a laptop during class? What do you think is the best way to find a job?). In a description task, participants were prompted to describe one of their friends.
Finally, for the Stage 6 structure, there were two writing-an-email tasks in which the participants were supposed to act like new city residents and potential train passengers. After reading through an email from a residents’ committee and a summary of a ticket, the participants were asked to write short request emails including at least three questions. Similar to the equivalent oral tasks, these two tasks were not designed to elicit particular types of questions. Rather, they left open the possibility of participants using Cancel INV (e.g., I wondered whether I could book additional tickets for my children).
In the speaking tasks, I sometimes intervened with additional explanation or encouragement to write more. As mentioned, the time limits for the speaking tasks were doubled for the corresponding writing tasks, an arbitrary increase intended to address the fact that typing requires more time than speaking (Table 2).
Procedure
The experiment was conducted in a quiet laboratory room in two sessions. I timed the participants’ responses with an automatic stopwatch, and the participants could monitor the time limits while completing the tasks. The order of the speaking session and the writing session was counterbalanced; thus, half of the individuals completed the speaking tasks on the first day and the other half completed the writing tasks on the first day. The order of the tasks was pseudorandomized to separate those designed to elicit the same target structure by at least three other tasks. On the first day, the participants completed a background questionnaire about their demographic information; either the speaking or the writing tasks; and two working memory capacity tests (i.e., reading span test, operation span test) in counterbalanced order, as interruption tasks to mitigate any effects of their memory of the task types on their production on the second day. On the second day, they completed the tasks in the other mode and an exit questionnaire asking for their reflections on the experiment. They had a short break between each task, and additional short breaks if requested. Participants’ answers in the speaking tasks were recorded with a Sony ICD-UX560F recorder, and Microsoft Word (with the spelling and grammar function blocked) was used for the writing tasks. The entire experiment took approximately two hours, and the participants received $30.
Measures
Emergence
According to PT, L2 development should be measured not by accuracy but by emergence, or “the point in time at which certain skills have, in principle, been attained or at which certain operations can, in principle, be carried out” (Pienemann, Reference Pienemann1998, p. 138). The ability to use a structure systematically and productively, which shows the mapping of form and function in learners’ processing (Pallotti, Reference Pallotti2007), is not confirmed by a single token but by multiple tokens and/or multiple contexts of use. Therefore, following previous studies (Bonilla, Reference Bonilla2015; Jansen, Reference Jansen2008; Pallotti, Reference Pallotti2007; Pienemann, Reference Pienemann1998; Spinner, Reference Spinner2011), the criterion of emergence for this study is arbitrarily set at four tokens and/or contexts of each morphosyntactic structure. Tokens occurring in formulaic language were excluded (Myles, Reference Myles2004). For example, for syntax, if a participant repeatedly used a hedging expression, such as I do not think that…, throughout the experiment, I do not think was not considered one of the contexts of S neg V; it was excluded as probable formulaic language. For morphology, a target item was considered to have emerged if it occurred with varied lexical words in at least four contexts and it was produced in morphological minimal pairs (e.g., dog vs. dogs), in creative constructions (i.e., overgeneralization, such as goed and putted instead of went and put), or with lexical variety (see Pallotti, Reference Pallotti2007). Similarly, a repeatedly used target item in a chunk (e.g., seems in it seems that…) was not counted as one of the tokens of the Third person singular -s.
Accuracy
Following Lee and Spinner (Reference Lee and Spinner2017), this study assesses the accuracy of morphosyntactic structures by means of obligatory occasion analysis (Ellis & Barkhuizen, Reference Ellis and Barkhuizen2005). All obligatory occasions for each structure were identified, and each occasion was then examined to investigate whether the correct structure was supplied. Then, the percentage of the correct forms was calculated. However, more than one use of identical forms in any single context was counted as a single use (e.g., She like XX and sweet food. She like song… counted as a single failure to supply an obligatory Third person singular -s).
This study defines the “accurate” use of a particular structure in three ways. First, when a structure is related to a verb in a sentence (e.g., Progressive -ing, Past -ed, and Third person singular -s), accuracy is determined in regard to tense, aspect, and subject-verb agreement (Ellis & Yuan, Reference Ellis and Yuan2004). Second, the accuracy of other morphological structures (e.g., Plural NP) is coded as correct when the participants used the forms accurately and with correct agreement. Third, the accurate use of syntactic structures (e.g., S neg V) is determined by correct word order and inclusion of all required elements.
Coding and data analysis
Participants’ oral responses were transcribed for the analysis. Grammatical and semantic errors were not corrected. In cases of self-correction of a target structure, only the last token was transcribed, and in cases of repetition, only one token was transcribed.
Two native speakers of English coded the participants’ responses in the transcripts of both modalities (intercoder reliability = .77).Footnote 3 They counted the morphosyntactic structures predicted by PT and judged the accuracy of each usage. For all target structures produced more than four times, I double-checked the transcripts to ensure none of the usages were formulaic and to confirm the emergence of the structure (Pienemann, Johnston, & Brindley, Reference Pienemann, Johnston and Brindley1988). While checking the coded data, I counted the number of cases of emergence in each participant’s response and checked the stage in which each case of emergence occurred.
Following previous PT studies, this study created implicational tables for the participants’ responses in each modality. Implicational scaling is a method of visually representing L2 learners’ developmental patterns in a table format. This arrangement of cross-sectional data allows for the examination of gradual progress over time, which possibly can be interpreted as representing L2 learning (Bonilla, Reference Bonilla2015; Jansen, Reference Jansen2008; Pienemann, Reference Pienemann1998). The first column of an implication table shows all participants arranged in order from less development in the top rows to greater development in the bottom rows. The rest of the columns indicate each stage predicted by PT from earlier stages (i.e., to the left). When the criteria for emergence were fulfilled in a learner’s production, it is marked e (emergence). When the criteria for emergence were not fulfilled in a learner’s production, it is marked N (none). In implicational scaling, a later stage structure that emerges earlier than an earlier stage structure is considered an error. Based on the number of errors, the predictability (coefficient of reproducibility; C of R) and reliability (coefficient of scalability; C of S) of the implicational scales were determined (Hatch & Lazaraton, Reference Hatch and Lazaraton1991). C of R indicates the extent to which an implicational scaling predicts each participant’s development based on his/her rank in the implicational scaling, and a C of R over 0.9 is considered to show acceptable predictability. The C of S is then calculated, with a C of S above 0.6 considered to confirm that an implicational scale is reliable. For accuracy, this study highlights emerged structures with accuracy over 0.75 (Lee & Spinner, Reference Lee and Spinner2017) in the implicational scales, which allows for investigating the extent to which the emerged structures are accurately used in learners’ responses. Thus, the implicational tables for accuracy include target structures rather than the developmental stages predicated by PT in the columns. The order of the target structures in each stage in the columns was decided based on the number of cases of emergence across the participants’ responses. The emerged structures that met the criterion of accuracy (i.e., 0.75) are in bold and highlighted.
To maintain consistency, the number of cases of emergence and the number of emerged structures whose accuracy is over .75 were considered to decide the order of the participants in the implicational tables. As discussed, participants with lower emergence numbers appear earlier in the tables, and participants with the same numbers are ordered based on ID numbers (from earlier to later), number of produced structures meeting the criterion of accuracy, and proficiency levels (from A2 to C1).
Next, a generalized mixed model was performed to understand whether emergence patterns differ between the oral and the written modalities, and a linear mixed model was performed to explore whether accuracy patterns differ between the modalities. Because the accuracy of the emerged structures in higher stages (e.g., Third person singular -s in stage 5) was usually lower than the criterion, a statistical analysis would have a large number of missing values, and the results might not be generalizable. Therefore, the linear mixed model was performed only with the five most frequent structures. Maximum likelihood estimation was used for both models.
Results
The data for analysis consisted of 2,816 oral responses from 87 participants and 2,752 written responses from 86 participants (the written responses of one participant were removed due to errors in the data). The average number of words in each proficiency level in tasks designed to elicit a particular structure is shown in Table 3. Overall, the participants produced more words in the speaking tasks (β = –.14.12, SE = 1.31, t = –10.74, p < .001); participants with higher proficiency also produced more (e.g., C1: β = 28.35, SE = 3.46, t = 8.21, p <. 001) (Table 4).Footnote 4 Implicational scales were created using Microsoft Excel 2019, and other statistical analyses were computed, using the correlation and lmer4 package in R (R Core Team, Reference Team2018).
Emergence
PT posits that production of a specific structure in a stage confirms L2 learners’ acquisition of the processing procedure in the stage. This study chose the most frequently emerged structure at Stage 2 (Plural -s) for the implicational scaling. Therefore, the target structures were a single word (Stage 1), Plural -s (Stage 2), Plural NP (Stage 3), Third person singular -s (Stage 5), and Cancel INV (Stage 6). Figure 1 shows the implicational scaling of oral and written responses. For the participants’ oral responses, there were 435 available instances of emergence in the implicational scaling, and 1 error (C of R: 1; C of S: 0.99). The results confirmed that the implicational scaling successfully predicted the participants’ L2 development in the oral modality with reliability. In particular, the structures in earlier stages emerged in most of the participants’ production, whereas the structures in later stages emerged only in some advanced learners’ production. For instance, emergence of structures in Stages 1 and 2, which require a lower level of processing procedures, occurred in all participants’ responses, but emergence of structures in Stage 6, which require a higher level of processing procedures, was only found in five participants’ responses.
For the participants’ written responses, there were 430 available instances of emergence and 1 error in the implicational scaling (C of R: 1; C of S: 1). Similar to the results in the oral modality, these results indicate that the order of the participants’ L2 development in their written production could be described accurately and reliably by PT. The emergence of the target structures in the participants’ written production was similar to that in their oral production. The structures emerged gradually and in the predicted order. However, in some cases, structures emerged later in the written modality. For instance, a higher percentage of participants failed to show acquisition of the processing procedures in Stage 3 in their written responses than in their oral responses (6.98% in the written modality vs. 1.14% in the oral modality). Likewise, a lower percentage of participants showed the emergence of higher stage structures in the written modality than in the oral modality; specifically, the Third person singular (Stage 5; 24.41% in the written modality vs. 39.08% in the oral modality) and Cancel INV (Stage 6; 2.32% in the written modality vs. 5.74% in the oral modality).
Overall, the emergence patterns in the two modalities were similar. A greater number of participants showed emergence in earlier stages, with decreasing numbers of participants showing emergence of structures at later stages. In addition, the results of a majority of participants were consistent between the oral and written modalities (N = 56); in other words, emergence occurred at the same stage across the modalities. Specifically, participants in the A2 group reached Stage 3 in both modalities, but some participants reached Stage 5 starting at the B1 level (B1: 11.76%; B2: 26.67%; C1: 81.81%) and some of the C1 group even reached Stage 6 (9.09%), again in both modalities. However, some individual participants showed the emergence of more structures in the oral modality than in the written modality (N = 25), and vice versa (N = 5), although the number of the participants who showed consistent results between the oral and written modalities was statistically greater than that of the participants who showed inconsistent results [ $ {\unicode{x03C7}}^2 $ (1, N = 86) = 7.86, p = .005].
To explore the similarity and difference between the implicational scales for the two modalities, Pearson correlations were conducted with the number of participants who had reached each stage (Figure 2). For instance, 81 (out of 86) participants showed the emergence of the Stage 3 structures in their written responses; thus, the number of participants counted as having reached Stage 3 was 81. The results of the correlation analysis indicated that the implicational scales for the two modalities were correlated for all participants (r = .99, p = .001).
The differences between the two modalities were not significant; however, at all proficiency levels there was slightly higher emergence in the oral modality than the written modality at Stages 3, 5, and 6, with the greatest cross-modality difference in the production of the Stage 5 structure.
A generalized mixed model was performed with emergence of a structure (emergence, no emergence) as a dependent variable, modality (oral, written) as a fixed effect, and participant as a random effect. Oral modality served as the reference level. Modality was not significant in this model (β = –.21, SE = .14, z = –1.49, p = .14), indicating that the pattern of the gradual emergence of the target structures was similar in the two modalities.
Furthermore, the ratio of emergence and a set of Cs of R were calculated for individual structures in the two implicational tables (Table 5). The ratio of emergence and the Cs of R for individual structures indicate the extent to which each element occurs similarly or differently between the modalities and whether predictability of a particular structure being produced differs depending on modality. In other words, these analyses had the potential to reveal subtle similarities and differences between the modalities. The ratio of emergence and the Cs of R seem comparable between the oral and written modalities. In particular, although the percentage of emergence was slightly higher in the oral modality in Stage 2, overall, the same order of emergence appeared in both modalities: Plural -s, Progressive -ing, S neg V, Possessive ‘s, Past -ed. In addition, in both modalities, the Cs of R for all structures except for Past -ed were over. 9. Past -ed showed a lower C of R than the criterion, suggesting that reaching a certain developmental stage does not necessarily mean that all possible structures in this stage emerge, which is consistent with previous PT studies (e.g., Di Biase et al., Reference Di Biase, Kawaguchi, Yamaguchi, Bettoni and Di Biase2015).
Accuracy
Following Lee and Spinner (Reference Lee and Spinner2017), accuracy below. 75 is considered to indicate that a learner cannot consistently produce the structure in question accurately. Figure 3 illustrates the accuracy of the participants’ oral production together with emergence. The gray cells indicate emerged structures, with the number in bold for accuracy over. 75. Empty cells show that a structure has not emerged (thus accuracy cannot be calculated). Zero indicates consistently inaccurate production. The analysis considered only the accuracy of emerged responses.
The oral responses showed emergence of the early-stage structures (e.g., Plural -s and Progressive -ing) for most participants, but accuracy varied. Overall, the participants who showed greater development produced more of the early-stage structures more accurately. The number of emerged structures that met the criterion of accuracy is limited in the top rows (e.g., N = 0 for participant 69) and increases steadily in the bottom rows (e.g., N = 5 for participant 25). In other words, participants’ responses showed a gradually increasing ability to use these structures accurately. In addition, compared to the structures in the later stages, the participants produced the structures in the earlier stages more accurately. For instance, both S neg V (Stage 2) and plural NP (Stage 3) emerged in most of the participants’ responses (N = 85 and N = 86, respectively). However, 69 participants (78%) produced S neg V with accuracy over. 75, whereas only 31 participants (34%) reached the criterion in the production of Plural NP (34%).
In the written responses as well, accuracy in the production of a particular structure increased along with the participants’ L2 development. However, compared to their oral responses, in their written responses the percentage of the emerged responses that met the criterion of accuracy was relatively stable, regardless of the difficulty of the structures; in other words, the percentages were similar for structures in earlier and later stages (Table 6; Figure 3).
Note: Accurate forms are bold; inaccurate forms are underlined.
To explore differences in general trends of accuracy between the oral and written modalities, the next analysis considered only the accuracy of the five most frequently emerged structures. In a linear mixed model, accuracy was the dependent variable; modality (oral, written) and structure (S neg V, Plural -s, Progressive -ing, Possessive ’s, Plural NP) were fixed effects; participant was a random effect. The oral modality and Plural NP served as reference groups. Depending on type of structure, accuracy differed. The difference in accuracy of each structure seemed aligned with the order of the designated columns for the structures in the implicational table. In particular, Plural NP, which appears in the rightmost of the columns for the five structures in the implicational table, revealed lower accuracy than S neg V (β = .15, SE = .01, t = 10.35, p <. 001), Plural -s (β = .09, SE = .01, t = 5.89, p <. 001), and Possessive ’s (β = .10, SE = .02, t = 6.93, p <. 001). However, there was no significant difference in the accuracy of Plural NP or of Progressive -ing (β = –.01, SE = .01, t = –.83, p = .41). More importantly, the participants produced the target structures more accurately in the written mode than in the oral mode (β = .07, SE = .01, t = 7.08, p <. 001) (Table 7).
Discussion
This study assesses the applicability of PT to L2 writing by comparing how well PT predicts development in oral and written modalities. In addition, the study compares the accuracy of the morphosyntactic structures that correspond to PT’s stages in learners’ oral and written production. A series of communicative tasks required participants to produce spontaneous responses in oral and written modalities. The results of this cross-sectional study imply that the structures in the earlier stages of PT are likely to be produced earlier than the structures in the later stages and that learners become more and more able to produce the structures in later stages as their L2 develops. With a few negligible exceptions, the emergence of structures showed developmental patterns to be comparable between the oral and written modalities. Furthermore, accuracy seemed to be aligned with emergence: Structures that emerged earlier were more likely to be produced with higher accuracy than structures that emerged later. In addition, the results suggest that particular structures are not necessarily produced with accuracy when they first emerge. The learners showed a gradual increase in accuracy as their L2 development progressed. This gradual development in accuracy was found in both oral and written modalities but accuracy was higher, and increased more rapidly, in the written modality than in the oral modality.
Emergence
Although each modality has unique characteristics (e.g., Gilabert et al., Reference Gilabert, Manchón and Vasylets2016; Williams, Reference Williams2012), this study demonstrates that a single framework can be used to assess L2 learners’ language development in their written production as well as their spoken production. While some differences between oral and written modalities were observed, the overlap between the two modalities led to comparable developmental patterns. Specifically, the participants revised and edited their responses more freely in the writing tasks than in the speaking tasks (Byrnes & Manchón, Reference Byrnes, Manchón, Byrnes and Manchón2014; Granfeldt, Reference Granfeldt, van Daele, Housen, Kuiken, Pierrard and Vedder2008). Although the participants also corrected some of their responses in the speaking tasks, this behavior was limited to when they noticed having made a grammatical error right after uttering it. In this case, the learners usually self-repaired, such as in the man ride riding is riding. It is often assumed that L2 learners more frequently avoid producing structures at higher levels when speaking than when writing (Kuiken & Vedder, Reference Kuiken, Vedder and Robinson2011; Son, Reference Son2022; Tavakoli, Reference Tavakoli, Byrnes and Manchón2014), but this study’s participants appeared neither to avoid producing higher level structures in the speaking tasks nor to attempt to produce more higher level structures (over the criterion of emergence) in the writing tasks. Thus, some differences between oral and written modalities did not affect the overall developmental patterns.
The study sought to address some limitations in the few theory-based previous studies of L2 language development in writing (Boss, Reference Boss2008; Byrnes & Sinicrope, Reference Byrnes, Sinicrope, Ortega and Byrnes2009; Weissberg, Reference Weissberg2006). Overall, the previous studies left open the possibility of extending one of the theoretical frameworks of L2 development based on oral production to written production, although they also reported some inconsistencies with the predictions of the frameworks, which may be due to these studies’ use of (a) insufficient numbers of tasks designed to elicit particular structures, (b) non-communicative tasks, and/or (c) inappropriately designed tasks for the purpose of eliciting particular structures. In addition, no previous study has sought to compare the application of a certain theoretical framework of L2 development in oral and written modalities.
Hence, to complement and extend the aforementioned research, this study designed a series of communicative tasks, each intended to elicit one of the morphosyntactic structures predicted by PT to align with the developmental stages. With some slight modifications, identical task types were used in the two modalities. With this methodological improvement, this study revealed similar developmental patterns between the oral modality and the written modality and provided evidence for L2 language development in the written modality. The study’s application of PT’s theoretical framework to both oral and written modalities is an important leap in the research on language development.
This study’s demonstration of the possibility of using PT to investigate L2 learners’ development in the written modality also supports Pienemann’s (Reference Pienemann1998) steadiness hypothesis, which posits that, in oral production, L2 learners are able to process morphosyntactic structures consistently regardless of type of communicative task, if the tasks are designed to use the same knowledge and skills. The current study’s use of comparable tasks suggests that such consistency in oral production might extend to written production. It is assumed in this study that the best way to observe parallels between oral and written production is to use tasks that encourage the use of similar types of knowledge and skills. However, in some cases L2 learners may be able to deploy explicit knowledge and metalinguistic L2 knowledge in writing but not in speaking, due to the possibility of revision in writing (e.g., Schoonen et al., Reference Schoonen, Snellings, Stevenson, Van Gelderen and Manchón2009; Williams, Reference Williams2012). The time constraints in this study were intended to reduce the possibility of the learners using explicit knowledge in their written production, encouraging spontaneous production and leading them to rely more on L2 implicit knowledge in both modalities.
The use of comparable tasks and time constraints encouraged L2 learners to employ the same knowledge in both modalities, in keeping with the steadiness hypothesis, which I argue can now be extended to certain writing tasks. Whether the developmental order of PT is found in other types of writing tasks, such as essays with multiple revisions, has yet to be investigated, which calls for further research. In particular, this study doubled the time limits of the speaking tasks to set time limits for the writing tasks in order to make the experimental settings comparable. However, if future research can determine longer time constraints that still maintain the spontaneity of written responses, the allowance of additional time to revise might show more emergence of structures at a higher level in written production.
Håkansson and Norrby (Reference Håkansson, Norrby and Mansouri2007) also argued that L2 learners followed the developmental order predicted by PT in their writing. However, the current study’s findings provide clearer evidence, for a few reasons. First, Håkansson and Norrby tested only advanced learners, while this study included a wide range of proficiency levels, enabling it to observe that learners followed the developmental order predicted by PT. This finding means that the applicability of PT’s prediction to L2 written production can be more widely generalized. In addition, Håkansson and Norrby used translation tasks to assess writing, while this study utilized similar communicative speaking and writing tasks, so that, in both modalities, the participants had to come up with their own ideas and produce morphosyntactic structures by themselves within the designated time.
Accuracy
The development of accuracy may be relevant to PT’s perspective on automatization (Pienemann, Reference Pienemann1998). L2 learners may need some time for underlying processing routines to become automatized; even after a particular morphosyntactic structure emerges in L2 learners’ production, its accuracy is not guaranteed. This study’s results showed that L2 learners were able to produce morphosyntactic structures more accurately along with their further L2 development, as shown in the emergence of structures, and that their accuracy with structures that appeared earlier was higher than their accuracy with structures that appeared later. These results were similar between the two modalities.
However, the L2 learners reached the criterion of accuracy and showed greater stability of accuracy, much earlier in their written production than in their oral production. These results indicate that while the different characteristics of the oral and written modalities may not have changed the overall pattern of emergence (Research Question 1), they may lead to differences in accuracy (Research Question 2). Writing, even with time constraints, allows learners opportunities to plan, check, and edit grammatical errors in their production (Kellogg, Reference Kellogg, Levy and Ransdell1996, Reference Kellogg2001; McCutchen, Reference McCutchen1996; Och, Reference Ochs and Givón1979). In addition, the written modality both allows and encourages L2 learners to use metalinguistic knowledge to monitor their language and accuracy (Williams, Reference Williams2012), which may lead to greater accuracy compared to speaking, in which they rely on implicit knowledge (Polio, Reference Polio2012). Furthermore, as Schoonen, Snellings, Stevenson, & Van Gelderen (Reference Schoonen, Snellings, Stevenson, Van Gelderen and Manchón2009) argued, spoken language is more tolerant of errors than written language, which may lead L2 learners to monitor their language more thoroughly in written tasks than in oral tasks. Hence, the unique characteristics of the written modality may have led this study’s participants to pay more attention to the structures they produced and to correct more grammatical errors in the written tasks than in the oral tasks.
The differences in accuracy between oral and written production are also partially aligned with some previous studies that demonstrated the effects of task modality on L2 learners’ production (e.g., Kormos, Reference Kormos, Byrnes and Manchón2014; Kormos & Trebits, Reference Kormos and Trebits2009; Tavakoli, Reference Tavakoli, Byrnes and Manchón2014; Zalbidea, Reference Zalbidea2017). Although the current study employed different measures for accuracy (i.e., means of obligatory occasion analysis vs. error per T-unit; see also, Polio & Shea, Reference Polio and Shea2014), its findings are consistent with those of the previous studies and provide more evidence for the effects of the written modality’s characteristics that can lead to greater accuracy.
Conclusion
This study sheds light on the possibility of using the same theory to investigate L2 language development in the oral modality and the written modality. The study used similar tasks in oral and written modalities and compared the results to provide a better understanding of the applicability of a language development theory, PT, to writing. The results suggest that L2 linguistic knowledge may be developed in similar ways regardless of modality and demonstrate that theory-based investigations of language development are possible in the written modality as well as in the oral modality.
In addition, the study assessed whether accuracy is aligned with the emergence of morphosyntactic structures. The results showed a gradual progress in accuracy, with patterns similar to those of emergence, although the written modality exhibited a more consistent progression, possibly due to the opportunities to revise and edit writing.
This study has some limitations, which point to directions for further research. First, the time constraints for both task types were intended to encourage the learners to respond spontaneously. However, previous PT studies have not provided a principled way of determining appropriate time constraints for written tasks. Hence, the time constraint for each written task was set, rather arbitrarily, at double the time for the task’s oral counterpart, considering the different speeds of typing and speaking. It is possible that the different time constraints between the modalities unintentionally led to different degrees of spontaneity, specifically, less spontaneity in the learners’ written production. Further research should investigate the optimal time constraints for eliciting spontaneous production in both modalities as well as whether PT can be extended to L2 writing without time constraints.
Next, this study did not analyze L2 learners’ online writing behaviors, but doing so could provide concrete evidence for the effects of the different characteristics of the two modalities on accuracy. The study assumed that the unique characteristics of writing affected accuracy. However, measuring online writing behaviors with the use of keystroke logging (e.g., Leijten & Van Waes, Reference Leijten and Van Waes2013), stimulated recall (e.g., De Silva & Graham, Reference De Silva and Graham2015), or both (e.g., Révész, Kourtali, & Mazgutova, Reference Révész, Kourtali and Mazgutova2017) would provide information on whether and how L2 learners actually do revise and edit their grammatical errors while writing, and whether such behaviors do lead to higher accuracy in their writing than in their speaking.
Finally, this study followed Lee and Spinner (Reference Lee and Spinner2017) in setting 75% as the criterion of accuracy. Because previous PT studies did not explore the accuracy and emergence of morphosyntactic structures together, there is little evidence for whether this criterion is valid. Thus, the validity of the criterion of accuracy is another issue for future research to address.
Acknowledgements
I would like to thank Patti Spinner and several anonymous SSLA reviewers for their constructive and insightful comments on various aspects of this article during the whole process. Any problems remain my own.