1. Introduction
Few would argue against the importance of reading for language learning, which is why reading is the cornerstone of language learning curricula around the world. Few would disagree with the fact that the more students read, the better they read, and that fostering a reading habit outside of school (or work) is essential for life-long learning. However, in spite of a growing body of research, extensive reading (ER) continues to be “an approach less taken” (Day & Bamford, Reference Day and Bamford1998, p. 3) in both second and world language learning contexts for a variety of reasons, some of a practical nature, some of a personal nature, and some of a theoretical nature. Nonetheless, ER researchers and practitioners alike are still wondering, “Why isn't everyone doing it?” (Grabe, Reference Grabe2011; Macalister, Reference Macalister2010).
Ideally, ER is an activity that involves students self-selecting a large volume of reading texts that are well within their linguistic competence for purposes of information or pleasure. In so doing, the learners focus on language for meaning and not for structure. They implicitly build up knowledge of vocabulary, syntax, and text structure while increasing their fluency, which promotes comprehension. ER is quite different from the kinds of activities that students often have in classrooms, activities such as explicit vocabulary instruction, translation, answering discrete comprehension questions, and examining syntactic features of a relatively short text. These activities have often been described as intensive reading (IR). While the distinction between ER and IR is stark when presented this way, the reality of much classroom-based language learning is that without considerable teacher guidance and supportive transitional activities, students are not likely on their own to reach self-motivated independent ER either in or out of the classroom.
While classroom-based language learning has for generations adhered mostly to models of explicit instruction, today the preponderance of usage-based theories of language learning suggest that much more attention should be given to the role of implicit learning while engaging in meaningful communicative activity. (For a review of various usage-based models of language acquisition, see Ellis, Reference Ellis2019.) Frequency of experience is foundational for implicit learning since language learning requires massive exposure of “words, morphemes, lexico-grammatical-functional patterns and of the probabilistic relations between them and their functions, their speakers, their contexts, and their genres” (Ellis, Reference Ellis2019, p. 49). ER is not the only activity that contributes to this requirement for language frequency, but it is a highly effective and practical source of it when language learning occurs primarily in the second language (L2) classroom.
In spite of the theoretical support for ER, there continue to be perceived and real obstacles. Ewert (Reference Ewert, Dressman and Saddler2020) enumerated some of these for the various stakeholders who might be involved. Students may resist doing large amounts of independent reading because it has never been attempted before. Finding just the right materials – interesting and at the right proficiency level – may also be a challenge for them. There are also those students who are not avid readers in their primary language and have never experienced reading as pleasurable. Additionally, their previous IR experiences may have led them to internalize a belief that they need to read difficult texts to learn vocabulary and grammar.
Instructors can also face serious difficulties even if they have become convinced of the efficacy of ER. Perhaps the most challenging obstacle is when they are expected to teach a particular curriculum in a lockstep fashion. Where can they fit in even a bit of training for later out-of-class reading? Materials are another difficulty depending on where in the world you are and which language is needed. Instructors also struggle with giving the students’ autonomy in their reading choices and trusting the students to honestly report what they have read. In many educational contexts, administrators, parents, teachers, and sometimes even students insist that ER be checked in some way to avoid cheating and to confirm some level of comprehension, and this can easily lead back to IR activity, thus limiting the amount of reading possible. Finally, instructors are sometimes worried that students, parents, and administrators may make the false assumption that the teacher is either lazy or incompetent if they find students and the teacher quietly reading in a classroom.
Another factor that constrains administrators and instructors from adopting ER as a viable component of a reading curriculum is the current emphasis on testing and assessment. The more easily-measurable outcomes of explicit instruction can make an argument for implicit learning challenging. Sometimes questions are also raised about the quality of materials common to ER programs. While using unsimplified language is a good source of input for language learning, texts that were written by and for advanced primary language users are typically beyond the linguistic competence of language learners. Those who insist on only using such texts inevitably turn reading into a language lesson. On the other hand, investigations of materials written for language learners indicate that many of these texts are well-written, and if they tell “a good story” and are accessible linguistically, students will read them (Claridge, Reference Claridge2012).
Hopefully, recent scholarship will contribute to the evidence for the efficacy of ER for language learning thus mitigating some of these obstacles. This review focusses on “classroom-based ER in L2 contexts,” which differs from the application of ER for first language (L1) learners in a number of ways: (1) While the objectives of reading for L1 learners can be simply for pleasure, for information, for school, or for work-related reasons, the added motivation for ER with L2 learners is often seen as a means of learning the target language; (2) L2 learners are learning an additional language that is often not used in daily life in their living context. Thus, their only input is through their L2 classrooms; (3) Even L2 learners living in a region where the target language is the dominant means of communication may not use the target language except in the classroom since their families, friends, and sometimes their entire neighborhood communicate in their primary language; (4) While L2 learners in a target language setting may of their own volition decide that they wish to study the target language as an independent activity, most will only receive instruction in a school environment, often alongside L1 speaking children, with no allowance or consideration given to the fact that they may still be struggling with the target language.
Research on ER in recent years has addressed some of these concerns by investigating research on ER topics such as motivation, vocabulary learning, fluency, comprehension, materials, and implementation, in classroom-based contexts. The vast majority of this research on ER continues to be undertaken in Asian contexts of English language learning.
Although publications on English ER in other parts of the world are increasing in number, many are non-systematic evaluations of implementation efforts. Research on ER in languages other than English is also emerging, but only those published in English or Japanese were accessible to the authors for this review.
Although there are some notable successes in ER research in the recent past, this review will also demonstrate some of the ongoing dilemmas of this area of research. The nature of instructional contexts precludes laboratory-style quantitative experiments valued by some as the standard for valid and reliable research vis-a-vis language acquisition. Small class sizes, intact classes, often with a mandatory textbook, and little time outside of class due to the pressure of high-stakes tests also conspire against the type of robust studies scholars prefer. There is also the ethical dilemma of withholding the practice of ER from a group of students for experimental purposes if one already finds prior research or experience a compelling reason to implement ER.
This review is organized around the major foci of ER research: motivation, vocabulary learning, reading rate, comprehension, and materials. Each section reviews multiple studies in recent years and concludes with lessons learned and possible future research directions.
2. ER and motivation
Virtually all teachers involved in literacy instruction view the need to imbue an intrinsic desire to read in their students as the ultimate goal of their instruction. This has always been difficult to accomplish in L1 settings, and even more difficult in L2 contexts. Nonetheless, research on motivation and ER has consistently indicated a strong positive correlation. The contexts of research on motivation and ER continue to expand and some studies have developed more nuanced measures that illuminate the changing nature of motivation through a course of ER.
Using a dynamic systems approach, de Burgh-Hirabe and Feryok (Reference De Burgh-Hirabe and Feryok2013) analyzed motivation in their qualitative study of nine adolescent students learning Japanese in New Zealand. While this approach (Larsen-Freeman & Cameron, Reference Larsen-Freeman and Cameron2008, cited in de Burgh-Hirabe & Feryok, Reference De Burgh-Hirabe and Feryok2013) has been applied broadly to L2 learning, this is a novel approach to investigating motivation for ER. Examining the students’ ER behaviors over the period of eight months, de Burgh and Feryok were able to observe the various changes in motivation that can occur between the “preactional phase,” the “actional phase,” and the “postactional phase” drawing from Dörnyei and Ottó (Reference Dörnyei and Ottó1998). Although all the students expressed positive motivation to engage in ER at the beginning of the project (participation in the project being voluntary) and again when they evaluated the project in the postactional phase, three distinct patterns were observed within the actional phase. Four students increased in their motivation, indicated by increased reading. Three diminished in their enthusiasm, and thus their reading activity, and two maintained the same low level of engagement throughout.
Of relevance for future investigations of motivation and ER is the way in which diverse internal and external factors conspired to either enhance or mitigate the students’ commitment to ER. These factors included students’ goals, perceived progress and feelings of success, beliefs about L2 learning, autonomy, external demands (such as from exams), distractions (such as sports and friends), and self-regulation. De Burgh and Feryok conclude that while this variability in motivation must be considered normal, lots of materials (topics, levels), self-selection of reading materials, and dedicated in-class time for reading would increase positive motivation and reduce distractions and other demands on time. While the specific factors influencing motivation noted here are likely to reflect the local context, the model of “complex and dynamic motivation” is likely to be useful in further studies of motivation and ER.
Takahashi (Reference Takahashi2018) investigated the impact of out-of-class ER activities of learners of Japanese at the University of Belgrade (Serbia) on learners’ understanding of ER and motivation. It was clear that the respondents gained an understanding of the purpose of ER – to improve their reading ability and vocabulary and to get used to reading books in Japanese. In addition, 95% of those who had experience in ER found it useful for their own learning of Japanese. When asked about their views regarding the ER rules of “reading from the easy one” and “reading another book when it gets stuck,” the respondents were generally positive. On the other hand, the rules of “reading without looking up a dictionary” and “skipping what you don't understand” were partially resisted by those with little ER experience.
In another investigation of ER effectiveness and motivation, Hardy (Reference Hardy2016) found that a seven-week elective Spanish ER course for college-level Spanish language learners led to evidence of an increase of intrinsic motivation and statistically significant improvement on one proficiency measure. Hardy used a pre-post Spanish proficiency measure along with student questionnaires regarding attitudes towards reading in Spanish. While the students did self-select the elective course, and the number of participants in the study was small, the results were highly encouraging. One aspect of implementation was initially quite challenging, and that was developing an adequate library of graded readers at various levels of proficiency and for diverse interests.
Day and Bamford's (Reference Day and Bamford2002) ten “principles” of ER, although not based on empirical evidence, have been powerful guidelines for implementers of ER. Reading for pleasure, reading without “testing,” or not completing a text if uninteresting, for example, have mitigated against making ER compulsory in the graded curriculum. Van Amelsvoort (Reference Van Amelsvoort2017) for instance, mentions that 30% of his students did not read even one book during the entire semester. He attributed this non-compliance to not having made the ER activity compulsory. In successive years, he modified his policy with a clear improvement in the number of students and volume of material read. Van Amelsvoort argues that some form of “extrinsic motivation” needs to be applied in early stages with the hope that it will eventually give way to “intrinsic motivation” and that tracking student progress towards a final word count goal allows the students to set their own interim goals. White and Mulder (Reference White and Mulder2016) confirm the necessity of external motivators for ER. Their students were given the opportunity to use Raz-Kids (https://www.raz-kids.com) for online reading practice (“mLearning” in their parlance), but they found that a gap grew between those students who were doing the assignments and those who were not. They assert that “it is not enough to be fun and engaging; without some way of enforcing compliance, busy students and parents are unlikely to take advantage of mLearning opportunities” (p. 21).
Although ER has been shown to influence some students to develop a love of reading without requirements (Cheetham et al., Reference Cheetham, Harper, Elliott and Ito2016; Demirci, Reference Demirci2019), and there is some evidence that intrinsic motivation to read can develop from initial required reading (e.g., Demirci, Reference Demirci2019; Mikami, Reference Mikami2020; Puripunyavanich, Reference Puripunyavanich2021), still, extrinsic motivators are often required. Students are inclined to prefer non-language learning activities in their discretionary time, yet most will do whatever reading is assigned and assessed. For this reason, much more attention recently has been given to investigating the benefits of extrinsic inducements to motivate ER. The following section will examine various forms of extrinsic motivators.
2.1 Extrinsic motivators and incentives
In this section, we discuss two related categories: “extrinsic motivators” that can encourage students to read more through specific class requirements, and “incentives” that are pedagogical strategies to increase students’ intrinsic motivation to read. The most common motivator is simply a percentage of the grade based on how much has been read in terms of number of books, pages, or words. See Meniado (Reference Meniado2021, pp. 231–232) or Van Amelsvoort (Reference Van Amelsvoort2017) for further discussion of the importance of establishing a reading requirement. Also, see Taib et al. (Reference Taib, Nair, Gopalan and Sedhu2022) concerning the 20-year-old government-mandated Malaysian program that has had no impact, partially due to the fact that there was no grade attached to the reading. They found that “[students] feel lazy to write and keep records because they know it does not carry any marks and they are more focused on other academic activities” (p. 48).
2.1.1 Goal setting
Establishing an attainable goal is critical in making ER a required component of the curriculum. Mikami (Reference Mikami2020) found that:
When students used goal setting effectively, they felt a sense of achievement, enhanced their intrinsic motivation and self-efficacy, and formed a virtuous cycle leading to new goals. On the other hand, when students were unable to use goal setting effectively, they repeatedly failed to achieve goals and seemed less motivated to read. (p. 28)
McLean and Poulshock (Reference McLean and Poulshock2018) experimented with three groups with different reading goals: (1) a weekly word target of 2,500 words, (2) 15 minutes of Sustained Silent Reading (SSR) in class, and (3) one book a week. They report that the word-target group internalized extrinsic motivation from the word-targets, and this led them to do more free reading and increase their reading self-efficacy more than the other groups.
Sakurai (personal communication, September 11 2021) states that when setting a word-target goal, as long as the amount is within reason, the specific amount does not appear to matter. She reports that she did not actually change the goal, but rather than stating it as a minimum of 80,000 words, which was 60% of the target, she reworded the requirement saying that they needed 133,333 words to get a mark of 100%. “Surprisingly, most 1st Years reached the [maximum] goal while [s]tudents in previous years, who were seeing the minimum amount as a goal, stopped reading as soon as they read that amount.”
Yang et al. (Reference Yang, Majumdar, Li, Flanagan and Ogata2022) created a “learning dashboard” where students could record their outside reading. In a trial with two groups of high school students in Japan, the group that used the dashboard for self-directed reading read significantly more.
2.1.2 Written response
Writing full-blown summaries or book reports has a long history, but has not been generally favored in ER approaches since it creates a considerable burden in time for the student to write and the teacher to check. It is also conducive to cheating and reduces the amount of time available for reading, thus, demotivates the students and discourages further reading. Nation and Waring (Reference Nation and Waring2020) mention “book reports” as one method to provide meaning-focused output with the caveat that “output should only play a minor role in extensive reading” (p. 14). White (Reference White2019), however, used various forms of brief written reports, and found them useful for monitoring reading and supporting discussion.
These short reports were also valued positively by students in a term-end questionnaire, with many students acknowledging “that they would not complete the reading assignment if such reports were not assigned” (p. 3).
2.1.3 Book sharing
Whole class or small group sharing of books read is a common follow-up activity since it is one means of assuring that the students have read their books, but in addition, if the discussion is carried out in English, it provides an opportunity for productive use of the language. Singkum and Chinwonno (Reference Singkum and Chinwonno2021) report that students enjoyed the opportunity to share their reading and to receive feedback from the teacher and peers. As a caveat, assuming that the students are reading one book or more per week, it might be physically impossible for all students to share all of the books that they have read.
2.1.4 Quizzes
There is now considerable evidence that quizzes are an effective means to motivate students to read. Stoeckel et al. (Reference Stoeckel, Reagan and Hann2012) laid to rest the objections of instructors that adhere to Day and Bamford's (Reference Day and Bamford2002) principle that “Reading is its own reward” was essential for an effective ER program. Their analysis “suggests that extensive reading quizzes do not impact reading attitudes” (p. 193). Subsequent papers by others, such as Al Damen (Reference Al Damen2018) and Zhou and Day (Reference Zhou and Day2021), have reported that quizzes were an essential component of their ER program and an effective class management tool. While it is possible to create quizzes for ER materials, for those with internet access, web-based quiz platforms – such as MReader (https://MReader.org) and Xreading (https://xreading.com) – have become extremely useful. With either system, the quizzes are used to check basic comprehension and confirm that the students have done their claimed reading. The scores on the quizzes are normally not a component of the course assessment. McBride and Milliner (Reference McBride and Milliner2016) evaluated the efficacy of the MReader on their students’ reading. They observed that students felt a sense of accomplishment after passing a quiz, more than if they had simply written a book report. They also felt that it was pedagogically effective, allowing them to manage their ER implementation with a large number of students.
Numerous recent studies involving online systems, such as MReader or Xreading, have attested to the value of a gradual increase in intrinsic motivation over the course of the class term. Al Damen (Reference Al Damen2018) and Cheetham et al. (Reference Cheetham, Elliott and Tagashira2022) are two representative studies.
An unanticipated result of using a web-based ER quiz application, however, is that the recorded word count can become a motivating force for many students, particularly those who enjoy competition. Demirci (Reference Demirci2019), who teaches in a foundation program at a large university in the United Arab Emirates that used MReader with its students, found that a leaderboard, publicly posted weekly from the MReader results, was an essential component of their reading program. Teachers also found other innovative ways to incorporate ER into activities within the classroom (Demirci, Reference Demirci2019).
Quizzes have also been found to motivate more reading. Price (Reference Price2020a) claims “the fact that most of the students continued reading the graded readers and increasing their word-counts on their MReader profiles, even after the assignment was finished, suggests that this approach stimulates an intrinsic motivation that brings the method closer to a more ideal concept of an ER project” (p. 48, emphasis ours).
2.1.5 Reading marathons
Endris (Reference Endris2018), in his study of ER with Ethiopian upper-level primary school learners, found that group discussions and “reading marathons” increased student motivation compared with those who read without “motivating activities” (p. 7). Puripunyavanich (Reference Puripunyavanich2021) also reported that the students preferred reading marathons (competing for the highest word count) to writing reviews to submit to the instructor.
2.1.6 Gamification
In a pilot study, Philpott (Reference Philpott2015), used two online resources, EnglishCentral (www.englishcentral.com) and MReader, with the students’ final grades dependent on their relative standing on leaderboards available on each platform. Werbach and Hunter (Reference Werbach and Hunter2012, cited in Philpott, Reference Philpott2015, p. 87) state that leaderboards can be motivating for those who have a chance at reaching the top, but disheartening to those who know that will not have that chance. While this may be true of leaderboards that only rate the best performers, other forms, such as the most books read in a week or a separate leaderboard for different ranks of readers, might overcome this issue.
Alalwany (Reference Alalwany2019) used ReadTheory (https://readtheory.org), a free reading program that displays readings deemed to be at the student's current level, followed by comprehension questions and provides detailed feedback on the students’ answers. The system is gamified to the extent that it determines reading levels for the students that dynamically change depending on their success or failure with subsequent texts. The system also awards badges for good performance as well as “Knowledge Points,” the number of which continues to grow. Alalwany set a goal of 44 readings for full marks, spent 15 minutes per class meeting, but also encouraged the students to continue to read at home. She reports that although the students lacked the choice of what to read, it did allow students to “take more responsibility for their own learning and work independently” and that many students continued to use the site beyond what was required.
Jun (Reference Jun2018) compared a “standard” ER class, with one with gamified elements. The students in the gamified class collected points based on the Lexile level of the book multiplied by the number of pages, and competed for the highest score in their assigned group. Students could “spend” some of their accumulated points for “mystery boxes” that contained stickers and candies. Jun found that gamification elements increased his students’ motivation and self-efficacy.
2.1.7 Biblio Battles
This is a “social game” that was created in Japan, for groups to “Get to know books through people” and “people through books.” The Japanese version is run by a group with its own webpage (Bibliobattle, n.d.). The original version was conceived for Japanese students to share their reading in Japanese, but has great potential as a motivating post-reading activity that doubles as a speaking activity since each participant needs to speak about their book for 5 minutes, with the listeners then voting on the most interesting presentation. MacLauchlan (Reference MacLauchlan2018) implemented ER Biblio Battles with university students and notes:
… it is an activity that can help to alter students’ perceptions about the act of reading itself. Through sharing valuable information amongst peers, creating a group atmosphere as students take on a dynamic challenge, and converting the concept of books into something deeper that can be communicated and discussed, Biblio Battles can influence an extensive reading classroom in unique and valuable ways. (p. 26)
See also, Freiermuth and Ito (Reference Freiermuth and Ito2021).
2.2 Demotivators and disincentives
While most discussion centers on means to motivate students to read, there are also mentions in the literature concerning factors that discourage students from doing so.
2.2.1 Students feel that ER is a waste of time
While students may enjoy ER, in cultures such as that of Taiwan, where success on a high-stakes examination is a priority, time spent reading can be seen as an inefficient means of improving their scores (Huang, Reference Huang2015). Andreano and Wolfe (Reference Andreano and Wolfe2019), in a similar vein, assert that the over-emphasis on high-stakes examinations inhibits the development of autonomous learning.
2.2.2 Teacher attitude
Manara (Reference Manara2019) points out the sore truth that in many contexts, where multiple teachers are teaching from the same text at the same pace, creativity was discouraged. Students viewed teachers as having very limited roles: textbook instructor and “answer key” checker. Manara realized that she needed to find stimulating readings for her students, involve them more in the discovery of meaning in the texts, and allow them to select their own material from the internet.
2.2.3 Forced use of unsuitable material
Bui and Macalister (Reference Bui and Macalister2021) mention that a shortage of reading material that forces students to read texts that they do not enjoy or relate to is demotivating. Price (Reference Price2020b) also mentions that a lack of variety in the available material can result in a reduced perception of “autonomy,” which in turn affects motivation.
2.2.4 Cheating
Tagane et al. (Reference Tagane, Naganuma and Dougherty2018) found that student dishonesty in recording their ER activity was a challenge to implementing a successful ER program. Students became demotivated when they saw other students circumventing the system through various strategies, such as asking for the help of friends, finding information about the book online, or merely watching the corresponding movie rather than reading the book. They found that if it is too easy to “beat the system,” students will do so simply to save effort or time on their part. As a result, other students may resent reading while others are cheating especially if appropriate measures to discourage or prevent it are not being taken by the teacher.
2.3 Lessons learned
Teachers have attempted to enhance students’ intrinsic motivation to read through book sharing, gamification, and other strategies, but for many students, whether they will read or not comes down to how the reading will affect the bottom line – their grade. While there is a consensus that for class-based ER, some form of reading requirement connected to the final course evaluation is needed, there is little guidance available on what a suitable requirement might be (see Table 1). There are multiple contextual variables that the instructor, or preferably the school curriculum, needs to consider, which is beyond the scope of this review. Suffice it to say that both the amount of required reading and the percentage of the grade will probably need to be determined by trial and error.
Note: Abbreviations used: Exp, Experimental group; Ctrl, Control group; IR, Intensive reading; ER, Extensive reading; Non-exp, Non-experimental.
3. ER and vocabulary
Recent research on vocabulary gains from ER confirms prior research of its efficacy in contributing to incidental vocabulary learning. For example, Jiménez's (Reference Jiménez2017) small-scale and short-term experiment comparing Spanish vocabulary learning with ER and wordlists demonstrated that in both the immediate and delayed (1 week) post-test, the ER group (15) retained more vocabulary from reading a single text than the word list group (15). Naturally, the gains were small even for the ER group. This slow and incremental growth of incidental vocabulary knowledge through ER in proportion to the frequency of the unknown vocabulary in the texts read and the amount of reading done continues to mitigate against large scale experimental and longitudinal studies of the impact of ER on vocabulary learning.
3.1 Longitudinal studies of ER and vocabulary gains
One study stands out in recent years because of its ecological and empirical validity. Suk (Reference Suk2017) investigated vocabulary learning (along with reading rate and reading comprehension discussed later in this review) in a quasi-experimental study in which two groups of university students (N = 171) were compared. Teaching both groups, Suk was able to give all the students 70 minutes per week of the same intensive textbook-based reading activity and 30 minutes of the differentiated activities. The control classes received more IR activity, such as vocabulary review, quizzes, and challenging sentence translation tasks, while the experimental classes got 30 minutes of ER activity, half of which was silent reading of self-selected graded readers. In addition, students in all the classes were required to engage in 2–3 hours of homework, which for the control classes meant more IR work with the textbook and vocabulary review and for the experimental classes meant ER, with a goal of reading at least 200,000 words in the semester.
To measure vocabulary learning, Suk (Reference Suk2017) developed a 120-item translation recall task based on the 155 graded readers available to the experimental group. A comparison of pre- and post-tests indicated that the vocabulary knowledge growth of the ER group was statistically significantly greater than that of the control group.
Although Suk's (Reference Suk2017) test was directly related to the texts students could choose to read, it may have missed some of their incidental vocabulary growth since the students did not read the same materials. To investigate this possible limitation of her general vocabulary test, Suk (Reference Suk2021) also developed individualized vocabulary tests based on the approximately six unique books that each student in the experimental group had read. She found that the results of both tests were very similar, suggesting that a general vocabulary test that reflects the corpus of texts students will self-select for ER is not only practical but also valid for measuring vocabulary growth. Webb and Chang (Reference Webb and Chang2015a) also investigated vocabulary learning longitudinally but in a secondary school setting, and with a different purpose. They were interested to see how prior word knowledge affected vocabulary learning with ER. Previous research was mixed on the impact initial proficiency would have on vocabulary gains (Horst et al., Reference Horst, Cobb and Meara1998 and Zahar et al., Reference Zahar, Cobb and Spada2001, cited in Webb & Chang, Reference Webb and Chang2015a). They divided the English language learners into three groups based on a pretest of the 100 target words that had been selected from the set of 20 graded readers the students would read over two terms. Ten readers at a lower level of difficulty were read in the first term, and the second ten at a slightly higher level were read in the second term. The students were tested with a synonym matching test, three times for each set of target words: before the term, just after the term ended, and three months after the term ended. The results indicated that all three groups made statistically significant gains in vocabulary knowledge between pre- and post-test and between pre- and delayed post-test in both terms. It was also evident that students with the highest proficiency gained relatively more than the lowest even though both groups read the same books. Another important study is Aka (Reference Aka2018) where an experimental group (N = 405) significantly surpassed the performance of the control group despite the fact that the “traditional group” (N = 400) had spent 60 hours during the course of a school year studying “Grammar and Vocabulary” while the experiment group had spent an equal amount of time reading graded readers in class.
3.2 Benefits of ER with additional support
Taking the benefits of ER for vocabulary learning as a given, several recent studies have compared those benefits with the efficacy of other modes of input either separately or in combination with ER. The two Webb and Chang studies, (Reference Webb and Chang2015b) and (Reference Webb and Chang2020), consider listening input as support for learning words and collocations, respectively. Boutorwick et al. (Reference Boutorwick, Macalister and Irina2019) investigated the benefits of discussion following ER for vocabulary learning.
Webb and Chang (Reference Webb and Chang2015b) compared the amount of vocabulary learned by a group of secondary students enrolled in a 4-hour-per-week ER program with students at different starting levels of vocabulary who read the same ten Oxford Bookworm readers in class while listening to the audio recording of the text. They found that those with a higher starting level retained more new vocabulary than those at lower levels, although both groups made progress. However, there was no comparison of learners who read without listening support.
In Reference Webb and Chang2020, Webb and Chang, investigated incidental learning of collocations in three input modes: reading, reading while listening, and listening alone. While this provides comparative data on input mode for collocation acquisition, the study was not longitudinal and only involved a single text. Nonetheless, the results were encouraging in that there were statistically significant gains in collocation knowledge across the three modes and between the pre- and post-test, and pre- and delayed post-test with large effect sizes. Of note, however, is that the reading-only mode appeared less impactful than the reading-while-listening mode or the listening-only mode. Furthermore, analysis of the frequency and occurrence of the collocations and vocabulary learning suggests that the reading-while-listening was the best condition.
Boutorwick et al. (Reference Boutorwick, Macalister and Irina2019) also investigated multimodal input for vocabulary learning in an ER program. In this case, they compared two groups of upper-level students in an English for Academic Purposes (EAP) program in New Zealand. Both groups read the same five books during the course of a term, but one group engaged in small group discussions about the books. This group scored significantly higher in a word recognition post-test. Undoubtedly, more opportunities to use the targeted vocabulary led to greater gains. This approach to enhance the input related to ER may be less effective if the students are less proficient and unable to “discuss” the texts. This could be investigated along with the impact of having, if applicable, an L1 discussion about the text.
3.3 Lessons learned
Since direct study of vocabulary removes it from the realm of “extensive reading,” what is left would appear to be the incidental learning of lexis and collocations. We have mentioned some recent studies that would attest to the efficiency of learning vocabulary in this manner. Various approaches that might augment retention would include listening while reading, lexical support via glossing, or availability of bilingual versions. For students using a touch screen, additional avenues might feed retention, such as the ability to touch a word to see or hear its pronunciation, or a function that collects looked up words for future review. A review of extant studies of these means of additional support, however, is beyond the scope of the present review. Clearly, much more experimentation is required to discover the optimal combination of functions to augment retention, depending on the learning context. Table 2 summarizes the recent studies mentioned above.
Note: Abbreviations used: R, Reading; L, Listening; Exp, Experimental group; Ctrl, Control group; IR, Intensive reading; ER, Extensive reading; RC, Reading comprehension; RR, Reading rate; VA, Vocabulary acquisition; GR, Graded reader.
4. ER and reading rate and comprehension
The importance of reading rate for reading comprehension has been long established in L1 reading development by both theory (e.g., LaBerge & Samuels, Reference LaBerge and Samuels1974; Perfetti, Reference Perfetti1985) and research (e.g., Kim, Reference Kim2015; Slocum et al., Reference Slocum, Street and Gilberts1995). The importance of reading rate for comprehension is also well-theorized for L2 reading (e.g., DeKeyser, Reference DeKeyser2007; Grabe, Reference Grabe2009; Segalowitz, Reference Segalowitz2010), and there is a growing body of research to support this. Although methodological concerns have limited generalizability and strength of some claims, research on ER in an L2 confirms the common-sense expectation that more reading leads to faster reading (Al-Homoud & Schmitt, Reference Al-Homoud and Schmitt2009; Beglar et al., Reference Beglar, Hunt and Kite2012; Bell, Reference Bell2001; Iwahori, Reference Iwahori2008; Sheu, Reference Sheu2003; Shiki, Reference Shiki2011; Shiki & Hase, Reference Shiki and Hase2010).
The more important relationship, however, is whether reading rate improvements correlate with at least the maintenance, or perhaps the improvement, of comprehension. The latter would be anticipated particularly for those learners who have engaged in L2 reading mostly in ways that draw attention to vocabulary and grammar at the sentence level (i.e., translation) rather than ways that attempt to replicate the reading behavior of proficient readers in the L1. Here, previous research results are somewhat more equivocal since some studies showed a slight loss of comprehension (Shiki, Reference Shiki2011; Shiki & Hase, Reference Shiki and Hase2010). The lack of gains in comprehension with increases in reading rate may be a result of the relatively short duration of these studies, the limited amount of reading actually done (either by design or by student choice), reading materials used, and the variability in reading rate and comprehension measurements, not to mention the varied populations investigated. Nonetheless, most earlier studies have shown that reading rates can and do increase without a loss in comprehension (Al-Homoud & Schmitt, Reference Al-Homoud and Schmitt2009; McLean & Rouault, Reference McLean and Rouault2017).
4.1 Reading rate
In recent years, some of the methodological issues of earlier studies have been taken up in a relatively small number of studies, which has resulted in strong confirmation of the positive impact of ER on reading rate without losses in comprehension. Following up on the Beglar et al. (Reference Beglar, Hunt and Kite2012) study, which had demonstrated that reading a large amount of comprehensible simplified text led to substantial reading rate gains, Beglar and Hunt (Reference Beglar and Hunt2014) looked more closely at whether the amount of reading, the type of reading, and the level of reading were implicated in the increased reading rate gains of successful L2 learners in a Japanese university context. Examining the outcomes for five levels of learners (as determined statistically in relation to a vocabulary levels test), it was clear that Group 1, with the highest vocabulary test scores, read the most and gained the most in reading rate. However, this pattern did not correlate in the same way in each of the other groups since the differences in reading amounts were minimal but changes in reading rate were still significant between the groups.
The analysis of reading type indicated that those who read most or only simplified texts made the greatest gains in reading rate. Additionally, the students who gained the most in reading rate had read more books on average at the lowest two levels of the readers (300–800 and 1,000–1,600, based on publisher headword counts). In other words, the students who were most likely to know most if not all the vocabulary in those two levels of texts, read the most, and gained the most in reading rate. Those who read less also read more texts with higher headword counts; thus, they likely came across more unknown words, read less, and saw much smaller reading rate gains. These results suggest the general ER “principle” of reading a lot within one's linguistic competence might need to be revised to reading a lot of simplified texts on the “low” boundary of one's linguistic competence.
4.2 Comparisons of ER with other English learning activities related to reading rates
Several studies investigated reading rate gains by comparing an ER treatment with another form of language learning. Huffman's (Reference Huffman2014) quasi-experimental investigation of ER and reading rate gains compared two groups of undergraduate Japanese nursing students (N = 66) in a semester long investigation. The students were randomly assigned to either an ER course, taught by Huffman, or an IR course, taught by a Japanese instructor of English. Huffman used the “standard word measure,” which is based on an average number of characters in a word (set at 6), as the measure of reading amount (Carver, Reference Carver1976), to determine the amount of reading in the ER group. They were given considerable freedom to select texts from an in-class library of over 200 graded readers with varied headword counts. The students' initial reading rate was measured by the average of three timed readings texts (Quinn & Nation, Reference Quinn and Nation1974, cited in Huffman, Reference Huffman2014), chosen carefully for similarity in readability. Since the treatment period was only one 15-week semester, Huffman used a different set of three texts for the post-test but these were almost identical in readability with the pre-test set. The results indicated that the ER participants increased their reading rates significantly compared with the other students without a loss in comprehension.
Suk (Reference Suk2017) also implemented a quasi-experimental comparison study of reading rate gains between students (N = 171) at a Korean university who received ER and others who received IR activities as a supplement to the regular curriculum of IR. (Details of this study are provided in the Vocabulary section of this paper.)
The students’ reading completion and comprehension of out-of-class ER was monitored by completing MReader online quizzes. Suk also took care to account for how much students in both groups actually read and whether or not they participated in additional English classes. The students' initial reading rate was measured by the average of three timed readings rate passages after instructions and a practice test. It so happened that the control group had higher reading rates in the pre-test than the experimental groups, and although their post-test reading rates had increased, the ER group significantly outperformed the control group in the post-test. The ER group also had higher comprehension scores in the post-test even though they had started out about the same level as the control group. Additionally, Suk found that although some students took additional classes on their own, this did not influence the outcomes of the study. In fact, those who did not take such classes outperformed those who did.
McLean and Rouault (Reference McLean and Rouault2017) also compared reading rate gains by comparing an ER treatment with grammar-translation. Although the study included fewer participants, it was implemented over two academic semesters, which allowed them to reuse the same pre- and post-test reading texts to measure reading rate changes. Students from five intact classes (N = 50) engaged in the same English language curriculum taught by the same instructor at a Japanese university were randomly assigned to one of two treatment groups: grammar-translation or ER. Each group was provided materials for approximately 60 minutes of weekly out-of-class activity. The students were closely monitored throughout the lengthy treatment period to make sure the time-on-task for each group was comparable although all the students were encouraged to do as much as they wished beyond that.
For the ER participants, standard word counts of reading were only included in the weekly reading target after they achieved at least 70% on the MReader graded reader quizzes. Finally, McLean and Rouault (Reference McLean and Rouault2017) applied a conservative approach to pre- and post-test measurements of reading speed. First, participants engaged in two 400-word timed reading exercises each week to reduce problems with unfamiliarity with reading-rate tasks. In addition, the average reading speed of the timed readings during the third, fourth, and fifth weeks of the first semester was used as the pre-test reading rate, thus eliminating typical extremes of early reading rate assessment. Here again, the ER students increased their reading rate significantly in comparison with the grammar-translation students with comprehension maintained above 70%. McLean and Rouault conclude that the effectiveness and efficiency of ER for developing reading rate should lead to more time allotment in the curriculum for ER.
The most recent study of ER and reading rate by Bui and Macalister (Reference Bui and Macalister2021), as in the previous four studies, takes place in another Asian educational context in which the curriculum is prescribed and classroom instruction is teacher directed. In spite of the growing evidence of the efficacy of ER in promoting many aspects of L2 acquisition as well as motivation, it remains very difficult to include ER in the standard curriculum. Bui and Macalister implemented a non-experimental study of an online ER program for first-year students at a university in Vietnam. The 17 volunteer participants were provided first with instruction in the methodology of ER and use of their own website with gathered free reading materials. The students were expected to read at least 1,000 words twice weekly for ten weeks using their own tablets or computers. The website also provided the students with tools for measuring the number of words read and time spent in reading each text, and it was these self-reported times that were used to calculate reading rate changes for each student. Weekly, the students submitted reports on what they had accomplished and something about the books read. In this way, their engagement in the program was monitored and encouraged. Despite the relatively small number of words read by the students (average 26,409), most of the participants increased their reading speed over the course of the ten weeks and reported greater motivation to read.
Milliner (Reference Milliner2021) examined whether a teaching intervention combining (a) ER and practicing, (b) timed reading, along with (c) repeated oral reading during class time promoted reading fluency. While the students’ silent reading rates improved significantly, with an average improvement of approximately 50 silent words per minute, there is no analysis of the relative effect of each of the three interventions. The students who read at least 95,000 words of the assigned 100,000, however, did experience significant gains on the Test of English for International Communication (TOEIC) reading section. A correlation was also found between the students’ accumulated ER word counts and their gains on both the listening and reading sections of the TOEIC.
Huffman (Reference Huffman2021) studied two cohorts of students, with one set taking an “Intensive Reading” (IR) course in the first semester and “Extensive Reading with Fluency Training” (ERFT), while the other cohort took the two classes in the reverse order. The study demonstrated that ERFT yields greater fluency gains than IR, and particularly among students with initially slower reading rates.
4.3 Reading comprehension
While reading rate and comprehension are addressed together in the previous studies, Robb and Kano's (Reference Robb and Kano2013) investigation focused entirely on comprehension benefits of ER as measured by a standardized measure of proficiency. They found that even when the over 2,000 participants at a Japanese university from a range of disciplines, courses, and levels did not meet the minimum requirement of reading at least five English graded readers outside class each semester for two semesters, they still performed statistically significantly better on the reading proficiency component of the test than a similar cohort in the previous school year across each level and in each discipline. Taylor (Reference Taylor2014) measured the effect size for Robb and Kano's large sample size and found a strong effect (0.99 mean for all groups) not only for reading comprehension of the cohort doing ER but also a small effect for their listening comprehension, which showed significant gains for student groups in most of the disciplines represented.
Mermelstein (Reference Mermelstein2014), too, only attended to the effect of an ER treatment on reading comprehension in a quasi-experimental research study, but in this case, the comprehension was measured in terms of reading level changes. Using a self-designed levels test based on the six levels of two series of graded readers available to his students, Mermelstein found that the treatment group's reading level changes were statistically significant compared to those who had not received any ER. As in the Robb and Kano (Reference Robb and Kano2013) study, the amount of reading was quite limited over the 12 weeks of the study. The ER students engaged in individual silent reading for approximately 15 minutes once a week with an expectation that they would read at least three pages of their graded readers outside of class. On average the students completed two graded readers over this time period.
Unless ER is implemented programmatically, as in the Robb and Kano (Reference Robb and Kano2013) study, it is not easy to investigate the use of ER for building reading comprehension over long periods of time, which is a predicted need for implicit learning. Typically, classroom-based ER and reading comprehension studies are implemented over the period of one semester (from 12–15 weeks). However, Sarıçoban and Zaloğlu (Reference Sariçoban and Zaloğlu2021) found that even a five-week period led to better reading comprehension. The 66 tertiary Turkish learners of English in this quasi-experimental study were allowed to read as much as they wished beyond a minimum of ten “articles” per week. Although it is not entirely clear what or how much students read in their free time or what the control group did in comparison, the ER treatment group demonstrated statistically significant improvement from pre-test to post-test on a locally-produced reading comprehension instrument.
In another recent study, Robb and Kamiya (Reference Robb and Kamiya2020), using software they developed, showed that students who read more, as measured by MReader, were able to better predict what word fit best in a “scrolling cloze” test where the proper choice for each blank had to be filled in before it disappeared from view. While their score on the test correlated significantly with the amount of reading done throughout the year, there were no significant correlations with their pre- or post-test scores on the Test of English as a Foreign Language (TOEFL), except for improvement in their TOEFL listening over the year.
Park and Lee (Reference Park and Lee2021), in an experiment with young Korean learners (N = 101), found that comprehension gains through ER were impacted by how the students accessed the texts. One group read physical books (N = 42) while another group read online books using tablets provided by the school (N = 32). Another class that had no ER (N = 23) was used as a control group. Although both experimental groups improved more than the control group, Park and Lee discovered that those who read online improved more in literal comprehension of their texts, and those who read physical books improved more in their inferential abilities. Park and Lee also examined “grammatical knowledge” and found that only the printed book group showed some improvement with no significant change in the tablet or control groups. Their study did not track reading speed nor number of words read so this leaves many questions yet to be answered.
4.4 Measuring reading rate and comprehension
The complex nature of reading does not lend itself to easy testing of comprehension or reading rate change, and ER research provides evidence for the wide array of methodologies that have been used. While the variety has perhaps facilitated classroom-based research, it has not made a statistical meta-analysis easy. Nonetheless, it is possible to claim that ER even in small amounts over rather short periods of time, in or out of the classroom, has a positive impact on reading comprehension and reading rate for learners at different levels of proficiency who read diverse materials for different purposes. However, it does not yet seem possible to claim what an optimal implementation of ER should be. More comparability needs to be built into the research design, and at least two recent studies provide evidence for standardizing certain measurement features in this area of ER research.
Bui and Macalister (Reference Bui and Macalister2021) make an important contribution to ER and reading rate research in their recent study aside from the small number of participants, the non-experimental design, and the relatively small amount of ER done by the participants. The students read books online from a specially constructed website that offered a relatively small set of books to read. The students used a timer built into the online site to measure and record their reading time. The study compared the impact of four different scoring methods on reading rate test results: average scoring (first three and last three), highest minus lowest scoring, twentieth minus first (or last minus first) scoring (based on Chung and Nation, Reference Chung and Nation2006, cited in Bui & Macalister, Reference Bui and Macalister2021), and three extremes scoring (three highest and three lowest, based on Yen, Reference Yen2012). Although all the participants showed increases in reading speed, only three of the four measures provided evidence of a significant difference. The data provide evidence that the scoring method that only used two data points, highest minus lowest and last minus first resulted in the highest and lowest percentage of change scores, respectively. The two measures using averages fell between the others, but the average of the three high and low extremes resulted in a much higher percentage of change score than the average of the first three and last three scoring method. Bui and Macalister claimed that this last method could be considered more conservative but reliable in comparison with the other methods since it uses more than two data points and does not focus on extremes.
Another methodological issue of reading rate research that has been taken up in the recent past is that of how to reliably measure reading amounts. In the three studies just described, what counted as a measure of how much was read was not the same. Suk (Reference Suk2017) and Bui and Macalister (Reference Bui and Macalister2021) used the exact number of words, but McLean and Rouault (Reference McLean and Rouault2017) used Carver's (Reference Carver1976) construct of a “standard” word. Kramer and McLean (Reference Kramer and McLean2019) take up this topic by investigating the impact of using simple word counts compared with the number of characters within a text on the results of reading rate measures. In the first study, they compared the reading times of two texts, each of which had two versions, the original and one modestly modified to increase the number of characters with almost no change in the total number of words and no change in meaning, done mostly with synonyms and longer proper nouns. One text pair was from an ER graded reader and the second text pair was from a timed reading text, both selected to be within the proficiency range of the participants. The 160 Japanese university students included in the analysis had all achieved a minimum of 70% on the comprehension checks following the reading. The longer (higher average characters per word) version of each of the texts took significantly longer to read.
Kramer and McLean (Reference Kramer and McLean2019) then investigated whether reading times could be predicted by the length of a text in number of characters. In a timed reading course for Japanese university students, the reading rates of 27 students, all of whom had maintained at least a 70% comprehension rate were included in the analysis. Holding constant the number of words, reading times were correlated with the number of characters in each passage.
Although the results were not statistically significant, the effect size was large indicating that there was a strong relationship between reading time and number of characters in a text. As the authors point out, however, digital versions of the reading texts are required, thus for practical use, texts with passages for timed reading will need to include information on the “standard word count” so that reading rates can be easily derived.
As long as ER reading rate studies have an adequate number of participants, such as in Suk (Reference Suk2017), pre- and post-test comparisons provide valuable results whether they use actual word or “standard” word counts. When the participant numbers are small, however, which is the case in many studies, a single measure, such as “standard” word, would support useful meta-analysis.
4.5 Lessons learned
These recent studies have further substantiated previous research on the efficacy of ER on reading rate and comprehension, provided models of experimental and quasi-experimental classroom-based designs, and suggested reliable standards for measuring the amount read and for comparing pre- and post-reading rates. With increasing attention to replicability and transparency in reporting, and by standardizing measures for assessment, it seems time to turn attention to reading rate and comprehension research in more varied contexts, levels of proficiency, purposes for reading, and types of reading. Table 3 summarizes the fluency studies mentioned above.
Note: Abbreviations used: Exp, Experimental group; Ctrl, Control group; IR, Intensive reading; ER, Extensive reading; ERFT, Extensive reading with fluency training; GR, Graded readers; GT, Grammar-Translation.
4.6 ER and materials
Materials for ER are often mentioned as an obstacle to implementing an ER program, and this is undoubtedly true for ER in languages other than English. Graded readers, or Language Learner Literature, in English, however, are quite plentiful, and growing in several other languages, such as Japanese (see Sakai, Reference Sakai2021, and Yoshimura & Domier, Reference Yoshimura and Domier2017) and Spanish (see Hardy, Reference Hardy2016). While graded readers have become the “gold standard” for ER material, they are not without their faults. A study by Holster et al. (Reference Holster, Lake and Pellowe2017) determined that headword levels are not an effective means of determining text difficulty, and the same can be said about publishers’ stated levels, which are based on their own proprietary lists of headwords. Thus, it is not as easy to gauge the difficulty of graded readers as one might be led to expect. Their statistics revealed text length and the Japanese “Yomiyasusa” scale (Furukawa, Reference Furukawa2022), which is derived from crowd-sourced assessments of difficulty by Japanese teachers and fluent readers, were better predictors of student difficulty than other indices. Similarly, Rodrigo (Reference Rodrigo2016) in reviewing 203 Spanish-language graded readers by 12 publishers found that the proficiency levels lacked a uniform set of criteria for establishing levels of reading difficulty. With the use of a readability test, however, Rodrigo was able to suggest alignment across the publishers so that instructors would be able to catalog their materials more easily, which in turn would make independent selection of texts and reading easier for the learners.
Although it is likely that print or digital graded readers will continue to be central to ER implementation, there are other reading resources that might be able to provide appropriate levels of lexis as well as the support required for fluent reading, which we will discuss here.
4.6.1 Use of readers published for L1 children
The Extensive Reading Foundation's “Graded Reader Master List” contains a list of readers commonly used for ER, only 52% of which can be classified as “graded readers.” As pointed out in Nation and Waring (Reference Nation and Waring2020, p. 13), readers meant for L1 children often contain low frequency vocabulary. This means that items introduced in one book will probably not occur in other books that the student reads. Gardner (Reference Gardner2004) and Webb and Macalister (Reference Webb and Macalister2013) both discuss the use of L1-oriented material, but neither analysis refers to “leveled readers,” which are created for L1 children who are learning to read. These are, however, still likely to also utilize low frequency items and idioms that L2 learners will not know – and may not be able to understand from the context. From the first author's experience with Japanese learners, however, this does not impede enjoyment.
Gardner (Reference Gardner2008), using her corpus of authentic books for English-speaking children and categorizing them by (1) same or different author and (2) narrative or expository content, and (3) the degree of recycling of specialized vocabulary included, notes that in terms of word frequency, “themes work best for expository collections, and single authorship works best for narrative collections” (p. 109). While her finding might be extrapolated to also apply to graded readers, a replication of Gardner (Reference Gardner2008) based on a GR corpus would be very welcome.
Series such as Oxford's Reading Tree (Hunt, Reference Hunt1986) is a common component of many ER libraries around the world, according to MReader usage statistics. Readers for L1 children use such measures as Renaissance Learning's ATOS scale, the Common European Frame of Reference (CEFR) scale, the Lexile framework, or even the Flesch–Kincaid Grade Level system. Such systems are not based on, or do not take into account, the slowly expanding lexical knowledge of the L2 learner. There is no research, to our knowledge, on how appropriate these scales are for L2 learners. For example, the Lexile framework produced by Metametrics and the Flesch Kincaid Grade Level measure difficulty by the physical textual traits of each passage, such as phrase and sentence length, without any reference to the frequency findings of various L2 learner corpora.
Smith and Turner (Reference Smith and Turner2018) of Metametrics, the home of the Lexile system, evaluated the correlation between the Lexile ratings of five graded reader series and the CEFR levels that had been assigned by the various publishers using their own internal data. They suggest that:
… the range of text complexity within a given CEFR level is quite large and creates a less than ideal situation for matching individuals to appropriate texts. The text complexity of what one publisher labels as A1 (650L) is just as difficult as what another publisher labels as C1 (630L). (p. 5)
They blame this state of affairs on the CEFR level descriptions, which “are not quite rigorous enough” (p. 7). Although the Lexile algorithm has been modified “to better account for special text characteristics that are used to support developing readers – things like decodability of words, sentence patterns, high frequency words and repetition,” Metametrics (2019) is certainly referring to L1 developing readers, and as such is likely not an adequate measure for L2 developing readers.
Apart from the standard “reader,” be it a graded reader or one intended for an L1 reader, there are other approaches to reading a sufficient number of words with texts that offer a degree of support for the learner.
4.6.2 Dual language readers
Dual language readers are available in most countries around the world where graded readers are often either unavailable or unaffordable. Often, these are originals of standard classics, with the translation into the local language on the opposite page. Zhang and Webb (Reference Zhang and Webb2019) report that Oxford University Press publishes dual language versions of some of their graded readers for the Chinese market. They used these for an experiment with four conditions: (a) English-only text, (b) English text with target words glossed, (c) English text followed by the Chinese text, and (d) Chinese text followed by the English text. They report that the vocabulary retention in their delayed post-test was higher for the groups that read the bilingual texts, as opposed to the English-only versions.
While this result is impressive, their experiment measured only word retention of new vocabulary. We know nothing about differences concerning the other benefits of ER, such as improved comprehension and fluency and lexical depth of knowledge. Dual language readers might have a negative effect on overall reading speed and text enjoyment while catering to the students’ and teachers’ traditional mindset with a focus on grammar and vocabulary to the detriment of a focus on overall meaning.
Apart from the OUP bilingual graded readers for the Chinese market, the only other source identified by the authors is English/Arabic bilingual readers offered by Grassroots Press. Most bilingual editions tend to either be for children or are translations of the traditional classics. The bilingual offerings of Mantralingua (https://uk.mantralingua.com/) are a case in point. While they might work for the self-directed language learner, our focus in this review is classroom-based learning for which these are most likely inappropriate. Also, prudent use of glossing either on paper or with online applications might render texts sufficiently accessible for ER in a supervised context. Recent studies, Salimi and Mirian (Reference Salimi and Mirian2019), Teng (Reference Teng2019), and Varol and Erçetin (Reference Varol and Erçetin2019), are encouraging examples. Perhaps a much more efficient approach, however, would be the implementation of automatic glossing functions within browsers, much in the way that Kindle allows one to touch a word and have the definition appear in one's chosen language. For ER and vocabulary acquisition, however, a function to record look-ups along with tools to review them in various ways would comprise an excellent online ER tool.
4.6.3 Bilingual weaving
Another promising avenue involves software that mixes known target language content with more difficult words rendered in the student's own language. The LoomVue chrome plug-in accomplishes the functions suggested in the previous paragraph, but with an additional twist, using AI to substitute L1 words from either an uploaded “teacher pack” or LoomVue's own set of substitute candidates. See https://youtu.be/yqqfEkl7oXE for a demonstration of the Phase 1 study that focuses on Hispanic L2 learners (LoomVue, 2021, June 3). The Phase 1 project was highly evaluated by subject area teachers who have no other means to provide individual assistance to their students. The project is now into Phase 2 thanks to a $900,000 grant from the U.S. Department of Education. The development includes an evaluation component by an independent agency, but further research will be needed to validate its use in languages other than Spanish.
4.6.4 Narrow reading
While there have been some small experiments with “narrow reading,” there seem to be no reports concerning entire classes that have conducted ER in this manner. Krashen (Reference Krashen2004) argues that “narrow reading” allows the reader to encounter similar content multiple times, both lexical and grammatical, in comprehensible contexts.
Kang (Reference Kang2015) reports on an experiment with narrow reading, but the students read only one “main text” and then three 450-word texts concerning “Secondhand Smoke,” while the control group read three texts on varying topics before reading the same “main text.” As hypothesized, the experimental group had better retention both receptively and productively.
Abdollahi and Farvardin (Reference Abdollahi and Farvardin2016) tested the acquisition of unknown vocabulary with two groups of high school students, one of which read three related expository texts, while the other read thematically unrelated ones. The experimental group significantly exceeded the control group on all measures.
Chang and Millett (Reference Chang and Millett2017) reported on an experiment where one group read three Sherlock Holmes stories from graded readers, while the other group read three different versions of the same story, The Railway Children. At the conclusion of the experiment, both read yet again a different version on each theme, equated for lexical difficulty. The result was that the students who had read the three books of either type, as might be expected, read more quickly on the genre that they were familiar with. Those who read The Railway Children had significantly higher comprehension of the other genre than the Sherlock Holmes group. However, there might have been issues with the comprehension aspect of the post-test that accounted for the differences.
Chang and Renandya (Reference Chang and Renandya2017), building on the study above, utilized the same two sets of readers, but added two more sets, a same-author set and a thematically random set for a total of 12 titles. All students read all books throughout the term, but each of the four groups read them in a different order. The results revealed that there was more vocabulary retention from the same-author books followed by the random set, same genre, and finally same title.
From these studies, it appears that narrow reading may be more productive than random reading although when one considers the interests of the students, this method would work better for outside reading where the students would have more freedom to choose books of interest to them. Renandya et al. (Reference Renandya, Krashen and Jacobs2018) list a number of series that could be used for narrow reading, but they all require a relatively high reading level in order to be read extensively, with the possible exception of the Goosebumps series where a CEFR level of A2+ level might suffice.
All of the extant discussion of narrow reading, however, has been based on students capable of reading material at rather high levels of difficulty, whereas most students who are potential targets for ER implementation are at the lower range of difficulty, the A1 or, at best, A2 CEFR levels. Graded reader series and leveled readers that are based around a small set of characters might be considered as “narrow.”
4.6.5 Reading with audio support
There are many reports of students reading while listening, even in classroom situations. One scholar, Anna Chang, has done considerable research on various aspects of vocabulary acquisition with classes that have listened in unison to books while they read. See Chang (Reference Chang2019) or Chang and Millett (Reference Chang and Millett2017) that was mentioned earlier under the discussion of “narrow reading.” Tusmagambet (Reference Tusmagambet2020), in an intervention with 9th grade Kazakhstani students, found that the experimental group, which listened to audiobooks while reading silently outperformed the control group in reading speed while preserving substantial comprehension of the texts. The control group had read silently without audio input. He reports that the expressive reading of audiobooks aided their comprehension of the story, and the audio itself helped the students focus more than if they were merely reading silently.
There are a number of possible limitations on the use of audio-support while reading that require future research:
• The students will, unless they all have separate devices, be reading at the same pace, which will be inevitably too fast for some and too slow for others.
• The students will all be reading the same book, rather than one of their choices (and at their own reading level), which goes against a basic principle of ER – although arguments have been made that teacher-selected reading is sometimes more appreciated by the students. See Ramonda (Reference Ramonda2020) for this point of view.
• It takes considerable class time, so while it might be useful, perhaps it should be an occasional activity rather than a regular component of the class.
If the students are using software for their listening, however, audio-supported reading can be performed outside of class time. With software such as Xreading, the time spent on listening, with or without the text being visible, is logged for the teacher.
4.7 Lessons learned
We have discussed both various types of materials that can be used for ER, as well as different methods for using them in the classroom. The current focus, however, has been on book-length readers while a considerable amount of shorter textual material is available. For students to read shorter texts but in greater number in order to achieve the same word targets is possible but perhaps logistically difficult. However, this may be particularly relevant for regions where use of full-length books is impractical due to financial constraints, or the lack of a systematic means for their distribution and management. See Robb (Reference Robb2022) for a discussion of the relevant issues.
5. ER and its implementation
Over the years, there have been reports indicating the relatively few instances of ER implementation although we have no specific statistics concerning the status of implementation on a country-by-country basis. Such statistics cannot be compiled since we only have information on specific studies, that normally concern a specific school setting.
Publishers are generally unwilling to release sales statistics on their graded readers, and it is impossible to know whether the readers are only used for ER or for IR as well. Furthermore, there is considerable confusion on the nature of ER itself. In a recent meta-analysis of ER in China, Wang and Kim (Reference Wang and Kim2021) found that many of the studies concerned intensive, teacher-centered instruction although nominally called “extensive reading” due to the length of the texts being read, which we would call “extended reading.” Indonesia currently appears to be a hotbed of ER implementation. A quick survey of the top 50 hits on Google Scholar, using “extensive reading” and “students” as keywords, revealed that since 2018, 26 of the first 50 papers and dissertations listed concerned ER in Indonesia, with Japan following with only six. However, there is no consensus on how ER is being implemented.
There are many impediments to ER implementation even if one strongly believes in its value. Although specific contexts will have unique challenges to overcome, there are several types of obstacles that seem to be faced in many contexts. These range from more intractable factors such as a sociocultural bias against reading or a testing culture that values language knowledge more than language use to other, perhaps modifiable but still significant factors such as an inflexible curriculum, limited appropriate materials, and instructor preparation. Several recent studies have discussed barriers to implementation, such as Lee and Ro (Reference Lee and Ro2020), Meniado (Reference Meniado2021), and Renandya et al. (Reference Renandya, Ivone and Hidayati2021). Robb (Reference Robb2022), comparing the disparate lists of problems reported in these studies, concludes that the implementation of ER in a regional educational system can only be accomplished through a “top-down” approach. Stakeholders need to be informed of the differing roles and strengths of the two approaches in order to design well-rounded curricula where ER has its own place, and then agree to allocate sufficient resources for materials and professional development.
Even without major barriers, there are issues such as those Mitchell (Reference Mitchell2018) mentions along with his efforts to overcome them: the effort required to set up ER; not knowing whether some of the students might already be doing ER in other classes; and old books in the library or a policy to keep only one copy of any particular title. In a Korean study, Lee and Ro (Reference Lee and Ro2020) report on how too strict adherence to the “freedom of choice principle” can have untoward effects (“I actually picked up a physics textbook!”), particularly when only a limited selection of books is available for in-class reading.
Considerably more work needs to be done concerning how or why ER is or is not implemented. Case studies of particular contexts, particularly with public school systems in a representative set of countries, would be a valuable step towards understanding these issues and discovering ways to overcome them.
Table 4 represents the authors’ best guesses on the evolving effect of “time on task” on major variables in ER implementation. The current prevalence of short-term implementations with a concomitant focus on lower-level graded readers implies that other approaches may be required in the curriculum to bring students to the independent reading level of unsimplified texts.
6. Conclusions
The focus of this research review has been on classroom-based ER. This research falls into two general categories: research on ER and language learning and research on ER practices and procedures in the classroom/curriculum. In the first category, the current research confirms that regularly reading accessible texts increases reading rate, comprehension, and vocabulary knowledge, although the amount of reading required for improvement in each of these components might well be different. While reading more leads to greater gains, the volume of reading may be rather modest, done outside of class, and minimally incentivized as long as it occurs often enough over time to provide the frequency necessary to develop automaticity.
Additionally, these studies provide refinements in research methodology. With models of ecologically valid longitudinal studies for reading rate (e.g., Milliner, Reference Milliner2021; Huffman, Reference Huffman2021), comprehension (e.g. McLean & Rouault, Reference McLean and Rouault2017), and vocabulary learning (e.g., Suk, Reference Suk2017; Webb & Chang, Reference Webb and Chang2015a), quality replications are possible. Better approaches to measurement of learning can also be applied to studies of different populations of language learners. Virtually all the studies here (and many previous studies) were conducted with university students, who typically had years of English language instruction albeit from a language-knowledge rather than language-use orientation. Together with other age groups and types of language learners, it will be important to further clarify how ER benefits learners at different levels of proficiency, with greater specificity in how proficiency is measured as pertains to the focus of the research.
Most of the research on ER and language learning has been done in Asia. This has made a lot of sense seeing as so many people are learning English, and English is not the dominant language in most Asian contexts. However, the assumption that the Asian contexts creates a natural control environment where English input can be calculated based on seat time must now be challenged. Learners at all ages and levels of proficiency have almost ubiquitous access to English input today. While researchers may try to control for this phenomenon as Suk (Reference Suk2017) attempts to do, it seems greater attention needs to be paid to the out-of-class English input of study participants. It also seems this would be a good time to investigate the learning of other world languages in non-target language contexts.
A great deal more ER research is needed in languages other than English if for no other reason than to eliminate arguments that research based on a global language such as English cannot apply to commonly or less commonly used or taught world languages. However, cross-linguistic evidence of efficacy in vocabulary acquisition, reading rate and comprehension gains, and growth in L2 reading motivation would also mitigate the naysayers who see ER as a tool for learners in non-target language contexts. For that purpose, there are now well-designed experimental and quasi-experimental studies on English ER that could be replicated with commonly taught world languages such as Spanish, French, Chinese, and Arabic. However, until there are adequate ER reading materials in these and other languages, large-scale studies are not likely to occur. Hence, we have a chicken–egg dilemma to contend with.
The bulk of the studies referenced in this review, however, are in relation to the second general category: research on ER practices and procedures in the classroom/curriculum. While the “gold-standard” of ER may still be the emergence of an eager L2 reader, the realities of mostly required language learning necessitate creativity and innovation to keep students reading, and also to increase the credibility of ER in the eyes of curriculum managers, ministries of education, and other stakeholders. For these reasons, the effect of follow-up activities, in particular, is an area of deep interest. The research presented here suggests there are many means of enhancing motivation or at least compliance.
However, there is little evidence yet how these different follow-up forms might affect language learning through ER. There may be an effect based on whether learners are writing a short summary, discussing the content with their peers, giving a short talk, or taking a quiz on the content, but we know of no studies that specifically investigate how the nature of the follow-up might affect the students’ overall improvement, or which specific ability areas are enhanced. Summary writing, for example, might lead to greater recall of the grammar and lexis as well as better comprehension of the content, but with a trade-off in less time spent actually reading. Students who must take a short quiz afterwards, even with easy questions on the overall flow of the story, might nevertheless pay more attention to details, and perhaps reading more slowly, than a student who is simply reading the story for pleasure.
A further aspect of implementation that these studies address is that of materials. Although most support the use of graded readers, this research suggests there might be ways of enhancing the input to increase learning while reading. This could be listening while reading, using bilingual texts, or applying digital tools that provide definitions for unknown words. As with follow-up activities, it remains to be seen whether or how the learning through ER differs among these enhancements. In the case of paper versus digital reading materials (Park & Lee, Reference Park and Lee2021), all the learners increased in their reading comprehension, but it was not of the same nature.
With all the variations in implementation, there is clearly a need for many comparative studies as suggested above. There is yet much to learn about the benefits and the role of ER in language learning, but the studies in this review confirm the following:
• replacing a portion of an IR curriculum with ER will increase language gains over time,
• self-selecting and reading linguistically accessible texts frequently over time will lead to increases in reading rate, reading comprehension, and durable vocabulary knowledge,
• providing learners with diverse texts and text modes increases engagement and learning, and
• necessitating some type of follow-up activity will increase learner accountability and learning.
The need for more research should not suggest that some “best” practice will emerge at some point. Rather, with more knowledge, practitioners, advocates, curriculum designers, and materials developers, can make the right choices for ER implementation for their specific contexts.
Questions arising
1. Considering that ER is only implemented in many cases for one or two terms, what can be done to help students further improve their vocabulary knowledge?
2. While there are programs that carry on with ER for two years or more, there have been few longitudinal studies that focus on improvement on specific aspects of ER, such as increase in reading speed, increase in vocabulary breadth or depth, or its effect on improvement in listening or writing. How can we encourage more targeted longitudinal studies?
3. Since ER is often implemented in small class groups, how is it possible to make future research more robust?
4. What steps are needed to make ER easier to implement effectively?
5. Since graded readers are initially expensive and also require management and maintenance, would the use of online readers be a possible solution?
6. Many teachers are unwilling to do ER since they see it as an increase in the workload. Does this need to be the case?
7. Some research has been conducted on reading while listening but mostly with students all listening at the same time. How can we implement effective research where the students read and listen at their own pace?
8. A number of measures have been attempted to support ER, such as listening or glossing potentially difficult vocabulary. What is the impact of such supports, and can they be implemented effectively in most contexts?
9. Students often take multiple English classes with only one of them containing an ER component. How can we attribute progress to the ER in such a situation?
10. Since ER is implemented over one or more semesters, how do student attitudes and the volume of words read change over time, based on specific student types or other variables?
Competing interest
The authors declare that they have no competing interests with any aspect of the content of this article.
Thomas N. Robb, Ph.D., University of Hawaii, is Professor Emeritus, Kyoto Sangyo University, Japan. He is a long-time user of computer assisted language learning (CALL) and the internet, and has created a number of websites and applications for extensive reading, student projects, interactive learning, and professional exchange. He has held numerous leadership positions in International TESOL, JALT (Japan), PacCALL, and now is Chair of the Extensive Reading Foundation. He is also the Editor of TESL-EJ, the first online journal for ELT. He has received the “Milne Innovation Award” from the Extensive Reading Foundation and a “Lifetime Achievement Award” from the Computer Assisted Language Learning journal in 2021.
Doreen Ewert is Professor in the Department of Rhetoric & Language at the University of San Francisco, USA and Director of Academic English for Multilingual Students Program. She is also a Professor in the MA TESOL Program at LCC International University in Klaipeda, Lithuania. Her areas of research and writing are pedagogically oriented and include Second Language or Foreign Language (SL/FL) writing development and assessment, curriculum implementation, content-based instruction, fluency-development, and extensive reading. She has served in leadership positions with International TESOL, CEA, and is also on the Executive Board of the Extensive Reading Foundation.