Hostname: page-component-586b7cd67f-rdxmf Total loading time: 0 Render date: 2024-11-22T08:01:20.390Z Has data issue: false hasContentIssue false

Norbert Schmitt's essential bookshelf: Formulaic language

Published online by Cambridge University Press:  21 April 2022

Norbert Schmitt*
Affiliation:
University of Nottingham, Nottingham, UK
Rights & Permissions [Opens in a new window]

Extract

In this series, Language Teaching invites a well-established scholar to make a personal choice of 12 works that he or she regards as essential reading for those interested in an historical and contemporary overview of the key work in the area chosen for study.

Type
First Person Singular
Copyright
Copyright © The Author(s), 2022. Published by Cambridge University Press

The essential bookshelf

In this series, Language Teaching invites a well-established scholar to make a personal choice of 12 works that he or she regards as essential reading for those interested in an historical and contemporary overview of the key work in the area chosen for study.

  Norbert Schmitt is Emeritus Professor of Applied Linguistics at the University of Nottingham and is interested in all aspects of second language vocabulary. He has published numerous books on vocabulary and applied linguistics, and over 100 journal articles and book chapters. His h-index is over 60, with more than 30,000 citations.

In the same year when I began my M.Ed. in English Language Teaching (ELT) at Temple University, Japan (Reference Nation1990), Paul Nation published Teaching and learning vocabulary, which proved to be the catalyst for creating the modern movement for the principled instruction of vocabulary that continues to flourish to this day. But as important as this book was, it still treated vocabulary as consisting of mainly individual words. There had been a few voices in the wilderness pointing out that vocabulary was considerably more than this, and consisted of large numbers of lexical items composed of multiple words (e.g. Pawley & Syder, Reference Pawley, Syder, Richards and Schmidt1983). In Reference Nattinger and DeCarrico1992, a breakthrough book appeared highlighting these ‘multi-word items’: Nattinger and DeCarrico's Lexical phrases and language teaching. Since then, work on these multi-word items has proliferated, and today it is one of the most topical and most researched aspects of second language (L2) vocabulary.

In the development of research into multi-word items, I discern at least three ongoing and interrelated strands. The initial one involves describing the phenomenon: asserting the existence of these items, determining their extent in language and how they are used, understanding why they are so prevalent, and identifying and creating lists of the items. The second strand concerns the pedagogical issues of how to teach these items. This mainly involves studies that explore how learners learn and use these items, and the efficacy of various instruction and testing techniques. It also, in a crossover with the descriptive strand, involves lists of the most useful items to teach. The third strand explores the processing/acquisition of the items, often using psycholinguistic techniques to understand how the mind acquires, stores, and uses these items to best effect.

One of the insights from the descriptive strand is that multi-word items come in many different types, each with their own characteristics and behaviors, including among others, idioms (over the moon), phrasal verbs (pick up), collocations (strong coffee), lexical bundles (and as a result of), and proverbs (Too many cooks spoil the soup). This led to a plethora of terms to describe the various items, with Wray (Reference Wray2002) counting over 50! In my book Researching vocabulary: A vocabulary research manual (Schmitt, Reference Schmitt2010), I suggested the cover term formulaic language (FL) to describe the overall phenomenon of multiple word lexical items, and formulaic sequence (FS) to describe the individual items. These suggestions seem to be gaining some traction in the field, and I will use these conventions in this article.

Given that there must now be thousands of books, chapters, and articles on FL, it is inevitably difficult to present my bookshelf with only a dozen as essential reading. I have chosen those I believe give some background across the wide range of perspectives on FL that have flowered since 1990. Some are historical and were drivers of the early interest in FL but remain on my bookshelf. They are still relevant today because it is important to remind oneself how the field has developed and what the issues were/are that pushed this development. These publications were also personally important for me, because they were part of what spurred my interest in vocabulary in the first place.

Some of my other selections are the seminal articles/books/chapters that launched new perspectives. Still others are ones that I find are particularly good discussions of the various facets of FL (e.g. the pedagogical issues of teaching and testing FSs), the kind that I recommend as overview reading to my students. All of my selections are well-cited and very influential, at least for the length of time they have been out. Taken together, I think they give a good overview of our current understanding of the nature of FL, and how it is described, learned, and used.

Descriptive strand

There is a lot of formulaic language in typical language use

(1) Sinclair, J. M. (1991). Corpus, concordance, collocation. Oxford University Press.

People have always known that FL exists. After all, people know about idioms, and they are obviously not your typical single word items. Furthermore, Stubbs (Reference Stubbs, Römer and Schulze2009) points out that, although either Harold Palmer in Reference Palmer1933 or J. R. Firth in Reference Firth1957 are typically credited with first discussing collocations, the idea that words seem to occur in partnerships extends back to the mid-1700s. Nonetheless, despite this long history, FL was still seen as a relatively peripheral phenomenon, and not considered worthy of all that much attention, in terms of either research or pedagogy. Thus, the first step in putting FL on the map was simply demonstrating that FL was pervasive and useful enough to be a mainstream issue. Besides Pawley and Syder (Reference Pawley, Syder, Richards and Schmidt1983 – see below), the most persuasive argument for this was probably Sinclair's book. It contained a range of Sinclair's ideas on corpus research and language usage, but crucially introduced the notion of two principles of language selection. The open-choice principle states that it is theoretically possible to combine words in a virtually unlimited range of ways. However, the idiom principle notes that in day-to-day usage, people prefer their language to be more conventional and predictable, and thus easier to comprehend and produce. This is largely done through FL, as it is a major way of ‘making meaning’ and has its own range of interesting characteristics.

Sinclair was one of the founders of corpus linguistics, and his genius was in using corpora to demonstrate how language was actually used, rather than how scholars presumed (or worse, prescribed) how it was used. Using early corpora, he showed that typical language use was not just sprinkled with FL use (as in the odd idiom), but rather was permeated with it. Later research has calculated that between one-third and one-half of discourse is made up of FL (e.g. Erman & Warren, Reference Erman and Warren2000). Sinclair's book received widespread recognition, proving to be one of the seminal publications that engendered the field of FL research, and continues to be widely cited.

I was lucky enough to attend one of Sinclair's week-long seminars at his Italian Tuscan Word Centre in 1997. While there, he showed us the full extent of the idiom principle. The more we looked at concordance lines, the more lexical patterning we discovered. I came away convinced that language is essentially structured around the kind of lexical meaning-based clusters that Sinclair discusses, and that FL would need to be a major focus of my vocabulary investigations.

Why do people use so much formulaic language? Making language sound natural and making it easy to produce and comprehend

(2) Pawley, A., & Syder, F. H. (1983). Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In J. C. Richards & R. W. Schmidt (Eds.), Language and communication (pp. 191–226). Longman.

While Sinclair's arguments were corpus-based, the son-mother team of Pawley and Syder approached FL from a processing perspective, that is, why is FL so prevalent? This was an important complement to the early corpus research, because it illustrated two reasons for the large amount of FL that corpus research was illuminating.

At the time, the field was still dominated by transformational grammar, and its emphasis on syntactic rules allowing an almost unlimited range of ways of saying things (open-choice principle). For example, a marriage proposal could be expressed as:

  1. (a) It is my desire to marry you.

  2. (b) I wish to be wedded to you.

  3. (c) Will you marry me?

Although (a) and (b) are grammatically correct, a suitor had better choose (c) if they wish to be successful! Pawley and Syder argue that there are often such conventionalized ways of saying things, that these ways are often formulaic in nature, and that through continual use, they become the default, and thus, ‘nativelike’.

Pawley and Syder also discuss why speakers can be so fluent by using FL. They suggest that FSs can be fluently produced because they are already memorized (i.e. as prefabricated phrases that are stored as single wholes) and are, as such, instantly available for use without the cognitive load of having to assemble them on-line as one speaks. In essence, Pawley and Syder suggest that the mind uses its vast long-term memory to store these prefabricated phrases in order to compensate for a limited working memory and a limited capacity to compose novel language on-line. Indeed, later research by Kuiper (Reference Kuiper and Schmitt2004) shows that speakers who operate under severe time constraints (play-by-play sports announcers, auctioneers) use a great deal of FL in their speech.

Before I read Pawley and Syder, I was already becoming aware of FL through the work of Sinclair and others. But Pawley and Syder's processing perspective particularly resonated with me because it explained how FL use made language use better, both in terms of naturalness and fluency. It also started me thinking about how we might go beyond corpora and test their (then undemonstrated) assertions. This led me to consider approaches using psycholinguistic measurement procedures (such as eye tracking), which had their fruition in the (2004) book, Formulaic sequences (see below).

Why do people use so much formulaic language? Expressing meaning and getting things done

(3) Nattinger, J. R., & DeCarrico, J. S. (1992). Lexical phrases and language teaching. Oxford University Press.

While Sinclair pointed out the widespread use of FL in everyday communication, and Pawley and Syder the processing reasons behind this use, the field was still developing an argument persuasive enough to excite teachers and learners about FL. Around this time, I still remember watching Michael Lewis doing his charismatic best to popularize FL at conferences and seminars, and producing influential books that spoke directly to teachers (e.g. The lexical approach (Lewis, Reference Lewis1993)). But it remained to Nattinger and DeCarrico to publish the essential communication-based argument for why FL is a crucial component of language.

Language is used to express meaning, and to get things done in life, and this practical usage can be seen in terms of functions. Functional language use is about doing things (typically expressed in -ing forms): apologizing, requesting, giving condolences, and so forth. In the early 1990s, there was much discussion of functions and functional specifications/syllabi were in vogue (e.g. The threshold level – Van Ek & Trim, Reference Van Ek and Trim1991). Nattinger and DeCarrico's contribution was in explaining how these very frequent and useful language functions were typically realized by FL. The more frequent a recurrent function is (e.g. greeting), the more likely there will be one or more conventionalized FS to represent it (e.g. How are you? [US]; You alright? [UK]). Because FL is so important for communication, it was made very clear that it deserves a place in language instruction.

I did the M.Ed.-ELT course at Temple University, Japan, from 1990–1992. It was very pedagogically oriented, and as I progressed, I gradually became particularly interested in vocabulary issues. As I read this book from my shelf, I remember receiving the ‘lights-on’ inspiration that if we want to teach our students to use their L2 effectively, we need to teach them the FSs that enact the functions they wished to carry out. From that time, FL has always been a key component when I think of L2 vocabulary instruction. Although the book is now three decades old, I still regard it as a very useful introduction to the world of FL, and why teachers and learners need to look beyond individual words if they want to use English to communicate well.

Pedagogic strand

Moving from description to use: How do learners actually use – and misuse – L2 collocations?

(4) Nesselhauf, N. (2005). Collocations in a learner corpus. John Benjamins.

Of all the categories of FL, it is probably fair to say that collocations have received the majority of attention from researchers, and most early studies were interested in identifying and describing collocations. What was lacking was a major focus on how learners actually used (or misused) collocations. I enjoyed watching a strong strand of research eventually coming out of the Centre for English Corpus Linguistics at Université Catholique de Louvain headed by Sylviane Granger. She and her many protégés have used learner corpora to describe the patterns of learner collocational usage (e.g. Granger, Reference Granger and Cowie1998), and it has been a pleasure to discuss these patterns with them over the years.

But the publication that probably had the greatest impact on me at the time was Nadja Nesselhauf's (2005) book that reported her Ph.D. research (with initial findings reported in Nesselhauf, Reference Nesselhauf2003). She examined essays from the German subcorpus of the International Corpus of Learner English (ICLE) for verb-object-noun combinations. Her major finding was that even advanced German learners of English have considerable difficulties in producing appropriate combinations, with about one-quarter to one-third of those she extracted from the corpus being judged as at least partially erroneous. She also concluded that first language (L1) influence was a factor in a large number of these errors. Because of these errors, she argues that awareness-raising about collocations is not enough, and that some explicit teaching is necessary.

Although some elements of her methodology can be challenged (see Durrant, Reference Durrant2007), Nadja's paper remains for me an important step in the early documentation of the difficulties that many learners face with FL. I certainly came away from it with an enhanced understanding of lexical acquisition. While it is true that in some advantageous contexts (e.g. northern European countries like Belgium and Sweden), there can be considerable incidental learning of vocabulary outside the classroom from extramural exposure (e.g. De Wilde & Eyckmans, Reference De Wilde and Eyckmans2017), it seems that in the majority of cases, vocabulary is not reliably just ‘picked up’ from communicative interaction, but rather, an intentional focus is often needed. Nesselhauf showed this is as true for FSs as it is for individual words. She also reinforced that notion that we are never far away from L1 influence in L2 vocabulary.

How should we teach formulaic language?

(5) Boers, F., & Lindstromberg, S. (2012). Experimental and intervention studies on formulaic sequences in a second language. Annual Review of Applied Linguistics, 32, 83–110.

From the publications above and many others, it became clear that FL was very common in language usage, and that it carried out important communicative functions. This provided a strong rationale for why it should be taught. But how to teach it? There is a disparate literature on FL instruction, and it is difficult to point to one publication that stimulated the field (or myself) more than others. It is easier for me to point to scholars who focused on FL instruction over time and made ongoing contributions to the area. Michael Lewis was an important early popularizer of FL, and Phil Durrant, Batia Laufer, Elke Peters, and Stuart Webb have all made sustained contributions to the area. But I think the most noteworthy researcher has been Frank Boers, who with various colleagues (particularly June Eyckmans and Seth Lindstromberg) has been researching FL instruction for over 20 years. They have explored a range of techniques including awareness raising, a focus on sound repetition (alliteration, rhyme, assonance), use of metaphorical frameworks, and etymological analysis, and have basically found that anything that draws learners’ attention to FSs and encourages deeper engagement with those sequences will facilitate their acquisition.

I read all of the original papers as they came out, but with so many studies exploring so many different techniques, I find it is easy to get them all a bit muddled together in my mind. This is where a survey article can be useful, tying all the studies together into a coherent ‘big picture’. And so much the better if the original researcher is the one doing the synthesizing, as they know the studies better than anyone else (particularly the details that there is never space to adequately report!). For this reason, I have chosen the Boers and Lindstromberg chapter for the shelf. It is a comprehensive 28-page overview that ties together the research from 2004–2012 in a logical and accessible manner. It also points out avenues for continuing research, which are still being pursued today. Whenever I refer back to it, I always come away with a clearer view of the various possibilities for intentional instruction. It is also a useful overview to assign to students, although today I would supplement it with the newer Pellicer-Sánchez and Boers (Reference Pellicer-Sánchez, Boers, Siyanova-Chanturia and Pellicer-Sánchez2019), which also includes sections on incidental and semi-incidental learning.

Which formulaic sequences should we teach?

(6) Simpson-Vlach, R., & Ellis, N.C. (2010). An academic formulas list: New methods in phraseology research. Applied Linguistics, 31(4), 487–512.

Ellis, N. C., Simpson-Vlach, R., & Maynard, C. (2008). Formulaic language in native and second language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly, 42(3), 375–396.

Concurrent with the research into FL teaching methods, there have been attempts at providing guidance into which FSs to teach. There are probably as many or more FSs as there are individual words, and so it is clearly impossible to address them all explicitly. So, which ones warrant spending precious teaching time on? There are dictionaries for idioms, collocations, and other categories of FL, but these are too large to be of much help in prioritizing the items that most merit instruction time. The solution is lists that limit the number of most important items down to pedagogically viable numbers. There have been several useful lists for various categories of FL: e.g. spoken idioms (Liu, Reference Liu2003), academic lexical bundles (Biber et al., Reference Biber, Conrad and Cortes2004), phrasal expressions (Martinez & Schmitt, Reference Martinez and Schmitt2012), academic collocations (Ackermann & Chen, Reference Ackermann and Chen2013), and phrasal verbs (Garnier & Schmitt, Reference Garnier and Schmitt2015). (Also see the pedagogical prioritization criteria based on frequency and transparency suggested by Martinez, Reference Martinez2013.)

However, the list to which I most often return is the one for academic formulas compiled by Nick Ellis, Rita Simpson-Vlach, and Carson Maynard. This is because I have come around to thinking that vocabulary lists need to be validated the same as vocabulary tests do, that is, that the lists need to be shown to work for particular purposes and for particular learners (Schmitt & Schmitt, Reference Schmitt, Celce-Murcia, Schmitt and Rodgers2020, Chapter 5). Ellis and colleagues used the criteria of frequency, Mutual Information (MI), and range to select the most useful academic formulas for learners. But they went further and checked whether an expert panel of English for Academic Purposes (EAP) teachers and language testers thought that the frequency-derived sequences were indeed formulaic and worth teaching. They then tested natives and nonnatives with several psycholinguistic measures (e.g. speed of reading and acceptance in a grammaticality judgment task) to ascertain their mastery of the formulas. This combination of corpus-based identification criteria with judgements of educational value and participant-based knowledge/processing criteria is a good early example of providing extended validation evidence for the pedagogical importance of the items on a proposed list. Note that owing to the journal length constraints, the project was reported in two separate papers that should be read together: the list itself and the frequency/MI/range/educational value criteria (Simpson-Vlach & Ellis, Reference Simpson-Vlach and Ellis2010), and the psycholinguistic components (Ellis et al., Reference Ellis, Simpson-Vlach and Maynard2008).

How can we test knowledge of formulaic language?

(7) Gyllstad, H. (2020). Measuring knowledge of multiword items. In S. Webb (Ed.), The Routledge handbook of vocabulary studies (pp. 387–405). Routledge.

If FL is worth teaching, then it stands to reason that it should be tested to ascertain the effectiveness of that teaching and whether learners know key FSs. However, the development of satisfactory tests of FL has proven to be difficult. There are various categories of FL, and these may need different testing (and teaching) approaches. As most research on FL has been on collocations, it is unsurprising that the exploration of FL assessment has focused on this category.

There have been numerous attempts to develop a test format for collocational knowledge, none of them fully successful. Probably the best-known is John Read's (Reference Read1993) Word Associates Format (WAF) test. It has been used or adapted by a number of researchers (e.g. Qian, Reference Qian2002; Schoonen & Verhallen, Reference Schoonen and Verhallen2008), but despite this, the scoring has never been straightforward (Schmitt et al., Reference Schmitt, Ng and Garras2011). The WAF test has gone through various iterations (e.g. Read, Reference Read and Kunnan1998), and I heard John speak about the test format at the Vocab@Tokyo conference in 2016. The fact that it was still in development 23 years after its inception illustrates the difficulty in developing item formats to measure FL. Other tests that have been advocated include the Discriminating Collocations (DISCO) test (Eyckmans, Reference Eyckmans, Barfield and Gyllstad2009) and the Constituent Matrix (CONTRIX) test (Revier, Reference Revier, Barfield and Gyllstad2009), and while good efforts, in my judgement they are all limited in their ability to demonstrate learners’ collocational knowledge. Thus, I feel that the field still lacks a truly viable collocation test.

This makes it difficult to recommend a single definitive reading on the testing of FL. Without a ‘standard’ test to refer to, the reading could most usefully review the various test formats explored to date, critique their strengths and limitations, and ideally suggest how we can take the next step towards a truly useable test. I think the publication that best accomplishes this is Henrik Gyllstad's overview chapter. Henrik has been involved with FL (particularly collocation) testing for 20 years. He studied a range of collocation testing formats during his 2002–2007 Ph.D., in part supervised by Beatrice Warren of the oft-cited Erman and Warren (Reference Erman and Warren2000) paper on ‘prefabs’. I remember commenting as examiner at his Swedish viva that his thesis (Gyllstad, Reference Gyllstad2007) was a good start, but that the field still had a long way to go before a truly adequate collocation test could be developed. Henrik agreed with this, and has been engaged with collocation (and lexical) testing ever since. His overview chapter deserves its place on the shelf for its accessible introduction to the various test formats that have been tried over the years, and includes a thought-provoking segment about how psycholinguistic techniques may provide improved testing solutions in the future.

Processing/acquisition strand

How can a range of research and assessment approaches be used to answer interesting questions about formulaic language?

(8) Schmitt, N. (Ed.). (2004). Formulaic sequences: Acquisition, processing, and use. John Benjamins.

By the early 2000s, the corpus-driven descriptive strand was well established, and the pedagogic strand was gaining momentum. But the field was only just waking up to the possibilities of applying psycholinguistic techniques to the understanding of the acquisition and processing of FL. The time was ripe for an attempt to trial a number of different research approaches in researching FL, and happily in 2001, my colleague Zoltan Dörnyei and I received a grant to do just this. It was an exciting time at the University of Nottingham, as we recruited various colleagues to pursue research studies based on corpus, psycholinguistic, sociological, and pedagogical approaches. The findings of these various studies were published together in this edited volume.

I consider the work seminal in terms of showing the value of a number of new/emerging research paradigms to the field. Perhaps the most successful of these has been eye tracking. To the best of my knowledge, the Underwood et al. (Reference Underwood, Schmitt, Galpin and Schmitt2004) study was the first to use eye-tracking to measure the mental processing of FL, and this technique has now blossomed into almost a field of its own, with its own research handbooks and overviews (e.g. Conklin et al., Reference Conklin, Pellicer-Sánchez and Carrol2018; Godfroid, Reference Godfroid2020). Case studies showed the importance of societal engagement in learning FL, with sociocultural adaptation/acculturation proving to be a key factor in FL acquisition. In a pedagogic study, Jones and Haywood (Reference Jones, Haywood and Schmitt2004) showed both the benefits and limitations of extended classroom instruction in FL.

Overall, the book's place on my shelf is deserving as it has been instrumental in showing that a wide range of research methodologies can be used to answer the kind of FL questions we want to ask. I am very pleased that many of these methodologies have been taken up and advanced upon by subsequent FL researchers. Examples of this include the use of self-paced reading (e.g. Kim & Kim, Reference Kim and Kim2012), eye-tracking (e.g. Kessler et al., Reference Kessler, Weber and Friedrich2021), and dictation (Nekrasova, Reference Nekrasova2009). There has also been a call for the pedagogic Jones and Haywood study to be replicated (Coxhead, Reference Coxhead2018). Paul Nation and Averil Coxhead sum up the value of the book, generously commenting that it explored FL ‘in imaginative and innovative ways using methodologies such as eye-tracking that provided new insights into learning and into how to do applied linguistics research’ (Reference Nation, Coxhead, Szudarski and Barclay2022, p. 169).

Measuring implicit, as well as explicit, knowledge of FL

(9) Sonbul, S., & Schmitt, N. (2013). Explicit and implicit lexical knowledge: Acquisition of collocations under different input conditions. Language Learning, 63(1), 121–159.

The most pleasurable aspect of my career has been the mentoring of extremely capable Ph.D. students. This has been a synergistic relationship, where I often learned as much or more from my apprentices than they did from me. This leads to the story behind this selection. From the very start of my career, I have been interested in vocabulary acquisition and the measurement of that acquisition, and in fact my own Ph.D. thesis included five studies that were eventually published that explored the acquisition and measurement of a range of word knowledge types (Schmitt, Reference Schmitt1998a, Reference Schmitt1998b, Reference Schmitt1998c, Reference Schmitt1999; Schmitt & Dunham, Reference Schmitt and Dunham1999). As this was in the mid-1990s, I used interviews and paper-and-pencil tests to measure vocabulary knowledge. As such, my measurements were almost exclusively of explicit, declarative knowledge.

But over the subsequent years, my apprentices’ research has led both me and the field into much richer and more nuanced measurements/descriptions of vocabulary knowledge/acquisition. One example of this is Beatriz González-Fernández's study into the multiple components of vocabulary knowledge and their interrelationships (González-Fernández & Schmitt, Reference González-Fernández and Schmitt2020). In some ways, it is a nice bookend to my career, as Beatriz was my last Ph.D. student, and her study is a quantum advancement beyond my Schmitt (Reference Schmitt1998c) Ph.D. study.

Another example of this more sophisticated methodology is the exciting research to measure and describe the acquisition of implicit vocabulary knowledge. For individual words, this kind of research is well illustrated by Ana Pellicer-Sánchez's (Reference Pellicer-Sánchez2016) eye-tracking study that documented the incidental acquisition of unknown words during reading. I still remember brainstorming about this with Ana in my office, but how she ran with the idea and developed it into a seminal study is amazing.

But the present article is focused on FL, which brings me back to my bookshelf selection. Unfortunately, there is less research into implicit knowledge of FL than there is into implicit knowledge of individual words. A notable exception is a study by another of my former Ph.D. students, Suhad Sonbul. She believed that more FL learning typically occurs than can be captured by paper-and-pencil tests, and so designed a study with both explicit (form recall and form recognition) and implicit (priming) measures. Her results showed that all three of her input conditions (enriched, enhanced, and decontextualized) led to long-term gains in explicit knowledge of collocations, but that none facilitated implicit learning, at least not to the extent that would show up with her priming paradigm.

Usefully, Suhad's multiple-treatment, multiple-measurement research design has been followed up by others, which has led to a better understanding of explicit vs. implicit learning. For example, in a conceptual replication, Toomer and Elgort (Reference Toomer and Elgort2019) found that, with more exposure sessions, some implicit learning did take place. Furthermore, Obermeier and Elgort (Reference Obermeier and Elgort2021) found some implicit idiom learning from a flashcard treatment, but not from a contextualized treatment. Thus, studies like these show that various teaching techniques may be better at promoting certain kinds of lexical knowledge, but not necessarily others, and that the amount of exposure may differentially affect the kinds of knowledge gained. They also show the importance of using diversified measures in conjunction to gain a fuller and more accurate understanding of the effects of various teaching techniques.

Acknowledging that different categories of FL may be processed differently

(10) Carrol, G., & Conklin, K. (2020). Is all formulaic language created equal? Unpacking the processing advantage for different types of formulaic sequences. Language and Speech, 63(1), 95–122.

As part of my collection, I felt I had to include a selection that represented the vocabulary-based research that exists in psycholinguistically-oriented journals. There is a considerable body of this research, but many vocabulary specialists and teachers seem unaware of it. But deciding which study to include was tricky. In the last five years, my attention has been mainly concentrated on developing the Knowledge-based vocabulary lists with colleagues (Schmitt et al., Reference Schmitt, Dunn, O'Sullivan, Anthony and Kremmel2021, Reference Schmitt, Dunn, O'Sullivan, Anthony and Kremmel2022). This means that I had rather lost track of the psycholinguistic strand and was unsure of what study to recommend.

Another of my former Ph.D. students, Laura Vilkaitė-Lozdienė, came to the rescue, and suggested that I should look close to home for an excellent example. Kathy Conklin is a groundbreaking scholar who has made a sustained study of FL processing, and as my colleague at the University of Nottingham, it was very interesting to see her work develop over the years. In this recent study, she and her former protégé Gareth Carrol explore what factors affect the processing of various categories of FL. Specifically, they look at which factors facilitate the reading of three different categories of FS (idioms, binomials, collocations) in sentence contexts. Unsurprisingly, higher frequency facilitated the reading of all three categories, but various other factors (familiarity, decomposability, predictability, semantic association, mutual information) affected the processing of the categories differentially.

The study is important because it intentionally explored the processing of various categories of FL, rather than assuming that all FL processing is the same. This does add complexity but is closer to the reality of vocabulary acquisition and use. We know that the learning burden of individual words depends on many factors, for example, length, word class, frequency, phonotactic regularity, imageability, and congruency with L1 norms (Laufer, Reference Laufer, Schmitt and McCarthy1997). Kathy and Gareth show that different categories of FL have different processing advantages depending on their characteristics, and I would bet that those characteristics also affect their learning burdens.

Although this is a very recent article, I hope that this more nuanced view of FL categories (they are not all learned/used/processed the same!) will prompt scholars to take account of the different characteristics and behaviors of the various categories when designing and interpreting their future research. (Also see work by Brent Wolter and colleagues on this, e.g. Wolter & Yamashita, Reference Wolter and Yamashita2015).

Research overviews

Setting the state-of-the-art for the field of formulaic language

(11) Wray, A. (2002). Formulaic language and the lexicon. Cambridge University Press.

While it is always important to read the original research for any applied linguistics topic, some very good resources have appeared in the guise of monograph overviews, edited collections, and handbooks. The very best do much more than just summarize the original research; they also provide authoritative interpretations and perspectives on the state-of-the-art of the field, usually in much more detail than any single research article can offer. Such books should figure in one's reading. For formulaic language, Phraseology: Theory, analysis and applications (Cowie, Reference Cowie1998), Phraseology: An interdisciplinary perspective (Granger & Meunier, Reference Granger and Meunier2008), Researching collocations in another language (Barfield & Gyllstad, Reference Barfield and Gyllstad2009), the Annual Review of Applied Linguistics: Volume 32: Topics in formulaic language (Polio, Reference Polio2012), The Routledge handbook of vocabulary studies (Webb, Reference Webb2020), and Vocabulary theory, patterning and teaching (Szudarski & Barclay, Reference Szudarski and Barclay2021) all provide many valuable insights into FL.

However, none have been as influential as Wray's (Reference Wray2002) masterpiece. When I was first invited to write this bookshelf article, I knew that it would have to be included, as it is probably no exaggeration to say that it is the single most important work in the FL canon. It is the second most cited of my selections with >4,000 citations (as of 5 October 2021), having been an essential reference for most of the current century. (John Sinclair's Corpus, concordance, collocation has >10,000 citations.)

When I went back to review Alison's book for this article, I was reminded of why it is still so essential. It is largely because of the breathtaking diversity of research she synthesizes, including that from a range of fields (e.g. L1 child acquisition, SLA, attrition and aphasia, song and memory). While John's book has more citations, Alison's book draws on a much wider range of research than just corpus data, and so offers a much fuller description of ways in which FL is learned and used. Her book is perhaps best known for her definition of FL (although her later 2008 book Formulaic language: Pushing the boundaries offers a more usable definition), and the iconic quote that emphasizes the importance and naturalness of FSs for L2 instruction and use:

The consequence [of concentrating on word-sized units in L2 learning] is a failure to value the one property of nativelike input which is most characteristic of the idiomaticity to which the learner ultimately aspires: words do not go together, having first been apart, but, rather, belong together, and do not necessarily need separating’ (Wray, 2002, p. 212, original emphases).

Overall, one simply cannot have a good understanding of FL unless the ideas in Alison's book are taken onboard.

The most current overview

(12) Siyanova-Chanturia, A., & Pellicer-Sánchez, A. (Eds.). (2019). Understanding formulaic language: A second language acquisition perspective. Routledge.

Moving on 20 years from Wray's book, the interest in FL has not waned. The current best single comprehensive overview of the continuing research on FL is Anna Siyanova-Chanturia and Ana Pellicer-Sánchez's edited collection. From another two of my former Ph.D. students, it is extremely pleasing to see how their careers have soared, and the fact that they have been able to put together this excellent overview shows how far they have progressed. What I find particularly recommendable for this inclusion is that they approach FL from multiple perspectives, engaging with not only the descriptive perspective of FL, but also the pedagogical, processing, and social-cultural/pragmatic perspectives as well. This makes their book very well-rounded, touching on a wide range of FL viewpoints. For example, Lin's (Reference Lin, Siyanova-Chanturia and Pellicer-Sánchez2019) chapter reminds us that FL is just as important, if not more, in speech than it is in the more easily-researchable (and thus more commonly researched) written form. They were also able to solicit some of the most notable names in the FL/lexis arena: Frank Boers, Tom Cobb, Phil Durrant, Sylviane Granger, Henrik Gyllstad, Alison Wray, and myself, in addition to their own contributions and those of other specialists (e.g. Kathleen Bardovi-Harlig, Gareth Carrol, Kathy Conklin, and Stefanie Wulff). This collection provides a very good understanding of what we currently know about FL, and is a useful guide to where FL research might/should go in the future.

Final thoughts

We all live in our own point in time, but to be a good scholar, I feel we need to have a wider view. In order to know where the field is now, it is important to know from where it emerged and what the debates were that formed the current consensus. This means that the key historical publications I have on my shelf are still essential reading, even though some are now decades old.

Likewise, to truly understand a field, it is necessary to look at it from multiple perspectives, because language is far too complex to fit into any single black-and-white description. In my introductory chapter (with Marianne Celce-Murcia) to An introduction to applied linguistics (Schmitt & Rodgers, Reference Schmitt and Rodgers2020), we relate the story from India about the five blind men of Hindustan who went out to learn about an elephant. They all felt different parts of the elephant's body and came to very different conclusions about what an elephant is like. The man who felt the trunk thought an elephant is like a snake, the one who felt a leg thought elephants are like a tree, the one who felt the ear thought elephants are like a fan, and so on. I realize much of my bookshelf highlights individual perspectives of FL (e.g. Nesselhauf→learner usage, Gyllstad→measurement, Carrol & Conklin→psycholinguistic research), but it is only by reading them together and integrating the various perspectives that one can truly begin to grasp the nature of the whole FL ‘elephant’. Overview books like Wray (Reference Wray2002) and Siyanova-Chanturia and Pellicer-Sánchez (Reference Siyanova-Chanturia and Pellicer-Sánchez2019) are particularly useful in this regard because they bring together multiple perspectives in a single volume.

Ultimately, it is only possible to understand a field by reading widely, but I hope that my suggestions are useful to start that process for the interested reader.

References

Ackermann, K., & Chen, Y.-H. (2013). Developing the Academic Collocation List (ACL) – A corpus-driven and expert-judged approach. Journal of English for Academic Purposes, 12(4), 235247. http://dx.doi.org/10.1016/j.jeap.2013.08.002.CrossRefGoogle Scholar
Barfield, A., & Gyllstad, H. (2009). Researching collocations in another language. Palgrave Macmillan.CrossRefGoogle Scholar
Biber, D., Conrad, S., & Cortes, V. (2004). If you look at …: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371405. https://doi.org/10.1093/applin/25.3.371.CrossRefGoogle Scholar
Boers, F., & Lindstromberg, S. (2012). Experimental and intervention studies on formulaic sequences in a second language. Annual Review of Applied Linguistics, 32, 83110. doi:10.1017/S0267190512000050.CrossRefGoogle Scholar
Conklin, K., Pellicer-Sánchez, A., & Carrol, G. (2018). Eye-tracking: A guide for applied linguistics research. Cambridge University Press.CrossRefGoogle Scholar
Cowie, A. P. (Ed.) (1998). Phraseology: Theory, analysis and applications. Oxford University Press.Google Scholar
Coxhead, A. (2018). Replication research in pedagogical approaches to formulaic sequences: Jones & Haywood (2004) and Alali & Schmitt (2012). Language Teaching, 51(1), 113123. doi:10.1017/S0261444815000221.CrossRefGoogle Scholar
De Wilde, V., & Eyckmans, J. (2017). Game on! young learners’ incidental language learning of English prior to instruction. Studies in Second Language Learning and Teaching, 7(4), 673694. doi:10.14746/ssllt.2017.7.4.6.CrossRefGoogle Scholar
Durrant, P. (2007). Review of Nadja Nesselhauf's Collocations in a learner corpus. Functions of Language, 14(2), 251261. https://doi.org/10.1075/fol.14.2.09dur.CrossRefGoogle Scholar
Ellis, N. C., Simpson-Vlach, R., & Maynard, C. (2008). Formulaic language in native and second language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly, 42(3), 375396. https://doi.org/10.1002/j.1545-7249.2008.tb00137.x.CrossRefGoogle Scholar
Erman, B., & Warren, B. (2000). The idiom principle and the open choice principle. Text, 20(1), 2962. https://doi.org/10.1515/text.1.2000.20.1.29.Google Scholar
Eyckmans, J. (2009). Towards an assessment of learners’ receptive and productive syntagmatic knowledge. In Barfield, A., & Gyllstad, H. (Eds.), Researching collocations in another language (pp. 139152). Palgrave Macmillan.CrossRefGoogle Scholar
Firth, J. R. (1957). A synopsis of linguistic theory 1930–1955. Transactions of the Philological Society. Special Volume. Studies in Linguistic Analysis, 1–32.Google Scholar
Garnier, M., & Schmitt, N. (2015). The PHaVE list: A pedagogical list of phrasal verbs and their most frequent meaning senses. Language Teaching Research, 19(6), 645666. doi:10.1177/1362168814559798.CrossRefGoogle Scholar
Godfroid, A. (2020). Eye tracking in second language acquisition and bilingualism: A research synthesis and methodological guide. Routledge.Google Scholar
González-Fernández, B., & Schmitt, N. (2020). Word knowledge: Exploring the relationships and order of acquisition of vocabulary knowledge components. Applied Linguistics, 41(4), 481505. https://doi.org/10.1093/applin/amy057.CrossRefGoogle Scholar
Granger, S. (1998). Prefabricated patterns in advanced EFL writing: Collocations and lexical phrases. In Cowie, A. P. (Ed.), Phraseology: Theory, analysis and applications (pp. 145160). Oxford University Press.Google Scholar
Granger, S., & Meunier, F. (Eds.) (2008). Phraseology: An interdisciplinary perspective. John Benjamins.CrossRefGoogle Scholar
Gyllstad, H. (2007). Testing English collocations: developing receptive tests for use with advanced Swedish learners. (Unpublished Ph.D. thesis). Lund University.Google Scholar
Jones, M., & Haywood, S. (2004). Facilitating the acquisition of formulaic sequences. In Schmitt, N. (Ed.), Formulaic sequences: Acquisition, processing, and use (pp. 269300). John Benjamins.CrossRefGoogle Scholar
Kessler, R., Weber, A., & Friedrich, C. K. (2021). Activation of literal word meanings in idioms: Evidence from eye-tracking and ERP experiments. Language and Speech, 64(3), 594624. doi:10.1177/0023830920943625.CrossRefGoogle ScholarPubMed
Kim, S. H., & Kim, J. H. (2012). Frequency effects in L2 multiword unit processing: Evidence from self-paced reading. TESOL Quarterly, 46(4), 831841. doi:10.1002/tesq.66.CrossRefGoogle Scholar
Kuiper, K. (2004). Formulaic performance in conventionalised varieties of speech. In Schmitt, N. (Ed.), Formulaic sequences: Acquisition, processing, and use (pp. 3754). John Benjamins.CrossRefGoogle Scholar
Laufer, B. (1997). What's in a word that makes it hard or easy: Some intralexical factors that affect the learning of words. In Schmitt, N., & McCarthy, M. (Eds.), Vocabulary: Description, acquisition, and pedagogy (pp. 140155). Cambridge University Press.Google Scholar
Lewis, M. (1993). The lexical approach. Language Teaching Publications.Google Scholar
Lin, P. (2019). Formulaic language and speech prosody. In Siyanova-Chanturia, A., & Pellicer-Sánchez, A. (Eds.), Understanding formulaic language: A second language acquisition perspective (pp. 7894). Routledge.Google Scholar
Liu, D. (2003). The most frequently used spoken American English idioms: A corpus analysis and its implications. TESOL Quarterly, 37(4), 671700. https://doi.org/10.2307/3588217.CrossRefGoogle Scholar
Martinez, R. (2013). A framework for the inclusion of multi-word expressions in ELT. ELT Journal, 67(2), 184198. https://doi.org/10.1093/elt/ccs100.CrossRefGoogle Scholar
Martinez, R., & Schmitt, N. (2012). A phrasal expressions list. Applied Linguistics, 33(3), 299320. https://doi.org/10.1093/applin/ams010.CrossRefGoogle Scholar
Nation, I. S. P. (1990). Teaching and learning vocabulary. Newbury House.Google Scholar
Nation, P., & Coxhead, A. (2022). Vocabulary learning and teaching. In Szudarski, P., & Barclay, S. (Eds.), Vocabulary theory, patterning and teaching (pp. 169175). Multilingual Matters.Google Scholar
Nattinger, J. R., & DeCarrico, J. S. (1992). Lexical phrases and language teaching. Oxford University Press.Google Scholar
Nekrasova, T. M. (2009). English L1 and L2 speakers’ knowledge of lexical bundles. Language Learning, 59(3), 647686. https://doi.org/10.1111/j.1467-9922.2009.00520.x.CrossRefGoogle Scholar
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics, 24(2), 223242. https://doi.org/10.1093/applin/24.2.223.CrossRefGoogle Scholar
Obermeier, A., & Elgort, I. (2021). Deliberate and contextual learning of L2 idioms: The effect of learning conditions on online processing. System, 97(102428), 111. https://doi.org/10.1016/j.system.2020.102428.CrossRefGoogle Scholar
Palmer, H. E. (1933). Second interim report on English collocations. Kaitakusha.Google Scholar
Pawley, A., & Syder, F. H. (1983). Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In Richards, J. C., & Schmidt, R. W. (Eds.), Language and communication (pp. 191226). Longman.Google Scholar
Pellicer-Sánchez, A. (2016). Incidental L2 vocabulary acquisition from and while reading: An eye-tracking study. Studies in Second Language Acquisition, 38(1), 97130. https://doi.org/10.1017/S0272263115000224.CrossRefGoogle Scholar
Pellicer-Sánchez, A., & Boers, F. (2019). Pedagogical approaches to the teaching and learning of formulaic language. In Siyanova-Chanturia, A., & Pellicer-Sánchez, A. (Eds.), Understanding formulaic language: A second language acquisition perspective (pp. 153173). Routledge.Google Scholar
Polio, C. (Ed.). (2012). Topics in formulaic language. Annual Review of Applied Linguistics, 32.Google Scholar
Qian, D. D. (2002). Investigating the relationship between vocabulary knowledge and academic reading performance: An assessment perspective. Language Learning, 52(3), 513536. https://doi.org/10.1111/1467-9922.00193.CrossRefGoogle Scholar
Read, J. (1993). The development of a new measure of L2 vocabulary knowledge. Language Testing, 10(3), 355371. https://doi.org/10.1177/026553229301000308.CrossRefGoogle Scholar
Read, J. (1998). Validating a test to measure depth of vocabulary knowledge. In Kunnan, A. (Ed.), Validation in language assessment (pp. 4160). Lawrence Erlbaum.Google Scholar
Revier, R. L. (2009). Evaluating a new test of whole English collocations. In Barfield, A., & Gyllstad, H. (Eds.), Researching collocations in another language (pp. 125138). Palgrave Macmillan.CrossRefGoogle Scholar
Schmitt, N. (1998a). Quantifying word association responses: What is nativelike? System, 26(3), 389401. https://doi.org/10.1016/S0346-251X(98)00019-0.CrossRefGoogle Scholar
Schmitt, N. (1998b). Measuring collocational knowledge: Key issues and an experimental assessment procedure. ITL Review of Applied Linguistics, 119-120, 2747. https://doi.org/10.1075/itl.119-120.03sch.CrossRefGoogle Scholar
Schmitt, N. (1998c). Tracking the incremental acquisition of second language vocabulary: A longitudinal study. Language Learning, 48(2), 281317. https://doi.org/10.1111/1467-9922.00042.CrossRefGoogle Scholar
Schmitt, N. (1999). The relationship between TOEFL vocabulary items and meaning, association, collocation, and word class knowledge. Language Testing, 16(2), 189216. https://doi.org/10.1177/026553229901600204.CrossRefGoogle Scholar
Schmitt, N. (2010). Researching vocabulary: A vocabulary research manual. Palgrave Macmillan.CrossRefGoogle Scholar
Schmitt, N., & Celce-Murcia, M. (2020). An overview of applied linguistics. In Schmitt, N., & Rodgers, M. P. H. (Eds.), An introduction to applied linguistics, (3rd ed, pp. 115). Routledge.Google Scholar
Schmitt, N., & Dunham, B. (1999). Exploring native and nonnative intuitions of word frequency. Second Language Research, 15(2), 389411. https://doi.org/10.1191/026765899669633186.CrossRefGoogle Scholar
Schmitt, N., Dunn, K., O'Sullivan, B., Anthony, L., & Kremmel, B. (2021). Introducing knowledge-based vocabulary lists (KVL). TESOL Journal, 12(4), e622, 110. https://doi.org/10.1002/tesj.622.CrossRefGoogle Scholar
Schmitt, N., Dunn, K., O'Sullivan, B., Anthony, L., & Kremmel, B. (2022). Knowledge-based vocabulary lists. Available on the British Council website: https://www.britishcouncil.org/exam/aptis/aptis-expertise/knowledge-based-vocabulary-lists-kvlCrossRefGoogle Scholar
Schmitt, N., Ng, J. W. C., & Garras, J. (2011). The word associates format: Validation evidence. Language Testing, 28(1), 105126. https://doi.org/10.1177/0265532210373605.CrossRefGoogle Scholar
Schmitt, N., & Rodgers, M. P. H. (Eds.). (2020). An introduction to applied linguistics, (3rd ed), Routledge.Google Scholar
Schmitt, N., & Schmitt, D. (2020). Vocabulary in language teaching, 2nd ed. Cambridge University Press.CrossRefGoogle Scholar
Schoonen, R., & Verhallen, M. (2008). The assessment of deep word knowledge in young first and second language learners. Language Testing, 25(2), 211236. https://doi.org/10.1177/0265532207086782.CrossRefGoogle Scholar
Simpson-Vlach, R. & Ellis, N.C. (2010). An academic formulas list: New methods in phraseology research. Applied Linguistics, 31(4), 487512. https://doi.org/10.1093/applin/amp058.CrossRefGoogle Scholar
Siyanova-Chanturia, A., & Pellicer-Sánchez, A. (Eds.) (2019). Understanding formulaic language: A second language acquisition perspective. Routledge.Google Scholar
Stubbs, M. (2009). Technology and phraseology: With notes on the history of corpus linguistics. In Römer, U., & Schulze, R. (Eds.), Exploring the lexis-grammar interface (pp. 1531). John Benjamins.CrossRefGoogle Scholar
Szudarski, P., & Barclay, S. (Eds.) (2021). Vocabulary theory, patterning and teaching. Multilingual Matters.CrossRefGoogle Scholar
Toomer, M., & Elgort, I. (2019). The development of implicit and explicit knowledge of collocations: A conceptual replication and extension of Sonbul and Schmitt (2013). Language Learning, 69(2), 405439. https://doi.org/10.1111/lang.12335.CrossRefGoogle Scholar
Underwood, G., Schmitt, N., & Galpin, A. (2004). The eyes have it: An eye-movement study into the processing of formulaic sequences. In Schmitt, N. (Ed.), Formulaic sequences: Acquisition, processing, and use (pp. 153172). John Benjamins.CrossRefGoogle Scholar
Van Ek, J. A., & Trim, J. L. M. (1991). Threshold 1990. Council of Europe.Google Scholar
Webb, S. (2020). The Routledge handbook of vocabulary studies. Routledge.Google Scholar
Wolter, B., & Yamashita, J. (2015). Processing collocations in a second language: A case of first language activation? Applied Psycholinguistics, 36(5), 11931221. https://doi.org/10.1017/S0142716414000113.CrossRefGoogle Scholar
Wray, A. (2002). Formulaic language and the lexicon. Cambridge University Press.CrossRefGoogle Scholar
Wray, A. (2008). Formulaic language: Pushing the boundaries. Oxford University Press.Google Scholar