1. Introduction
The double modal is a rare grammatical construction in the English language. As its name suggests, it denotes the use of two modals in a single tensed clause, such as I might could pick you up at the station on Sunday instead of I might be able to pick you up at the station on Sunday. In principle, double modals are disfavored by the ‘rules’ of Standard English Grammar (Huddleston and Pullum Reference Huddleston and Pullum2002: 107), which constrain clauses to contain a maximum of one tensed verb. Given that modals in English only feature tensed forms, their combination in a single clause is prohibited. By and large, double modals have thus long been considered unacceptable and avoided across varieties of English (Morin Reference Morin2023). Indeed, many native speakers would consider a double modal ‘ungrammatical’ and would seek to replace it with alternative constructions such as a modal and an infinitive, as above, or a modal and an adverb (such as maybe I could instead of I might could) (Brown Reference Brown, Trudgill and Chambers1991; Morin Reference Morin, Baranziniand and de Saussure2021a).
This has not prevented double modals from being notable and interesting grammatical constructions for linguists and specialists of English grammar, because they have been recurrently attested in a small, distinct group of varieties and dialects of English around the world. Specifically, these constructions have often been cited as typical features of varieties of the Upper and Lower South of the United States (e.g. Fennell and Butters Reference Fennell, Butters and Schneider1996; Bernstein Reference Bernstein, Nagle and Sanders2003), Northern England and Scotland (Brown Reference Brown, Trudgill and Chambers1991, Smith et al. Reference Smith, Adger, Aitken, Heycock, Jamieson and Thoms2019; Beal Reference Beal, Kortmann and Schneider2004), and a small number of English-based creoles, such as Gullah and Jamaican Creole (Kortmann, Lukenheimer, and Ehret Reference Kortmann, Lukenheimer and Ehret2020). Studying double modals in these varieties has always been difficult because of the rarity of these constructions, as well as the limitations of available empirical methods to collect data on them, especially corpus-based and fieldwork-based approaches (Morin, Desagulier, and Grieve Reference Morin, Desagulier and Grieve2020; Morin Reference Morin2021b). Survey-based research and recent corpus studies have found that the most common double modal types have epistemic meanings, and although dynamic and deontic meanings are also possible (Morin and Grieve Reference Morin and Grieve2024; Morin, Desagulier, and Grieve Reference Morin, Desagulier and Grieve2024), corpus findings have been interpreted as concomitant with a hypothetical ‘epistemic expansion’ in the recent history of English (Coats Reference Coats2024). Nevertheless, much remains unclear about the meanings encoded by particular combinatorial types, especially rare combinations.
For the past half-century, double modals have been described as regionally restricted sets of infrequent forms: the most typical combinations considered in the literature involve may/might and will/would as a first modal and can/could as a second modal (Montgomery and Nagle Reference Montgomery and Nagle1993). Furthermore, because of the geographical circumscription of these constructions, experts have often assumed that they are historically related. More precisely, Scots would have been the original source of double modals, which would have then been imported into American and Caribbean varieties (Zullo, Pfenninger, and Schreier Reference Zullo, Pfenninger and Schreier2021).
However, a very recent wave of linguistic research on these constructions has found that this general picture is not the end of the story, and that we still have much to learn about double modals in the English language. Specifically, five new empirical studies have uncovered that double modals are more widespread in terms of modal combinations and regional distributions than has been documented in the past. These studies take advantage of a new and powerful range of methods in linguistic research known as corpus-based ‘computational sociolinguistics’ (Grieve et al. Reference Grieve, Hovy, Jurgens, Kendall, Nguyen, Stanford and Sumner2023). Specifically, they have relied on the investigation of newly created large corpora of geolocated social media data in order to observe double modals at scale for the first time in a number of varieties of English.
For example, Coats (Reference Coats2023, Reference Coats2024) explored double modal use in two very large corpora of geolocated automatic speech recognition (ASR) transcripts from YouTube in North America, Britain and Ireland. The studies find that double modals are not only observed in the traditional dialect regions attested in the past, but can be found almost everywhere in these two broad areas, in a large number of low-frequency types with no clear constraints on their form. Similarly, Morin and Grieve (Reference Morin and Grieve2024) and Morin et al. (Reference Morin, Desagulier and Grieve2024) used two multi-billion-word corpora of geolocated American and British Twitter data to analyze double modals in these two varieties. Their results broadly align with those of Coats, as the authors find both regional clusters and an unexpected general, diverse distribution of low-frequency modal combinations across the two countries. Most recently, Morin and Coats (Reference Morin and Coats2023) have even found a general distribution of a rich and complex inventory of double modals in two varieties that had never been considered in the past: Australian and New Zealand English, using another corpus of geolocated YouTube data.
Collectively, these recent studies raise important new research questions and implications for our knowledge of double modals within the modal domain of English worldwide. Firstly, by contrast with previous assumptions in linguistic research, double modals appear to be far more widespread in English than has previously been appreciated. So far, we find that they are used without categorical constraints, albeit very rarely, in four of the most prominent inner-circle varieties of English: American, British, Australian, and New Zealand. There remain many other inner- and outer-circle varieties of English to investigate using computational sociolinguistic methods, in order to see whether these results will replicate in a more comprehensive sample. In addition, the unexpected finding of a seemingly general, low-frequency productivity of the feature suggests an alternative to the traditional Scots-Irish import theory of the historical origin and subsequent spread of double modals (Fennell and Butters Reference Fennell, Butters and Schneider1996).
According to some theoretical accounts, syntactic features, including non-standard features such as double modals, have a ‘wider areal reach’ and are ‘less restricted to very confined areas or individual dialects’ (Kortmann Reference Kortmann, Auer and Schmidt2010: 842). This conception moves away from the idea of static, location/variety-based inventories of features and towards cognitive frameworks in which most syntactic structures are possible for most speakers (cf. Adger and Trousdale Reference Adger and Trousdale2007), and supports recent theories which emphasize the importance of speaker creativity and the potentially overlooked role of ‘mistakes’ in language diffusion and evolution (De Smet Reference De Smet2020). Thus, even rare syntactic constructions, including those which contravene prescriptive norms, are possible for most speakers, but speakers have different sensitivities to the acceptability of non-standard constructions such as double modals. From this perspective, it is possible that the acceptability of double modals may be an inheritance of Scots-Irish, but the ability to spontaneously produce the feature is more widespread. In line with this account is the recent analysis of Morin and Grieve (Reference Morin and Grieve2024), who argue that some double modal types in American English may have emerged as innovations in African-American varieties of English in the Deep South of the United States. Likewise, Morin and Coats (Reference Morin and Coats2023) do not find conclusive evidence for the potential Scots-Irish origins of double modals in Australia and New Zealand. Taken together with the metalinguistic lack of awareness of the feature in these varieties, these factors suggest that double modals may be a rare but widespread syntactic construction for speakers of English worldwide.
In this article, we report the results of a new study of double modals on Australian and New Zealand Twitter. Our main aim is to explore whether i) results from a YouTube corpus of transcribed speech replicate in a new corpus compiled from a different medium, social media written texts, and ii) whether these results point towards register variation of double modal use online in these regions. Comparative discussions of results of this type have already been done for American and British English, confirming the underestimated productivity of the double modal system in these varieties, but also revealing subtle variation in double modal types used across these two platforms (Coats Reference Coats2023, Reference Coats2024; Morin and Grieve Reference Morin and Grieve2024; Morin et al. Reference Morin, Desagulier and Grieve2024). In addition, our aim is to illustrate the computational sociolinguistic approach to double modals in English worldwide for a broad audience, encouraging future research in this vein for studying rare and previously elusive morphosyntactic constructions in varieties and dialects of English.
2. Data and methods
We implemented a location filtering procedure for the identification of potential Australian or New Zealand Twitter/𝕏 accounts. Starting with a global seed corpus of 653,457,659 tweets with ‘place’ metadata, collected from November 2016 to June 2017 from the Twitter Streaming API using Tweepy (Roesslein Reference Roesslein2015), we identified 184,451 unique accounts which had authored a tweet with a ‘place’ field from Australia or New Zealand. 109,882 of these accounts were still in existence in April 2023; using Twitter/𝕏's API, all available tweets were downloaded from these accounts in April and May 2023, shortly before the free version of the API was closed in June 2023. Naturally, not every Twitter/𝕏 user who publishes a tweet with Australian/New Zealand ‘place’ metadata is a resident of those countries – they may be short-term visitors or simply commenting on news or other content from or related to Australia or New Zealand. We included users in our corpus if more than half of the ‘place’ tweets they had posted were from Australia or New Zealand; we assigned them to the latitude-longitude location represented by the centroid of the most common ‘place’ in their tweets. In total, the procedure resulted in a dataset of 80,157,335 tweets and 1,017,218,326 word tokens.Footnote 1
This method of identification of Australia and New Zealand-based Twitter/𝕏 users entails several assumptions (see the discussion in Section 3, below), but manual inspection of a random selection of messages confirmed that for the most part, the procedure retrieved content from users who are active in Australia and New Zealand. In order to validate the method, we compared different types of location information for individual tweets, using a subset of the corpus: 5.5m tweets that contained not only a ‘place’ metadata field, but also exact GPS coordinates.Footnote 2 For these tweets, the median distance between the inferred location (the centroid of the account's most common ‘place’ entry) and the exact GPS location was found to be 25.5 km. We interpret this as evidence that for the most part, the accounts sampled in the dataset under consideration are associated with the inferred locations.
Regular expressions were used to identify all sequences of two modals or semi-modals in the tweet text entity, starting from the modal or semi-modal forms forms may, might, can, could, shall, should, will, would, must, ought to, oughta, used to, and the abbreviated form ’ll. Manual filtering removed hits with repetitions (e.g. might might), cases of clause overlap (e.g. . . . using it as much as you can would help . . .), and cases of non-modals with the same word form (e.g. a will should reflect your current domestic and financial situations #estateplanning). Instances which were not filtered according to these criteria and which were coherent in terms of discourse were deemed authentic double modals. Additional tweet content such as images or videos were not considered.Footnote 3
3. Results and discussion
In total, 314 of the 1,026 sequences of two modals were annotated as authentic double modals; these comprise 51 different combinatorial types. Examples (1) through (5) show authentic usages (usernames and URLs have been anonymized).
(1) @user What did you end up going with for your camera??? Given I'm a photographer I may can help if needed
(2) I really dislike how my skin has broken out again. I'm hoping it will can clear with this new skincare product I'm currently trying out .
(3) A Sweet Tooth I will shall remain.. Also a photo comparison between 2 mid rangers, the Oppo R15 Pro (2018) and th . . . URL
(4) I know a few engineers would might sign up for that. URL
(5) Had he mentioned that as an option we might could have saved the day. He was in a hurry to get to the next customer. @user
In our data, double modals mostly occur in informal tweets such as personal status updates, responses to other users, or comments on other tweets, images, or links. We find that double modals are mainly restricted to non-institutional accounts and informal registers, contexts similar to those found for double modals in British and American Twitter (Morin et al. Reference Morin, Desagulier and Grieve2024; Morin and Grieve Reference Morin and Grieve2024). Figure 1 shows the frequencies for each type in the data.
Notable is the preponderance of types with first tier modals that denote future temporality or dynamic ability in terms of volition or prediction (will, would) among the most frequent types: In total, will or 'll is the first modal tier in 151 of the 314 attested double modals; can in 37 instances, and would in 35. The most frequent modals in the second tier mostly denote possibility or ability: can (53), could (43), might (43), and would (34). The most frequent double modal overall in this data, will can / 'll can, has traditionally been considered to be the most widely used double modal in Scotland (Miller and Brown Reference Miller and Brown1982; Brown Reference Brown, Trudgill and Chambers1991); a finding replicated by recent studies of naturalistic speech and Twitter/𝕏 data from Scotland (Coats Reference Coats2023; Morin et al. Reference Morin, Desagulier and Grieve2024). Although it is theoretically possible that the authors of DMs in our data are transplanted Scots, none of the authors of authentic double modals used the term ‘Scotland’ in their user profiles. Given that double modals with will or would as the first element are also the most frequent types in naturalistic speech from Australia and New Zealand (Morin and Coats Reference Morin and Coats2023), we tentatively propose that double modals with the schematic form dynamic modal (will, would) + epistemic modal (can, could, might) are the default forms for Australia and New Zealand, a pattern that corresponds to British usage and contrasts with double modals in the US, where the first tier epistemic might is more common.Footnote 4 Given the low frequency of the feature overall, however, additional data would be necessary to corroborate this interpretation.
Figure 2 depicts the geographical distribution of the 314 verified double modals in the dataset. As can be seen, the feature is reasonably well distributed across Australia, occurring with higher absolute frequencies in the larger state capital regions of Sydney, Melbourne, Brisbane, Adelaide, and Perth. In New Zealand, double modals occur mostly in Auckland, Wellington and other North Island locations; one double modal is attested in the South Island in Christchurch. Overall, the feature does not appear to show a distinctive geographical pattern, a finding in line with those of a recent corpus study based on geolocated YouTube data from Australia and New Zealand (Morin and Coats Reference Morin and Coats2023) and corresponding to the general research consensus that Australian and New Zealand varieties of English exhibit relatively little regional variation at the level of syntax (e.g. Hundt, Hay, and Gordon Reference Hundt, Hay, Gordon, Kortmann, Schneider, Burridge, Mesthrie and Upton2004; Murray and Manns Reference Murray, Manns, Willoughby and Manns2020). The overall low frequency of occurrence of the feature in this data, both in Australia and New Zealand, however, is insufficient to conclusively demonstrate regional variation or a lack thereof.
4. Conclusion
Non-standard grammatical and syntactic features of English have attracted considerable research interest in recent years, and large corpora of naturalistic data, prepared from online sources such as video subtitles or social media messages, have opened up new perspectives for the analysis of rare syntactic phenomena. Double modals, a feature long thought to occur almost exclusively in English varieties of the Northern UK and the Southern US, have recently been analyzed in a new wave of research employing computational methods and large corpora of online content. The feature has recently been attested in English in broader geographic contexts and in a larger number of combinatorial types than has been previously attested on the basis of surveys and elicited data. Building on the results of Morin and Coats (Reference Morin and Coats2023) for streamed video content, we find double modals to be a rare but consistently used feature of social media writing on Twitter in English from Australia and New Zealand, a fact which supports our interpretation of the feature as not restricted to particular geographically defined varieties, but possible for most speakers of English, regardless of location. Our Twitter/𝕏 data suggest that the feature occurs mostly in informal online writing in Australia and New Zealand, with a type inventory dominated by forms in which the first modal is will or would, largely corresponding to the British pattern, rather than the American. No clear geographical patterning of the feature within Australia or New Zealand is evident in our dataset.
An important pathway for future work on double modals in Australia and New Zealand will be to examine larger data sets, such as that collected by Bruns et al. (Reference Bruns, Moon, Münch and Sadkowsky2017); this may allow an analysis of potential differences between Australian and New Zealand usages, as well as consideration of social and demographic parameters which may correlate with double modals, topics which our data is not sufficiently large to explore. In addition, larger datasets may permit an interpretation of the modal meanings encoded by particular combinatorial types. For American and British English varieties, survey and elicitation data on double modal usage predated corpus analyses, but for Australia and New Zealand, no surveys have been conducted which investigate the grammatical acceptability of the feature or the possible modal meanings of particular combinatorial types. Collecting such data would provide further insight into the use of the feature in these varieties.Footnote 5
From the perspective of the typology of English varieties, future work should also examine similar corpora of naturalistic speech and social media texts from ‘outer circle’ varieties of English, for example in India, Southeast Asia, Africa, and the Caribbean. If double modals can be attested in these varieties, the comparison of inventories with those of the UK, North America, and Australia and New Zealand may shed light on the historical provenance of the feature and its spread throughout English-speaking communities, as well as provide a basis for theoretical accounts of its syntax, for example in the context of Construction Grammar (Morin et al. Reference Morin, Desagulier and Grieve2020; Morin Reference Morin2023).
Much remains to be explored regarding double modals, in Australia, New Zealand, and elsewhere. We expect that in the coming years, ongoing advances in the preparation and processing of naturalistic online data will continue to provide a rich basis for the analysis of this feature, and, in a broader sense, of rare features of English syntax in general.
STEVEN COATS is a lecturer at the University of Oulu, Finland, with interests in corpus linguistics and the language of computer-mediated communication. He has created the Corpus of North American Spoken English (CoNASE), the Corpus of British Isles Spoken English (CoBISE), the Corpus of Australian and New Zealand Spoken English (CoANZSE), and the Corpus of German Speech (CoGS), as well as CoANZSE Audio (https://coanzse.org), a searchable online version of CoANZSE, which contains audio and alignments, in addition to speech transcripts. Email: [email protected]
CAMERON MORIN is a temporary lecturer at the Ecole Normale Supérieure de Lyon, specialising in usage-based cognitive linguistics, sociolinguistics, dialectology, and language variation and change. His research includes notable publications on double modals in various English dialects, based on large corpora of geolocated social media data. His work also focuses on issues of social meaning in the theory of Construction Grammar. He recently completed his PhD in linguistics at the Université Paris-Cité in 2023.