Hostname: page-component-f554764f5-246sw Total loading time: 0 Render date: 2025-04-21T10:40:05.194Z Has data issue: false hasContentIssue false

Tracing change through different text types and genres: successes and challenges

Published online by Cambridge University Press:  26 March 2025

Wendy Ayres-Bennett*
Affiliation:
Murray Edwards College, University of Cambridge; Institute of Languages, Cultures and Societies
Rights & Permissions [Opens in a new window]

Abstract

In this article I begin by presenting some examples from Early-Modern French of how, using a large multi-genre database (Frantext), it is possible to track the spread of change and how it embeds across different genres (the genres in Frantext ranging from correspondence and first-person travel narratives to essays and poetry). These case studies allow us to explore a number of questions, including which changes seem to diffuse “from above” and which rather come “from below”; and which textual sources best reflect “authentic” usage or indeed are closest to reflecting spoken usage. They also raise questions as to whether it is possible to create a continuum of genres with those that are more “progressive” or are early adopters of change at one end, and those which are more resistant to change at the other. I then discuss some of the challenges surrounding this work, including the theorization (or lack of theorization) of text types and genres in various corpora and the privileging of other factors when they are elaborated. Consequently, there is currently a lack of comparability between different corpora in the way they categorize and calibrate different text types and genres.

Résumé

Résumé

Dans cet article, je commence par présenter quelques exemples tirés du français de la première modernité, qui montrent comment, à l’aide d’une grande base de données multi-genres (Frantext), il est possible de tracer la propagation du changement et la façon dont il s’intègre à travers différents genres (les genres allant de la correspondance et des récits de voyage à la première personne aux essais et à la poésie). Ces études de cas nous permettent d’explorer plusieurs questions, notamment quels changements semblent se diffuser “d’en haut” et lesquels viennent plutôt “d’en bas” ; et quelles sources textuelles reflètent le mieux l’usage “authentique” ou se rapprochent le plus de l’usage parlé. Elles soulèvent également la question de savoir s’il est possible de créer un continuum de genres avec ceux qui sont plus “progressistes” à une extrémité et ceux qui sont plus résistants au changement à l’autre. J’aborde ensuite certains défis associés à ce travail, notamment la théorisation (ou l’absence de théorisation) des types de texte et des genres dans divers corpus et la priorité accordée à d’autres facteurs lors de leur élaboration. Par conséquent, il existe actuellement un manque de comparabilité entre les différents corpus dans la façon dont ils catégorisent et calibrent les différents types de texte et genres.

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

1. INTRODUCTION

This article presents some of the successes and challenges to date associated with using a large multi-genre database to track the spread of linguistic change and how it embeds across different text types and genres. Having presented the methodology and the databases used (section 2), I will briefly summarize five case studies from the history of French from 1500–1800 (section 3). For this period, for Metropolitan French the principal multi-genre database available is Frantext,Footnote 1 which includes genres ranging from correspondence and first-person travel narratives or ego documents to poetry and tragedy in verse. We will see that these case studies allow us to explore a number of important questions about the nature of language change, as well as where it is best reflected in textual sources of different kinds.

In the second part of my article, I will address some of the outstanding challenges surrounding this work, including the theorization (or absence of theorization) of text types and genres in different corpora and the lack of comparability between their categorizations (section 4).Footnote 2 I will start by addressing some of the issues surrounding Frantext (section 5), before broadening out my analysis to consider the treatment of genres in corpora of different kinds (section 6) and reviewing how a recent study using a single genre corpus deals with clitic climbing, one of my earlier case studies (section 7). I will refer briefly to ongoing attempts to harmonize the terminology used to refer to different genres (section 8). My article will conclude with a discussion of the apparent mismatch between theoretical work on genres and text types (section 9) and of outstanding questions and potential directions for future research (section 10).

A number of theoretical questions underpin this work on genres and text types:

  1. 1. How does linguistic change diffuse through different genres or how do different genres reflect change as it becomes established? Are there genres which are “progressive” (“early adopters” of change) and others that are “late adopters” (parallel to the idea of “leaders” and “laggards” to describe how different people participate in ongoing change (Nevalainen, Raumolin-Brunberg and Mannila, Reference Nevalainen, Raumolin-Brunberg and Mannila2011))?Footnote 3

  2. I am interested, then, in why, and especially how, some changes succeed in becoming part of an emerging norm (or, conversely, fall from favour), that is, how linguistic changes spread and become embedded.

  3. 2. Can tracing change as it is reflected in different genres help us to differentiate “changes from above” and “changes from below”?

  4. What is the relative importance of “changes from above” and “changes from below” and, indeed, is it easy to maintain this distinction when we analyse concrete examples? In the Labovian sense (1994: 78), changes from above are “introduced by the dominant social class, often with full public awareness”, whereas changes from below “appear first in the vernacular, and represent the operation of internal, linguistic factors”. More recently, in the “language history from below” approach adopted by many sociohistorical linguists, consideration of “changes from below” entails a focus on oral registers and the language use of “members of the lower ranks of society in language history” (Elspaβ, Reference Elspaβ, Ayres-Bennett and Bellamy2020: 93; cf. Elspaβ, Reference Elspaβ2005; Elspaβ et al., Reference Elspaβ, Langer, Scharloth and Vandenbussche2007). If the majority of changes arise in speech, then we might expect changes to be reflected first in genres considered closest to speech (e.g. correspondence), and last in more literary and “artificial” genres such as poetry. Through analysing multi-genre corpora can we distinguish changes which come from literary and more learned genres from those which come from the colloquial (and possibly more oral) end of the continuum (cf. Koch and Oesterreicher, Reference Koch and Oesterreicher2011, discussed in section 9 below)?

  5. 3. Can this work help us to determine which written sources best represent “authentic” usage (in the sense employed by historical sociolinguists, see Elspaβ, Reference Elspaβ, Ayres-Bennett and Bellamy 2020 : 100) or indeed are closest to reflecting spoken usage?

  6. Work in historical sociolinguistics has frequently privileged ego documents, informal letters and writings of the semi-literate. To what extent are changes first reflected in memoirs, letters, travel narratives and other ego documents? Or in direct speech in theatre and other genres? Or in what Marchello-Nizia (Reference Marchello-Nizia, Guillot, Combettes, Lavrentiev, Oppermann-Marsaux and Prévost2012, Reference Marchello-Nizia, Ayres-Bennett and Rainsford2014) terms “represented orality” (oral représenté Footnote 4 ), that is, in reported speech as compared with the narrative portions of text? Some of the case studies I will present belie simple hypotheses about this question.

  7. 4. Is there a continuum of genres into which we can place different text types?

  8. There has been a flourishing of new French corpora, both covering long historical periods and those which are more, or sometimes indeed very, specialized (e.g. SERMO, a corpus of Protestant sermons, 1550–1750).Footnote 5 In particular there has been an increased interest in genres/text types which are non-literary, including legal texts. This opens up questions as to where these fit in terms of getting close to the vernacular (in the Labovian sense). Clearly we wouldn’t use legal texts for a sociolinguistic study of the 21st-century vernacular and, going back to the earliest written texts, scholars have often considered the Sequence of Saint Eulalia, written as part of the Catholic mass, a better reflection of the 9th-century vernacular than the Strasbourg Oaths, from a lego-political context (e.g. Price, Reference Price1990). So, where do these fit into our reliance on different text types? Or what about sermons? Can we posit a continuum into which different text types can be placed?

  9. 5. How do we theorize and categorize text types and genres, especially in large corpora? Can we aim for comparability across corpora?

  10. I will return to the question of how text types are coded in corpora in the second half of this article. Suffice it to say for now that Frantext, the corpus used for most of the case studies presented here, is not without problems. This is a multi-genre corpus of texts, many of which are literary. It does not therefore contain the kind of informal personal letters written ideally by those with low literacy skills, which have been described by Nevalainen and Raumolin-Brunberg (Reference Nevalainen, Ramoulin-Brunberg, Hernández-Campoy and Conde-Silvestre2012: 32) as the “next best thing” to authentic spoken language, and which have transformed the work, for instance, on the history of Dutch (Rutten and van der Wal, Reference Rutten and van der Wal2014).Footnote 6

  11. 6. How can we study change over long periods of time (diachronie longue, cf. Sorba et al., Reference Sorba, Kraif, Renwick and Denoyelle2024), if genres and their definition are changing?

  12. Related to the issues of definition and categorization is the question as to how we analyse change as it spread through genres over long periods, given that definitions of genres change over time. For instance, the French term novel (roman) is applied both to a certain type of Old French text and a 19th-century novel by Balzac, which are extremely different text types.

2. METHODOLOGY

The focus of the case studies is particularly (morpho)syntax. For Case study 1, data from a previous study were analysed, whilst Case study 4 triangulated the results from analysing the same feature in different, more specialized corpora. The methodology employed for the other three case studies (2, 3, 5) was essentially the same, using the database Frantext. It should be made clear from the outset that we need to be careful in interpreting the results, since Frantext is neither a perfectly stratified corpus nor, as I have explained, does it contain, for instance, letters by those with low-levels of literacy. Nevertheless, it does offer a substantial body of written evidence to analyse; for instance, for the 17th century we have over 21.7 million words of French usage to analyse.

For each occurrence of a particular form or construction, the genre of the textual source was noted, using the genre categorization already included in Frantext. However, not all texts were assigned a classification; for example, for the 17th-century texts, 91/636 (c. 14%) had no classification, so these were classified using the same categories. In order to make the presentation and visualization of the results clearer, some of the related text-types were conflated, e.g. essays/treatises [essays] or novels, novellas, tales (contes) [novels], resulting in seven main categories.

These seven genre categories were placed on a hypothetical continuum to represent the prediction that if changes occur “from below”, then they would be attested first in more personal ego documents (where the writer or “I” is present as the writing/describing subject) such as correspondence, memoirs and travelogues, and more slowly in more formal and more literary genres such as essays/treatises and poetry (Figure 1). The purpose of the case studies was, then, in part to test the validity of these predictions which arise from work in historical sociolinguistics. It should, however, be noted that, as I discuss in section 9, genres are complex and multidimensional and placing them on a continuum necessarily entails privileging certain characteristics.Footnote 7

Figure 1. Hypothetical continuum of genres.

For each of the studies, both a quantitative and a qualitative analysis were conducted. First, the broad tendencies and patterns were identified using a quantitative analysis, and a qualitative analysis was then conducted to identify, among other things, if the usage of a particular author might be idiosyncratic and therefore skewing the results (as indeed was the case); or to see if a particular syntactic context or lexical choice was significant, for instance.

3. CASE STUDIES

The five case studies are presented briefly in this article. For full details, the reader is advised to look at the articles cited in each case.

3.1 Case study 1: Clitic climbing (Ayres-Bennett, Reference Ayres-Bennett2004: 208–219)

Whilst the first case study is now rather dated, I wanted to include it because it treats a topic which has been addressed recurrently by linguists, including in a very recent article (Olivier, Sevdali and Folli, Reference Olivier, Sevdali and Folli2023) to which I will return in section 7. Clitic climbing (1a) was usual in Old French and indeed remained the dominant construction in the 16th century,Footnote 8 but it declined rapidly during the 17th century, giving us the modern construction (1b):

In an early landmark study, Galet (Reference Galet1971) tracked the progress of this change in a corpus of texts of different genres in both prose and verse. Galet concluded that by the end of the 17th century, clitic climbing was becoming archaic. Research based on a corpus of translations (Ayres-Bennett, Reference Ayres-Bennett2004: 215–218) confirmed that the turning point for the new construction becoming the dominant one was the 1660s–1670s. It is important to note that a number of linguistic features influence the choice of construction. For instance, the tendency to no longer pronounce the final -r of infinitives ending in -ir or -er may well have been one of the factors triggering the change.Footnote 10 There are also syntactic factors which favour the modern construction, such as where there are two infinitives coordinated or when the finite verb is in a compound tense; additionally, the avoidance of potential ambiguity with certain verbs (e.g. il me faut céder) may have been a factor promoting change.

When we examine Galet’s data for the 17th century (Figure 2),Footnote 11 we find that some results conform to our prediction concerning the placement of genres on our hypothetical continuum (see Figure 5 below). Verse, essays and novels are all relatively conservative, as predicted. However, theatre is, overall, more progressive than sermons, while correspondence is more conservative than theatre and sermons. This latter finding is particularly striking given that the majority of Madame de Sévigné’s letters, providing the data here, were addressed to her daughter (although we must note their published nature). Moreover, when we break down theatre according to the different playwrights, we find that it is the theatre of Pierre Corneille, all of whose tragedies are in verse, that is more progressive than Molière’s, despite the fact that his prose comedies have often been cited as a good source of “more spoken” usage (Lodge, Reference Lodge1991).

Figure 2. Trendlines for percentage use of clitic climbing by decade and genre.Footnote 9

3.2 Case study 2: Agreement with la plupart (‘most, the majority’) (Tristram and Ayres-Bennett, Reference Tristram and Ayres-Bennett2012)

My second case study considers the spread of plural agreement with la plupart (‘most, the majority’): this is syntactically singular but designates the majority of an entity comprised of a number of individual units. In modern French, agreement is made in the plural, influenced by the plural complement (2):

For this change, the key period is 1500–1699, since by the end of the 17th century plural agreement is categorical (unlike with other collectives) (Figure 3).

Figure 3. Percentage usage of singular and plural agreement with la plupart without a singular postmodifying NP (1500–1699)Footnote 12

When the results are analysed by genre (Table 1, Figure 4), once again we have a mixture of results which fit with our hypothetical continuum and those which do not (it is difficult to conclude anything about sermons since we only have data for periods where there is a high level of plural agreement across the board). Correspondence uses plural agreement for every period for which we have data,Footnote 13 making it the earliest genre to show categorical use of the new construction, whilst poetry is relatively slow to show the change. Other results do not conform. Most noticeably, theatre is very conservative, retaining the singular later than the other genres, indeed at a lower level than even poetry. Conversely, novels consistently have plural usage at over 90%, making it one of the most progressive genres. Memoirs have an anomalous figure for authors born 1500–1549; this is explained by the usage of one author (Fonteneau), but its path still closely follows that for essays. Here, when we disaggregate the different types of theatre, we find that in the earlier periods comedy/tragedy and verse/prose seem to behave the same, but in the last period for which data are available, the singular tokens occur only in tragedy in verse.

Figure 4. Percentage of plural usage by author’s date of birth and by genre

Table 1. Percentage of plural usage by author’s date of birth and by genre

Comparing the results across the first two case studies, it becomes clear that not only does the positioning of the genres on a continuum according to how progressive they are differ for the two case studies, but it also differs from our original hypothetical continuum (Figure 5). It is particularly striking how theatre appears to lead the way in showing the new construction in Case study 1, whereas it is the most conservative genre in Case study 2.

Figure 5. Comparison of the genres in terms of their ‘progressivity’

3.3 Case study 3: Loss of par après and en après as alternatives for après (‘after’) (Ayres-Bennett, Reference Ayres-Bennett, Ayres-Bennett and Carruthers2018)

This case study looks at the patterning across genres of the disappearance of par après and en après from usage as part of the general tendency in the 17th century for the reduction of variants associated with the emergence of Classical French, as evidenced in the volumes of observations written by the remarqueurs (Ayres-Bennett and Seijido, Reference Ayres-Bennett and Seijido2011). Usage of en après peaked in the 14th century, and was in quite advanced decline in the 17th century. Par après, which is still used in certain Francophone regions, notably Belgium, is little attested before the 16th century, becomes fashionable in the 17th century, and then declines rapidly in the 18th century (Figure 6).

Figure 6. Usage of en après and par après in Frantext by 50-year periods

Since there is little change in usage after the end of the 18th century, the analysis of the results by genre focused on the period 1500–1799. This analysis shows that the results cluster in three main genres. For par après, these are essays, novels and memoirs (Figure 7). For en après, novels again feature, as do essays, pamphlets (also classified by Frantext at a higher level as a type of essay) and reports (Figure 8).Footnote 14

Figure 7. Usage of par après by genre

Figure 8. Usage of en après by genre

In this case, it seems difficult to think of the change as being one “from above” or “from below”, since we have a mixture of more formal genres (essays) with those often considered more informal because of their personal nature (memoirs, journals, travelogues, etc.). Rather, it seems that these expressions survive longest in those genres where the sequence of events or the sequence of arguments is vital, notably in essays, but also to a lesser extent in novels, memoirs, journals, travelogues, etc. This is perhaps where the notion of “discourse tradition” is helpful (Koch, Reference Koch1997; Winter-Froemel and Octavio de Toledo y Huerta, Reference Winter-Froemel and Octavio de Toledo y Huerta2023). According to Winter-Froemel (Winter-Froemel and Octavio de Toledo y Huerta, Reference Winter-Froemel and Octavio de Toledo y Huerta2023: 31), these “provide the language users with conventions concerning the design of concrete discourses in both discourse production and discourse reception, with linguistic conventions possibly interacting with extra-linguistic conventions and all levels of linguistic description being included”. In the case of the use of en après and par après frequency and entrenchment of the expressions in the relevant discourse traditions probably also play an important role.

3.4 Case study 4: Recategorization of dedans (‘in’)/dessous (‘under’)/dessus (‘on’)/dehors (‘outside’) (Amatuzzi et al., Reference Amatuzzi, Ayres-Bennett, Gerstenberg, Skupien Dekens and Schøsler2020)

Up to the 16th century dedans, dessous, dessus and dehors could be used as both adverbs or prepositions for which function they doubled up with the simple forms dans (rare before the 17th century), sous, sur and hors. From then on we witness the progressive distinction between the adverbial and prepositional functions, with dedans, dessous, dessus and dehors specializing to the adverbial usage (except when used in compound structures such as au dessous, en dehors, etc.). In the 17th century, then, dans rapidly replaces dedans as a preposition, for instance.

In Amatuzzi et al. (Reference Amatuzzi, Ayres-Bennett, Gerstenberg, Skupien Dekens and Schøsler2020), an attempt was made to widen the types of texts being examined. Prose models (different translations of Quintus Curtius Rufus’s Life of Alexander) were compared with usage in corpora comprising Protestant sermons, correspondence, diplomatic correspondence and a Danish princess’s memoir. Contrary to our expectations – that the non-literary texts would show this change earlier – evidence for the chronology of the change did not seem to differ significantly between the different text types.

3.5 Case study 5: Loss of the coordinating conjunction ains (‘but’) (Ayres-Bennett, Reference Ayres-Bennettforthcoming)

The conjunction ains, with the sense of ‘but’ after a negative clause (cf. German sondern) was in complementary distribution with mais until Pre-Classical French when they became rivals, with ains falling into rapid decline over the course of the 17th century (Marchello-Nizia et al., Reference Marchello-Nizia, Combettes, Prévost and Scheer2020: 1648) (Figure 9). Of the 1,517 examples of the form in Frantext in the 17th century, 1,503 (99.1%) occur in the first half of the century, and there are only 20 attestations (1.3%) in the 1640s, suggesting that by 1640 the form had become scarce. The distribution of usage by genres therefore focused on the period 1600–1649 (Figure 10).

Figure 9. Usage of ains (1600–1699)

Figure 10. Distribution of ains by genre (1600–1649)

In this case, one genre – essays – dominates overall, with c. 70% of all the examples. However, the second most common genre is correspondence (14%), at the other end of our hypothetical continuum, followed by poetry (6.5%). In the 1610s, essays and correspondence remain the most common genres, whilst in the 1630s it is correspondence that has the most occurrences, followed by poetry. In short, it is difficult to see clear patterns of the loss of the conjunction being associated with either the more formal or the more personal genres.

4. SOME PRELIMINARY CONCLUSIONS

From these five case studies we can conclude that using a large multi-genre database to track linguistic changes through genres leads to some very interesting results. Despite the problems associated with Frantext, it does provide 21.7 million words of usage for the 17th century alone and for the period under consideration (1500–1800), for Metropolitan French, it has been, and remains, the major textual resource, and consequently has been exploited in numerous studies by (socio)historical linguists of French. The results, however, are not always what we might predict. Indeed, each of the case studies has demonstrated different patterning of the occurrences according to which genres appear to be more “progressive” and which seem more “conservative”.

Alongside the successes, then, there are also challenges. It proved difficult to see clear patterns of how change spreads through different genres across the five case studies or to argue that one text type consistently provides a more “authentic” reflection of vernacular usage. This leads us to further conclude that it is important to avoid simple assumptions about which genres are closest to speech and therefore, perhaps, more progressive (e.g. we did not find evidence that ego documents were consistently the most progressive).

It was also difficult to use the results to distinguish changes “from above” and changes “from below”. Like Nevalainen (Reference Nevalainen, van Kemenade and Los2006: 566), I therefore conclude that it is not always possible to determine the directionality of a linguistic change in progress by a simple register – or in my case genre – analysis. Other factors may intervene, for instance, the communicative features of a text may override genre characteristics, as we saw in Case study 3 (par après, en après).

Perhaps more importantly, the reliability of the results relies, to a great extent, on the quality of the input on genres/text types in different databases. While this point may seem obvious, it is rarely, if ever, acknowledged in studies of individual changes.

In the rest of this article, I therefore want to focus on a number of further questions:

  1. 1. How reliable is the categorization of genre/text type in Frantext ?

  2. 2. How does it compare with how these categories are treated in other corpora? To what extent are genre categories readable across different corpora?

  3. For this analysis, I will limit my discussion to corpora focusing on the French of France and set aside resources for other Francophone varieties, notably those for French in Canada, created by France Martineau and her team.

  4. 3. How broad/fine-grained should the categories be?

  5. In my analysis, I merged some categories e.g. essai and traité or journal and mémoire, so that the analysis of large quantities of data is manageable, but this carries the danger of creating heterogeneous categories. For instance, in an excellent paper by Piccione and Rainsford (Reference Piccione and Rainsford2024) on the history and typology of manner verbs in French, the authors consider the spread of change through different genres. For this, they created a very large category of récit (narrative) subdivided into récit vers and récit prose (the latter being set as reference level for comparison) which includes chroniques, essais, mémoires, fabliaux, romans, science-fiction, etc. – the differences between these genres are therefore lost in their analysis. While such a broad-brush approach may suit certain studies, it highlights how we have to be careful in interpreting conclusions about genre usage and consider the categories employed.

  6. 4. Where do single genre corpora fit into the broader picture?

  7. Can we triangulate the results from different corpora or are they in some way incompatible?

  8. 5. To what extent are the categorizations used in databases underpinned by theoretical work on genre and text type?

5. FRANTEXT AND GENRE/TEXT TYPE

It should be made clear that Frantext was devised initially not for historical (socio)linguists, but as the basis for getting lexical data for the large online dictionary, the Trésor de la langue française informatisé.Footnote 15 A blog on the Frantext website,Footnote 16 gives an insight into some of the issues surrounding the categorization of genres in this database:

Il n’existe pas de nomenclature harmonisée des genres littéraires, les textes ayant été étiquetés en genre par beaucoup de personnes différentes en quarante ans. Nous avons en plus dû utiliser, pour pallier les défaillances de l’ancien système informatique, des étiquettes parfois inadaptées, parce qu’elles étaient les seules existantes (« mémoires » par exemple, pour couvrir ce qui était aussi de l’autobiographie.) Ensuite, nous en avons ajouté de plus adéquates. Cette catégorisation fait à peu près la différence entre la fiction (roman), le théâtre et la poésie. En revanche, tout ce qui touche à la nomenclature des écrits personnels est beaucoup moins précis et nous avons souvent maintenu deux, voire trois étiquettes (par exemple, mémoires, autobiographie, écrits personnels). […] Ce n’est donc pas un outil fiable pour catégoriser les textes en genre.

In a personal communication from the current Frantext team, it was noted that the text type labels were added relatively freely by the person reading or keying in the text.

A constellation of issues therefore surround the use of Frantext for tracing change through genres: different labels were used over the course of its over forty-year history (although they were tidied up to some extent in 2022) and different degrees of specificity of the labels were employed, largely depending on the individual inputting the text (for instance, some people coded for detective novels, whereas others coded them simply as novels; this means that one cannot reliably identify all the detective novels in the database) and the period of coding (in 2010 they moved to larger generic families of texts). For modern texts the coders often relied simply on the blurb on the back cover or first page. We have already noted the fact that a significant minority of texts do not have genre labels (958/5,658 = 16.93%) and the difficulty of applying the same labels over different centuries as genre conventions change. For the 17th-century texts, 30 different variants of genre labels were used (with overlaps and different degrees of specificity, plus the category “non renseigné”). In short, it was clear that, at least at the outset, the question of genre labels was – perhaps understandably – not the highest priority for the ATILF team.

6. GENRE AND TEXT TYPE IN OTHER CORPORA

Having identified these issues surrounding the classification of genre and text type in Frantext, I decided to examine the extent to which other French databases commonly used by historical linguists share some of the same problems. These can be divided into three broad types: (i) those with a long diachronic coverage (6.1); (ii) those relating to a particular period, notably the medieval period for French (6.2); (iii) single genre corpora (6.3).

6.1 Databases with a long diachronic coverage

In the first category I analysed PRESTO, Democrat and the corpus used as the basis for the (monumental) Grande grammaire historique du français, published in 2020 (Marchello-Nizia et al., Reference Marchello-Nizia, Combettes, Prévost and Scheer2020). PRESTO is a corpus that comprises data that are coded and lemmatized representing all periods of the history of French, as well as different genres and text types.Footnote 17 As regards genres, they identified three “champs génériques” (in the sense of Malrieu and Rastier, Reference Malrieu and Rastier2011):Footnote 18 narrative genres (romans, nouvelles, contes…); poetry; and theatre. A fourth category, traités (treatises, essays), taken from Frantext, was added: the creators of PRESTO, however, express some scepticism about this category, since it is “trans-discours”, that is, you can have treatises relating to history, philosophy, religion, etc. and it therefore cuts across other genre classifications. The genre categories are principally those employed by others, notably Frantext (and to a lesser extent the Bibliothèques Virtuelles Humanistes (see below), ARTFL). The authors concede that there is still work to be done concerning the definition of genre subcategories, so that their categories remain broad. As for many corpora, the principal factor determining choice of texts was date rather than genre or text type.

Democrat is a smaller corpus comprising 58 texts ranging from the Chanson de Roland (11th century) to 21st-century Wiki articles (Landragin, Reference Landragin2021).Footnote 19 Here two levels are employed: the “text type” (narrative/non-narrative), a label attributed to the whole text rather than specific parts, and genre textuel, of which there are 24 categories of varying degrees of specificity. The problem of the changing definition of categories is evident here: for instance, the label roman is attributed to Eneas (12th century), Pauline (Alexandre Dumas, 19th century), and Douce lumière (Marguerite Andoux, 21st century). In a personal communication the authors noted that genre was not at the heart of their preoccupations and that the annotation was conducted in an “empirical” fashion.

The corpus created for the Grande grammaire historique du français (GGHF, Marchello-Nizia et al., Reference Marchello-Nizia, Combettes, Prévost and Scheer2020; see also Prévost, Reference Prévost, Marchello-Nizia, Combettes, Prévost and Scheer2020), relies heavily on the Base de Français Médiéval (BFM) and Epistemon (see below). There are three pertinent categories here: form (poetry/prose/mixed); domain (comprising an initial five categories: literary, didactico-scientific, religious, historical and legal, with two others added for Middle French on: epistolary and argumentative); and an open list of genres. Domain was privileged over genre. Indeed, in a personal communication it was acknowledged that, in choosing the texts, the date, the domain, the form and the dialect were privileged, so that once all these criteria had been satisfied, the genre of the texts selected was more or less determined. Thirty different genre labels are used in total,Footnote 20 of which seven are used for the 17th century; this compares with thirty for the 17th century alone in Frantext, although some of these are subcategories of bigger categories. In short, it is clear that there are different levels of granularity across corpora, and indeed sometimes within a corpus, and there are inevitably very different categories employed according to period.Footnote 21

6.2 Corpora relating to a specific period

The authors of the medieval corpora, particularly the Base de Français Médiéval (BFM, covering Old and Middle French; see Guillot, Heiden and Lavrentiev, Reference Guillot, Heiden and Lavrentiev2017)Footnote 22 and the Corpus représentatif des premiers textes français (CoRPTeF: 9th–12th centuries),Footnote 23 led the way in thinking about genre, and their analysis is more sophisticated than what we find in Frantext. As we have seen for the GGHF, they employ form, domain and genre. CoRPTeF includes a sixth domain, acte de la pratique which, they acknowledge, overlaps potentially with juridique (it is not retained in the GGHF), whilst BFM has a seventh one, politique. There are 44 genres listed in the BFM (e.g. bestiaire, biographie, cérémonial, charte, chronique, commentaire, comput, coutumier, débat, dialogue, divers, dramatique…), but the list is potentially open; it is noted that genre is difficult to define precisely and that its attribution was the product of consensus. Guillot and Lavrentiev (Reference Guillot and Lavrentiev2009: 9–10) observe that genre categories can be based on one of three possible viewpoints: modern criteria for the grouping of texts; relevant categories for the time when the text was produced; or, categories informed by the philological tradition, i.e. medieval genres as we understand them from today’s viewpoint, knowing how they have evolved. They chose the third option. As Jenelle Thomas concludes (2024: 354): “this underscores the cultural and temporal specificity of genre categories and the way viewpoints might shift over time”. Genre is in some sense subordinate to domain but the same genre can appear in more than one domain (e.g. dramatique could be religieux or littéraire). For CoRPTeF, standard reference works were used for the assignment of genres: they were nevertheless “intuitive” categories, something which is acknowledged as problematic, but necessary.

There is less discussion of how to treat text type and genre in the other medieval corpora. The Nouveau Corpus d’Amsterdam (c. 1150–1350) has approximately 36 categories (which are subcategorized for prose/verse),Footnote 24 whilst the Corpus de la littérature médiévale: des origines au XVe siècle Footnote 25 has four main categories – narrative, poetry, theatre and other – which are then subdivided. It is striking that the textbase associated with the Anglo-Norman Dictionary (AND), which has eight categories for genre, likewise includes a miscellaneous category, a sort of heterogenous “catch-all” category used for texts which are difficult to place in the major genres.Footnote 26

The major database for the Renaissance, the Bibliothèques Virtuelles Humanistes, which includes three corpora, Epistemon, Rabelais, and Montaigne, is one of the few treatments of genre in a database where the theoretical foundation is explicitly mentioned. Its principal author, Marie-Luce Demonet, speaks of an “ontology” inspired by the categorization of Käte Hamburger, combining semantic, formal and discursive criteria, as well as what the authors themselves said. There are 33 genre categories, some of which are those used for medieval texts, others which innovate. The database is currently being revised to take account of some of the harmonization work on genre being conducted (see section 8).

6.3 Single genre corpora

In recent years, there have been a whole range of single genre corpora emerging, covering domains such as prose chivalry novels from the 13th to the 17th century (PhraséoRoChe; Denoyelle et al., Reference Denoyelle, Kraif, Mounier, Renwick, Sorba and Souvay2024), World War I soldiers’ correspondence (semi-literate writers, Corpus 14),Footnote 27 or Norman customary law texts from the 13th to the 19th century (Condé).Footnote 28

Scholars vary in the extent to which they try to situate the results of analysing a single genre database into the broader picture of the history of French. It is striking that at times results are presented without comparing them to other text types or genres, making it difficult to interpret their significance. This can be achieved, to some extent at least, through the triangulation of different sources, but such a comparison has to ensure that the comparison is like with like and that the value of a particular corpus is not overstated because either the data or methodologies – and notably the treatment of genre – in the corpora being compared are different.

7. CLITIC CLIMBING REVISITED

A number of these smaller corpora are being built by individuals or teams to address specific linguistic questions. For instance, Olivier, Sevdali and Folli (Reference Olivier, Sevdali and Folli2023), examining the history of clitic climbing in French, composed a corpus of 19 mostly prose legal texts (1150–1856), and principally from Normandy, the exception in terms of text type being the Roman de Brut. The choice of corpus was inspired by an article by Balon and Larrivée (Reference Balon and Larrivée2016) which had shown, using a corpus of legal texts, that there was no evidence of pro-drop in legal texts after the 13th century, whereas previous studies, based mostly on literary texts, had situated the change much later in the 15th or 16th centuries.Footnote 29 Following Balon and Larrivée, Olivier, Sevdali and Folli (Reference Olivier, Sevdali and Folli2023) chose legal texts on the grounds that “a change that takes place in the language is likely to be reflected earlier on” in non-literary texts (Olivier, Sevdali and Folli, Reference Olivier, Sevdali and Folli2023). Yet the chronology produced by their findings, when we compare it with other studies, does not seem to support this hypothesis. For instance, Galet’s (Reference Galet1971) data suggest that clitic climbing was very much in a minority in the 18th century: in the 31 texts making up Marivaux’s theatre (1720–1754), the rate of the modern construction without clitic climbing ranges between 86%–100% with an average of 91.5%. Similarly in the three prose contes by Voltaire analysed (1759–1768), the modern construction occurs between 88%–89% of cases, with an average of 88.3%. Olivier, Sevdali and Folli (Reference Olivier, Sevdali and Folli2023) analyse just two texts for the century, Merville (1731) with rate of 56.9% for the modern construction without clitic climbing and Pesnelle (1771) with a rate of just 33.8%. Overall, whilst clitic climbing occurs in around only one in ten instances in the literary texts analysed by Galet, it is still in the majority (54.7%) in the legal texts. We might note here how one text, such as Pesnelle, can again potentially skew the data.

In short, there is a huge value to these new corpora of different text types, but we have to be careful not to overgeneralize the conclusions from them as to which text types are more progressive, and we need continually to test hypotheses about the value of different text types across a range of linguistic features of various types.

8. PRELIMINARY SUMMARY

If we go back to the first four of the questions I posed, we can see that questions of genre have, understandably, not always been of the highest priority for those creating corpora. Whilst much attention has been paid to tokenization (Lavrentiev, Guillot and Heiden, Reference Lavrentiev, Guillot and Heiden2021), morphosyntactic coding and lemmatization (cf. Goux, Reference Goux2024), there has been much less emphasis on sophisticated analysis of genre, with the result that the categorizations both in Frantext and in many of the other French historical corpora have to be treated with caution. The variation in the categorization used across corpora makes comparison difficult, although thankfully there is work underway to try and remedy this to some extent. There is considerable variation also on the question of calibration and, as yet, it is not simple just to insert single genre corpora into some sort of continuum, although triangulation of different outcomes – providing the data are comparable – is a likely positive way forward.

On the harmonization of labels, within the CORLI project there is a new scheme to provide an Open French Corpus bringing together all the freely available corpora. The group ‘Typologie textuelle’ which was part of the consortium CAHIER produced in 2022 a thesaurus of terms;Footnote 30 this started with the terms/tree structure used in the BFM and Epistemon, but it has been considerably augmented since, giving a maximal list of the terms one might use (Galleron et al., Reference Galleron, Idmhand, Lavrentiev, Demonet and Réach-Ngô2021; Galleron et al., Reference Galleron, Idmhand, Lavrentiev, Demonet and Reach-Ngo2022). The application of these labels to different corpora is as yet at an exploratory stage: for instance, Marie-Luce Demonet is working on applying these in the corpora of the Bibliothèques Virtuelles Humanistes, and they will be used in the next iteration of the BFM (personal communications). This work also raises the question of the translation of these terms into other languages, notably English (since there are different national traditions, see section 9).

9. THEORETICAL WORK ON GENRES AND TEXT TYPES AND CORPORA

In relation to my fifth question, could the genre categorization in corpora be more informed by the theoretical work on genres and text types, for which different national models and preferences come into play? Whilst not wishing to minimize the importance of many other scholars (e.g. Görlach, Reference Görlach2004), it is perhaps fair to say that Biber’s (Biber, Reference Biber1989, Reference Biber1995; Biber and Conrad, Reference Biber and Conrad2019) multidimensional model has dominated in the Anglophone world. In this, there are six “dimensions”, each a continuum running between two poles (e.g. “involved vs informational production”, “narrative vs non-narrative discourse”, with different types of discourse positioned at different points of the continua) (Carruthers, Reference Carruthers, Ayres-Bennett and McLaughlin2024: 358). These are complemented by situational characteristics including information relating to participants, the setting, communicative purposes, and topic. Different linguistic features (e.g. tenses, pronouns, etc.) can then be associated with different points on the various “dimension” continua and with different “situational characteristics” (Carruthers, Reference Carruthers, Ayres-Bennett and McLaughlin2024: 358). The use of continua, rather than, say, clear binary oppositions, makes it potentially more difficult to apply in corpus annotation.

The same issue arises with the work of Adam (e.g. Reference Adam1999, Reference Adam2005, Reference Adam2011), which is favoured in the Francophone world. Adam uses the notion of prototypes rather than classes, groups of attributes of differing importance, so that texts, or more usually parts of texts, can be adjudged to belong “more or less” to a certain prototype. Adam identifies five different types of sequence – or series of clauses – to characterize different units of text, and these sequences can be narrative, descriptive, argumentative, explanatory or dialogal, each having different linguistic and textual properties. As Carruthers (Reference Carruthers, Ayres-Bennett and Carruthers2018: 340) observes, Adam also stresses the dynamic and fluid nature of genres which is reflected in his articulation of linguistic regularities as faisceaux de régularité, rather than fixed sets of features, and the genres created as closer to airs de famille than to straightforwardly prototypical textual forms.

Another popular model in Romance linguistics is that of Koch and Oesterreicher (Reference Koch and Oesterreicher2001, Reference Koch and Oesterreicher2011), which has been influential for instance in the construction of two corpora of contemporary French, Rhapsodie Footnote 31 and Multicultural Paris French (Cheshire and Gardner-Chloros, Reference Cheshire and Gardner-Chloros2018). This model comprises three interrelated notions: réalisation médiale (a binary divide between the phonic code and the graphic code), conception (with informal spontaneous conversation at one end of conception orale and a legal text at the other end of conception écrite, see Figure 11), and comportement communicatif, involving a set of parameters which are predominantly in the form of a continuum, e.g. private communication – public communication, close interlocutor – unknown interlocutor, spontaneous communication – prepared communication. This allows Koch and Oesterreicher to make a distinction between communicative distance and immediacy.Footnote 32

Figure 11. Immédiat communicatif/distance communicative and code phonique/code graphique (Koch and Oesterreicher, Reference Koch and Oesterreicher2001: 586)

Key: (secteur) A: immédiat phonique (conversation spontanée, etc.); B: distance phonique; C: immédiat graphique; D: distance graphique ; (a) conversation spontanée entre amis, (b) coup de téléphone, (c) lettre personnelle entre amis, (d) entretien professionnel, (e) interview de presse, (f) sermon, (g) conférence scientifique, (h) article de fond, (i) texte de loi

Given our previous discussion of the value of legal texts as potentially better sources for the historical sociolinguist, it is interesting to note that Koch and Oesterreicher place these right at the far end of communicative distance (Figure 11). Here the question of granularity is perhaps again pertinent, since the category of legal texts embraces a range of subtypes, and scholars of Old French, for instance, have often distinguished texts relating to legal practice (e.g. charters, account books) from those recording laws, the former being considered less “normative” and more locally marked than literary texts, for instance.Footnote 33 Comparison of Figures 1 and 11 also shows up other differences, notably the position of sermons which are closer to the pole of communicative distance in Figure 11. This once again demonstrates variability depending on the model or theoretical framework favoured.

10. CONCLUDING THOUGHTS, QUESTIONS AND POSSIBLE FUTURE DIRECTIONS

The case studies I have presented demonstrate, I believe, the interest and value in tracing the diffusion of change through genres. It helps us to see how changes become embedded in an emerging norm, to identify how constructions and lexis are lost, and to consider the relative role of changes from above and changes from below. The value of conducting detailed case studies also lies in seeing the extent to which predicted pathways of diffusion and change are, or – as we have seen here – are not, borne out.Footnote 34

The major challenge to the quality of this research is the quality of the input or evidence being used. The classification of text types and genres has, for understandable reasons, come relatively low down the priorities of those building corpora, compared with, for instance, part of speech or syntactic tagging. The encoding of text types and genres is, moreover, inconsistent, with different degrees of granularity being used, making it challenging to compare studies based on different corpora. Faute de mieux, the majority of studies, whether treating long diachrony or the post-Medieval period, use Frantext, despite its limitations.

A number of important questions remain:

  1. 1. Given the changing nature of genres over time, how do we study diachronie longue ? On the one hand, what Walkden (Reference Walkden2019: 10) terms “uniformitarianism as null hypothesis” should lead us to analyse genres in the past in the same nuanced way as we do contemporary varieties. On the other hand, we cannot assume that genres in the past had the same values or characteristics as they do today, since genre conventions also vary over time. A possible, potentially fruitful, way forward might be to use clustering algorithms to look at the proximity between texts in corpora and the extent to which this coincides with existing classifications in terms of genres and text types.

  2. 2. How best can we move towards greater harmonization of the treatment of genres and text types in corpora? Currently we tend to have individual models/studies which are, consequently, often difficult to compare, so the new initiatives towards agreeing on genre categorizations are timely. However, the proposed thesaurus, welcome as it is, seems to be in the form of a “shopping list” from which those building corpora can select, rather than a model as such.

  3. 3. Can we use existing research on genre and text type to inform and improve our analysis of genres in corpora and studies? If so, how do we deal with continua? Most diachronic studies (including those presented here) rely on broad-brush categories rather than the more complex multidimensional models promoted by Biber and other theoreticians,Footnote 35 which favour continua rather than discrete categories. And should we be thinking about the genre of texts as a whole or sequences or parts of texts as in Adam’s model?

  4. 4. To what extent is it possible to establish a broad continuum of genres? And if so, where do the new genres being explored (sermons, legal documents, newspapers, etc.) fit? If scholars wish to compare the relative merits of different types of texts for French diachronic studies – something which is a central preoccupation for sociohistorical linguists –, then there needs to be, wherever possible, triangulation of findings and attention paid to the extent to which studies use comparable methodologies and resources. It is, moreover, important to remember that, since genres and text types appear to be inherently complex and multidimensional, any continuum posited will foreground only one dimension.

Analysis of contemporary data has shown that the multidimensional nature of register variation means that genres may be hybrid and that different linguistic features can combine in different proportions in different genres.Footnote 36 Given there is no reason to expect that the past is any different in this respect, this may help explain some of the variation in the results of the case studies I have presented. There is still much research to be done before we have the answers to all these questions.

Financial support

There is no funding to be acknowledged.

Competing interests

There are no competing interests.

Footnotes

1 www.Frantext.fr. All websites in this article were consulted on 24 July 2024.

2 There is considerable variation in the way terminology, including text type and genre, is used, as I discuss in the second part of my article. The case studies focus particularly on genres (in the sense of theatre, poetry, sermons), but more broadly it is important also to take account of features such as descriptive/narrative/expositive/argumentative which describe different text types (cf. Koch, Reference Koch1997: 53).

3 There are a number of antecedents to this, dating back to Rogers (Reference Rogers1962) who proposed five categories in his diffusion of innovations theory – innovators, early adopters, early majority, late majority and laggards.

4 See also Mazziotta and Glikman (Reference Mazziotta and Glikman2019) on the interplay between represented orality, text type and syntactic factors, in particular the difference between usage in main and subordinate clauses.

6 Unfortunately, for Early Modern French these remain relatively scarce, although the transcription of letters from the Prize Papers as part of the MACINTOSH project (https://www.prizepapers.de/) is beginning to offer some new material for the 17th and particularly the 18th century (Bergeron-Maguire, Dourdy and Thiriet, Reference Bergeron-Maguire, Dourdy and Thiriet2024).

7 For instance, if we used Koch and Oesterreicher’s parameters and proposed a continuum running from communicative immediacy to communicative distance (see section 9), the ordering of the genres might be somewhat different.

8 According to Galet (Reference Galet1971), Rabelais (1532) in his prose used it in 88% of possible occurrences, Montaigne (1572–1587) used it in his prose in 82% of possible occurrences, whilst Du Bellay (1558) in his poetry used it in 100% of possible occurrences.

9 Galet’s figures for Madame de Sévigné’s letters cover the period 1645–1661; for the purposes of this analysis, this figure has been attributed to the 1650s.

10 This meant that constructions such as il se croit acquitter/il se croit acquitté and je les sais bien punir/je les sais bien punis were at the time pronounced identically. While final [r] had ceased to be pronounced by the end of the 14th century, so that infinitives in -er, -ir, and oir ended in a vowel (Rickard, Reference Rickard1989: 63), it nevertheless remained in liaison contexts and before a pause, and was later restored in -ir and -oir infinitives (Marchello-Nizia et al., Reference Marchello-Nizia, Combettes, Prévost and Scheer2020: 474). Loss of final [r] before a pause was only completed in the 17th century as the competing comments by the grammarians Mauas (1607) and Chiflet (1659) attest (Pope, Reference Pope1952: 222).

11 In total Galet analysed 105,510 lines of verse and 67,655 lines of prose (for details of how this was calculated, see Galet, Reference Galet1971: 66), which constitutes a significant amount of data, even if theatre predominates in the corpus analysed.

12 The full data are available in Tristram and Ayres-Bennett, Reference Tristram and Ayres-Bennett2012: 378. In total 1,309 examples were identified, of which 121 had singular agreement and 1,188 plural agreement.

13 The Base de Français Médieval (BFM) provided just four tokens of agreement with la plupart (all plural from the Cent nouvelles nouvelles) and the corpus Modéliser le Changement: les Voies du Français (MCVF) only 19 examples for the 1400s (of which 17 plural) and 24 for the 1500s (of which 20 plural) making it difficult to conduct an analysis of variation and change pre-1500 (Tristram and Ayres-Bennett, Reference Tristram and Ayres-Bennett2012: 377).

14 Somewhat different categories were used for this study; it should be noted that reports are extremely rare and do not feature at all in the 17th-century corpus.

17 http://presto.ens-lyon.fr/?page_id=584. The researchers’ principal research questions concerned the history of prepositions.

18 Malrieu and Rastier (Reference Malrieu and Rastier2011) set out the four categories of textual genres that form a hierarchy. At the top the discours (e.g. legal, literary, essayist, scientific), below that the champs génériques (e.g. theatre, poetry, narrative genres), below them the genres (e.g. comedy, detective novel, memoirs, travelogues), and finally the sub-genres (e.g. the epistolary novel). For them, the level genre is the fundamental one for categorizing texts. See also Rastier (Reference Rastier2011).

20 Again, the category of traité was used following Frantext in spite of the same reservations about it expressed above.

21 For instance for the GGHF, for the 12th century the categories used are (core corpus first, supplementary corpus in square brackets; the number in parenthesis indicates where there is more than one example for that genre): épique (2), hagiographie, lapidaire, roman (3); [charte, chroniques (2), comput, dramatique, hagiographie, histoire, lyrique, miracles, psautier, roman]; and for the 19th century: correspondance, dramatique, lyrique, mémoires (2), presse, récit de voyage, traité (2); [chanson, correspondance, dramatique (2), mémoires, procès, roman (2), traité].

25 https://classiques-garnier.com/corpus-de-la-litterature-medievale.html. This is a corpus available on subscription, published by Classiques Garnier numérique (2001).

29 See also Goux and Larrivée (Reference Goux and Larrivée2020); Larrivée and Goux (Reference Larrivée and Goux2024).

30 This thesaurus is currently maintained by the consortium Huma-Num ARIANE in collaboration with the consortium CORLI 2 https://opentheso.huma-num.fr/opentheso/index.xhtml.

32 As mentioned above, Koch’s work on discourse traditions (Koch, Reference Koch1997) is also important, although as yet discourse traditions “have not been established as a key concept in French historical linguistics” (Winter-Froemel and Octavio de Toledo y Huerta, Reference Winter-Froemel and Octavio de Toledo y Huerta2023: 435).

33 I am grateful to one of the anonymous readers for pointing this out for me. They also note, however, that charters and account books equally tend to be heavily influenced by Latin models and very formulaic, thereby offering contradictory evidence.

34 Recent research on discourse traditions has demonstrated that it is not just conditions of immediacy that favour innovations; conditions of distance are also relevant for language change, albeit in different ways (Winter-Froemel and Octavio de Toledo y Huerta, Reference Winter-Froemel and Octavio de Toledo y Huerta2023: 163).

35 Görlach (Reference Görlach2004: 106) proposes a complex componential analysis of text types, but notes that “as realizations of text types, individual texts conform with the emic type to a greater or lesser degree, according to the writer’s awareness of the conventions and his linguistic/stylistic competence”.

36 This suggests that more studies using multivariate analysis are required to consider the interaction between different factors and the relative importance of genre compared with, say, the individual author or, indeed, linguistic features.

References

Adam, J.-M. (1999). Linguistique textuelle. Des genres de discours aux textes. Paris: Nathan.Google Scholar
Adam, J.-M. (2005). La Linguistique textuelle. Introduction à l’analyse textuelle du discours. Paris: Colin.Google Scholar
Adam, J.-M. (2011). Genres de récits. Narrativité et généricité des textes. Louvain-la-Neuve: Academia.Google Scholar
Amatuzzi, A., Ayres-Bennett, W., Gerstenberg, A., Skupien Dekens, C. and Schøsler, L. (2020). Changement linguistique et périodisation du français (pré)classique: deux études de cas à partir des corpus du RCFC. Journal of French Language Studies, 30: 301326. https://doi.org/10.1017/S0959269520000058 Google Scholar
Ayres-Bennett, W. (2004). Sociolinguistic Variation in Seventeenth-Century France: Methodology and Case Studies. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Ayres-Bennett, W. (2018). Historical sociolinguistics and tracking language change: Sources, text types and genres. In Ayres-Bennett, W. and Carruthers, J. (eds), Manual of Romance Sociolinguistics. Berlin: Walter de Gruyter, pp. 253279.Google Scholar
Ayres-Bennett, W. (forthcoming). Les Nouvelles Remarques de M. de Vaugelas sur la langue françoise (édit. L.-A. Alemand, 1690) comme source précieuse du français préclassique. In C. Skupien Dekens, S. Ortner and M. Schmerbec (eds), Le Français préclassique. Spécificités et tendances. Paris: Classiques Garnier.Google Scholar
Ayres-Bennett, W. and Seijido, M. (2011). Remarques et observations sur la langue française. Histoire et évolution d’un genre. Paris: Classiques Garnier.Google Scholar
Balon, L. and Larrivée, P. (2016). L’Ancien Français n’est déjà plus une langue à sujet nul – nouveau témoignage des textes légaux. Journal of French Language Studies, 26(2): 221237. https://doi.org/10.1017/S0959269514000222 Google Scholar
Bergeron-Maguire, M., Dourdy, L.-M. and Thiriet, J. (2024). Ordinary letters, extraordinary findings: 17th- and 18th-century French and the MACINTOSH project (Missing hAlf the picture, ClassIcal NoT sO clasSical FrencH). Lingvisticae Investigationes, 47(1): 121142.Google Scholar
Biber, D. (1989). Variation Across Speech and Writing. Cambridge: Cambridge University Press.Google Scholar
Biber, D. (1995). Dimensions of Register Variation: A Cross-Linguistic Comparison. Cambridge: Cambridge University Press.Google Scholar
Biber, D. and Conrad, S. (2019). Register, Genre and Style, 2nd edn. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Carruthers, J. (2018). Oral genres: Concepts and complexities. In Ayres-Bennett, W. and Carruthers, J. (eds), Manual of Romance Sociolinguistics. Berlin: Walter de Gruyter, pp. 335361.Google Scholar
Carruthers, J. (2024). Spoken French. In Ayres-Bennett, W. and McLaughlin, M. (eds), The Oxford Handbook of the French Language. Oxford: Oxford University Press, pp. 356376.Google Scholar
Cheshire, J. and Gardner-Chloros, P. (2018). Multicultural youth vernaculars in Paris and urban France = Journal of French Language Studies, 28(2).Google Scholar
Denoyelle, C., Kraif, O., Mounier, P., Renwick, A., Sorba, J. and Souvay, G. (2024). Le Corpus PhraséoRoChe. Les Défis de l’établissement des textes et de l’hétérogénéité des états de la langue. Corpus, 25. https://doi.org/10.4000/corpus.8501 Google Scholar
Elspaβ, S. (2005). Sprachgeschichte von unten. Untersuchungen zum geschriebenen Alltagsdeutsch im 19. Jahrhundert. Tübingen: Niemeyer.CrossRefGoogle Scholar
Elspaβ, S. (2020). Language standardization in a view ‘from below’. In Ayres-Bennett, W. and Bellamy, J. (eds), The Cambridge Handbook of Language Standardization. Cambridge: Cambridge University Press, pp. 93114.Google Scholar
Elspaβ, S., Langer, N., Scharloth, J. and Vandenbussche, W. (eds) (2007). Germanic language histories “from below” (1700–2000). Berlin/Boston: De Gruyter.CrossRefGoogle Scholar
Galet, Y. (1971). L’Évolution de l’ordre des mots dans la phrase française de 1600 à 1700. Paris: Presses universitaires de France.Google Scholar
Galleron, I., Idmhand, F., Lavrentiev, A., Demonet, M.-L. and Réach-Ngô, A. (2021). Décrire les textes dans le cadre d’une édition numérique. Le Thésaurus “Typologie textuelle” du consortium CAHIER. https://shs.hal.science/halshs-03402679 Google Scholar
Galleron, I., Idmhand, F., Lavrentiev, A., Demonet, M.-L. and Reach-Ngo, A. (2022). Décrire un corpus d’auteurs. In I. Galleron and F. Idmhand (eds), Dix ans de corpus d’auteurs. Paris: Éditions des archives contemporaines, pp. 31–46. https://doi.org/10.17184/eac.5806 CrossRefGoogle Scholar
Görlach, M. (2004). Text Types and the History of English. Berlin/New York: Mouton de Gruyter.CrossRefGoogle Scholar
Goux, M. (2024). De très grands corpus pour l’étude diachronique du français: annotations, informations métalinguistiques et paratextes. Humanités numériques, 9. https://doi.org/10.4000/11wmv CrossRefGoogle Scholar
Goux, M. and Larrivée, P. (2020). Expression et position du sujet en ancien français: le rôle de la personne pronominale. SHS Web of Conferences, 78 = 7e Congrès mondial de linguistique française. https://doi.org/10.1051/shsconf/20207803002 CrossRefGoogle Scholar
Guillot, C. and Lavrentiev, A. (2009). Manuel de description des textes pour la Base de français médiéval. Version 2.3. Lyon: ENS de Lyon. http://ccfm.ens-lyon.fr/IMG/pdf/Manuel_Descripteurs_BFM.pdf Google Scholar
Guillot, C. Heiden, S. and Lavrentiev, A. (2017). Base de français médiéval. Une base de référence de sources médiévales ouverte et libre au service de la communauté scientifique. Diachroniques, 7: 168184.Google Scholar
Koch, P. (1997). Diskurstraditionen: zu ihrem sprachtheoretischen Status und ihrer Dynamik. In B. Frank, T. Haye and D. Tophinke (eds), Gattungen mittelalterlicher Schriftlichkeit. Tübingen: Gunter Narr, pp. 43–73.Google Scholar
Koch, P. and Oesterreicher, W. (2001). Gesprochene Sprache und geschriebene Sprache/Language parlé et langage écrit. In G. Holtus, M. Metzeltin and C. Schmitt (eds), Lexikon der Romanistischen Linguistik, vol. I,2: Methodologie (Sprache in der Gesellschaft/Sprache und Klassifikation/Datensammlung und -verarbeitung). Tübingen: Niemeyer, pp. 584–627.Google Scholar
Koch, P. and Oesterreicher, W. (2011). Gesprochene Sprache in der Romania: Französisch, Italienisch, Spanisch, 2nd edn. Berlin/New York: Walter de Gruyter.CrossRefGoogle Scholar
Labov, W. (1994). Principles of Linguistic Change. Volume 1: Internal Factors. Oxford/Cambridge, MA: Blackwell.Google Scholar
Landragin, F. (2021). Le corpus Democrat et son exploitation. Présentation. Langages, 224(4): 124. https://doi.org//10.3917/lang.224.0011 Google Scholar
Larrivée, P. and Goux, M. (2024). The evolution of bare nouns in the history of French: The view from calibrated corpora. Journal of French Language Studies, 34(3): 323350. https://doi.org/10.1017/S0959269524000061 Google Scholar
Lavrentiev, A., Guillot, C. and Heiden, S. (2021). Enjeux philologiques, linguistiques et informatiques de la philologie numérique: l’exemple de la segmentation des mots. Diachroniques, 8: 77102.Google Scholar
Lodge, R. A. (1991). Molière’s peasants and the norms of spoken French. Neuphilologische Mitteilungen, 42: 485499.Google Scholar
Malrieu, D. and Rastier, F. (2011). Genres et variations morphosyntaxiques. Traitement automatique des langues, 42(2): 548577.Google Scholar
Marchello-Nizia, C. (2012). L’Oral représenté. Un accès construit à une face cachée des langues “mortes”. In Guillot, C., Combettes, B., Lavrentiev, A., Oppermann-Marsaux, E. and Prévost, S. (eds), Le Changement en français. Études de linguistique diachronique. Bern: Peter Lang, pp. 247264.Google Scholar
Marchello-Nizia, C. (2014). L’Importance spécifique de l’“oral représenté” pour la linguistique diachronique. In Ayres-Bennett, W. and Rainsford, T. (eds), L’Histoire du français. État des lieux et perspectives. Paris: Classiques Garnier, pp. 161174.Google Scholar
Marchello-Nizia, C., Combettes, B., Prévost, S. and Scheer, T. (eds) (2020). Grande grammaire historique du français. Berlin: De Gruyter.CrossRefGoogle Scholar
Mazziotta, N. and Glikman, J. (2019). Oral représenté et narration en ancien français. Spécificités syntaxiques dans trois textes de genres distincts. Linx, 78. https://doi.org/10.4000/linx.3151 Google Scholar
Nevalainen, T. (2006). Historical sociolinguistics and language change. In van Kemenade, A. and Los, B. (eds), The Handbook of the History of English. Malden, MA: Blackwell, pp. 558588.Google Scholar
Nevalainen, T. and Ramoulin-Brunberg, H. (2012). Historical sociolinguistics: Origins, motivations, and paradigms. In Hernández-Campoy, J. M. and Conde-Silvestre, J. C. (eds), The Handbook of Historical Sociolinguistics. Chichester: Wiley-Blackwell, pp. 2240.CrossRefGoogle Scholar
Nevalainen, T., Raumolin-Brunberg, H. and Mannila, H. (2011). The diffusion of language change in real time: Progressive and conservative individuals and the time depth of change. Language Variation and Change, 23(1): 143.Google Scholar
Olivier, M., Sevdali, C. and Folli, R. (2023). Clitic climbing and restructuring in the history of French. Glossa: A Journal of General Linguistics, 8(1). https://doi.org/10.16995/glossa.10135 Google Scholar
Piccione, M. and Rainsford, T. (2024). New insights into the typology of motion in the history of French: Evidence from the manner verb lexicon. In SHS Web of Conferences, 191 = 9e Congrès mondial de linguistique française. https://doi.org/10.1051/shsconf/202419103005 Google Scholar
Pope, M. K. (1952). From Latin to Modern French with Especial Consideration of Anglo-Norman, 2nd edn. Manchester: Manchester University Press.Google Scholar
Prévost, S. (2020). Une grammaire fondée sur un corpus numérique. In Marchello-Nizia, C., Combettes, B., Prévost, S. and Scheer, T. (eds), Grande grammaire historique du français. Berlin: De Gruyter, pp. 3753.Google Scholar
Price, G. (1990). La Cantilène de sainte Eulalie et le problème du vers 15. In M.-P. Dion (ed.), “La Cantilène de sainte Eulalie”. Actes du colloque de Valenciennes, 21 mars 1989. Lille: ACCES; Valenciennes: Bibliothèque municipale, pp. 81–87.Google Scholar
Rastier, F. (2011). La Mesure et le Grain. Sémantique de corpus. Paris: H. Champion.Google Scholar
Rickard, P. (1989). A History of the French Language, 2nd edn. London: Unwin Hyman.Google Scholar
Rogers, E. M. (1962). Diffusion of Innovations. New York: Free Press of Glencoe.Google Scholar
Rutten, G. and van der Wal, M. J. (2014). Letters as Loot: A Sociolinguistic Approach to Seventeenth- and Eighteenth-Century Dutch. Amsterdam/Philadelphia: John Benjamins.CrossRefGoogle Scholar
Sorba, J., Kraif, O., Renwick, A. and Denoyelle, C. (2024). La Constitution de corpus en diachronie longue. Corpus25. https://doi.org/10.4000/corpus.8274 Google Scholar
Thomas, J. (2024). Register, genre, text type. In Ayres-Bennett, W. and McLaughlin, M. (eds), The Oxford Handbook of the French Language. Oxford: Oxford University Press, pp. 335355.Google Scholar
Tristram, A. and Ayres-Bennett, W. (2012). From negation to agreement: Revisiting the problem of sources for socio-historical linguistics. Neuphilologische Mitteilungen, 113(3): 365393.Google Scholar
Walkden, G. (2019). The many faces of uniformitarianism in linguistics. Glossa: A Journal of General Linguistics, 4(1): 52. https://doi.org/10.5334/gjgl.888 Google Scholar
Winter-Froemel, E. and Octavio de Toledo y Huerta, Á. (eds) (2023). Manual of Discourse Traditions in Romance. Berlin/Boston: De Gruyter.Google Scholar
Figure 0

Figure 1. Hypothetical continuum of genres.

Figure 1

Figure 2. Trendlines for percentage use of clitic climbing by decade and genre.9

Figure 2

Figure 3. Percentage usage of singular and plural agreement with la plupart without a singular postmodifying NP (1500–1699)12

Figure 3

Figure 4. Percentage of plural usage by author’s date of birth and by genre

Figure 4

Table 1. Percentage of plural usage by author’s date of birth and by genre

Figure 5

Figure 5. Comparison of the genres in terms of their ‘progressivity’

Figure 6

Figure 6. Usage of en après and par après in Frantext by 50-year periods

Figure 7

Figure 7. Usage of par après by genre

Figure 8

Figure 8. Usage of en après by genre

Figure 9

Figure 9. Usage of ains (1600–1699)

Figure 10

Figure 10. Distribution of ains by genre (1600–1649)

Figure 11

Figure 11. Immédiat communicatif/distance communicative and code phonique/code graphique (Koch and Oesterreicher, 2001: 586)Key: (secteur) A: immédiat phonique (conversation spontanée, etc.); B: distance phonique; C: immédiat graphique; D: distance graphique ; (a) conversation spontanée entre amis, (b) coup de téléphone, (c) lettre personnelle entre amis, (d) entretien professionnel, (e) interview de presse, (f) sermon, (g) conférence scientifique, (h) article de fond, (i) texte de loi