1 Introduction
Understanding and generating novel sentences is often considered the hallmark of human language. It is striking how, given a finite inventory of means, speakers can always process and produce new expressions through which they can convey a new trend, realize a poem, express emotions, or produce scientific theories. In that sense, language is a remarkable example of human creativity. Theories of language refer to this phenomenon as the productivity (sometimes named creativity) of language, and they usually explain this by the fact that language processing is driven by a mechanism that composes the meaning of words into larger semantic units to create novel combinations.
Nevertheless, the influence of context on the interpretation or generation of a sentence is equally characteristic of language understanding. People have a great deal of previous linguistic and extralinguistic experience they use to build rich, elaborated representations of texts and conversations. In each utterance or discourse, each word acts as a cue to “activate” and retrieve related background knowledge, creating anticipations (or expectations) about sentence completion. In a general sense, this trait could be designated as the context-sensitivity of language: Sentence comprehension results from how preexisting linguistic knowledge and contextual constraints combine to make a particular linguistic form.
The following examples illustrate these observations:
a. The child spilled the milk all over the floor.
b. The child spilled the rice all over the floor.
Both sentences are syntactically well-formed and generate meaningful semantic representation; that is, they are both semantically plausible. The first utterance includes nonnovel combinations of words: Milk is something we experienced to spill very often, and specifically, spill the milk is a chunk of words recurrent in texts. Usually, interpreting this kind of sentence eases the processing determined by several rationales (the expression is stored in long-term memory, the words match semantic expectations, etc.). Conversely, the situation reflected in the second sentence is unexpected. A comprehender could have never directly experienced that scene or heard this sequence of words (which is quite rareFootnote 1). Therefore, interpreting this unusual utterance should rely on a specific process. While the classic mechanism proposed in the literature relies on a building-block strategy, where the final meaning is somehow built from single utterance components, an alternative hypothesis is to characterize this process as a generalization based on similar previous experiences driven by other inference mechanisms (co-activated network of representations, analogical inferences, and so forth). Despite the vast amount of research in sentence processing offered by linguistic and psycholinguistic literature, it is still debatable how language users deal with the challenge of interpreting sentences presented in real time, incrementally, word by word. Accordingly, efforts are focused on formalizing a linguistic theory that can provide an adequate description and computational model of language processing.
This Element (1) proposes a review of what compositionality has been defined in general and its concrete transposition in different formalisms, (2) summarizes several experimental works that demolish (or largely downsize) the role of compositionality in language processing, (3) introduces analogy as a mechanism that could be used to build meaning, and (4) defines the role of compositionality in a usage-based constructionist perspective. In all cases, the theoretical perspectives, experimental data supporting these observations, and the potential applications to a computational model of language processing are presented.
From a broad perspective, the Element provides an overview of mechanisms proposed to explain language comprehension and their potential integration into a comprehensive theory of language processing. Before delving into specific details, the next section will introduce the two concurrent general mechanisms that linguistic theories have posited as the driving force of comprehension.
1.1 The Dual-Route Access to Meaning
The traditional view of language understanding relies on compositionality: The meaning of a sentence (or a discourse) is a function of the meaning of its constituents and the way they are syntactically combined (Partee, Reference Partee, Gleitman and Liberman1995). More precisely, the associated theoretical strategy sustained in the generative tradition (Chomsky, Reference Chomsky1965; Pinker, Reference Pinker1999) is that syntax provides the primary mode of meaning combination in language: Syntactic rules do no more than determine which symbol sequence works as units for syntactic purposes, while meaning derives from the lexical conceptual structure. In other words, reading a sentence consists of linearly accessing the meaning of the words stored in the lexicon and then integrating them within the abstract hierarchical structure. This stance encourages a bottom-up, or building block, model of meaning where the interpretation mechanism is incremental: The meanings of the words are composed into syntactic units and aggregated until reaching a complete interpretation.
Nevertheless, there is extensive experimental evidence from psycholinguistic and neurolinguistic research against traditional compositionality (Baggio, Reference Baggio2021; Baggio, Van Lambalgen, & Hagoort, Reference Baggio, Van Lambalgen, Hagoort, Hinzen, Werning and Machery2012; Ferreira, Bailey, & Ferraro, Reference Ferreira, Bailey and Ferraro2002; Ferreira & Patson, Reference Ferreira and Patson2007; Mollica et al., Reference Mollica, Siegelman and Diachek2020, among others). These findings can be synthesized into two fundamental observations (Baggio, Reference Baggio2018, p. 19):
A comprehender generates interpretations based on the semantic relations between words, not necessarily encoded or reflected by the grammar, and
A comprehender tends to generate semantic representations that could bypass or collide with syntactic analyses, resulting in superficial and even inaccurate interpretations.
These observations argue for a more top-down model, in which our semantic expectations drive comprehension: The linguistic elements function as cues to activate our extensive linguistic knowledge stored in the long-term memory. This knowledge encompasses various facets, such as the frequency of use of certain expressions, common associations, and event schemas. Consequently, linguistic knowledge transcends mere lexical components and transforms into the “cognitive organization of one’s experience with language” (Bybee, Reference Bybee2006, p. 771). Moreover, experimental studies have revealed a consistent pattern: When the comprehender accesses information congruent with their preexisting or pre-activated knowledge, facilitation effects in processing occur (faster reading times, low N400, etc.). The linguistic theories rooted in these assumptions endorse a noncompositional mechanism of meaning interpretation. They advocate a model of language where linguistic knowledge comes from direct linguistic experience and sentence processing is constraint-based, probabilistic, and reliant on expectations.
Although the ongoing debate has opened the possibility that these two strategies (compositional and noncompositional access to meaning) are not mutually incompatible, few linguistic theories have thus far bothered to elucidate how the two mechanisms could be integrated within a unified framework of sentence processing. Notably, theories within the fields of psycholinguistics, theoretical linguistics, and computational linguistics have offered different perspectives on uncovering the cognitive systems behind language, describing their characteristics, and modeling them. On one end, experimental results from different research areas have shed light on the mechanisms that could underlie human language comprehension. However, our knowledge about the language system remains scattered: Psycholinguistic studies usually focus on language processing subtasks (e.g., lexical access) or modules (e.g., morphology, syntax) without being aggregated into a unified framework. Conversely, linguistic theories provide a rigorous formalism for language description, but they still have difficulty integrating the variability of language productions observed in behavioral experiments. To this day, it remains challenging to find in the existing literature a comprehensive model that unifies the different observations on language understanding into a unique architecture (Blache, Reference Blache, Sharp, Sèdes and Lubaszewski2017). However, there is a family of linguistic theories whose fundamental assumption could make it possible to integrate findings from multiple research fields.
1.2 A Constructionist View of Language
The behavioral evidence surrounding noncompositionality points to a model of linguistic representation that is in line with the assumptions of the usage-based models of language (Bybee, Reference Bybee2010; Croft, Reference Croft1991, Reference Croft2001; Langacker, Reference Langacker1987; Tomasello, Reference Tomasello2009) and the Construction Grammar (CxG) paradigm (Hilpert, Reference Hilpert2019; Hoffmann, Reference Hoffmann2022b; Hoffmann & Trousdale, Reference Hoffmann and Trousdale2013; Ungerer & Hartmann, Reference Ungerer and Hartmann2023). CxG refers to a family of models based on the assumption that grammar is more than simply a formal system consisting of stable but arbitrary rules for defining well-formed sequences. Besides their specificities, all constructionist theories agree on a fundamental claim: Grammar consists of meaningful and symbolic form–meaning mappings, called constructions (Goldberg, Reference Goldberg1995, Reference Goldberg2006, Reference Goldberg2019). The definition and operationalization of “construction” are still under debate, and each formalism proposes a slightly different criterion (see Ungerer & Hartmann, Reference Ungerer and Hartmann2023 for a recent discussion on the definition of constructions). In the more general sense, constructions are processing units or chunks, from morphemes or words, to partially and fully lexicalized expressions, to schematic and productive patterns of language – such as Passive or Ditransitive constructions (Goldberg, Reference Goldberg2003), up to even genres and text types (Hoffman & Bergs, Reference Hoffman and Bergs2018). For instance, the comparative correlative construction (Hoffmann, Brunner, & Horsch, Reference Hoffmann, Brunner and Horsch2020; also “Covariational Conditional”; cf. Culicover & Jackendoff, Reference Culicover and Jackendoff1999) “The Xer, the Yer,” such as
(2) The more you know, the less you understand.
has specific syntactic and semantic properties. First, both clauses are introduced by an element that “resembles” the English definite article the, which instead is followed by comparative phrases. Semantically, English speakers identify the cause–effect relationship between the two clauses, which is not marked at any syntactic level. Thus, both the syntax and the meaning of the construction are not entirely predictable by any abstract rule.
At the same time, a syntactic pattern like the Double Object construction has a meaning independently of the words that compose the construction. People reading a sentence like
(3) She mooped him something.
interpret the made-up word moop as “to give” because the abstract pattern V Obj Obj communicates itself the concept of transfer between two persons (Goldberg, Reference Goldberg2019, p. 29).
Constructions form a structured inventory of a speaker’s knowledge of the conventions of their language, called the constructicon (Diessel, Reference Diessel2023): Each construction constitutes a node in the taxonomic network of constructions, and taxonomic relations allow us to distinguish different types of grammatical knowledge. However, there is no complete agreement about how such taxonomies emerge. Formal models such as Sign-Based CxG (Sag, Reference Sag, Boas and Sag2012) assume that only “idiosyncratic morphological, syntactic, lexical, semantic, pragmatic or discourse-functional properties must be represented as an independent node in the constructional network in order to capture a speaker’s knowledge of their language” (Croft & Cruse, Reference Croft and Cruse2004, p. 263). Conversely, usage-based approaches – that is, Radical CxG (Croft, Reference Croft2001) or Cognitive CxG (Goldberg, Reference Goldberg2003) – advocate that constructions can be of any linguistic pattern used enough to be memorized (or entrenched; cf. Blumenthal-Dramé, Reference Blumenthal-Dramé2012) in the long-term memory (Goldberg, Reference Goldberg2006). Specifically, the assumption is that linguistic units that are more frequently encountered become more accessible and are preferred. According to this thesis, the most entrenched linguistic units tend to shape the language system in terms of patterns of use, at the expense of less frequent and thus less well-entrenched words or phrases. This account opens to a more redundant view of the lexicon: Although we do not technically need to memorize the word form cats because, in principle, it can be formed with a productive rule cat+s, it could be memorized in the lexicon because we have encountered it thousands of times in everyday language (Hilpert, Reference Hilpert2021, p. 21). The same observation could be done for phrases. For instance, read a book is a semantic transparent chunk; that is, we can identify the meanings of its parts and how they were combined to generate the final interpretation. Even though we could build the meaning of the expression “on the fly” using a compositional, incremental mechanism, this sequence was heard and used so many times to be stored as a whole; thus, interpretation becomes the act of retrieving the stored meaning. Using Goldberg’s words, “memory is cheap. There is a good deal of evidence that we retain an enormous amount of information about the language(s) we witness” (Goldberg, Reference Goldberg2019, p. 54).
However, that does not mean that people retain frequently observed word combinations as atomic units (as this would quickly result in a combinatorial explosion), but memory traces have an internal structure. Thus, representations of related memories overlap neurally, mitigating the concern about a combinatorial explosion (Goldberg, Reference Goldberg2019). Moreover, the cognitive capacity of pattern detection and schematization (Bybee, Reference Bybee2010) allows the storage of more abstract construction from specific instances. Going back to the previous example, speakers redundantly store frequent plural forms in addition to a general plural construction, or an entire expression together with the abstract transitive pattern.
Placing constructions as the fundamental unit of language has the consequence of blurring the distinction between words of the lexicon and the rules of grammar. Contrary to generative theories, CxG argues that the architecture of the grammar is not layered in distinct modules, but different properties (morphological, prosodic, syntactic, semantic) together constitute the form that allows the construction to be identified, and when it is recognized, it is possible to access the associated meaning directly. This holistic view emphasizes the importance of surface structure, that is, the concrete utterances that a hearer is exposed to, as opposed to mainstream generative grammar, which primarily focuses on hidden syntactic processes not directly observable in the final output (Goldberg, Reference Goldberg, Hoffmann and Trousdale2013). As a joint representation of syntax and semantics, constructions provide a powerful mechanism for investigating many different linguistic phenomena (Diessel, Reference Diessel2019).
Despite the vast possibilities this framework offers for linguistic description and language modeling, some issues are still to be addressed. For instance, while we agree with the assumption that the lexicon is a repository of constructions, it is unclear which factors drive the memorization of specific chunks. The question about which constructions are stored in long-term memory and which aspects can be constructed online in working memory is yet to be fully answered and has consequences on the mechanisms governing sentence processing. In that regard, one more issue has to be figured out: What is the most appropriate and acceptable representation of constructional meaning? Toward a complete model of language comprehension, a further challenge is to give a semantic representation that could be coherent with the usage-based perspective and could account for the evidence that lexical knowledge is quite detailed, often idiosyncratic and verb specific, and often accessible at the earliest possible stage in sentence processing. Finally, while the majority of approaches have focused on grammatical description, only a few efforts have been carried on to operationalize CxG as computational modeling, with the exception of Fluid CxG (Steels, Reference Steels and Steels2011) and Embodied CxG (Bergen & Chang, Reference Bergen, Chang, Hoffmann and Trousdale2013.
To conclude, the core of this Element is investigating the relationship between constructions and compositionality, as the title of this work suggests. Indeed, constructionist approaches do not refute compositionally per se, but reformulate this paradigm in terms of the combination of constructions (“weak compositionality”; cf. Michel, Reference Michel2023, p. 566):
By recognizing the existence of contentful constructions we can save the compositionality in a weakened form: the meaning of an expression is the result of integrating the meaning of the lexical items into the meanings of constructions.
Consequently, the CxG framework turns out to be the best way to unify the different compositional and noncompositional mechanisms observed in language due to its key assumptions. Nevertheless, there is no consensus regarding the manner in which constructions interact with each other to license a specific utterance (Boas, Reference Boas, John and Xu2021, p. 64). Overall, this Element is focused on reviewing the different insights about language processing, which aspects are already depicted in CxG, and which ones still need to be addressed by future research.
Adapted from Goldberg (Reference Goldberg, Hoffmann and Trousdale2013, p. 15–16)
1. Language as language use. Our linguistic knowledge comes from linguistic experience: our lexicon and grammar are shaped by repeated exposure to specific utterances.
2. Construction are the fundamental units of language. Constructions are conventionalized associations of a form and a function, which apply not only to words but also to syntactic structures, thus guaranteeing a certain uniformity of representation of linguistic facts.
3. The importance of the surface structure Meaning is directly associated with surface structure, without derivations or transformations.
4. The construct-i-con. Grammar is a network of constructions, hierarchically organized through inheritance relations.
5. There is a continuum from what is stored to what is processed. There is no dichotomy between interpreting stored linguistic units and assembling expressions “on the fly”; there is just a continuum from stored items, highly predictable sequences, and completely compositional ones.
6. Meaning emerges through context. The meaning of a construction is inherently rooted in its contexts of use.
1.3 Compositionality, Productivity, Creativity
As introduced in the first paragraph of this Element, the surprising fact about language is that people can constantly generate (and understand) never-ever-produced utterances. Chomsky defined this property as the “creative aspect” of language: “[A]n essential property of language is that it provides the means for expressing indefinitely many thoughts and for reaching appropriately in an indefinite range of new situations” (Chomsky, Reference Chomsky1965, p. 6). More recently, Adger affirmed:
The fact that sentences hardly reoccur shows us that we use our language in an incredibly rich, flexible, and creative way . Virtually every sentence we utter is novel. New to ourselves, and quite often new to humanity. We come up with phrases and sentences as we need to, and we make them express what we need to express. We do this with incredible ease. We don’t think about it, we just do it. We create language throughout our lives, and respond creatively to the language of others.
Chomsky and the generative tradition thus seem to suggest that linguistic creativity is “combinatorial” and “productive” (Bergs, Reference Bergs2018, p. 278): It involves creating something entirely new using existing rules in almost infinite ways. The success of the principle of compositionality thus relies on its ability to explain the most attractive property of language: creativity. However, before introducing compositionality in the next section, it is essential to step back and understand how linguistic creativity is defined (especially in the realm of CxG). Let us start with some creative expressions:
a. She smiled him in the door (Goldberg, Reference Goldberg2019, p. 61).
b. The mother of all battles (Hartmann & Ungerer, Reference Hartmann and Ungerer2023, p. 5).
c. Messi is the Mozart of football (Hoffmann, Reference Hoffmann2019, p. 5).
d. Weapons of mass distraction (Giora et al., Reference Giora, Fein and Kronrod2004).
The following expressions are likely to be unfamiliar to most readers and, therefore, by the previous definition can be considered creative utterances. However, according to Sampson (Reference Sampson and Hinton2016), there are two distinct types of creativity: F-creativity (fixed creativity), which produces examples drawn from a predetermined and established inventory, and E-creativity (extending/enlarging creativity), which goes beyond the system rules. According to this dichotomy, many linguistic phenomena traditionally assumed as “creative” are, in fact, examples of F-creativity, as new sentences are the result of grammatical rules (Hoffmann, Reference Hoffmann2018).
The term productivity is used in linguistics to refer to the “original use of established possibilities of the language” (Leech, Reference Leech2014, p. 24). For instance, syntactic productivity concerns “the range of lexical items that may fill the slots of constructions” (Perek, Reference Perek2016, p. 66). In accordance with CxG’s assumptions, one uses and extends preexisting constructions to generate novel utterances. This is exemplified in (4a), where the use of a typical intransitive verb (e.g., smiled) as transitive in a caused-motion construction forces a creative new meaning, such as “She caused him to move the door by smiling.” Mismatches between the typical environments in which a verb is used and its occurrence in a new and creative way are widely discussed as valency coercion (Goldberg, Reference Goldberg1995). Several studies in CxG have investigated this construction productivity, and in particular Goldberg (Reference Goldberg2019) offers an extensive review focusing on explaining “the partial productivity of grammatical constructions.”
Even though many new expressions arise from productivity (of F-creativity), the question of “how do speakers use their grammar to create E-creative utterances” remains a topic of debate (Hoffmann, Reference Hoffmann2022a, p. 280). According to Bergs (Reference Bergs2018), a source of E-creativity relies on the “intentional manipulation of linguistic structure” (p. 281), usually exemplified by linguistic extravagance, that is, to talk in such a way that you are noticed (Haspelmath, Reference Haspelmath1999; Ungerer & Hartmann, Reference Ungerer and Hartmann2020). The use of formulaic patterns drawn from a fixed template, like in (4b) (namely, snowclones; cf. Hartmann & Ungerer, Reference Hartmann and Ungerer2023) can be considered creative. They represent an interesting case because, even if they transmit a hyperbolic meaning fulfilling a specific pragmatic function, they still derive from a partially fixed construction. As such, these expressions illustrate the complex interplay between creativity and productivity (Ungerer & Hartmann, Reference Ungerer and Hartmann2023).
Other examples of proper creative constructs are metaphorical expressions, like the one in (4c), which are governed by the general cognitive process of Conceptual Blending (Fauconnier & Turner, Reference Fauconnier and Turner2002; Turner, Reference Turner2018). This mental operation constructs a partial match between two input mental spaces (FOOTBALL and CLASSIC MUSIC, in this example) and selectively projects from those inputs into a novel “blended” mental space, resulting in a new meaning (Messi is a genius on the football pitch, just as Mozart was a musical genius; cf. Hoffman, Reference Hoffmann2019). However, even apparently, rule-breaking phenomena like the production of a novel metaphor rely on established patterns (i.e., entrenched construction X-is-the-Y-of-Z; cf. Fauconnier & Turner, Reference Fauconnier and Turner2002) and on established mechanisms. As Bergs and Kompa (Reference Bergs and Kompa2020, p. 14) observes: “Still, even the most creative metaphor has to use established means (analogy) and comply with most of the rules governing language use and linguistic interaction. Thus, metaphors are actually also examples of F-creativity in the widest sense; they do not expand the rules of language as such.”
Other research domains propose alternative models for linguistic creativity. One such theory is the Optimal Innovation hypothesis, which posits that the aesthetics of creative productions are best explained by variations of familiar material (Giora et al., Reference Giora, Fein and Kronrod2004). According to this theory, specific minimal modifications of familiar expressions can be more pleasurable than entirely novel creations. For instance, the neologism in (4d) is optimally innovative because it induces a novel response while enabling the retrieval of a salient stimulus, the familiar expression “weapons of mass destruction.” The question is: Are utterances of this type examples of F-creativity (as they relate to the familiar and use a specific mechanism) or of pure E-creativity?
Despite the considerable research on linguistic creativity, the examples above reveal that a consensus on what constitutes a creative expression has yet to be reached. As Maybin (Reference Maybin and Jones2015, p. 34) stated: “While everyday language creativity is now an established area of ongoing linguistic research, there is a continuing lack of clear agreement about the precise definition and scope of creativity itself.” Generally, the complex relation between productivity and creativity is far from being defined: Given that language is a complex system, it is challenging to define expressions entirely unconstrained by any rules (“All use of natural human language ultimately is F-creative”; cf. Bergs & Kompa, Reference Bergs and Kompa2020, p. 18). In a broader sense, creativity can be viewed as a gradient phenomenon ranging from systematic productivity to extravagant stimuli that generalize from existing schemata. This Element focuses more on the F-creativity aspect of language, a “constrained” form of creativity (Goldberg, Reference Goldberg2019), focusing on how the generation and comprehension of new expressions relate to the familiar and the mechanisms we can exploit to generate novel (but not necessarily creative) utterances apart from compositionality.
1.4 Roadmap
What is compositionality’s role in today’s models of language and sentence processing? How do we process both familiar and novel expressions? How can observations from experimental data be transposed into a formal theory of language representation and processing? This Element connects various linguistic theories and behavioral observations about processes governing semantic interpretation, arguing that CxG provides a more suitable linguistic formalism to explain language comprehension.
This Element is organized as follows. First, Section 2 introduces one of the two protagonists of the title: compositionality. Specifically, it discusses the notion of Fregean compositionality, traditionally believed to be the sole explanation for our ability to understand and create new sentences, and illustrates how this principle was used for describing the mechanism of meaning composition in traditional formalisms, constructionist approaches, and distributional models of meaning. Complementary, Section 3 examines studies in psycholinguistics and neurolinguistics that challenge this traditional view. The behavioral outcomes suggest a model of linguistic representation consistent with usage-based constructionist approaches, blurring the distinction between stored and nonstored sequences and productive and nonproductive patterns. Furthermore, Section 4 introduces the main claim of this Element: Systematic processes of language productivity are mainly explainable by analogical inferences rather than sequential compositional operations. Novel expressions are produced and understood “on the fly” by analogy with familiar ones. The section delves into the characteristics of analogical reasoning and explores the nature of linguistic analogy to support the proposal that analogical processing forms the basis of the human capability to generate new utterances. Finally, Section 5 proposes a redefinition of the role of compositionality as a property of natural language and as the only mechanism in sentence comprehension, suggesting that compositionality is only one of the possible explanations for the human ability to comprehend and produce an endless number of novel utterances.
In the end, readers will realize the complexity of rethinking a linguistic theory that formalizes the coexistence of different mechanisms to interpret any expression, from the most common to the never-encountered-before ones.
2 The Problem of Compositionality as a Processing Principle
One common remark about human thought and language is their outstanding expressive power to assemble meaningful parts into endlessly novel configurations. As observed in everyday language, we have a potentially open-ended capacity to produce and understand novel meaningful sentences we have never heard before. For instance, let us consider the following sentence.
(5) Purple cats are fluffy.
Any English speaker could understand this sentence, even if it sounds odd and plausibly it was never encountered before: This is because comprehenders know the meanings of purple, cats, and fluffy and how to construct the meaning of a novel sentence from the meanings of its parts. By combining morphemes into words, words into phrases, and phrases into sentences, natural language is exceptionally productive and expressive.
If it is possible to easily comprehend the meaning of a new sentence, there must be a systematic procedure for determining that meaning. Crucially, a fundamental question that any theory of language should address is: How do people glean meaning from language? (Goldberg, Reference Goldberg and Riemer2015). In other words, it should propose a hypothesis about the mechanism that enables the construction of meaning from smaller units of meaning. Classical theories posit the existence of a compositionality principle, which stipulates that the meaning of a complex expression is a function of the meanings of its parts. The following pages summarize what the Principle of Compositionality is, the primary arguments supporting it, and its formalization within linguistic theory.
2.1 The Principle of Compositionality: Definitions and Main Assumptions
The traditional presumption in philosophy and linguistics is that language and thought are compositional (Martin & Baggio, Reference Martin and Baggio2020): The meaning of a complex expression is entirely determined by its structure and the meanings of its constituents – once we specify what the parts mean and how they are put together, there is no more leeway regarding the meaning of the whole. This view is referred to as the Principle of Compositionality, also called Frege’s Principle by the name of Gottlob Frege, credited with having first formulated this notion – although there are problems with this attribution (Pelletier, Reference Pelletier1994, p. 24). The principle of compositionality was first introduced as a constraint on the relation between the syntax and the semantics of languages, and it was later postulated as an adequacy condition for other representational systems such as structures of mental concepts (Hinzen, Werning, & Machery, Reference Hinzen, Werning, Machery, Hinzen, Werning and Machery2012a).
Broadly, the principle of compositionality is typically defined as follows:
The meaning of a whole is a function of the meanings of the parts and of the way they are syntactically combined.
The meaning of a complex expression is determined by the meanings of its constituents and by its structure.
These definitions are just two of the most cited, although several variants have been formulated (Hinzen, Werning, & Machery, Reference Hinzen, Werning and Machery2012b). The common aspect of both versions is that the notions of content and structure are admittedly vague: The nature of the principle can be made precise only with an explicit theory of meaning and syntax, together with a full specification of what is required by the relation “is a function of” (Pelletier, Reference Pelletier and Aronoff2016). In this sense, compositionality is highly theory-dependent (Partee, Reference Partee2004, p. 154).
Linguistic theories adopting the compositional hypothesis diverge on several points depending on the theoretical assumptions regarding the representation of words’ meanings (i.e., the building blocks of the sentence), the syntactic rules governing sentence structure, and the processes implicated in meaning construction. Among others, they diverge on whether syntactic analysis recedes and supplies its output prior to semantic analysis, or whether syntactic and semantic analyses are combined (as exemplified in the Head-driven Phrase Structure Grammar, or HPSG; Sag & Pollard, Reference Sag and Pollard1994), with syntactic rules integrating the semantic information obtained from the elements they combine. Additionally, these theories differ in the form of the semantic output: a logical formula in first-order logic, a lambda expression, or a typed-feature representation, inter alia.
Although several variants of the principle have been formulated, the core idea is that the principle of compositionality advocates for a bottom-up, or building block model of meaning: The meaning of the whole expression is incrementally built from the meanings of its constituent parts (Goldberg, Reference Goldberg and Riemer2015). The principle also entails a modular vision of the interpretation process; that is, there is a clear separation between syntax and semantics. This so-called division of labor works in most formal approaches in the following way: Lexical semantics represents the meanings of words, and syntax governs the combination of words into larger units of meaning and ascribes the relationships between words within these larger units.
Furthermore, the principle presupposes the concepts of localism and incrementality. In the first place, localism pertains to whether compositional operations are local or global in nature. As outlined by Pagin and Westerståhl (Reference Pagin and Westerståhl2010a), the meaning of a complex term can be derived from (1) the meaning of its immediate “children” within the syntactic structure (considered in a tree-like fashion) regardless of the process by which their meaning was built up (strong compositionality), or (2) from its total (global) structure and the meanings of its constituent atomic parts (weak compositionality). In the latter interpretation, complex terms may exhibit different meanings depending on the larger expression of which they are part. Hupkes et al. (Reference Hupkes, Dankers, Mul and Bruni2020) exemplified the problem in arithmetic terms: The outcome of 14 - (2 + 3) does not change when the subsequence (2 + 3) is replaced by 5, a sequence with the same (local) meaning, but a different structure (strong version). However, the strong hypothesis is controversial in natural language, especially in the case of disambiguation of a phrase or word in context. Conversely, incrementality assumes that the interpretation process follows rigidly the same order in which the constituents are combined to form complex expressions, step by step, in a deterministic fashion. At each step, the interpretative operation builds the semantic value of the current node and makes it available for further steps. Importantly, once a semantic value has been ascribed to a particular utterance, it cannot be changed. These two properties introduce a perspective on meaning composition that presents certain challenges. First, localism yields that the assignment of a semantic value to a node must not rely on external factors beyond the current segment of the sentence under analysis. Second, incrementality asserts that once a semantic representation has been assigned, the meaning remains unchanged, regardless of any subsequent constituents, whether they be phrases, sentences, or discourse (Gayral, Kayser, & Lévy, Reference Gayral, Kayser, Lévy, Machery, Werning and Schurz2005). Nonetheless, as will be argued in the subsequent section, the context of use assumes a pivotal role in human interpretation, challenging a strong and incremental view of compositionality.
The principle of compositionality has been compelling for many reasons, even if it has been widely criticized (in fact, approximately 318 arguments against it can be found in the literature, cf. Pelletier, Reference Pelletier1994). Without delving into specific details, the following section sketches the traditional arguments favoring compositionality together with their main criticisms (cf. Pagin & Westerståhl, Reference Pagin and Westerståhl2010b for a comprehensive review of arguments both in favor of and in opposition to compositionality).
2.1.1 The Arguments of Compositionality
The standard arguments in favor of the principle originate from supposed “facts” about language and are used as justifications for the necessity of compositionality (Baggio, Reference Baggio2021, p. 4). In the following boxtext we report the main arguments, as summarized by Goldberg (Reference Goldberg and Riemer2015).
Derived from Goldberg (Reference Goldberg and Riemer2015) and influenced by Dowty (Reference Dowty, Barker and Jacobson2007, p. 3–4).
a. Speakers produce and listeners parse sentences that they have never spoken nor heard before.
b. Speakers and listeners generally agree upon the meanings of sentences.
c. Since there exists an infinite number of sentences, they cannot all be memorized.
d. There must be some procedure for determining meaning.
e. Sentences are generated by some grammar of the language.
f. The procedure for interpreting sentences must be determined, in some way or the other, by the syntactic structures generated by the grammar together with the words.
The first and most familiar argument in favor of compositionality is that it can explain our ability to produce and understand sentences we have never heard before (productivity, cf. points a., c., and d. in the box). The argument goes as follows. Since speakers are able to understand a sentence S they have never encountered, it must be that they know something on the basis of which they can figure out, without any additional information, what S means. What can this knowledge be? The only thing that could plausibly be is knowledge of the syntactic structure of S and of the individual meanings of the simple constituents of S. However, this argument has been criticized on the ground of several considerations. Szabó (Reference Szabó, Hinzen, Werning and Machery2012) questioned the argument of productivity, observing it assumes “that we already understand expressions we have never heard before. What is the evidence for this? The fact that when we hear them we understand them shows nothing more than the information necessary to determine what they mean is available to us immediately after they have been uttered.” Reformulating, what is evidence for the claim that we already understand certain expressions we have never heard before? Is it true that we always rely on syntax in interpreting novel expressions?
A related argument in favor of compositionality (points e. and f. in the box) is the concept of systematicity, a term introduced by Fodor and Pylyshyn to denote that “[t]he ability to produce/understand some sentences is intrinsically connected to the ability to produce/understand certain others” (Fodor & Pylyshyn, Reference Fodor and Pylyshyn1988, p. 37). In the simplest manifestation, if speakers comprehend a sentence of the form R , such as John loves Mary, they are expected to similarly comprehend the corresponding sentence R (e.g., Mary loves John). Nonetheless, this intuitive property becomes relatively weak once we start considering more complex cases. For instance, not every word substitution within an expression remains meaningful. For example, given the phrases within an hour and without a watch, it is challenging to derive meaningful interpretations for within a watch and without an hour (Baggio, Van Lambalgen, & Hagoort, Reference Baggio, Van Lambalgen, Hagoort, Hinzen, Werning and Machery2012, p. 657). Moreover, the mere comprehension of red car and tall building does not necessarily imply the comprehension of red building and tall car (Szabó, Reference Szabó, Hinzen, Werning and Machery2012).
In this sense, the argument of systematicity delves into the very nature of natural language: Are sentences resulting from grammatical recombination inherently meaningful or not? It is debatable to what extent this really holds, and sentences like Chomsky’s Colorless green ideas sleep furiously (Chomsky, Reference Chomsky1957) have been used to argue that not all grammatical sentences are meaningful. Nevertheless, even if we were to assume that all grammatical sentences are meaningful, this alone does not establish the necessity of compositionality or any form of systematic semantics for its explanation (Pagin & Westerståhl, Reference Pagin and Westerståhl2010b, p. 5).
Finally, while systematicity can be empirically observed to a certain extent, productivity remains a more contentious issue. It is, indeed, impossible to conclusively demonstrate the existence of an infinite number of complex expressions in natural languages (Pullum & Scholz, Reference Pullum and Scholz2010). Even if human memory were theoretically capable of generating infinitely long sentences, the finite lifespan of individuals would preclude such a possibility. Consequently, the argument about the productivity of language is generally regarded as more contentious than that of systematicity (Hupkes et al., Reference Hupkes, Dankers, Mul and Bruni2020), although it is the most intuitive one.
Another point, referred to as the methodological argument (Baggio, Van Lambalgen, & Hagoort, Reference Baggio, Van Lambalgen, Hagoort, Hinzen, Werning and Machery2012), posits that compositionality serves as a necessary constraint in semantic analysis. The principle of compositionality provides an operationalized way to compute the meaning of complex linguistic expressions. It represents indeed one of the most straightforward explanations: Starting from the meanings of its atomic constituents and following its syntactic structure, the interpretation of a complex expression unfolds progressively, step by step, from the atomic components to the most elaborate ones. Despite its widespread appeal, this argument, too, falls short of validating compositionality. The ability of compositional semantic theories to account for certain phenomena does not inherently imply that these theories are effective because they are compositional; in other words, it does not prove that compositionality is a property of natural language.
These concerns, though merely a fraction of the broader issues at hand, have initiated an extensive investigation aimed at establishing the limits and refining the concept of compositionality through empirical data and cognitive insights. The subsequent section delves into the formalization of compositionality in both traditional linguistic frameworks and more recent computational methodologies.
2.2 Modeling Compositionality
2.2.1 Compositionality in Formal Semantics
The principle of compositionality stands as a foundational claim in Formal Semantics, a well-established approach in linguistic theory (Groenendijk & Stokhof, Reference Groenendijk, Stokhof, Carlson and Pelletier2005; Partee, Reference Partee, Aloni and Dekker2016; Partee, ter Meulen, & Wall, Reference Partee, ter Meulen and Wall1990). Formal Semantics encompasses a range of semantic theories, all employing standard methodologies grounded in symbolic logic, mathematics, and mathematical logic to rigorously formulate well-defined theories concerning the semantics of natural languages (King, Reference King and Smith2006).
The philosopher and logician Richard Montague (e.g., Reference Montague1970b; Reference Montague, Hintikka, Moravcsik and Suppes1973) was one of the first to argue that the relation between syntax and semantics in natural language could be regarded as not essentially different from the relation between syntax and semantics in a formal language, such as the language of first-order logic. He articulated this idea in the following words:
There is in my opinion no important theoretical difference between natural languages and the artificial languages of logicians; indeed, I consider it possible to comprehend the syntax and semantics of both kinds of languages within a single natural and mathematically precise theory. On this point I differ from a number of philosophers, but agree, I believe, with Chomsky and his associates.
Accordingly, natural language could be “translated” into the metalanguage of logic, as, for instance, the language of predicate calculus. Within the Montague Grammar tradition, the principle of compositionality assumes a pivotal role in articulating the relation of semantics to syntax. It states that the semantic interpretation for a language is defined as some homomorphism (a structure-preserving mapping) from syntax to semantics (a gentle introduction to this concept is provided by Janssen and Partee Reference Janssen, Partee, van Benthem and ter Meulen1997, p. 448–450). In other words, the syntactic operations that combine syntactic expressions must match the meaning operations, forming complex meanings from simpler ones. Goldberg (Reference Goldberg1995, p. 13) illustrates this claim as follows:
where is a function from syntax to semantics, + is a rule of syntactic composition, and + is a rule of semantic composition. The formula formally conveys that the interpretation of the entire expression results from applying the meanings of the immediate constituents (and only by those meanings) via a semantic operation that aligns directly with the corresponding syntactic operation. It is worth noticing that having a compositional interpretation structured in this manner represents a straightforward way of ensuring that each of the infinitely potential syntactic structures within a language will receive a clearly defined interpretation (Groenendijk & Stokhof, Reference Groenendijk, Stokhof, Carlson and Pelletier2005).
The perspective and the technical apparatus Montague offered have significantly impacted the study of natural language semantics, paving the way for a wide range of Formal Semantic approaches, from model-theoretic to proof-theoretic semantics. Besides their specificities, any compositional formal semantic framework provides
1. a knowledge or a semantic representation of linguistic expressions in a logic,
2. some mechanisms for combining them in the form of formal rules.
Classically, semantic information is depicted in terms of feature-value structures, and logic is used both as a description language and calculus for constructing the meaning. In these approaches, the meaning is assembled starting from atomic objects (typically the meaning of words) and incrementally combined into larger structures. This mechanism constitutes the basis of compositionality. It is noticeable that this formalization requires an explicit representation of both types of information: the information associated with the constituents (typically lexical semantics) and the meaning composition mechanisms. The standard approach is to use first-order logic and model meaning composition as function application (Montague, Reference Montague and Visentini1970a; Partee, ter Meulen, & Wall, Reference Partee, ter Meulen and Wall1990). The semantic representation of the phrase Alex smiled can be expressed as in Equation 2. In this representation, the proper noun “Alex” is denoted by a constant (a), while the predicate “smiled” is expressed as a lambda term, signifying a function that can be applied to arguments of the appropriate type. The application of “smiled” function to the argument “Alex” (as illustrated in the third row) results in the final expression, where the bound variable (x), found within the lambda term, is replaced with the argument expression a (the example is adapted from Martin & Baggio, Reference Martin and Baggio2020).
In lambda calculus, the process of function application allows the identification of the arguments and the predicates to be gathered into formulae, thanks to a mapping function from syntax to semantics. The integration of quantifiers and modalities completes the logical model, employing specific mechanisms based on more intricate calculi. This mechanism, which remains relatively consistent across various theories, relies on two foundational premises: first, that meaning can be dissected into fundamental, atomic components; and second, that a linear and incremental mechanism exists for assembling these components into abstract structures.
This approach constitutes the basis of numerous semantic formal frameworks, particularly those focused on the interface between syntax, semantics, and discourse. Noteworthy examples include Discourse Representation Theory (Kamp & Reyle, Reference Kamp and Reyle1993) and Combinatory Categorial Grammar (Steedman, Reference Steedman2001). From a computational perspective as well, the Montagovian perspective of compositionality has long been a cornerstone in natural language understanding approaches. Extensive work has been carried out in this direction within the logic programming paradigm (Colmerauer, Reference Colmerauer, Clark and Tärnlund1982; Shieber & Pereira Reference Shieber and Pereira1987), and more recently, within the theoretical framework of Categorial Grammars (Bos et al., Reference Bos, Clark, Steedman, Curran and Hockenmaier2004; Moot, Reference Moot2012). Additionally, a more recent framework exploring semantic representation with minimal structures has been proposed in the Head-Driven Phrase Structure Grammar paradigm, known as Minimal Recursion Semantics (Copestake et al., Reference Copestake, Flickinger, Pollard and Sag2005). This framework introduces key notions, such as under-specification and a generalization of the interface between semantics and other domains.
However, the traditional formal analysis of meaning composition is acknowledged to exhibit certain limitations in terms of its power and expressive capacity. On the representation side, it is unclear how lambda terms precisely capture the intricate nuances of the meanings of constituent expressions. Formal approaches struggle to encapsulate content words in all their richness – and, by extension, the array of inferences drawn from lexical information (Boleda & Herbelot, Reference Boleda and Herbelot2016). For instance, while man and dude would have the same ontological representation (they both refer to male humans), they are not equivalent, as they have different connotations (Boleda & Herbelot, Reference Boleda and Herbelot2016). Different formalisms have been explored for modeling lexical meaning, employing richer data structures than lambda terms, with significant implications for theories concerning the process of semantic composition. On the composition side, a critical question emerges: whether meaning arises solely from the process of function application or from the interpretation of formulas within predicate logic. This dilemma becomes particularly pronounced when composition and interpretation do not mirror each other, giving rise to scenarios where a strong version of compositionality falls short of delivering comprehensive explanations (Martin & Baggio, Reference Martin and Baggio2020). Furthermore, it is essential to acknowledge that Formal Semantics does not encode the full spectrum of human linguistic experiences. Notably, analogical reasoning, a fundamental facet of human predictive cognition, lies beyond the scope of Formal Semantics (Boleda & Herbelot, Reference Boleda and Herbelot2016). Additionally, this formalism encounters difficulties in adequately describing numerous linguistic phenomena, including but not limited to co-compositionality (Pustejovsky, Reference Pustejovsky, Hinzen, Machery and Werning2012) and coercion.
2.2.2 Compositionality in Generative Linguistics
The principle of compositionality has been a central assumption even within mainstream generative grammar. Jackendoff (Reference Jackendoff1997, p. 48) asserted that these theories had been founded under a standard (and usually unspoken) hypothesis, which he designated as “syntactically transparent semantic composition” or Simple Composition. This concept corresponds to what Culicover and Jackendoff (Reference Culicover and Jackendoff2006) name Fregean compositionality and is grounded on the following assumptions:
1. All elements of content in the meaning of a sentence are found in the lexical conceptual structures (LCSs) of the lexical items composing the sentence.
2. The way the LCSs are combined is a function only of the way the lexical items are combined in syntactic structure (including argument structure). In particular,
the internal structure of individual LCSs plays no role in determining how the LCSs are combined;
pragmatics plays no role in determining how LCSs are combined.
Given this definition, Simple composition is governed entirely by syntactic structure, and lexical items are predominantly considered as semantically undecomposable entities, excluding any interaction between their internal structure and phrasal composition. Nevertheless, predicates of various categories are sometimes understood as having implicitly on some level more arguments than appear on the surface, that is, there are cases in which certain aspects of meaning do not seem to be represented in both its word components or its syntactic structure.
Consider, for instance, the following sentence:
(3) The journalist began the article after his coffee break.
While it may not explicitly mention what the journalist began to do, it is unlikely that English speakers would find this sentence difficult to understand, and even most would interpret it as “The journalist began to write the article after his coffee break.” This example is a classic case of logical metonymy (Pustejovsky, Reference Pustejovsky1995). Specifically, this phenomenon arises from a type clash between an event-selecting metonymic verb (e.g., begin) and an entity-denoting direct object (e.g., article), triggering the retrieval of a covert event (e.g., the act of writing). This phenomenon poses a challenge for traditional theories of compositionality (Asher, Reference Asher2015) since it is the counterproof that interpretation cannot always be solely determined by syntactic structure. This raises the question of how the covert event is accessed and which cognitive processes are involved in its retrieval.
Another case in which simple compositionality fails is the beneficiary dative construction. In a double object construction such as
(4) Bill baked Andy a cake,
the indirect object Andy is understood as coming into possession of the direct object (a cake). However, the “possession” component of meaning does not reside in the meaning of any lexical words, but in the construction itself. A Fregean compositionality requires an explicit (but hidden) representation of possession in the syntactic structure.
The same observation can be formulated for the sound–motion construction, a linguistic phenomenon that links auditory or sound-related elements with motion or movement-related concepts in language and thought (Goldberg & Jackendoff, Reference Goldberg and Jackendoff2004; Levin & Hovav, Reference Levin and Hovav1995). This construction is often used to express or convey the idea that a particular sound or noise is associated with a particular type of motion, movement, or action. Here are some example sentences that illustrate this concept:
a. The water gurgled down the stream.
b. The door creaked open.
The sentences’ meaning can be approximated to “The sound ‘gurgled’ is linked to the motion of water flowed down the stream, producing a gurgling sound,” and “The door opened, creating a creaking sound.” However, it is important to note that both “gurgle” and “creak” are verbs describing the emission of a sound, not verbs used to express motion. Consequently, within the sentences, there is no word that conveys the intended sense of the motion (Culicover & Jackendoff, Reference Culicover and Jackendoff2006). The Simple Composition paradigm could not deal with these expressions because a hidden verb, such as “go,” should be required in the syntax to fully capture the intended meaning.
The analysis of such cases raises both descriptive and theoretical problems that bear on the compositionality thesis, and they have led to a reformulation of a new hypothesis about meaning composition, that is, Enriched Composition (Jackendoff, Reference Jackendoff1997, p. 49):
1. The conceptual structure of a sentence may contain, in addition to the conceptual content of its LCSs, other material that is not expressed lexically, but that must be present in conceptual structure either (i) in order to achieve well-formedness in the composition of the LCSs into conceptual structure (coercion, to use Pustejovsky’s term) or (ii) in order to satisfy the pragmatics of the discourse or extralinguistic context.
2. The way the LCSs are combined into conceptual structure is determined in part by the syntactic arrangement of the lexical items and in part by the internal structure of the LCSs themselves (Pustejovsky’s cocomposition).
This reformulation of compositionality (Jackendoff, Reference Jackendoff1997; Pustejovsky, Reference Pustejovsky1995) introduces complex lexical entries: Entities are associated with a complex structure (e.g., Pustejovsky’s qualia structure) in the mental lexicon. Given this perspective, this linguistic phenomena can find a new reformulation. For the logical metonymy, for example, the covert event “must be present in [the] conceptual structure” (Jackendoff, Reference Jackendoff1997, p. 49). The introduction of this novel formalism carries profound implications: When lexical meanings are rich and internally developed data structures, the process of meaning composition becomes intricate and potentially defies straightforward description through function application. This realization gives rise to fundamental questions regarding the characteristics and extent of composition, including whether it involves a simple or complex function, involves single or multiple operations, functions independently or relies entirely on syntax, and more (Martin & Baggio, Reference Martin and Baggio2020).
2.2.3 Compositionality in Constructionist Approaches
It is frequently contended that constructional approaches either lack compositionality or explicitly deny semantic composition (Kay & Michaelis, Reference Kay and Michaelis2012). However, compositional operations and a construction-based formalism to syntax are not inherently contradictory, and some constructionist approaches have been developed to formally integrate them into a unique representation. Sign-Based Construction Grammar (SBCG) (Michaelis, Reference Michaelis, Hoffmann and Trousdale2013; Sag, Reference Sag, Boas and Sag2012; Sag, Boas, & Kay, Reference Sag, Boas, Kay, Boas and Sag2012) has been the one more extensively focused on formally explaining syntactic and semantic composition through construction representation (Michaelis, Reference Michaelis, Heine and Narrog2015). Sign-Based Construction Grammar proposes a highly structured and taxonomically organized lexicon, based on two fundamental units, namely signs and constructions. Signs are feature structures that specify both syntactic and semantic properties and are formally represented as attribute-value matrices (AVMs; cf. Figure 1). This representation regards each expression of a language as a sign, as words, lexemes, and even phrases (Michaelis, Reference Michaelis, Heine and Narrog2015). Conversely, constructions are described as the means to derive more complex sign descriptions from simpler ones. Specifically, they are type constraints that specify (i) the properties that define a class of constructs (i.e., feature structures equivalent to local trees with signs at the nodes) and (ii) the way to construct a mother sign from one or more daughter signs (Michaelis, Reference Michaelis, Hoffmann and Trousdale2013). Constructions are descriptions of either classes of constructs (combinatoric constructions) or of lexemes (lexical class constructions). An example of the subject–predicate construction is provided in Figure 2.
Sign-Based Construction Grammar offers a robust formalism to construction-based syntax by relying on the mechanism of unification. This operation involves matching and merging the corresponding features from each linguistic structure (signs), ensuring that the resulting representation captures the combined form and meaning of the components. These constraints ensure that the unified feature structure is well-formed and conforms to the grammatical and semantic constraints of the language. Constraints may include syntactic rules, semantic roles, selectional restrictions, and other linguistic principles (Sag, Reference Sag, Boas and Sag2012). For example, the feature structure of the construct subject–predicate is unified with those of the sign of the lexemes for the verb and the subject to create a representation of the entire fragment. Such a combination is possible because there is no conflicting attribute-value information between the two constructions (i.e., the AVMs are “unifiable”; cf. Chaves, Reference Chaves, Kertész, Moravcsik and Rákosi2019).
Therefore, while traditional syntactic approaches affirm that the interpretation of an expression is licensed by (i) a rule of syntactic composition and (ii) a rule of semantic composition (Equation (1)), SBCG proposes a unique linguistic object (a construction) that serves a similar function (Michaelis, Reference Michaelis, Sinha and Wenin press). Overall, SBCG offers a formalism for construction-based syntax that is declarative and constraint-based (Michaelis, Reference Michaelis, Hoffmann and Trousdale2013).
This approach has the advantage of incorporating the CxG organization into a formalized framework. Specifically, it provides a means to explicitly specify the relationships between the various components of a construction. However, criticism has been leveled against SBCG, suggesting a tendency to prioritize formal-syntactic aspects over semantic considerations. For instance, Sag (Reference Sag2010) proposes a comprehensive construction for filler-gap phenomena that completely lacks semantic content (defective constructions). This contrasts with a shared constructionist view (Goldberg, Reference Goldberg2006; Hilpert, Reference Hilpert2019) for which each construction, even the more abstract one, has associated a specific meaning (Ungerer & Hartmann, Reference Ungerer and Hartmann2023).
Moreover, SBCG is a process-neutral approach that makes no predictions about the actual online parsing or production of constructions (Hoffmann, Reference Hoffmann and Dancygier2017). By contrast, there are two constructionist frameworks based on the unification of AVMs developed for the computational implementation of sentence processing, namely, Fluid Construction Grammar (FCG; Steels, Reference Steels, Hoffmann and Trousdale2013, Reference Steels2017) and Embodied Construction Grammar (ECG; Bergen & Chang, Reference Bergen and Chang2005, Reference Bergen, Chang, Hoffmann and Trousdale2013). Besides their formalisms, both are specifically developed for computational implementation (Hoffmann & Trousdale, Reference Hoffmann and Trousdale2013; Ungerer & Hartmann, Reference Ungerer and Hartmann2023. In particular, FCG utilizes truth-conditional first-order predicate calculus, whereas ECG relies on mental simulation models and embodied schemas. Additionally, while FCG formalism accepts defective constructions, ECG constructions are always form-meaning pairings, though it does not deny the existence of purely form or meaning schemas (Hoffmann, Reference Hoffmann and Dancygier2017).
2.3 Distributional Approaches to Compositionality
Formal approaches assume that the meaning of words (constants that replace symbols into logical formulas) is one and just one, determined a priori, that is to say, “a lexical item must make approximately the same semantic contribution to each expression in which it occurs” (Fodor & Pylyshyn, Reference Fodor and Pylyshyn1988, p. 42). However, this assumption contrasts with the evidence that lexical meanings are context-sensitive, that is, they can “adapt” their meaning to fit a specific context and communicative situation, and generally, their use in contexts defines their semantic representation. That is to say, the distribution of the words constitutes one of the essential sources of information for accessing their meaning.
In this respect, Distributional Semantics (Boleda, Reference Boleda2020; Lenci, Reference Lenci2018; Lenci & Sahlgren, Reference Lenci and Sahlgren2023) have provided a solid alternative framework for denoting word meaning in the past decades. Posing a radically different stance, Distributional Semantics aims at representing the word meaning as the contexts in which it occurs, rising from the so-called Distributional Hypothesis of lexical meaning (Firth, Reference Firth and Firth1957; Harris, Reference Harris1954; Sahlgren, Reference Sahlgren2008). Concretely, a Distributional Semantic Model represents the lexicon in terms of a vector space, where a lexical target is described as a numeric vector (also known as embedding) built by identifying its syntactic and lexical contexts in a corpus (Lenci, Reference Lenci2018). This computational implementation makes it possible to quantify the similarity between words using algebraic formulas while allowing room for semantic changes (Perek, Reference Perek2016) and meaning shifts (Busso, Pannitto, & Lenci, Reference Busso, Pannitto and Lenci2018). Compared to formal representation, this approach has some advantages: (1) It provides a continuous representation that can easily tackle language’s gradience and fuzziness; (2) it does not assume a priori semantic primitives, that is, it is not stipulative; and (3) representations are also explainable in terms of how we can cognitively build these (it is plausible with respect to learnability, cf. Miller & Charles, Reference Miller and Charles1991).
Initial distributional approaches have been designed to represent word meaning by assigning each word to a single vector, produced as an abstraction over all its contexts of use. The logical next step involved understanding how to combine these representations to obtain vectors for phrases, sentences, and even larger pieces of text. Research in the last decade led, first and foremost, to methods integrating DSMs with formal symbolic theories of language: Semantic composition depends on an algebraic function that combines words, which are no longer described as symbolic representations but as distributional ones.
The approaches partaking this stance fall under the name of Compositional Distributional Semantics Models (CDSMs) ((Baroni, Bernardi, & Zamparelli, Reference Baroni, Bernardi and Zamparelli2014; J. Mitchell et al., Reference Mitchell, Lapata, Demberg and Keller2010) and aim to explicitly apply the principle of compositionality to compute distributional vectors for phrases. Compositional Distributional Semantics Models produce representations of phrases by composing distributional vectors of words comprised in these phrases. As in classic Distributional Semantics for words, these models generate similar vectors for semantically similar sentences, regardless of length or structure. For example, require attention and need treatment should have a similar distributional signature, and they should be dissimilar to, that is, attend a conference.
Various strategies to compose word embeddings have been suggested (cf. Table 1). In the most influential papers on the topic, J. Mitchell and Lapata (Reference Mitchell and Lapata2008, Reference Mitchell and Lapata2010) introduced several arithmetic operations for vector composition, operationalized as additive and multiplicative functions. For instance, the expression fluffy cat can be represented as
in which the meaning of the phrase is a new embedding derived from the addition of word vectors. Compositional Distributional Semantics Models are usually evaluated on a phrase similarity task: For pairs of phrases, the similarity between their respective combined vectors is computed and these scores are compared with similarity ratings elicited from English speakers.
Model | Function |
---|---|
Weighted additive | |
Multiplicative | |
Full additive | |
Lexical Function | |
Fullex |
Note: stands for pointwise multiplication. and are scalar parameters, matrices X and Y represent syntactic relation slots (e.g., Adjective and Noun), matrix A represents a functional word (e.g., the adjective in an adjective–noun construction).
More complex models characterize composition by representing lexemes and phrases with matrices and tensors rather than with vectors alone (Socher et al., Reference Socher, Perelygin and Wu2013; Zanzotto et al., Reference Zanzotto, Korkontzelos, Fallucchi and Manandhar2010). For instance, the Lexical Function model denotes predicates (verbs and adjectives specifically) as functions mapping one noun meaning to another, coherently with the Montagovian view. Concretely, predicates are matrices, their nominal arguments are represented as vectors, and their multiplication results in the phrase vector (Baroni & Zamparelli, Reference Baroni and Zamparelli2010). The Lexical Function model was one of the first attempts to represent formal semantic operations with DSMs and turned out to be very influential in the research area. However, the main limitation of this approach is the difficulty in scaling up to multi-argument sentences. Estimating the matrices and tensors for complex functional types such as transitive verbs can be very complex and may encounter data-sparseness problems. Paperno, Pham, and Baroni (Reference Paperno, Pham and Baroni2014) proposed a practical approximation of the Lexical Function model to address these limits, but it is hardly competitive with the much simpler additive models (Rimell et al., Reference Rimell, Maillard, Polajnar and Clark2016).
Some authors also exploited neural networks to learn composition functions explicitly. An example is the recursive neural network (RNN) of Socher, Manning, and Ng (Reference Socher, Manning and Ng2010), in which representations for larger chunks are computed recursively obeying a predefined syntactic parse tree of the sentence. Specifically, the neural network induces a score for each pair of neighboring words, which measures how likely these two words are to be combined into a phrase, and simultaneously, it collapses the two words into an -dimensional representation of the resulting phrase. This new phrase embedding replaces the words in the sequence and possibly becomes a child of another phrase spanning more words. This bottom-up process continues until the whole input sentence is mapped to the embedding space. An alternative approach proposed by Socher (Reference Socher, Huval, Manning and Ng2012) was the matrix-vector RNN, which consists in representing each word by a vector and a matrix encoding its interaction with the syntactic sisters. Compositional representations for phrases and sentences are learned by a RNN in a supervised setting.
Finally, an alternative approach to compositional DSMs assumes that the representation of a sentence is not a vector but rather a logical form containing distributional vectors of the content words (Asher et al., Reference Asher, Van de Cruys, Bride and Abrusán2016; Beltagy et al., Reference Beltagy, Roller, Cheng, Erk and Mooney2016; Coecke, Sadrzadeh, & Clark, Reference Coecke, Sadrzadeh and Clark2010; Garrette, Erk, & Mooney, Reference Garrette, Erk, Mooney, Bunt, Bos and Pulman2014).
Among all compositional functions proposed here, vector addition still shows remarkable performances on various tasks, such as phrase similarity or paraphrase detection (Asher et al., Reference Asher, Van de Cruys, Bride and Abrusán2016; Rimell et al., Reference Rimell, Maillard, Polajnar and Clark2016), outperforming more complex methods, such as the Lexical Function model. However, vector addition is theoretically and cognitively unsatisfactory: The meaning of a complex expression is not simply the sum of the meaning of its parts but it also depends on the syntactic content. Without being able to discriminate between the different syntactic realizations of semantic roles, sentences like
a. The dog chases the cat.
b. The cat chases the dog.
are modeled in the same way. Moreover, while vectors are suitable to capture the semantic relatedness among lexemes, this representation might not be adequate for more complex linguistic expressions because of the limited and fixed amount of information that can be encoded (Erk & Padó, Reference Erk and Padó2008).
In summary, how distributional representations can be projected from the lexical level to the sentence or discourse level poses an ongoing challenge. Currently, compositionality is still considered the real bottleneck for Distributional Semantics (Lenci, Reference Lenci2018). It is worth highlighting that all the studies discussed earlier, in one way or another, adhere to the conventional principle of Fregean compositionality: The representation of a complex unit is derived from the representation of its immediate constituents.
A final note concerns the last generation of language models (foundational models, cf. Bommasani et al., Reference Bommasani, Hudson and Adeli2021) built by relying on deep learning artificial neural networks and trained on massive amounts of text using a word-in-context prediction task. Numerous empirical studies have explored the compositional capabilities of neural models using various approaches (Gulordava et al., Reference Gulordava, Bojanowski, Grave, Linzen, Baroni, Walker, Ji and Stent2018; Lake & Baroni, Reference Lake, Baroni, Dy and Krause2018; Linzen, Dupoux, & Goldberg, Reference Linzen, Dupoux and Goldberg2016; Loula, Baroni, & Lake, Reference Loula, Baroni and Lake2018, among others). However, there is still an incomplete understanding of the strategies learned by these networks and their capacity to generalize. Hupkes et al. (Reference Hupkes, Dankers, Mul and Bruni2020) have identified five aspects of compositionality from theoretical literature (cf. Table 2) and translated them into five grounded tests for these models. This evaluation framework underscores the necessity for a more comprehensive and valid set of evaluation criteria and improved analytical tools for assessing the compositional abilities of neural networks.
Property | Test |
---|---|
Systematicity | If models systematically recombine known parts and rules |
Productivity | If models can extend their predictions beyond the length they have seen in the training data |
Substitutivity | If models’ predictions are robust to synonym substitutions |
Localism | If models’ composition operations are local or global |
Over-generalization | If models favor rules or exceptions during training |
2.4 Summary
This section has introduced the concept of compositionality from two distinct perspectives. First, compositionality has been outlined as a processing principle, where the fundamental assumptions and the supporting and opposing arguments have been presented. Additionally, the application of compositionality as a representation component within linguistic theory was discussed, examining its various formalizations in Formal Semantics, generative semantics, CxG, and computer science, along with their central assumptions and primary limitations.
Overall, the accounts supporting compositionality, primarily the generative approaches, propose a view of language that can be outlined in two primary components: (i) words, which are stored in a lexicon, and (ii) rules, which govern how words can be combined into meaningful, coherent sentences (the grammar component of language). The rules of grammar mostly obey the principle of compositionality: There is an inventory of rules that dictates the construction of syntactic representations, which are subsequently mapped to principles responsible for the composition of word meanings into more complex expressions. On the contrary, constructionist theories assume that there are no boundaries between lexicon and syntax, that is, between what is regular and what is irregular, or what is productive and what is unproductive. Constructions, as the basic units of language, are productive linguistic constructs that account for syntactic processes. In other words, CxG “handles ‘normal syntax’ in a way that necessitates a shift of perspective away from the common view of words, word classes, and phrase structure rules” (Hilpert, Reference Hilpert2019, p. 70). While formal CxGs (e.g., SBCG, FCG, ECG) employ rigorous unification-based formalism to elucidate the emergence of well-formed structures from feature matching among their constituent parts, these approaches formalize the aggregation of constructions but do not encode other mechanisms that occur concomitantly in sentence processing.
The following section will provide behavioral evidence demonstrating that several factors affect comprehension and are against a “strong” version of compositionality centered solely on syntax-semantics homomorphism.
3 Accessing Meaning Noncompositionally: Insights from Experimental Data
This section examines (some) limitations of the Principle of Compositionality in natural language, aiming to reconcile the various bodies of literature on sentence processing. The following pages are structured in two blocks.
Section 3.1 regards what is actually included in the lexicon, a crucial aspect concerning any linguistic theory. Defining which components of a sentence are stored in long-term memory and which are constructed online in working memory is crucial to understanding and delineating the mechanisms underlying language comprehension (Jackendoff, Reference Jackendoff2002, p. 152). Thereby, Section 3.1 discusses the psycholinguistic nature of multiunit sequences, encompassing both literal and figurative meanings. Idiomatic expressions have historically served as fundamental pillars within CxG approaches, which consider idioms not mere appendages to linguistic grammar but integral entities that can be productive, highly structured, and deserving of grammatical inquiry (Fillmore, Kay, & O’Connor, Reference Fillmore, Kay and O’Connor1988). Concurrently, there is a growing acceptance that multiword expressions are stored within the lexicon, particularly within the usage-based and constructionist frameworks (Abbot-Smith & Tomasello, Reference Abbot-Smith and Tomasello2006; Bybee, Reference Bybee2006; Goldberg, Reference Goldberg2006). The following pages summarize mainly the experimental data about the processing of these expressions, aiming to provide (i) a cognitive validity of the CxG assumptions, and (ii) emphasis on how this definition of the lexicon is incongruent with traditional compositional approaches illustrated in Section 2.
Conversely, Section 3.2 delineates fundamental studies that support the idea that comprehension processes are often shallow, unspecified, and driven by comprehender’s expectations. The hypothesis that the mental lexicon includes not only atomic representations but also an interconnected network of knowledge entails that there are multiple ways to determine meaning: People come to the task of interpretation with a vast amount of shared world knowledge and contextual information. Consequently, the semantic composition is constantly enriched (Jackendoff, Reference Jackendoff1997) with background knowledge and contextual constraints. Rather than undertaking a comprehensive, bottom-up analysis, interpretation may occur in a “good-enough” manner, primarily relying on expectations, potentially resulting in shallow interpretation or misinterpretation. Integrating constructional representations and principles related to this “good-enough” processing is gaining more interest and is garnering increasing attention, albeit explored in a limited number of works (Blache, Reference Blache2024).
Although the behavioral evidence presented in this section aligns with the majority of constructionist perspectives, the question remains open as to how CxG could integrate shallow processing, prediction, and background knowledge into its formalism to become a comprehensive model of language processing.
3.1 Online Processing of Multiunits Sequences
3.1.1 Idioms
Idiomatic expressions are almost universally considered a challenge for compositionality (Maienborn, von Heusinger, & Portner, Reference Maienborn, von Heusinger and Portner2011, p. 118). Indeed, an idiom is traditionally defined as a phrase whose meaning cannot be deduced from its individual components (cf. Pinker, Reference Pinker1999). Expressions like it’s raining cats and dogs, kicked the bucket, or go bananas cannot be understood by simply combining together the meaning of their constituent words; instead, their meaning must be specifically learned. For instance, there is nothing in the individual words nor the syntactic combination of “they,” “chewed,” “the,” and “fat” that could suggest that the sentence they chewed the fat means that a group of people chatted (and not that the subjects actually chewed some fat). In other words, idioms do not abide by the principle of compositionality: The meanings of the parts and the rules of composition do not suffice to explain the meaning of the whole, as specific knowledge is needed.
The processing of idioms has been extensively debated in the psycholinguistic domain, and various theories have been proposed regarding whether people process idioms compositionally or holistically, that is, whether the meaning of an idiomatic phrase is stored separately from the meaning of its individual parts and how the idiomatic meaning is assembled. According to M. Libben and Titone (Reference Libben and Titone2008), there are approximately three different types of models of idiom processing. The “noncompositional” models contend that the whole idiomatic meaning is stored as a distinct entry in the mental lexicon and is retrieved directly as a morphologically complex word, with a process autonomous from the computation of the literal meaning. Supporters of this approach, namely, the Lexical Representation hypothesis, bring the empirical evidence about a processing advantage for idioms used figuratively in both comprehension and production (Gibbs, Reference Gibbs1980; Swinney & Cutler, Reference Swinney and Cutler1979).
On the contrary, “compositional” approaches claim that the meaning of an idiomatic phrase is not stored as a separate semantic unit in the mental lexicon but is assembled “on the fly” from the meanings of its individual parts. Therefore, analyzing each idiom’s components is necessary to comprehend the idiom’s figurative interpretation. For instance, Gibbs, Nayak, & Cutting (Reference Gibbs, Nayak and Cutting1989) posit that idiomatic expressions are represented and processed in a different way depending on whether they are decomposable, that is, the meanings of idiom components are related to the overall figurative interpretation (e.g., pop the question), or not (e.g., kick the bucket; cf. the Idiom Decomposition hypothesis). According to this approach, semantically decomposable expressions can be analyzed compositionally: Each component is recovered from the mental lexicon and merged with the other components based on their syntactic relationships. Conversely, the meaning of nondecomposable idioms is directly retrieved from the lexicon. Moreover, Gibbs et al. observed that sentences incorporating decomposable idioms are read more rapidly than those containing semantically nondecomposable ones. Following the assumption that decomposable idioms are processed more akin to literal language, this finding implies that an initial attempt is made to analyze idioms compositionally, as indicated by the shorter reading times for decomposable idioms. However, the results of Tabossi, Fanari, & Wolf (Reference Tabossi, Fanari and Wolf2009) disagree with the abovementioned hypothesis. In a semantic judgment task, participants were as fast at judging nondecomposable idioms as decomposable idioms and clichés, showing an advantage over matched controls. This study suggests that the relationship between an idiom’s constituents and its overall figurative meaning does not impact its processing.
A third class regards the so-called hybrid models, which incorporate features of both noncompositional and compositional approaches. The central claim of these models is that idiomatic expressions are processed simultaneously as semantically arbitrary word sequences and compositional phrases. The Configuration Hypothesis of Cacciari and Tabossi (Reference Cacciari and Tabossi1988) is one of the most influential hybrid models. The core idea is that idiomatic phrases are processed literally, word by word, until the comprehender recognizes that the phrase they are processing is an idiom, that is, until the idiom key (or idiom recognition point) is reached. After this identification point, the figurative meaning is retrieved: The idiom is processed according to its figurative meaning, while compositional processing ends. In other words, once the individuals have enough information to realize that the unfolding sentence contains an idiom or an idiom fragment, such as Tom advised them not to put all their eggs..., they can retrieve the string from semantic memory and compare the expected constituent (in one basket) with the actual idiom string. Hence, the point at which the string is recognized as a known idiom determines how early the idiomatic meaning is activated. How many of the words composing the idiom string are literally processed before depends on different factors. Overall, this model strongly emphasizes the dimension of predictability to give access to the idiomatic configuration, independently of other variables, such as familiarity.
Following the Configuration Hypothesis, neurolinguistic studies have investigated how the processing advantage of idioms over nonidiomatic expressions is related to a kind of more direct access to their holistic structure once it is recognized as an idiom. Event-related potentials (ERPs)Footnote 2 studies have revealed that idiom processing correlates with faster reading time and larger electric signals in brain activity (Molinaro & Carreiras, Reference Molinaro and Carreiras2010). In a well-known study, Vespignani et al. (Reference Vespignani, Canal, Molinaro, Fonda and Cacciari2010) observed that the brain’s electrical response to the correct idiom constituent exhibits differences when recorded before and after the idiom recognition point (RP). For instance, in the fragment
(4) Marco piangeva sul latte versato (Mark cried over the milk spilt),
the idiomatic completion elicited smaller N400 before recognition compared to other conditions (e.g., “Marco piangeva sul letto disfatto,” Mark cried over the bed unmade). This observation can support the hypothesis that when we recognize a string of words as an idiom before the idiom ends, we develop expectations concerning the incoming idiomatic constituents. However, the electrophysiological response led to a P300 effect after the recognition of the idiom that was left intact, indicating the process for which the idiom meaning retrieved from long-term semantic memory must be integrated into the sentence representation to form a semantically coherent structure. This effect reflects a qualitative shift in readers’ expectations regarding upcoming words once the idiom has been recognized, indicating the activation of a template that matches the upcoming information (P300) and pointing to easier semantic integration (N400).
At the same time, Rommers et al. (Reference Rommers, Meyer, Praamstra and Huettig2013) have shown that, during idiomatic interpretation, the literal meaning of words is actively suppressed, replaced by global access at the idiom level. This effect has been shown at the cortical level using EEG: When introducing a semantic violation within an idiom, there is no effect in processing sentences with or without semantic violations, unlike in processing literal sentences. These findings support the hypothesis that semantic unification mechanisms (i.e., integrating a word’s meaning into a structure) are less engaged in idiom comprehension. In other words, the brain’s semantic expectancy and literal word meaning integration operations are “switched off” when the context renders them unnecessary.
While the configuration hypothesis proposes a sequential model of idiom processing, the Hybrid Model (M. Libben & Titone, Reference Titone and Connine1999; Titone & Connine, Reference Libben and Titone2008) posits that idiom comprehension involves in parallel (i) direct meaning retrieval to figurative meaning, and (ii) compositional analysis based on the literal meanings of the idiom’s constituents and syntax. The activation and use of literal or figurative meanings during comprehension is a function of the degree to which idioms are conventional or compositional: The more familiar a speaker is with an idiom, the more directly its figurative meaning can be activated and retrieved. These effects were correlated with faster and slower processing of decomposable and nondecomposable idioms, respectively. Recent studies have shown that speakers engage in a more compositional processing strategy when idioms are less frequent or familiar, for example, because they appear in a noncanonical modified form or they are being processed in a second language (Senaldi & Titone, Reference Senaldi and Titone2022; Senaldi et al., Reference Senaldi, Wei, Gullifer and Titone2022). Actually, research on idioms still investigates the various factors that can contribute to idiom processing. Among others, Cacciari, Corrardini, & Ferlazzo (Reference Cacciari, Corrardini and Ferlazzo2018) investigated to what extent individual differences in cognitive and personality variables are associated with spoken idiom comprehension in context.
To summarize, psycholinguistics works related to the Hybrid Model imply that, while direct retrieval of an idiomatic form is the preferential processing route, compositional (or combinatorial) parsing is present and can play a role in processing idiomatic expressions (e.g., as observed in bilingual adults; cf. Senaldi et al., Reference Senaldi, Wei, Gullifer and Titone2022). Notwithstanding the extensive behavioral experiments, questions on how idioms are processed, and specifically how direct access to lexical expressions and compositional parsing interact, are still open. Indeed, the complex cognitive architecture that stands behind the comprehension of idioms has yet to be singled out in full detail.
3.1.2 Multiword Expressions
A formulaic expression can be broadly defined as “a sequence, continuous or discontinuous, of words or other elements, which is, or appears to be, prefabricated: that is, stored and retrieved whole from memory at the time of use” (Wray, Reference Wray2002, p. 9). Formulaic language comprises many expressions commonly used in everyday language and familiar by definition. These expressions constitute a considerable portion of the language use of L1 speakers and are a reason for the fluency in production: They support the language interaction task by limiting the choices about what phrases to use when expressing particular meanings, what words to use in them, and in what order to use them (Kallens & Christiansen, Reference Contreras Kallens and Christiansen2022). In detail, formulaic language includes both literal compositional expressions as lexical bundles (in the meantime), verb–particle phrases (catch up), irreversible binomials (bride and groom), as well as nonliteral or figurative expressions, whose meaning is not deductible from the meaning of its components, such as the idioms already introduced.
Formulaic expressions differ from each other in several dimensions (Carrol & Conklin, Reference Carrol and Conklin2020; Kallens & Christiansen, Reference Contreras Kallens and Christiansen2022; Siyanova-Chanturia & Sidtis, Reference Siyanova-Chanturia, Sidtis, Siyanova-Chanturia and Pellicer-Sánchez2018; Titone et al., Reference Titone, Columbus, Whitford, Mercier, Libben, Heredia and Cieślicka2015). In general, some expressions are more “frozen” than others (fixedness/conventionalization), and they can allow for internal variation through open “slots” (schematicity). Another continuum regards compositionality, that is, how well an expression can be decomposed into atomic parts of meaning. Apart from idioms, these expressions largely vary in terms of their internal degree of compositionality: for instance, given the structurally similar collocations carpet sweeping and vacuum cleaning, the interpretation of what is being cleaned and who is cleaning is unlike (in the first case, it is the carpet that is cleaned by a brush, while in the second case something is cleaned by a vacuum; cf. Kallens & Christiansen, Reference Contreras Kallens and Christiansen2022). However, much research into these multiunit conventionalized sequences has mainly centered on how L1 speakers deal with figurative versus literal language or frequent versus novel linguistic information, usually reporting a processing advantage of recurrent sequences compared to novel control phrases (Siyanova-Chanturia & Sidtis, Reference Siyanova-Chanturia, Sidtis, Siyanova-Chanturia and Pellicer-Sánchez2018).
In production, Bannard and Matthews (Reference Bannard and Matthews2008) observed that children repeat frequent sequences (a drink of milk) correctly and faster than low-frequent controls (a drink of tea), even if the substring frequencies are the same. Additional studies in adult comprehension replicated the same results. Among others, Arnon and Snider (Reference Arnon and Snider2010) found that the reaction times in a phrasal decision task using four-word sequences are faster when the frequency of the whole phrase is higher: for instance, don’t have to worry is read faster than don’t have to wait. Moreover, they observed that this effect extends across the entire frequency range of the individual words or sub-strings. Similarly, Tremblay et al. (Reference Tremblay, Derwing, Libben and Westbury2011) observed that lexical bundles such as in the middle of the were read faster and recalled with higher accuracy than sequences matched for length such as in the front of the during self-paced reading experiments.
Event-related potentials and eye-tracking studies have further validated these behavioral conclusions. Tremblay and Baayen (Reference Tremblay, Baayen and Wood2010) showed that the frequency of a four-word sequence continuously modulates early N1a (a peak at frontal and central sites around 100–150 msec after stimulus onset) and P1 (earliest visual ERPs potential known to vary with spatial attention, state of arousal, lexical frequency, and probability) components usually associated with frequency effects. Similar effects were replicated in English binominals. Siyanova-Chanturia, Conklin, and Schmitt (Reference Siyanova-Chanturia, Conklin and Schmitt2011) observed that literal (at the end of the day – “in the evening”) and idiomatic (at the end of the day – “eventually”) high-frequency binomials were read faster and with fewer fixations than novel controls (at the end of the war) by both L1 and L2 speakers. In a recent study, Jiang et al. (Reference Jiang, Jiang and Siyanova-Chanturia2020) focused on understanding the phrase frequency effects in adults and L1 children’s online processing in language comprehension by employing a naturalistic reading task (choosing Chinese as the target language). Using an eye-tracking study, they collected reading times for verb–noun combinations varying phrase frequency. As in previous literature, collocations (attend the meeting) were read faster than control phrases (attend the game). In addition, age was a significant predictor of (general) reading times across the analyses and eye-tracking measures, with the youngest (Grade 3) readers being the slowest, the oldest (adult) readers being the fastest, and Grade 4 readers being in the middle.
This frequency effect is consistent across literature, and it is typically explained in terms of storage: Language users must have some stored representation of these expressions, and they are used “holistically,” even if they could be assembled by compositionally (Wray, Reference Wray2012, p. 234). Conversely, compositional phrases are represented in the same way that simple words and noncompositional phrases are: The frequency of a phrase will influence its entrenchment and future processing. The difference between higher and lower-frequency phrases has to be described as a continuum (the level of activation) and not as a dichotomy (stored versus computed). Besides, Snider and Arnon (Reference Snider, Arnon, Divjak and Gries2012) support this hypothesis: Since it is hard to empirically differentiate compositional and noncompositional phrases, theories should overcome the distinction between “stored” and “computed” forms. However, what counts as the threshold for “frequent” is still an open question.
While these experiments have focused on individual types of formulaic sequences, some works have directly compared the processing of several types of phrases with different properties. For instance, Carrol and Conklin (Reference Carrol and Conklin2020) decided to compare the reading times of three types of formulaic phrases (idioms, binomials, and collocations) relative to control phrases in an eye-tracking experiment. Results revealed a processing advantage for all three types, observing that, while overall phrase frequency contributes much of the processing advantage, different phrases do show additional effects according to the specific properties relevant to each type. With the same intent, Jolsvai, McCauley, and Christiansen (Reference Jolsvai, McCauley and Christiansen2020) observed that the meaningfulness of a word sequence was an essential factor in how it was processed in a phrasal decision task, over and above simply how frequently it occurs. These observations account for a continuum between idioms and formulaic expressions: The fact that some phrases have a literal compositional and others have a figurative noncompositional meaning does not significantly affect language processing.
3.1.3 The Implications of Processing Multiwords
Evidence for the psychological reality of multiword linguistic units has served to blur the lines between grammar and lexicon, demonstrating the storage of “compositional” phrases and their use in comprehension and production (McCauley & Christiansen, Reference McCauley and Christiansen2019). In particular, idiomatic expressions and other types of multiword expressions represent an interesting test case of how the brain and the mind handle the frequency with which we are exposed to linguistic input in the environment (Cacciari, Corrardini, & Ferlazzo, Reference Cacciari, Corrardini and Ferlazzo2018).
While these observations are problematic for traditional theories of language, usage-based constructionist perspectives consider these stored multiword sequences as essential building blocks for language learning and use (cf. Section 1.2). The main argument of CxG is that there is no boundary between the lexicon and the grammar: Language is a collection of constructions, form-meaning pairings varying in schematicity and complexity. Following this assumption, the dimension of the lexicon crosses the traditional representational boundaries: It includes not only idiosyncratic lexical items (i.e., words and idioms), but it comprises a large number of expressions, including partially lexicalized patterns as well as regular word forms (such as cats, dogs) and multiword sequences. In this respect, language could be seen as a larger store of prepackaged, or prefabricated, expressions (Bybee Reference Bybee2010), which are accessed and used to comprehend and produce novel expressions. This position is shared by some models of CxG (Croft, Reference Croft2001; Goldberg, Reference Goldberg2006), which consider syntactic productivity as the extension of learned constructions. As Section 4 will introduce, the organization and productivity of language can be explained by analogical inferences from expressions stored in long-term memory rather than by sequential compositional operations (Bybee, Reference Bybee2010; Diessel, Reference Diessel2019). However, from a cognitive-processing perspective, how much repetition is required to form a linguistic chunk has yet to be established. Some multiword phrases are stored in memory, but the factors that drive this need to be clarified. Moreover, it is still critical to identify which sentences are produced using compositional mechanisms and which are not.
The implications of processing multiword units extend beyond the realm of formulaic language processing: In language processing, there is always a balance between direct memory access and compositional parsing (Senaldi & Titone, Reference Senaldi and Titone2024). While constructionist and usage-based approaches have the merit of underlining that even structurally complex and semantically idiosyncratic units play a central role in the lexical organization and linguistic behavior, besides single words, ongoing efforts should focus on formally encoding behavioral evidence in theories and computational models of language processing.
3.2 The Predictive and Shallow Nature of Processing
3.2.1 Shallow Processing
A fundamental assumption that forms the basis of many semantic theories, particularly those supporting the principle of compositionality, is that the language processing system follows a strict and thorough syntactic algorithm to compute the representation for a given linguistic input. This tenet posits that the semantic content of words is recovered from the lexicon and subsequently combined in accordance with syntactic rules to derive the overall meaning of the sentence. However, research in psycholinguistics has provided evidence to suggest that comprehension processes frequently manifest as shallow and underspecified (Ferreira, Bailey, & Ferraro, Reference Ferreira, Bailey and Ferraro2002; Ferreira & Patson, Reference Ferreira and Patson2007; Goldberg & Ferreira, Reference Goldberg and Ferreira2022; Sanford & Sturt, Reference Sanford and Sturt2002). Numerous studies reveal that syntactic structures are not always fully analyzed and exploited to extract meaning; instead, people form representations that are only “good enough” for the communicative purpose, often employing simple heuristic procedures. As a direct consequence, this process can lead to a misinterpretation of the linguistic input.
In the domains of pragmatics and psycholinguistics, several studies have revealed the existence of the so-called semantic illusions, a phenomenon whereby people fail to recognize an inaccuracy or inconsistency in a text. The most famous example is the well-known Moses illusion (Erickson & Mattson, Reference Erickson and Mattson1981):
(5) How many animals of each sort did Moses put on the ark?
When presented with this question, subjects tend to provide the response “two” without noticing the fact that it was Noah, and not Moses, who performed the action in the biblical narrative. Comparable observations can be made to questions such as After an air-crash, where should the survivors be buried? (Barton & Sanford, Reference Barton and Sanford1993) or Can a man marry his widow’s sister? (Sanford, Reference Sanford2002). These cases of lexical misinterpretation shed light on the tendency of listeners or readers to process these sentences in a superficial and shallow manner, consequently failing to detect erroneous presuppositions.
Much of the evidence for shallow processing comes from the literature on Good-Enough Processing (Ferreira & Lowder, Reference Ferreira, Lowder and Ross2016). This approach is based on the idea that human cognitive resources are limited, and the brain optimizes comprehension by processing language just enough to achieve understanding without engaging in overly detailed or exhaustive analysis. Good-enough processing relies on heuristics and shortcuts to comprehend language rapidly, even if it leads to occasional misinterpretations or inaccuracies.
A classic example is provided by the sentence the dog was bitten by the man. People often fail to compute the correct event representation, that is, the one in which the man (and not the dog) does the biting (Ferreira, Reference Ferreira2003). Good-enough processing has mostly been investigated through the examination of garden-path sentences, such as While Mary bathed the baby played in the crib. Certainly, these sentences are particularly challenging to process: In this example, the noun “baby” is initially considered as the object of the verb, and it is only later in the sentence this interpretation is ruled out, replaced by a subject interpretation of the term “baby” (i.e., the baby is doing the playing and is not bathed by Mary). Christianson et al. (Reference Christianson, Hollingworth, Halliwell and Ferreira2001) and Ferreira, Christianson, & Hollingworth (Reference Ferreira, Christianson and Hollingworth2001) provided evidence that the correct interpretation of such utterances may not always be computed. Indeed, participants were able to correctly infer that the baby was playing in the crib. However, it was observed that they often held a confident yet incorrect belief that Mary bathed the baby. These findings emphasize that the process of garden-path reanalysis is not a binary, all-or-nothing phenomenon and suggest that the initial assignment of thematic roles for the subordinate clause verb is not invariably subject to revision (Ferreira, Christianson, & Hollingworth, Reference Ferreira, Christianson and Hollingworth2001).
An additional example documented in the literature pertains to the phenomenon of semantic attraction. Semantic attraction occurs when a particular argument violates its verb’s selectional requirements, yet comprehenders do not detect this violation due to its attraction to another noun within the same sentence:
(6) The bubblegum had been chewing by the boy.
The verb “chewing” is perceived as either syntactically or semantically anomalous. Syntactic cues suggest that the subject noun, “bubblegum,” should be the AgentFootnote 3 of the verb. However, this interpretation is semantically anomalous, as inanimate objects do not typically “chew” things. Since the noun “bubblegum” is a highly plausible candidate for the Theme role of the verb, this strong semantic attraction to the Theme interpretation may lead comprehenders to pursue it, even though it contradicts the syntactic structure of the sentence (as this interpretation should require a passive verb form). In an influential study, Kim and Osterhout (Reference Kim and Osterhout2005) recorded ERPs from participants while they read sentences like those presented in this section. The authors found that sentences with attraction violations were associated with a more prominent P600 component and showed no modulations of the N400 component when compared to passive and active control sentences.
In the aforementioned cases, people tend to depend more on local linguistic information and global background knowledge rather than compositional meanings derived from fully articulated syntactic representations (McCauley & Christiansen, Reference McCauley and Christiansen2019). The reality of shallow processing challenges the prominence of hierarchical phrase structures as well as the standard generative view that syntax and semantics are consistently and autonomously processed.
3.2.2 Prediction
A complementary perspective to shallow processing is the widely shared hypothesis that “language comprehension is predictive” (Kuperberg & Jaeger, Reference Kuperberg and Jaeger2015, p. 1). Prediction approaches assume that efficient comprehension adopts contextual constraints to anticipate or predict upcoming input, leading to facilitated processing once the expected component is encountered (Ferreira & Lowder, Reference Ferreira, Lowder and Ross2016; Hale, Reference Hale2001; Huettig, Reference Huettig2015; R. Levy, Reference Levy2008; Pickering & Gambi, Reference Pickering and Gambi2018; Pickering & Garrod, Reference Pickering and Garrod2013).
From a theoretical perspective, prediction can occur at different levels, from simple priming (word meaning opens a possibility of interpreting the event) to activation. In the latter case, prediction occurs if a comprehender activates linguistic information before processing the input that carries that information. When the prediction is successful, the subject uses the pre-activated representation when encountering the linguistic chunk: In this scenario, some processing was performed at an early stage, thus explaining why prediction facilitates comprehension. This mechanism contrasts with integration, which occurs when the comprehender combines a new processed linguistic information with the representation of the preceding context (cf. Hagoort, Baggio, & Willems, Reference Hagoort, Baggio and Willems2009). In this case, facilitation effects are not witnessed in the same way, and the processing works in a bottom-up fashion. However, it can be challenging to distinguish prediction from integration and, in particular, to find evidence compatible with prediction but not integration (Pickering & Gambi, Reference Pickering and Gambi2018). Moreover, researchers question whether the prediction mechanism is serial (it allows for the pre-activation of one highly likely candidate) or consists of the parallel pre-activation of multiple candidates (all sharing requisite semantic or orthographic features, and many options are equally likely). Admittedly, the role of prediction in language comprehension is still under debate, and the precise means by which comprehenders derive predictions still needs to be fully defined.
By exploring when information becomes available in the brain, researchers have investigated the circumstances under which people anticipate or expect upcoming input and where this predictive processing is less facilitated. Various works have demonstrated that prediction can manifest at different linguistic levels.
From a phonological perspective, DeLong, Urbach, and Kutas (Reference DeLong, Urbach and Kutas2005) recorded ERPs while participants read sentences such as
(7) The day was breezy, so the boy went outside to fly a kite/an airplane.
The authors observed an N400 effect when the sentence ended with the less predictable an airplane than the more predictable a kite. The striking finding was that this effect occurred at the determiner a or an. This result could not be explained as a result of integration but as the prediction of the word and specifically of its phonological form (i.e., that it began with a consonant).
Predictions have been observed also at the syntactic level. Among others, Staub and Clifton (Reference Staub and Clifton2006) found that people read or the subway faster after fragment (a) than after (b).
a. The team took either the train...
b. The team took the train...
The authors concluded that the conjunction either makes the subsequent chunk more predictable by ruling out an analysis in which or starts a new clause.
Indeed, there is ample evidence that contextual predictability influences lexico-semantic processing. Using the visual-world eye-tracking paradigm, Tanenhaus et al. (Reference Tanenhaus, Spivey-Knowlton, Eberhard and Sedivy1995) have shown that comprehenders actively anticipate or predict the imminent arrival of not-yet-encountered information. In this setup, subjects’ eye movements are monitored as they listen to sentences while at the same time viewing an array containing pictures of various objects (e.g., a cake, a girl, a tricycle, and a mouse). Altman and Kamide (Reference Altman and Kamide1999), in their most widely cited experiment, showed that participants directed their gaze more toward edible objects compared to inedible ones when presented with sentence fragments like The boy will eat (but not when the verb eat was replaced with other verbs, such as move).
The ongoing debate around prediction during sentence interpretation centers around whether the process is serial or parallel. A serial prediction approach implies that a single highly likely candidate is pre-activated, such as “bucket” in the phrase kick the bucket. On the other hand, a parallel approach would involve the pre-activation of multiple potential candidates when they share relevant semantic or orthographic features, and there are several equally probable options. While it is widely accepted that anticipation and prediction are integral to sentence interpretation, the specific mechanisms underlying these processes remain a subject of ongoing inquiry, together with the implementation of computational models that quantitatively define these mechanisms in relation to compositionality.
3.2.3 Background Knowledge
Lastly, psycholinguistic evidence supports the idea that knowledge of real-world events is crucial in guiding online sentence processing. For instance, the following sentence fragments
a. The doctor visits
b. The veterinarian visits
activate two different mental images and, accordingly, two different linguistic expectations. To describe this type of stored information, McRae and Matsuki (Reference McRae and Matsuki2009) have introduced the concept of Generalized Event Knowledge (henceforth, GEK). The term “generalized” is employed because it contains knowledge related to prototypical event types rather than detailed memory about specific event instances, differentiating from exemplar-based models (for more details on these approaches, see Section 4.2). Principally, GEK includes people’s knowledge of typical participants, objects, and settings for events. This generalized knowledge about events arises from “first-hand participation, watching them on television and in movies, listening to others talk about them, and reading about them” (McRae & Matsuki, Reference McRae and Matsuki2009, p. 1418).
Psycholinguistic and neurocognitive research has brought extensive evidence supporting that stored world knowledge plays a crucial role in online language production and comprehension. Lexical priming studies suggest that the processing of isolated words immediately activates knowledge of events of which the words are components. For instance, Ferretti, McRae, & Hatherell (Reference Ferretti, McRae and Hatherell2001) examined priming effects from verbs to typical agents, patients, instruments, and locations; vice versa, McRae et al. (Reference McRae, Hare, Elman and Ferretti2005) demonstrated that nouns referring to entities could prime verbs for which these nouns often serve as typical agent (waiter-serving), patients (guitar-strummed), instruments (chainsaw-cutting), and locations (cafeteria-eating). Overall, these findings on event-based priming reinforce the hypothesis that the mental lexicon is organized as an interconnected network of mutual expectations activated by the GEK (Elman, Reference Elman2009, Reference Elman, Calvo and Symons2014).
Event knowledge influences expectations of syntactic structures, as well. McRae, Spivey-Knowlton, & Tanenhaus (Reference McRae, Spivey-Knowlton and Tanenhaus1998), among others, demonstrated that the chunk The cop arrested tends to promote a transitive construction (“the cop arrested X”) over a reduced relative structure (“the cop arrested by the X”). Conversely, the fragment criminal arrested promoted a reduced relative over a main verb, and reading times revealed that expectations for the syntactic continuation are affected by the status of the grammatical subject as a typical Agent (The crook arrested by the detective was guilty) or Patient (The cop arrested someone) of the main verb. Analogous observations have emerged from ERP experiments (Metusalem et al., Reference Metusalem, Kutas and Urbach2012; Paczynski & Kuperberg, Reference Paczynski and Kuperberg2012): combinations that are more “coherent” with the event scenarios activated by the previous words result in smaller N400 amplitudes.
As a whole, these findings suggest that, during online interpretation, comprehenders tap into general knowledge regarding real-world events: Incoming linguistic input is mapped onto schemas of events, situations, or scenarios based on prior contexts or input. Therefore, the final interpretation of an utterance heavily depends on the background information. To conclude, the predictions made during language comprehension are memory-based, and one’s experience about events and their participants plays a role in generating expectations about the upcoming linguistic input, thereby minimizing the overall processing effort (Elman, Reference Elman, Calvo and Symons2014; McRae & Matsuki, Reference McRae and Matsuki2009).
3.2.4 Implementing Background Knowledge and Prediction
Nowadays, it is widely acknowledged that comprehenders integrate all accessible cues – contextual, semantic, and formal – to incrementally access pertinent prior linguistic and nonlinguistic representations required for interpretation (Goldberg & Ferreira, Reference Goldberg and Ferreira2022). This section illustrated numerous cases where interpretation is driven by noncompositional operations. These phenomena largely align with the main assumptions of CxG, which has directly investigated and formalized some of them.
First, constructionist approaches take into account three levels of semantic contribution for sentence interpretation: the construction, the context, and the rich background information (Michel, Reference Michel2023, p. 566). As introduced in the last section, words are cues that activate event knowledge (Elman, Reference Elman2009, Reference Elman2011). The idea that every lexeme is interpreted against the background of a whole network of concepts in a particular domain has been formalized in Frame Semantics (Fillmore, Reference Fillmore1982; Fillmore & Baker, Reference Fillmore, Baker, Heine and Narrog2010). Specifically, a frame is a schematic representation of an event or scenario together with the participating actors/objects/locations and their (semantic) roles. For instance, the commercial transaction frame includes a bulk of participant roles which must, at the very least, include buyer, seller, goods, and money. According to Fillmore, words and grammatical constructions are subordinate to frames, which means that the meaning associated with a particular word (or grammatical construction) cannot be understood independently of the frame with which it is associated. Following the previous example, verbs like sell and buy are associated with the commercial transaction frame, each representing a different perspective – one from the merchant’s viewpoint and the other from the customer’s. In terms of linguistic structure, frames facilitate the syntax–semantic relationship by serving as interfaces between semantic and syntactic roles, a crucial aspect in explaining noncompositional mechanisms. Nevertheless, other formalizations have been proposed to represent the meaning of constructions. For instance, Radical CxG (Croft, Reference Croft2001) advocates an exemplar semantics model of the syntax–semantics mapping, in which specific situation types are organized in a multidimensional conceptual space. Formal construction types are then said to have a frequency distribution over that conceptual space.
In addition, evidence from this section aligns with the core observations of usage-based and constructionist approaches: Speakers’ linguistic knowledge comes from linguistic experience, that is, lexicon and grammar are shaped by repeated exposure to specific utterances. Even more importantly, language structures at all levels, from morphology to syntax, emerge out of facts of actual language usage (Bybee, Reference Bybee2010), with the effect that linguistic representations are sensitive to context and statistical probabilities (Boyland, Reference Boyland and Eddington2009).
Some theories that share some of CxG’s main claims have also proposed to computationally translate these observations into quantitative and computational models. For instance, Johns and Jones (Reference Johns and Jones2015) introduced an exemplar model of sentence processing that uses the storage and retrieval of linguistic experiences as the fundamental operations. During the processing of new input, a vectorial representation of the sentence is used as a retrieval cue to activate past linguistic experiences, which are then used to make predictions about forthcoming words and to construct sentence meaning. Recently, Huettig, Audring, & Jackendoff (Reference Huettig, Audring and Jackendoff2022) developed a linguistic perspective on viewing prediction in terms of pre-activation inside the formalism of Parallel Architecture (Jackendoff, Reference Jackendoff2002, Reference Jackendoff1997), which share similar assumptions about language with the constructionist approaches (Jackendoff, Reference Jackendoff, Hoffmann and Trousdale2013). On a theoretical note, Michel (Reference Michel2023) suggested the use of Predictive Processing as a cognitive-computational paradigm for CxG. On a related note, it is worth mentioning that there is a growing body of research that addresses language comprehension and language production from a constructional point of view. However, the focus has mostly been on argument structure constructions, with few attempts on morphological constructions or information structure constructions (Hilpert, Reference Hilpert2019, p. 153).
In summary, behavioral evidence regarding language comprehension largely aligns with the fundamental tenets of CxG. Nonetheless, while primary literature in CxG has focused on formalizing linguistic representations, only recent works have endeavored to elucidate the operational mechanisms within the theoretical framework of CxG, with even fewer studies attempting computational implementations (cf. Section 5.4). The review hereby proposed suggests that there is a need for further efforts to elucidate the interplay among diverse sources of information and to delineate the fundamental mechanisms to (i) access to the meaning of constructions and (ii) combine constructions and other sources of information into the final interpretation.
3.3 Summary
All the experimental literature summarized in this section poses against a serial account of sentence processing where syntax proposes a structural interpretation for semantics to cash out subsequently.
First, the evidence for the psychological reality of multiword linguistic units points to a linguistic model in which there is no clear boundary between expressions stored in our memory and expressions generated by compositional mechanisms, as claimed by constructionist approaches: The large number of facilitation effects observed in language processing make us doubt a clear way to distinguish different processes for idiomatic, formulaic, and novel expressions. While this is problematic for a strong view of compositionality, the notion of constructions and the importance of frequency in modeling semantic memory implies a redefinition of what linguistic units are combined and how they are combined to provide the final representation of a sentence.
Secondly, language processing can be seen as a process predominantly driven by sequence matching and pattern identification and incorporates probabilistic cues, including, importantly, the frequency and predictability of the sentence and its component. Compositionality, devised as a mechanism to aggregate meaning, should be redefined as a complex mechanism able to integrate a network of activated linguistic and contextual information.
Altogether, a linguistic theory of language processing should provide a formalization that (i) integrates formal, semantic, and contextual knowledge, and (ii) implements the predictive nature of comprehension and quantitatively models the basic processing of the good-enough approach, that is, the “interpret whenever possible” principle. Among others, Blache (Reference Blache, Sharp, Sèdes and Lubaszewski2017) proposed that, instead of building a syntactic structure serving as support of the comprehension of a sentence, the processing mechanism is delayed until enough information becomes available (i.e., the density of information – or the cohesion – reaches a certain threshold). This general parsing mechanism offers the possibility to integrate different sources of information when they become available by delaying the evaluation, waiting until a certain threshold of cohesion can be identified.
4 Explaining Productivity through Analogy
So far, we have discussed various cases in which the principle of compositionality is suppressed in place of noncompositional access to meaning. While the focus has primarily centered on the cognitive processing of highly frequent and predictable linguistic expressions, an equally fundamental aspect to consider pertains to the generation of novel constructs. Indeed, any theory, even the most hostile to compositional processes governed by online rules, must necessarily account for the production and comprehension of sentences conveying novel events and concepts. Consequently, notwithstanding the assumption that the semantics of multiword sequences can be held in our long-term memory, thereby facilitating predictive comprehension, it remains necessary to elucidate the mechanisms that enable the generation of entirely novel, never-encountered-before linguistic structures.
The generative tradition has claimed that linguistic knowledge constitutes a separate cognitive faculty, informationally encapsulated and structured according to its specific principles (Hauser, Chomsky, & Fitch, Reference Hauser, Chomsky and Fitch2002). Within this framework, compositionality is regarded as an innate constraint of this faculty of language, governing the meaning-determining operations (Del Pinal, Reference Del Pinal2015). However, the usage-based constructionist perspectives challenge this idea, claiming that language is no different from any other cognitive domain. Linguistic structures do not result from a specific language function but rather can be explicable as the manifestation of domain-general processes, including categorization, chunking, rich memory storage, crossmodal association, and analogy (Bybee Reference Bybee2010, p. 7). Consequently, these paradigms advocate an alternative approach to address creativity and productivity in language (cf. Section 1.3), rooted in a specific domain-general process, namely, analogy (Bybee, Reference Bybee2010).
Analogical reasoning is recognized as a potent cognitive device that allows one to discover similarities, formulate conceptual categories, and extrapolate them to novel categorical domains (Behrens, Reference Behrens, Hundt, Mollin and Pfenninger2017). In its broadest sense, analogy denotes the ability to think about relational patterns (Holyoak, Gentner, & Kokinov, Reference Holyoak, Gentner, Kokinov, Holyoak, Gentner and Kokinov2001, p. 2). The concept of analogy holds a central position in contemporary cognitive science and is considered a fundamental mechanism in human cognition. Notably, Hofstadter emphasizes the cardinal role of analogy by likening it to the “motor of the car of thought” and “the interstate freeway of cognition,” designating it as “the core of human cognition” (Hofstadter, Reference Hofstadter2009).
This section will introduce the concept of analogy as a cognitive process employed within the domain of cognitive science. The specific definitions and the explanation of its main characteristics are essential to understanding how to insert analogical operations in a model of language processing coherent with cognitive observation. Subsequently, it will introduce how usage-based constructionist approaches have proposed analogy as a mechanism of language, with a specific focus on the role of analogy in language productivity. Additionally, there will be summarized how analogy was computationally implemented in cognitive and distributional models.
4.1 Analogical Reasoning and Its Role in Cognition
Analogy is a domain-general cognitive process that enables a structure mapping between two situations or objects (Gentner, Reference Gentner1983). In the most typical case, a familiar concrete domain, referred to as the base or source, functions as a template by which one can understand and draw new inferences about a less familiar or abstract domain, namely the target (Gentner & Smith, Reference Gentner and Smith2013).
Analogical thinking is pervasive in human thought and speech. People draw on experiential analogies to form mental models of phenomena in the world every day. One example is the often-cited “Rutherford analogy”: A sentence like The atom is like the solar system is acceptable because some aspects of the structure of the atom (notably the fact that electrons orbit the nucleus) can be understood from prior knowledge of the structure of the solar system (i.e., planets orbit the sun).
Over the last three decades, Gentner and colleagues have conducted an extensive investigation into analogy, developing one of the most influential frameworks in cognitive research. The Structure-Mapping Theory of analogy (Gentner, Reference Gentner1983, Reference Gentner1988; Gentner & Markman, Reference Gentner and Markman1997) delineates the set of implicit constraints by which people interpret analogy and similarity. At its core, the theory posits that analogy is characterized by mapping relations between objects, rather than attributes of objects, from base to target. Accordingly, the theory assesses analogies on purely structural grounds, as defined by Falkenhainer, Forbus, and Gentner (Reference Falkenhainer, Forbus and Gentner1989, p. 3): “This structural view of analogy is based on the intuition that analogies are about relations, rather than simple features. No matter what kind of knowledge (causal models, plans, stories, etc.), it is the structural properties (i.e., the interrelationships between the facts) that determine the content of an analogy.”
Going back to the Rutherford analogy, the sentence The atom is like the solar system is interpretable as “The electron revolves around the nucleus, just as the planets revolve around the sun.” However, the atom and the sun do not share the same features; that is, the analogy does not imply that “The nucleus is yellow, massive, etc., like the sun”). If that were the case, we would have a literal similarity statement instead, in which a large number of both object attributes and relational predicates are mapped from base to target relative to the number (Gentner, Reference Gentner1983). In other words, the two situations are analogous because they share the complex relationship known as “to revolve around”; however, the target object does not have to resemble its corresponding base.
Analogical processes, hence, rely on a structure-mapping engine (SME) that identifies relations between representations rather than the mere similarity between their attributes. In this sense, mapping of one entity to another depends on the “syntactic properties of the knowledge representation [describing the entities], and not on the specific content of the domains” (Gentner, Reference Gentner1983, p. 1). To clarify, this process is not triggered simply by surface similarity but requires a great deal of relational or structural alignment knowledge:
Analogy occurs when comparisons exhibit a high degree of relational similarity with very little attribute similarity. As the amount of attribute similarity increases, the comparison shifts toward literal similarity.
Figure 3 provides a concrete illustration of these differences. On the one side, (perceptual) similarity occurs when an observer perceives the resemblance between two objects, such as the medium-sized triangle in (b) and the smallest triangle in (a), which are identical in size. In this case, the two objects match because they share identical perceptible attributes. In contrast, the analogy between the smallest triangle in (a) and the small one in (b) is based on a relational resemblance: While these objects differ in size, they both hold the distinction of being the smallest within their respective sets. However, it is crucial to bear in mind that the demarcation between analogy and similarity is not a strict dichotomy; rather, it exists along a continuum.
Since the late 1980s, cognitive works have converged to delineate the general characteristics of analogical thinking across domains. Most theorists agree that analogies can be decomposed into several basic component processes. Specifically, Gentner and Smith (Reference Gentner and Smith2013, p. 670) identify three key stages involved in analogical reasoning.
Retrieval: In this stage, the individual retrieves information from their long-term memory (base) based on a current topic or situation they are dealing with in the working memory (target). The goal is to find a prior analogous situation or case from their memory that is similar in some way to the current situation. This retrieval of past experiences or knowledge is essential for drawing parallels and making analogies.
Mapping: Once a relevant analogous situation has been retrieved, the mapping stage involves aligning and comparing the representations of the base and the target. This alignment process helps identify similarities, differences, and relationships between the elements or components of the two cases. It also allows for projecting inferences from one situation to the other. This mapping process is systematic and structure-consistent, meaning it considers the larger relational systems within the cases.
Evaluation: After the analogical mapping is complete, the individual evaluates the analogy and its associated inferences. This evaluation can involve assessing the validity and relevance of the analogy to the current situation. It also includes judging the quality and reliability of the inferences drawn from the analogy. Effective evaluation is crucial for making informed decisions or solving problems based on the analogical reasoning process.
Holyoak (Reference Holyoak2012) offers an accurate summary of the strategies underlying analogical thinking (p. 10):
In a typical reasoning scenario, one or more relevant analogs stored in long-term memory must be accessed. A familiar analog must be mapped to the target analog to identify systematic correspondences between the two, thereby aligning the corresponding parts of each analog. The resulting mapping allows analogical inferences to be made about the target analog, thus creating new knowledge to fill gaps in understanding. These inferences need to be evaluated and possibly adapted to fit the unique requirements of the target. Finally, in the aftermath of analogical reasoning, learning can result in the generation of new categories and schemas, the addition of new instances to memory, and new understandings of old instances and schemas that allow them to be accessed better in the future.
To summarize, analogical reasoning is not just comparing two analogs based on the similarities we perceive. Instead, it is a complex process of retrieving structured knowledge from long-term memory, representing and manipulating role-filler bindings in working memory, generating new inferences, and finding structured intersections between analogs to form new abstract schemata (Holyoak, Reference Holyoak2012). Besides, an aspect that is generally highlighted is that analogy is an active process that can shape our perception, and it is a central component to learning and transfer.
According to Gentner and Smith (Reference Gentner, Smith and Ramachandran2012), analogical processes can augment and extend knowledge in four ways. In inference projection, spontaneous candidate inferences are made from a well-structured representation to one that is not entirely complete. Schema abstraction, instead, is abstraction of shared relational structure across different exemplars. This structure may be stored in memory as an abstraction and used again for later exemplars. In difference detection, the structural alignment process highlights alignable discrepancies between analogs and makes them more salient. Finally, new representations could be made by relying on re-representation: Even when two potential analogs have nonidentical conceptual relations, they may still be analogous if altering one or both analog representations improves the relational match.
In that sense, analogical reasoning leads to learning in terms of categorization, abstraction, and category extension: “The cognizer needs the ability to compare two structures and notice discrepancies as well as similarity or overlap. When source and target (the new item) match in the relevant respects, the target is categorized as an item belonging to the source category” (Langacker, Reference Langacker1999, p. 4). Gentner, Holyoak, Hofstadter, and numerous other scholars have conducted extensive research on the role of analogy in the process of learning, substantiating their investigations with insights from psychology and, more recently, neurology.
In brief, analogy can be defined as “an inductive mechanism based on structured comparisons of mental representations” (Holyoak, Reference Holyoak2012, p. 1). More broadly, analogical thinking plays a crucial role in creative discovery, problem-solving, categorization, learning, and knowledge transfer. A fundamental aspect of research is its interdisciplinary nature, allowing multiple fields to contribute collectively to our comprehension of cognitive processes. Psychological experiments, naturalistic observations, linguistic analyses, and computer simulations offer diverse perspectives on analogy. For the present discussion, the following section provides an overview of how the analogical mechanism has been proposed as an explanation of language production in linguistic theory.
4.2 Analogy in Language Use
Considerable evidence from cognitive psychology underscores the pivotal role of analogical reasoning as a foundational element of human cognition. Consequently, it is not surprising that analogy has been recognized as a core component of linguistic competence from the earliest times (Blevins & Blevins, Reference Blevins, Blevins, Blevins and Blevins2009). In linguistics, the mechanism of analogy has been conceptualized as a principle governing regularities in language, mostly behind morphological regularization (Anttila, Reference Anttila1977; Hock, Reference Hock, Brain and Richard2003, among others). However, this process has gained renewed attention within usage-based and constructionist approaches, which regard analogy as a fundamental mechanism behind language development and productivity (cf. Behrens, Reference Behrens, Hundt, Mollin and Pfenninger2017 for a systematic review).
As already mentioned, analogical reasoning involves a structural mapping process, wherein parallel structures are aligned to draw inferences about the less familiar structures based on knowledge derived from a more familiar counterpart. Due to the intrinsic relational nature of language, it is possible to align and map identical or similar linguistic structures to which speakers have been exposed (Holyoak, Reference Holyoak2012). Consequently, introducing a novel element within a linguistic construction requires a great deal of relational or structural alignment knowledge and substantial similarity to existing elements, thereby departing from a strong rule-governed view of productivity (cf. Table 3). The commonalities of the two structures could also generate an abstraction (or generalization) of these patterns. The following pages explore previous approaches to analogy-based productivity.
Analogy | Rules | |
---|---|---|
Pattern usage | Relies on stored constructions or lexical items | Abstract and independent of specific instances |
Productivity | Influenced by the number of participant items they can apply to | Determined by “default” status (i.e., used in typical situations) |
Productivity seen as gradient | Rules viewed as either productive or unproductive | |
Relation to existing types | Highly influenced by existing types | Applies to entire categories without consideration for their individual items |
Probabilistic nature | Probabilistic – individual types may vary in their closeness to the best exemplars of a category | Discrete – conforms to a rule or does not |
Relation to meaning | Constructions relate meaning to form and are grounded in linguistic and extralinguistic contexts | Typically viewed as purely syntactic, no inherent connection to meaning |
According to the constructionist view, language is a large repository of constructions with several levels of abstraction, from specific expressions to general patterns (Dąrowska, Reference Dąbrowska2017). However, novel utterances also occur. The question is then to understand what role our ample stored knowledge has in language productivity (and creativity, if possible). The argument proposed is that, even when we create novel expressions, people exploit the similarity with already encountered, stored sequences: Some new expressions are “more similar” to existing prefabs, and some are more remote to them. The term analogy is then used to identify the mechanism that, given a new sequence, determines the pattern that best serves as a foundation on which a speaker might articulate new linguistic forms (Ambridge, Reference Ambridge2020a; Bybee, Reference Bybee2010; Diessel, Reference Diessel2019). According to this view, the organization and productivity of language are the result of analogies between form and/or meaning in a structured inventory of constructions (Ibbotson, Reference Ibbotson2013), or, in other words:
Analogy is the process by which novel utterances are created based on previously experienced utterances.
The notion of analogy as a mechanism driving productivity has been explored in various linguistic domains.
Several examples are found in morphology, where substantial evidence suggests that new formations consistently rely on similarity to existing exemplars (Bybee, Reference Bybee2010). Among others, a great deal of research has focused on an apparent simple morphological phenomenon: English past-tense marking. The general idea of these works is that speakers do internalize rules, but these rules are few and cover only regular processes; the remaining patterns are attributed to analogy (Pinker & Prince, Reference Pinker and Prince1988). A series of studies involving acceptability judgments and production, conducted with both adults (Albright & Hayes, Reference Albright and Hayes2003) and children (Ambridge, Reference Ambridge2010; Blything, Ambridge, & Lieven, Reference Blything, Ambridge and Lieven2018), revealed that both the acceptability and the likelihood of producing “regular” past-tense forms for a novel verb (e.g., wiss, bredged, chooled, daped) are determined by the phonological similarity of the verb to existing stored “regular” past-tense forms (e.g., wissed being similar to missed, hissed, and wished; cf. Ambridge, Reference Ambridge2020a, p. 520). The same pattern holds for “irregular” forms (e.g., flept being similar to slept, wept, and crept), as it was also observed by Bybee and Moder (Reference Bybee and Moder1983).
In word formation, analogy is described as the process of creating a new word, patterned after an existing one, such as the formation of software after hardware (“surface analogy”; cf. Mattiello, Reference Mattiello2017). Of the various word formation processes, compound words represent one of the most studied phenomena, as the productivity of compounding is considerably greater than other word formation processes. Recently, Mattiello (Reference Mattiello2016, Reference Mattiello2017) studied novel analogical compounds in English, where words are formed either through a specific model (e.g., beefcake after cheesecake) or a schema model (e.g., green-collar after white-collar, blue-collar, pink-collar, and other similar compounds).
There is also a large amount of literature in psycholinguistics that has investigated the production, representation, and processing of compounds by means of analogical inferences (cf. Krott, Reference Krott, James and Blevins2009 for an introduction). For instance, Coolen, Van Jaarsveld, and Schreuder (Reference Coolen, Van Jaarsveld and Schreuder1991) suggest that “the interpretability of isolated novel compounds may be determined by the availability of lexicalized compounds that can serve as a model for the interpretation” and the “[r]elations within these lexicalized compounds may be among the first ones that are considered in the interpretation process” (p. 350). Research by Christine Gagné and colleagues (see, for example, Gagné, Reference Gagné2001; Gagné & Shoben, Reference Gagné and Shoben1997) highlights that the frequency of how compound constituents are used in existing compounds with similar interpretations affects how a new compound is interpreted. For example, when presented with the novel compound honey soup (interpreted as “a soup made of honey”), people read it faster if it is preceded by a compound with the same relation, like honey muffin (“a muffin made of honey”) compared to when it is preceded by a compound with a different relation, such as honey insect (“a honey made by insect(s)”). Notably, this phenomenon occurs only when the prime and target compounds share a common modifier (Gagné, Reference Gagné2001).
Behavioral evidence points toward a model of compounding that is far from classic rule-based morphology: “New compound words rarely are formed de novo from two independent words. Rather, they are created through a process of partial analogy in which one element of an existing compound is exchanged” (G. Libben, Reference Libben2014, p. 15). Overall, the compounds seem to invite individuals to search their memories for experiences with exemplars of the referents being identified by a compound and to create a new, ad hoc, category for it on the spot based on the examples found in memory (Chandler, Reference Chandler2017).
At the construction level, some works in CxG have presented evidence about the importance of prior constructions in producing novel combinations; that is, the meaning of a new sequence can derive from an extension of the meaning of a learned construction (Diessel, Reference Diessel2019; Goldberg, Reference Goldberg2019; Hilpert, Reference Hilpert2019). For instance, Boas (Reference Boas2003, pp. 260–284) usees the term “analogical creativity” to argue that the creative use of verbs in novel syntactic contexts could be explained by “item-based analogy,” driven by local similarities between particular verbs. For instance, given the sentence “She sneezed the napkin off the table,” a speaker can associate the resultative meaning stemming from the conventionalized [NP V NP XP] syntactic frame of blow (the source) with sneeze (the target). This position is in line with Goldberg’s view on the productivity of construction, whose focus includes the impact of type frequency and the coherence of a constructional schema and others (Goldberg, Reference Goldberg2019). Let us consider another example. The meaning of the Ditransitive construction is closely connected with “transfer of possession” as in John gave Mary a goat. Metaphorical extensions of this pattern, such as John gave the goat a kiss or even Cry me a river, are understood by analogy to the core meaning of the construction from which they were extended, which in the case of the ditransitive is something like “X causes Y to receive Z” (Goldberg, Reference Goldberg2006). Goldberg (Reference Goldberg2019, pp. 63–64) directly refers to the theory proposed by Gentner and colleagues: “The formal surface regularities of constructions invite learners to seek other types of regularities across exemplars, through a process of structural alignment, which we recall involves relating two (or more) distinct relational structures.”
In addition, Goldberg stated that by aligning the abstract relational structure of, for instance, I love you and You want a cookie, the shared relational structure, “animate entity experiences attitude toward something,” becomes more salient, and also the individual differences stand out (e.g., a pronoun object versus a lexical noun phrase object; (Goldberg, Reference Goldberg2019, p. 64).
To summarize, the usage-based constructionist approach relies on the fact that learners attend to and retain aspects of both the form and interpretation of utterances. This assumption leads to clustering the instances of constructions in the hyper-dimensional space we use to represent language so that more general constructions can emerge. In other words, the process of aligning exemplars relies on both formal properties and the meaning of the exemplars.
A final consideration that must be accounted for regards the risk of approaching language productivity as only and exclusively determined by stored exemplars and analogy. Indeed, the notion that we store direct linguistic experience and use it to understand a novel expression is comparable to exemplar models of language. Exemplar-based approaches provide both a model of how language is represented and how learning and using language takes place.
Combining exemplar models with the mechanism of analogy as the driving process of productivity can lead, in its most extreme version, to a complete repudiation of any form of abstraction, considering concrete (i.e., experienced) exemplars as the only stored elements. According to Ambridge (Reference Ambridge2020a), forms of which one has never had direct experience are produced and understood through on-the-fly analogical processes with respect to multiple stored exemplars and weighed according to their degree of similarity to the new instance, without reference to any kind of abstraction (such as the concepts of [VERB] [NAME], [SUBJECT], etc.). Chandler (Reference Chandler2017) has supported a similar position (p. 81):
[O]ur knowledge of linguistic categories, and perhaps of language more generally, does not consist of resident linguistic generalizations, a grammar, that have been abstracted away from our experiences with exemplars of linguistic usage. Instead, the phenomena of categories and of categorization appear to be better explained by positing a mechanism and a set of procedures by which we compare current instances of linguistic usage systematically to memories for previous instances of similar usages in order to arrive at a formulation or interpretation of the new instance on the fly.
However, the ensuing discussion (see Ambridge, Reference Ambridge2020b) showed the profound difficulties with such a view; namely, abstraction is necessary for the psychologically realistic storage of linguistic experiences. The version proposed by Goldberg in the CxG framework is more realistic than a radical exemplar-based view of language. While we can affirm that we memorize more linguistic units of language, human brain architecture is shaped to generalize from single items of experience: “language processing requires that our brains recode and compress incoming information. Thus memory traces of experiences, no matter how vivid, are partially abstracted from our experience” (Goldberg, Reference Goldberg2019, p. 16). In that sense, Ambridge’s radically exemplar model is not coherent with our cognitive architecture.
Adapted from Kaplan (Reference Kaplan2017)
Concrete Instances, Not Abstract Concepts Linguistic knowledge is not founded on abstract generalizations; rather, it is rooted in a multitude of specific linguistic encounters or exemplars.
Exemplars as Structured Entities Exemplars consist of rich linguistics and extralinguistic information recorded from experience.
Emergence of Grammar from Exemplar Clusters Categories and grammatical units can emerge from the experience recorded in memory. Exemplars are categorized by similarity to one another, showing prototype effects; generalizations about words and grammatical categories thus arise from the central tendencies within the clusters of exemplars associated with them.
4.3 Analogy in Computational Models of Language
Several approaches within the domain of artificial intelligence (AI) have been proposed to replicate abstraction and analogy-making in computational systems. These strategies range from earlier symbolic or hybrid approaches, like Gentner et al.’s structure-mapping approach and the “active symbol” approach of Hofstadter and colleagues (Hofstadter, Reference Hofstadter and Hofstadter1985; Hofstadter & Mitchell, Reference Hofstadter, Mitchell, Holyoak and Barnden1994), to recent techniques employing deep neural networks and probabilistic program induction (see M. Mitchell, Reference Mitchell2021 for a complete review). Among others, structure mapping theory has been translated into computational form through the Structure Mapping Engine (SME) (Falkenhainer, Forbus, & Gentner, Reference Falkenhainer, Forbus and Gentner1989; Forbus et al., Reference Forbus, Ferguson, Lovett and Gentner2017). Structure Mapping Engine’s input consists of descriptions of two entities or situations, a base and a target, each consisting of a set of logical propositions. This model primarily focuses on the mapping process of analogy-making; thus, the situations to be mapped have already been represented in a logical form. Structure Mapping Engine provides a domain-independent explanation of analogy-making, concentrating on mapping the structure or syntax of its input representations rather than delving into domain-specific semantics. The challenge arises from the fact that human mental representations of real-world situations (including linguistic knowledge) are typically not as rigidly segmented as required by this architecture.
In the linguistic domain, at least three exemplar-based computational models have been proposed to replicate analogy in the realm of phonological, morphological, and lexical usage: Nosofsky’s Generalized Context Model (Nosofsky, Reference Nosofsky1990), Daelemans’s Tilburg Memory Based Learning Model (Daelemans & Van Den Bosch, Reference Daelemans and Van Den Bosch2010), and Skousen’s Analogical Model (Skousen, Reference Skousen1989). These models share the assumption that we recognize and interpret the significance of present experiences by directly comparing them with memories of past experiences. This approach involves specific memories rather than schematized representations abstracted from those collective experiences. Consequently, each of the three exemplar-based models suggests a continuous accumulation of rich sensory memories over an individual’s life, along with a procedure for comparing the sensory input of a current experience with the stored representations of one or more of those earlier experiences (Chandler, Reference Chandler2017).
Analogical Modeling (AM) (Skousen, Reference Skousen1989, Reference Skousen1992) seems to be fully compatible with our present comprehension of the psychological abilities and functions governing categorization behavior (Chandler, Reference Chandler2017). This model is based on the idea that past linguistic experiences are stored within the mental lexicon. When the need arises to analyze certain linguistic behaviors (such as pronunciation, morphological relationships, words, etc.), the lexicon itself is accessed. The process involves searching for the stored exemplars that closely resemble the one whose behavior is being predicted. Typically, the behavior of highly similar stored entities predicts the behavior of the one in question, although less similar ones also have a small probability of being applicable.
In detail, AM comprises three main components: (i) the dataset, which consists of exemplars accumulated in long-term memory and used as a basis for performing analogical operations on a current target form; (ii) the core of the AM, an algorithm designed to select from the dataset the exemplar(s) serving as the basis for analogically interpreting the target form; and (iii) decision rule(s) used to choose one or more forms in the analogical set, determining the basis for the analogical interpretation of the target form. In the process of classifying a target form, AM takes into account all exemplars that share certain features with the target (supracontext). However, only those that do not add uncertainty to the classification (homogeneity) are chosen for the final list, namely, the analogical set. Ambridge (Reference Ambridge2020a, p. 521) provides an example of how AM works:
For example, if the target is the novel verb chool (from Albright & Hayes, Reference Albright and Hayes2003), the analogical set contains choose ( chose) and chew (chewed), which narrow down the choice of classification (to either choolchole or choolchooled). It does not include, for examples, cheat, check, cheer, poop, puke or boot because, although each shares one or more feature with the target, they serve only to increase uncertainty regarding classification.
While these models have allowed linguists to test the claims and implications of the exemplar approach explicitly against both observationally and experimentally obtained data, none of the three exemplar models has yet been applied to the incremental syntactic interpretation of word strings. Recently, Chandler (Reference Chandler2020), in its commentary to Ambridge (Reference Ambridge2020a), illustrated how to apply AM incrementally to the ambiguous sentence such as They fed her dog biscuits, arguing that a similar computational model could test Ambridge’s hypothesis empirically (p. 571). Within the field of language acquisition, Bod (Reference Bod2009) demonstrated how a computational learning algorithm is able to employ structural analogy in a probabilistic way. This process mimicked children’s language development, going from item-based constructions to abstract constructions, even simulating some errors witnessed in children producing complex questions.
A final consideration regards how the mechanism of analogy can be modeled in Distributional Models (introduced in Section 2.3). Word analogies have been used as a standard intrinsic evaluation task for measuring the quality of word (O. Levy & Goldberg, Reference Levy and Goldberg2014; Linzen, Dupoux, & Goldberg, Reference Linzen, Dupoux and Goldberg2016; Mikolov, Yih, & Zweig, Reference Mikolov, Yih and Zweig2013) and sentence embeddings (Ushio et al., Reference Ushio, Espinosa Anke, Schockaert and Camacho-Collados2021; Wang, Daille, & Hathout, Reference Wang, Daille and Hathout2021; Zhu & de Melo, Reference Zhu and de Melo2020). However, the task is usually defined as a candidate retrieval: Given a pair of words (Tokyo, Paris) and a third one (Paris), the goal is to identify the underlying relation behind the first pair (IS THE CAPITAL OF) and find the correct completion from a list of candidates to solve the analogy (France, in this case). However, a different perspective has been brought from works in visual analogy and deep learning techniques in the last few years. Researchers in computer vision (Ichien et al., Reference Ichien, Liu and Fu2021; Reed et al., Reference Reed, Zhang, Zhang, Lee, Cortes, Lawrence, Lee, Sugiyama and Garnett2015; Sadeghi, Zitnick, & Farhadi, Reference Sadeghi, Zitnick, Farhadi, Cortes, Lawrence, Lee, Sugiyama and Garnett2015; Upchurch, Snavely, & Bala, Reference Upchurch, Snavely and Bala2016) have built deep-learning neural network architectures that organize visual representations in the same way as distributional semantic vector spaces organize linguistic data. These works consisted of recognizing a visual relationship between two images and generating a transformed query image accordingly. Among others, Reed et al. (Reference Reed, Zhang, Zhang, Lee, Cortes, Lawrence, Lee, Sugiyama and Garnett2015) developed a novel deep network that was trained to perform visual analogies, transforming an image in the same way shown by a pair of example images. For instance, given a 3D image of a car in a frontal pose and its left-rotated version, the network should replicate the same rotation for another object, for example, a truck. What is relevant for language is that this model is directly trained on the objective of analogy completion, that is, it generates an appropriate image to make a valid analogy. Recently, Rambelli et al. (Reference Rambelli, Chersoni, Blache and Lenci2022) proposed a neural network simulating the construction of phrasal embedding as an analogical process by taking inspiration from word embeddings and computer vision techniques. The authors proposed an analogical model to create the distributional embeddings of new expressions by applying a variant of Reed et al.’s network and evaluated different architectures in terms of generalization and systematicity.
To conclude, current computational models of analogical inference in language are still rather rudimentary, and we are nowhere near possessing a model that captures not only the statistical abilities of speakers but also their preferences and limitations.
4.4 Summary
The present section introduced the cognitive mechanism of analogy in the debate on mechanisms of language processing. First, it delineated the main characteristics of analogy. While a comprehensive review of analogy in cognitive science might appear excessive, the primary objective is to inform the reader about the true nature of analogy, and the specific characteristics of this cognitive process. Indeed, a thorough description of the mechanisms of analogy could be beneficial for those who aim to incorporate analogy into a linguistic model.
Subsequently, it illustrated the usage-based rationales that support the proposition that analogical processing constitutes the basis of the human ability to create novel utterances. Specifically, the main assumption is that the interpretation of a novel linguistic expression can be derived from stored word sequences, which function as analogical bases. While this perspective has been successfully adopted in recent studies of language acquisition, we agree with Bybee’s thesis that analogy could also be applied in adult production to account for novel utterances. The notion of analogy as a mechanism behind language processing has profound implications for linguistic theory, as it attenuates the role of compositionality. For instance, a transparent sentence like reading a papyrus could be understood in analogy with stored the frequent expression reading a book (Rambelli et al., Reference Rambelli, Chersoni, Blache and Lenci2022) instead of assembling the meaning of the lexemes (which is, however, still possible). Consequently, examples of language productivity could be explained by analogical inferences rather than by sequential compositional operations. The question of which mechanism (combinatorial or analogical) takes place in a given moment is an open question the author is still interested in exploring.
The section also included a review of analogical models to underscore the necessity of implementing a computational model designed to process sentence-level constructions, raising the question of how to incorporate analogical mechanisms into this architectural framework.
5 Rethinking Compositionality: A Constructionist Perspective
As we discussed at the beginning of this Element, compositionality is perhaps one of the most potent and well-defended tenets in theoretical linguistics, and for good reason. Indeed, compositionality explains why it is possible to easily comprehend the meaning of a new sentence. Generative tradition has operationalized this concept starting from the assumption that sentences are generated by syntax; thus, semantic composition must follow syntactic composition in every step, from combining individual words into phrases to combining phrases into a sentence (Dowty, Reference Dowty, Barker and Jacobson2007; Szabó, Reference Szabó, Hinzen, Werning and Machery2012). In that sense, the interpretation of a sentence depends on the hierarchical syntactic structure alone, and representations formed during this process are accurate, precise, and detailed (cf. Section 2).
Conversely, usage-based constructionist approaches refuse the strict view of compositionality (or simple composition, cf. Culicover & Jackendoff, Reference Culicover and Jackendoff2006; Jackendoff, Reference Jackendoff1997), focusing more on (i) the cognitive abilities involved in compositionality and (ii) the idea that compositionality is just one aspect of the diverse mechanisms of combinatoriality in human language and cognition (Pleyer, Lepic, & Hartmann, Reference Pleyer, Lepic and Hartmann2022). These approaches often assume that compositionality is not a singular concept explaining all forms of meaningful combination in a communication system; rather, human language also includes noncompositional mechanisms of combination. These positions are sustained by behavioral evidence from psycholinguistic literature: language processing is often under-specified, linguistic information comes from different and heterogeneous sources that may vary depending on usage, and prediction is crucial for efficient language comprehension (cf. Section 3). Therefore, it is not necessarily true that syntactic structure fully determines meaning composition. Interpretation derives from the combination of bottom-up and top-down strategies, and it is an empirical issue of how syntax contributes to meaning composition precisely. Moreover, by defining the language system not as an innate faculty with its own rules, but as a system governed by general cognitive processes, more mechanisms could underlie language comprehension, such as the analogical process (cf. Section 4).
In light of these observations, it becomes necessary to revisit the question: What is compositionality? In order to provide an accurate answer, it is imperative to distinguish between two conceptualizations of compositionality: First, as a property inherent to language (and cognition), and secondly, as a linguistic principle governing the aggregation of meaning from stored units into larger (and typically innovative) utterances. Moreover, the constructionist redefinition of compositionality as a processing principle carries significant implications for the construction of a model of language comprehension. Toward the conclusion of this section, a recent proposition in this regard will be examined.
5.1 Redefining Compositionality as a Property of Natural Language
A common assumption is that human thought and language are compositional by nature (Martin & Baggio, Reference Martin and Baggio2020). By specifically posing the problem that way, this statement implies that both the language of thoughts and natural language are intrinsically compositional. The relationship between language and thought is vast and particularly controversial, and this Element is not interested in entering this debate. Generally, an influential assumption is that thought is mainly prior to and independent of linguistic communication: It is the system of thought (semantics) that shapes language. The connection between language and thought has been examined by Jerry Fodor, who has claimed that just thought, and not language, is compositional. Addressing the question of which precedes the other – thought or language – he proposed that at least one of them must be compositional, and if only one is compositional, that is the one that has underived semantic content. Fodor suggested that if natural languages lack compositionality, their content then derives from the content of thought (Fodor, Reference Fodor2001, p. 234).
From a different field of study, Christiansen and Chater (Reference Christiansen and Chater2016a) mentioned that “compositionality, function argument structure, quantification, aspect, and modality are properties of the thoughts that language may express” (p. 51). In this perspective, if thoughts are compositional, then the language should be itself compositional. However, it seems more accurate to say that there is a “capacity” for compositional processing and representation in our mind, which is recruited and expressed in language (Baggio, Reference Baggio, Christiansen and Chater2020, p. 5). Analogously, the usage-based perspectives support the hypothesis that cognitive processes shape language. From this stance, the problem can be rephrased as follows: As we can aggregate concepts, we somehow apply this cognitive mechanism to linguistic processing.
This assumption yields a reformulation of what we mean when saying “language is compositional.” For instance, Dowty (Reference Dowty, Barker and Jacobson2007) proposes to apply the term natural language compositionality “to whatever strategies and principles we discover that natural languages actually do employ to derive the meanings of sentences, on the basis of whatever aspects of syntax and whatever additional information (if any) research shows that they do in fact depend on” (p. 6). In other words, it is one of the possible strategies used to explain productivity, but it is not the only one. A different position is represented by Baggio’s works. The author still considers compositionality a backbone of language; however, his idea has less to share with the traditional Fregean compositionality. For instance, Baggio, Van Lambalgen, & Hagoort (Reference Baggio, Van Lambalgen, Hagoort, Hinzen, Werning and Machery2012) claimed that the real issue about compositionality and open-ended productivity in natural language is “the balance between storage and computation.” While the centrality of compositionality is diminished, it is evident that human languages have algorithms for building meanings from their parts (Călinescu, Ramchand, & Baggio, Reference Călinescu, Ramchand and Baggio2023). The open question is so when this computational constraint occurs and modulates language processing. One possibility is that “compositionality can often be rescued by increasing the demand on the storage component of the architecture, whereas it must be abandoned if one puts more realistic constraints on storage” (Baggio, Van Lambalgen, & Hagoort, Reference Baggio, Van Lambalgen, Hagoort, Hinzen, Werning and Machery2012, p. 18).
The research on ERPs during sentence processing has brought evidence that novel sentences evoke stronger N400 components in the ERP waveform than sentences composed of more expected combinations. The effect reveals a cognitive effort to combine the meaning of a word with the current contextual meaning, suggesting that there is a large amount of stored knowledge in semantic memory about event contingencies and concept combinations, the so-called realistic constraints on storage (Baggio & Hagoort, Reference Baggio and Hagoort2011). In other words: “[compositionality] is no longer a principle applying to language or to linguistic theory as a whole, but a computational constraint on one processing phase in the brain’s language system” (Baggio, Reference Baggio2021, p. 15).
In this context, explaining what compositionality is in language should benefit from studies about how systems in the brain realize meaning composition within the bounds of neurophysiological computation. Is the human brain, our computational device, compositional? The challenge is to identify cortical networks and neurophysiological events responsible for composition.
Among others, Hendriks (Reference Hendriks2020) reviewed the literature about the role of syntax in meaning composition, focusing on children’s acquisition of simple transitive sentences such as The car is pushing the boy. The major conclusion is that children’s production of subject–object word order in languages such as English appears to be ahead of their comprehension of the subject–object word order. In other words, syntax plays a lesser role (or perhaps a different role) from what is envisaged by the view of syntax–semantics relations in Formal Semantics and generative syntax. Syntactic structure does not fully determine meaning composition. Instead, syntax is merely one of the sources of information constraining meaning and does not have a special status. Conversely, Mollica et al. (Reference Mollica, Siegelman and Diachek2020) investigated how semantic computation can take place when the syntactic structure is not licensed by the language’s grammar. The authors introduced a novel manipulation aimed at investigating the neural responses to sentences in which word order is disrupted by increasing the number of local word swaps while maintaining local dependency relationships – that is, combinable words remain close to each other.
a. She left the museum and walked to her rooms to save money. (Intact)
b. She left the and museum walked to rooms to her save money. (3swaps)
Using fMRI, they observed that word order degradation did not decrease the magnitude of the blood oxygen level-dependent response in the language network, except when combinable words were put so far apart that the composition among nearby words was doubtful. This observation means that even when the syntactic structure is violated, the language regions respond with equal strength as they did to syntactically correct inputs, confirming that some form of composition still occurs. Given these results, the authors can affirm that “semantic composition,” defined as combining the meaning of the words in a sentence without strict syntactic parsing, is the core computation of the language network (Mollica et al., Reference Mollica, Siegelman and Diachek2020, pp. 125–126).
To conclude, we could argue that compositionality, in the Fregean sense, cannot be considered the core property of natural language. It is still valid that compositional mechanisms exist at the cognitive and brain level, allowing the aggregation of the meaning of expressions. However, compositionality does not exclusively rely on syntactic parsing. In this sense, a reformulation of the classic representation proposed in generative tradition is needed.
5.2 Redefining Compositionality as a Processing Principle
Any model of comprehension aims to explain how language is processed in real time. Specifically, the central question concerns how individuals construct the meaning of a sentence by accessing the meanings of its component lexical items and by integrating those meanings into a coherent configuration. Two questions are thus the basis of any model of language comprehension: (i) what items are combined, and (ii) how these items are combined and integrated into a final, structured representation. As discussed in Section 2, the strong version of compositionality posits that interpretation is derived from the structure of utterances, with the syntactic form not contributing to semantic information. However, CxG offers a different perspective. First, the semantic primitives of combination are not lexical items, but constructions, that is, form-meaning pairs varying in schematicity and complexity. Some constructions have schematic slots that can be filled with other constructions, which in turn might have slots that can be filled in. Moreover, some constructions are syntactic patterns associated with a specific meaning (e.g., Ditransitive construction), so the interpretation of a sentence is also dependent on syntactic form. In a first, general stance, compositionality in CxG can be defined as follows: “By recognizing the existence of contentful constructions we can save the compositionality in a weakened form: The meaning of an expression is the result of integrating the meaning of the lexical items into the meanings of constructions” (Goldberg, Reference Goldberg1995, p. 16).
Therefore, constructions combine freely to form actual expressions as long as they do not conflict (Goldberg, Reference Goldberg2003, Reference Goldberg2019). For example, a sentence like Liza sent storage a book is unacceptable because the ditransitive construction requires an animate recipient argument, while the word storage refers to an inanimate argument (Goldberg, Reference Goldberg2003, p. 10). This composition is not related to inserting a lexical item into an argument construction, but it can be extended to any construction of different complexity. Grounded on a more formal representation, SBCG (cf. Section 2.2.3) offers a unification-based symbolic formalism for describing the mechanism by which two signs (constructions) are compared to ensure their features do not conflict. If compatible, these signs are merged to form a new, unified sign that combines the attributes and values of the original signs. This unification process operates under constraints specified within the signs themselves, which thus constitute the language grammar. Even though different CxG approaches define the way constructions are combined differently, one thing is clear: Constructions are the rules of compositionality. Syntactic, hierarchical structure with abstract representation does not play a role in comprehension: The specific properties of constructions (which can include surface form constraints as specific word order) are all that matter. In this sense, CxG frameworks adopt a “what you see is what you get” approach (Goldberg, Reference Goldberg2003, p. 10).
However, another aspect to consider is that people come to the task of interpretation with a vast amount of shared world knowledge and context (Goldberg, Reference Goldberg and Riemer2015). To this end, the semantic composition is constantly enriched (Jackendoff, Reference Jackendoff1997) with background knowledge and contextual constraints: The meaning of a sentence could be computed in a good-enough manner, using expectations instead of building a complete, accurate bottom-up analysis, with the result of leading to shallow interpretation or even misinterpretation (e.g., the renowned Moses illusion, cf. Section 3.2). Models that assume the online processing relies on chunking support this hypothesis: Instead of building a syntactic structure serving as support for the comprehension of a sentence, they hypothesize a mechanism that consists of incrementally building chunks at all levels of linguistic structure as rapidly as possible, using all available information predictively to process current input before new information arrives (Blache, Reference Blache2016; Christiansen & Chater, Reference Christiansen and Chater2016b, among others). This perspective is shared by usage-based constructionist approaches: “virtually all linguistic expressions, when first constructed, are interpreted with reference to a richly specified situational context, and much of this context is retained as they coalesce to form established units” (Langacker, Reference Langacker1987, p. 455). The notion of compositionality should account for different levels of semantic contribution, from constructional meaning to contextual information and world knowledge. However, most CxG formalisms lack a detailed formalization of how these sources of information are activated and how they contribute to the final interpretation.
A different perspective is offered by Baggio (Reference Baggio2021), which tries to include a syntactic form of compositionality. According to his model of language processing, semantic representations may be generated by both a syntax-driven processing stream and an “asyntactic” processing stream, either jointly or independently. Compositionality is viewed as a constraint on computation only in the former stream. This framework, which includes parallel streams for meaning and grammar, embodies these representational and processing capacities, with compositionality serving as a constraint on the syntax-driven stream. When complex natural language expressions have multiple meanings, and at least one of those meanings is solely a function of the meanings of the parts and their syntax, the language system can make predictions about upcoming linguistic inputs based on semantic constraints established by the material already processed. Compositionality is then preserved, but it is considered a specific constraint on computation.
While the centrality of compositionality has been largely minimized, it remains true that human languages possess algorithms for predictably constructing meanings from their parts. However, it is still unclear how the generativity of meaning should be modeled as a computational constraint that influences language processing and its outputs, even though not all complex meanings are equally governed by compositionality (Călinescu, Ramchand, & Baggio, Reference Călinescu, Ramchand and Baggio2023).
5.3 Compositionality, Analogy, and Productivity
Compositionality is often referred to as our ability to compose meanings into endlessly novel configurations. In linguistics, the question of productivity remains a central one: How can a speaker, who has been exposed to a few tons of thousands of sentences, become capable of understanding (and producing) virtually an infinity of utterances?
The previous section introduced the CxG perspective: A speaker interprets a new sentence by relying on the composition of different constructions, which can be defined as “emergent clusters of lossy memory traces that are aligned within our high- (hyper!) dimensional conceptual space on the basis of shared form, function, and contextual dimensions” (Goldberg, Reference Goldberg2019, p. 7). Besides, the successful production (or reception) of an utterance depends on previously encountered linguistic expressions, and it is likely to bring up a slight modification to the linguistic knowledge stored in our long-term memory.
Many approaches to productivity in language assume that computation is called into service in order to avoid storage in memory. That is, it is often assumed that memory and computation stand in an inverse relationship for the sake of efficiency . The usage-based constructionist approach takes a quite different perspective. Partially abstracted from experience, exemplars are retained in memory as part of a rich network of knowledge. While we are not able to recall individual exemplars at will, given that their representations overlap with the representations of other exemplars, our knowledge of language is formed and continually affected by them. Language is extended creatively (involving new “computations”), not in order to reduce or avoid storage in memory, but in order to express new messages in ever-changing contexts.
This idea of productivity leads to a shift in the linguistic description as well. The mechanism underlying language production is not an a priori set of rules, but it is a force that dynamically changes previous inputs while generating novel outputs. A productive use of a construction is supported to the extent that the potential coinage falls within a densely covered existing cluster of cases that exemplifies the construction. When no conventional constructions are available to express an intended message in context, speakers must extend their existing constructions in novel ways. In the absence of conventional formulations, speakers rely on (combinations of) representations that are sufficiently effective for communication (Goldberg, Reference Goldberg2019).
Therefore, the usage-based constructionist perspective allows constructional knowledge to be both remarkably specific and flexible (Goldberg, Reference Goldberg2024). According to Goldberg (Reference Goldberg2019), if a cluster of lossy overlapping memory traces that constitute a construction is very specific, the range of contexts in which it is observed will be narrow. In other words, when observed utterances share similar contexts of use, the resulting learned cluster will be correspondingly narrow and specific. However, even highly specific constructions are occasionally extended flexibly, as speakers must use constructions in constantly changing contexts to convey an open-ended range of messages (Goldberg, Reference Goldberg2024).
A tenet of this Element is that a possible mechanism responsible for dealing with extending new constructions is the cognitive process of analogy, the “core of cognition” (Hofstadter, Reference Hofstadter, Holyoak, Gentner and Kokinov2001; cf. Section 4.2). Resuming the previously mentioned assumptions of Bybee (Reference Bybee2010), analogy depends on similarity in form and meaning between constructions, whether these constructions are of a concrete type (as in collocations or fixed structures) or an abstract type: A novel instance is compared to those stored in our long-term memory to infer the new representation. In this perspective, the probability or acceptability of a novel item is gradient and depends on the extent of similarity to prior uses of a construction. In a more radical stance, Ambridge (Reference Ambridge2020a) proposed to disregard completely abstraction: Unwitnessed forms are produced and comprehended “by on the fly analogy” across multiple stored exemplars. Similar to what Ambridge (Reference Ambridge2020a) argued, forms of which one has never had direct experience are produced and understood through “on the fly” analogical processes with respect to multiple stored exemplars (i.e., concrete representations of experiences) and weighed according to their degree of similarity to the new instance; comprehenders generalize via analogy to interpret and generate new linguistic experiences. This mechanism could be applied to entire sentence comprehension: The evolving syntactic structure of the new sentence emerges on the fly as it is compared to the previously interpreted exemplars (Chandler, Reference Chandler2020). In that sense, the mental lexicon could be conceived as a “vast storehouse of triggerable analogies” (Hofstadter, Reference Hofstadter, Holyoak, Gentner and Kokinov2001, p. 504): Every lexical expression, when used in speech (whether received or transmitted), could constitute one side of an analogy being made in real time in the speaker’s/listener’s mind.
However, this assumption does not entirely endorse the radical vision of Ambridge where there is no abstraction: Any kind of analogy is simply untenable without an abstract structure of some sort (Adger, Reference Adger2020). As already introduced, even constructions are somehow abstracted from their specific instances, and some schemata are more syntactic than concrete expressions. The hypothesis that analogical mapping with existing, lexicalized construction can be at the basis of productivity does not exclude a priori the existence of other processes that aggregate meaning. Analogy is one possible method of meaning production within a broader array of cognitive mechanisms that not only contribute to productivity but also generate more creative expressions (e.g., conceptual blending; cf. Hoffmann, 2024). The central proposal is that productivity is explainable as a continuum: Sometimes, a novel expression can be interpreted analogically from partially overlapping stored sequences, and sometimes, it is the result of a bottom-up compositional computation (defined as unification or other formalisms).
In conclusion, while today’s approaches of CxG offer a flexible framework of the construction combinations, more efforts should be made toward a comprehensive description of the mechanisms that undergo three different steps of language (adapted from Kleinschmidt & Jaeger, Reference Kleinschmidt and Jaeger2015):
1. How we “recognize the familiar,” that is, how we deal with previously experienced and stored aspects of language (lexicalized construction, but also the integration with contextual knowledge);
2. How we “generalize to the similar,” that is, how we comprehend a novel situation based on similar previous (linguistic) knowledge to not start from scratch each time a new situation is encountered (productivity); and finally
3. How we “adapt to the novel,” that is, how is it possible to adapt beyond what is expected based on previous experience (creativity).
Reduction of the role of Compositionality
Compositionality is no longer the undoubted principle applying to language or linguistic theory but a computational constraint on processing in the brain’s language system.
Productivity is adaptation, and adaptation is by (relational) similarity
Comprehension can be viewed as a process of retrieval and adaptation: We interpret linguistic stimuli by recovering the constructions in the semantic memory that best share relational features, and, in case these are not found, we infer (or adapt) the interpretation of the input by analogical inference.
5.4 Toward a Constructionist Model of Language Processing
Defining compositionality is not just a theoretical matter, it is a pressing need for developing a cognitive (and computational) model of language processing. The observations and reviewed literature aim to establish a common ground for designing a formal representation of constructions integrated into a usage-based computational model of language processing – specifically, of language comprehension. Although this is a relatively recent line of research, some works have proposed different hypotheses about bridging the gap between linguistic and psycholinguistic theory (Huettig, Audring, & Jackendoff, Reference Huettig, Audring and Jackendoff2022; Lindes, Reference Lindes2022; Michel, Reference Michel2023). An example of frameworks that attempt this integration is presented here.
In terms of representation, Rambelli et al. (Reference Rambelli, Chersoni, Blache, Huang and Lenci2019) provided the formal basis for a constructionist model of language processing. Specifically, they introduced a novel semantic representation of CxG, termed Distributional Construction Grammar (DCxG), which integrates constructions with the vector representations used in Distributional Semantics. The primary objectives of this theoretical proposition were twofold: (i) to offer a comprehensive representation of semantic information within the CxG framework, and (ii) to incorporate distributional vectors into the construction representation, thereby accommodating the more usage-based aspects of meaning (Busso, Pannitto, & Lenci, Reference Busso, Pannitto and Lenci2018; Lebani & Lenci, Reference Lebani and Lenci2017; Levshina & Heylen, Reference Levshina, Heylen, Boogaart, Colleman and Rutten2014; Perek, Reference Perek2016, Reference Perek2018). Specifically, each construction is represented as an attribute-value matrix following the Signed-Based CxG formalism (Sag et al., Reference Sag, Boas, Kay, Boas and Sag2012). The sources of meaning are encoded separately in three components, which interact but can still be instantiated separately: constructional meaning, frames, that is, the schematic knowledge describing scenes and situations in terms of their semantic roles, and events, that is, the semantic information concerning particular event instances with their specific participants (McRae & Matsuki, Reference McRae and Matsuki2009). All these three components are associated with a distributional representation (Figure 4).
Distributional Construction Grammar stands as one of the few works aiming to establish a unified representation of grammar and meaning, grounded on the assumption that language structure and properties emerge from language use. As a linguistic representation, the model develops language structure, properties, and meanings from the distributional statistics observed in text corpora, coherent with the idea that language emerges from language use. On the modeling side, this framework is structured to incorporate linguistic information and world knowledge into the semantic representation of linguistic input, employing an incremental and predictive process. Specifically, this representation has been developed to be the basis of a computational semantic processing model founded on the interaction between the three informational structures with the mechanisms to meaning access: activation, similarity, and unification (Blache et al., Reference Blache, Chersoni, Rambelli and Lenci2023).
While the previous work is mostly related to a representation aspect, Blache (Reference Blache2024) uses a similar representation to propose a neurocognitive architecture of language processing, integrating constructions and the Memory, Unification, and Control model (MUC; cf. Hagoort, Reference Hagoort2013, Reference Hagoort, Hickok and Small2016), a general framework for sentence comprehension that aims at accounting for the balance between storage and computation (Baggio, Van Lambalgen, & Hagoort, Reference Baggio, Van Lambalgen, Hagoort, Hinzen, Werning and Machery2012). Without going into the specific details, this framework relies on two mechanisms: prediction and unification. Prediction calculates the most likely next sign (constructions) based on the context. It is always at work and signs (constructions) at any granularity can be predicted. While linguistic theories rely on a mechanism (derivation, constraint solving, etc.) that linearly aggregates objects of categories at the same level, linguistic objects used as basic components can be of any granularity (in line with CxG) and do not necessarily correspond to a category of the same level, meaning that the integration mechanism is no longer linear. Unification, on the other hand, is an operation consisting in comparing two structures, assessing their compatibility, and building a resulting structure merging both. In the case of lexical access, unification is the controlling mechanism for identifying the matching entry. In the case of situation model updating, it implements nonlinear compositionality by integrating the current sign into a structure. This prediction–unification model provides a framework bringing together unique architecture facilitation mechanisms besides classical incremental processing. This is done due to a single processing cycle based on the integration of complex multilevel structures. In addition to explaining how to integrate facilitation mechanisms, this model also brings a new vision about the two different ways to build meaning: compositionality or direct access. In this approach, these two mechanisms only differ in one point: the granularity of the signs to integrate into the situation model. Building the meaning is always done compositionally but can correspond to a word-by-word incremental mechanism (the classical view of compositional principle) or on the opposite in the integration of entire and large pieces of meaning.
The presented models are far from being a complete representation of language processing, and different works could propose different versions of composition (instead of unification) or the alternation of separated cognitive processes, such as analogy, conceptual blending, and so on (Goldberg & Ferreira, 2022; Hoffmann, 2024; Rambelli et al., 2022). In conclusion, new efforts in CxG should be oriented toward integrating construction formalisms into an architecture that combines several processing mechanisms observed in cognitive and neurolinguistic literature. To correctly implement a model of language comprehension, two aspects should be explicitly defined: (i) which type of information resides in the lexicon and in the long-term memory, and how such information is represented, and (ii) which type of principles guide the constraint-based unification that also produces potentially novel combinations. On top of this model, it should be included how the most creative expressions are interpreted (in contrast with productive sentences).
6 Concluding Remarks
This Element has investigated issues at the core foundation of language, issues that are indeed epistemologically complex. The five sections synthesized extensive literature on compositionality and language processing across theoretical, experimental, cognitive, and computational linguistics. On one side, it examined the problem of compositionality, designating both the principle formalized in traditional linguistic theories and the broader cognitive ability observed at the brain level. Conversely, it portrayed a broad scenario in language processing where interpretation is shallow, indeterminate, and often driven by contextual expectations and our preexisting linguistic and world knowledge.
However, the central claim of this Element is that, apart from compositionality, there is another mechanism accounting for language productivity: the cognitive process of analogy. I outlined the key features of analogical reasoning embraced in cognitive studies and explored the nature of linguistic analogy to support the proposal that analogical processing underpins the human capability to create new utterances. Throughout the Element, different observations confirmed the original hypothesis: CxG seems to be the best linguistic theory to characterize the psycholinguistic evidence about language.
While this work aims to converge diverse sources from various domains to present a comprehensive view of the complexity behind language comprehension, there remains a considerable amount of work to integrate these claims into a unified model of language representation and processing. Above all, some aspects of the formalizations proposed in different constructionist approaches still need to be clarified and need future research. According to Kallens and Christiansen (Reference Contreras Kallens and Christiansen2022, p. 10), a crucial step toward rendering CxG a fully adequate linguistic formalism involves “providing an account of what constructions at different levels of abstraction mean, and how that meaning can be acquired through linguistic experience.” Moreover, future efforts should seek to formulate an overarching theory of language comprehension, where input categories of varying granularity (words or constructions) possess a singular representation but engage different mechanisms for accessing meaning (either through composition or direct access).
Another question concerns explicitly the coexistence of these different accesses to meaning and, specifically, the role of analogy. While usage-based theories widely adopt this theoretical concept, there is no work identifying when analogies take place during language comprehension to the best of the author’s knowledge. The difficulty lies not only in recognizing the occurrences and their timing but also in the absence of resources enabling a comprehensive study of these phenomena. Consequently, the transition from one mechanism to another remains a challenging question in terms of modeling.
In summary, the primary purpose of this Element is to illustrate what it means to rethink a linguistic theory that considers both traditional compositionality and behavioral observations. Today’s challenge is developing linguistic (and computational) models that could address compositional and noncompositional aspects of meaning, using reasonable definitions of compositionality that formally and empirically make the principle nontrivial or nonvacuous. The author hopes that this Element will serve as a valuable resource for students and researchers interested in developing a linguistic architecture capable of modeling the cognitive and linguistic mechanisms involved in sentence interpretation. Future research efforts should move toward delineating a computational model that would integrate formal linguistic theories, usage-based contextual information, and psycholinguistic findings to provide a comprehensive understanding of language comprehension.
Acknowledgments
I thank Alex Bergs and Thomas Hoffmann, the series editors, for having faith in me and for providing support and positive feedback in the reviewing months. I’m also thankful for the comments by an anonymous reviewer, which helped me sharpen the focus of this manuscript.
This Element comes from a part of my PhD thesis, which I could not have done without the support and guidance of my supervisors, Alessandro Lenci and Philippe Blache. I am also grateful to all the people who reviewed my original thesis, Professors Giosué Baggio, Florent Perek, Adele Goldberg, and Aline Villavicencio. They have inspired me with their research, and I was honored they evaluated mine. I also thank Marianna Bolognesi for giving me the time to write these pages during my postdoc.
Thomas Hoffmann
Catholic University of Eichstätt-Ingolstadt
Thomas Hoffmann is Full Professor and Chair of English Language and Linguistics at the Catholic University of Eichstätt-Ingolstadt as well as Furong Scholar Distinguished Chair Professor of Hunan Normal University. His main research interests are usage-based Construction Grammar, language variation and change and linguistic creativity. He has published widely in international journals such as Cognitive Linguistics, English Language and Linguistics, and English World-Wide. His monographs Preposition Placement in English (2011) and English Comparative Correlatives: Diachronic and Synchronic Variation at the Lexicon-Syntax Interface (2019) were both published by Cambridge University Press. His textbook on Construction Grammar: The Structure of English (2022) as well as an Element on The Cognitive Foundation of Post-colonial Englishes: Construction Grammar as the Cognitive Theory for the Dynamic Model (2021) have also both been published with Cambridge University Press. He is also co-editor (with Graeme Trousdale) of The Oxford Handbook of Construction Grammar (2013, Oxford University Press).
Alexander Bergs
Osnabrück University
Alexander Bergs joined the Institute for English and American Studies at Osnabrück University, Germany, in 2006 when he became Full Professor and Chair of English Language and Linguistics. His research interests include, among others, language variation and change, constructional approaches to language, the role of context in language, the syntax/pragmatics interface, and cognitive poetics. His works include several authored and edited books (Social Networks and Historical Sociolinguistics, Modern Scots, Contexts and Constructions, Constructions and Language Change), a short textbook on Synchronic English Linguistics, one on Understanding Language Change (with Kate Burridge) and the two-volume Handbook of English Historical Linguistics (ed. with Laurel Brinton; now available as five-volume paperback) as well as more than fifty papers in high-profile international journals and edited volumes. Alexander Bergs has taught at the Universities of Düsseldorf, Bonn, Santiago de Compostela, Wisconsin-Milwaukee, Catania, Vigo, Thessaloniki, Athens, and Dalian and has organized numerous international workshops and conferences.
About the Series
Construction Grammar is the leading cognitive theory of syntax. The present Elements series will survey its theoretical building blocks, show how Construction Grammar can capture various linguistic phenomena across a wide range of typologically different languages, and identify emerging frontier topics from a theoretical, empirical and applied perspective.