1. Introduction
Pullum & Scholz (Reference Pullum and Scholz2003), Rogers (Reference Rogers2003) and Pullum (Reference Pullum, Rogers and Kepser2007, 2013a, 2020) draw a distinction between theories of syntax that are Generative-Enumerative (GES) and those that are Model-Theoretic (MTS). The former category includes classical Transformational Grammar (TG, the variously revised, extended, ‘standard’ theory, Chomsky Reference Chomsky1965, Reference Chomsky, Culicover, Wasow and Akmajian1977, Reference Chomsky1981), most proposals under the ‘Minimalist Program’ (Chomsky Reference Chomsky1995), Optimality Theoretic Syntax (OTS, Legendre et al. Reference Legendre, Grimshaw and Vikner2001), Tree-Adjoining Grammar (TAG, Joshi & Schabes Reference Joshi, Schabes, Nivat and Podelski1992) and Combinatory Categorial Grammar (CCG, Steedman Reference Steedman2000). The MTS category includes (with some important caveats (Pullum Reference Pullum2013a: 501)) the following: Lexical-Functional Grammar (LFG, Bresnan Reference Bresnan1982), Head-Driven Phrase-Structure Grammar (HPSG, Pollard & Sag Reference Pollard and Sag1994), and at least the related ‘sign-based’ version of Construction Grammar (CxG, Boas & Sag Reference Boas and Sag2012), all of which are constraint-based (Pollard Reference Pollard1996). Pullum further argues that only the MTS theories can explain the fact that adults and children acquiring their first language can cope with such features of the input as neologism and gradient grammaticality in language use.
The present paper endorses the core MTS claim as consistent with the related principle of monotonicity in rules embraced by many GES approaches including CCG, TAG and (rather late in the day) Chomskyan Minimalism, as well as by constraint-grammars, which says that structure, once built, cannot be modified. It is an empirical question whether grammars of this constrained kind are more adequate linguistically when expressed in terms of constraints or generative rules. However, it does not follow that they differ with respect to the problems of unseen vocabulary or gradient grammaticality. Both can be handled with mechanisms that are quite standardly used by computational linguists to manage the profligate ambiguity of all natural languages.
2. Why Model Theory?
Model theory is most familiar as the device that connects the sentences of a logic to truth value, T/F, relative to a structure, or model M. For example, the models for first-order predicate logic (FOPL) are structures of (unary, binary, etc.) predicates holding over entities, such as farmer(Giles), donkey(Modestine), walks(Modestine) and owns(Giles, Modestine). The model theory is a small collection of axioms closely related to the syntax of the logic – for example, one that assigns sentences of the syntactic form ‘P & Q’ (where P and Q are themselves sentences of the logic) the value T in the model just in case both P and Q have the value T in the model, and F otherwise.
Montague (Reference Montague, Hintikka, Moravcsik and Suppes1973) showed that the semantics of natural languages like English could be defined model-theoretically in similar terms over similar structures. (To the working linguist who pays any attention to semantics, this is probably the most familiar use of model theory.)
More radically, Lewis (Reference Lewis1970), Geach (Reference Geach1970) and Montague (Reference Montague and Visentini1970) suggested that the syntax of natural language, though more complex than that of FOPL, could be analyzed as actually isomorphic to a model-theoretic semantics. This suggestion was taken to varying degrees of literality by linguists such as Partee (Reference Partee1975), Bach (Reference Bach1979), Keenan & Faltz (Reference Keenan and Faltz1985), Jacobson (Reference Jacobson, Huck and Ojeda1987), Szabolcsi (Reference Szabolcsi and Szabolcsi1997) and the present author.
In apparent contrast, Chomsky has always insisted on the autonomy of syntax and defines the business of the syntactician as understanding the relation between the strings of the language(s) and sets of syntactic structures independently accessible to the intuition of native speakers, under the doctrine of the ‘autonomy of syntax’ from semantics and processing (Chomsky Reference Chomsky1957, passim).Footnote 1
Model theories can be defined for other logics over other models. In particular, Pullum (Reference Pullum2013a) follows Rogers (Reference Rogers1998, Reference Rogers2003) in taking the tree-structures that a Chomskyan would view as associated a priori with the sentences of natural languages as models, rather than (or perhaps, as well as) structures that are interpretable in their own right in the Montagovian sense. For Pullum and Rogers, a sentence is grammatical if its tree satisfies (‘models’) a grammar defined as an unordered set of formulæ expressed in the language of weak Monadic Second-Order Logic (wMSO). A second-order logic is one in which you can quantify over predicates, as well as over individuals. It is monadic if second-order quantification is limited to unary predicates (equivalently sets or properties), such as donkey, farmer and walks. It is weak if the sets are finite.
WMSO logic is interesting because, unlike full second-order logic, it supports useful models including graphs (Hedman Reference Hedman2004: 388–392). As a consequence, it has many applications in the theory of programming languages (Libkin Reference Libkin2004). For Pullum and Rogers, it acts as a metatheory for the theory of natural languages in terms of their treesets.
It is perhaps not surprising that the treesets constituting natural language should be second-order. The trees reflect the way in which the language puts meaning representations together and are clearly second-order as languages in that sense. In particular, raising and control verbs, as well as generalized quantifiers and relative pronouns, are second-order functions specifying first-order properties (but no higher) as arguments (Chierchia Reference Chierchia1985, Chierchia & Turner Reference Chierchia and Turner1988). For example, ‘seem’ can be argued to denote a second-order predicate $ \lambda p\lambda y. seem\left(p,y\right) $ that combines to its right with a first-order predicate, such as ‘to be drunk’, and a subject to its left, such as ‘they’, to yield a sentence ‘They seem to be drunk’, with a meaning writtenseem(drunk,them). It is this level that defines the derivational treeset.
It is possible to regard the grammars of such languages as a logic in its own right, as Categorial grammarians of the ‘Type-Logical’ persuasion (Moortgat Reference Moortgat1988, Morrill Reference Morrill1994, Kubota & Levine Reference Kubota and Levine2020) do. This is, in fact, the view in other explicitly model-theoretic syntactic frameworks such as those of Michaelis et al. (Reference Michaelis, Mönnich, Morawietz, Kamp, Rossdeutscher and Rohrer2001), Mönnich (Reference Mönnich, Rogers and Kepser2007) and Graf (Reference Graf2010), who apply model-theoretic approaches to Chomskyan Minimalism, among other GES formalisms. However, such logics are tree-logics, and their semantics is ‘tree-conditional’, specified in terms of membership of the treeset of the language, rather than truth conditional. For the most part, as in the above case, the meaning representations that natural language derivations put together can be translated as expressions in a different, first-order, truth-conditional language, with a first-order model theory, chunks of which the derivation merely glues together.
Nevertheless, sentences like the following seem both syntactically and logically to involve second-order quantification over monadic properties $ p $ :
If so, the connection between MTS and model theoretic semantics is made.Footnote 2
However, Pullum further argues that, in order to reconnect with the nature of human language in use – and, in particular, its acquisition by children and its related resilience in adult speakers to gradient grammaticality, neologism and error in the input – MTS grammars should be defined as sets of constraints on the trees of the language, rather than as productions, as is standard in GES (Pullum Reference Pullum2013b).
This paper agrees that explanatory theories of syntax should be model-theoretically formulated if they are to follow Pullum’s call to re-engage with the psychological questions of the resilience of language in use, and of evolutionarily and cognitively realistic conditions on its acquisition, but argues that they need not be constraint-based in order to achieve those ends, widening the range of syntactic theories that should be recognized as model-theoretic.
3. MTS as Metatheory
Rogers (Reference Rogers2003) shows that wMSO logics can be used to define a logic-based equivalent of the infinite hierarchy of increasingly expressive ‘Abstract Families of Languages’ identified by Weir (Reference Weir1988, Reference Weir1992) including the well-known context-free languages (CFL), ‘Type 2’ in the original hierarchy of Chomsky (Reference Chomsky1956, passim). These systems, with CCG and TAG languages are properly included in the lowest trans-context-free level among the multi-level hierarchy of sub-context-sensitive Linear Context-Free Rewriting Systems (LCFRS, Vijay-Shanker et al. Reference Vijay-Shanker, Weir and Joshi1987) and the essentially equivalent Multiple Context Free Grammars (MCFG, Seki et al. Reference Seki, Matsumura, Fujii and Kasami1991). These lowest levels are called LCFRS-2 or 2-MCFG. By contrast, the languages of $ \mathcal{MG} $ , the version of Chomskyan Minimalist Grammars identified by Stabler (Reference Stabler and Retoré1997), reside at the highest level of full LCFRS/MCFG (Michaelis Reference Michaelis, de Groote, Morrill and Retoré2001), but all are much less expressive than the context-sensitive and recursively-enumerable languages (respectively identified as ‘Type 1’ and ‘Type 0’ in the original Chomsky hierarchy). Rogers generalizes results of Thatcher (Reference Thatcher1967) and Doner (Reference Doner1970) to show that this hierarchy corresponds to a series of wMSO logics (Seki et al. Reference Seki, Matsumura, Fujii and Kasami1991), where level $ n $ corresponds to a wMSO defined over $ n $ linear precedence relations $ {<}_1 $ to $ {<}_n $ , corresponding to the $ n $ dimensions of LCFRS trees, and where the set of level $ n $ languages properly includes those of all levels $ <n $ .
The definition of the language hierarchy in terms of wMSO logic, in addition to the standard formalisms of productions and automata, is an exciting and elegant result, which perhaps should have been expected on the basis of the Curry-Howard-Lambek equivalence between logical proof, computation, production systems and category theory (Chomsky Reference Chomsky1956, Mac Lane Reference Mac Lane1971, Lambek Reference Lambek1968).Footnote 3
Pullum gives the following correspondence between abstract families of languages (AFL) and logics in the hierarchy (adapted from Reference Pullum2013a: 500):Footnote 4
A number of points are of interest concerning this hierarchy. First, membership at any level carries a guarantee of polynomial recognition/parsability (Vijay-Shanker & Weir Reference Vijay-Shanker and Weir1994), and hence of the applicability of practical divide-and-conquer parsing algorithms. Second, it affords a partial ranking of linguistic theories in terms of their expressive power.
As Chomsky notes (Reference Chomsky1965: 31, 62), explanatory adequacy of a descriptively adequate theory of grammar is dependent in part on the degree to which it restricts the space of possible grammars to allow only possible human languages.Footnote 5
The expressive power of a theory is directly relevant to this question. CFG (and therefore GPSG, GB under the assumptions of Relativized Minimality (Rizzi Reference Rizzi1990) and Manzini’s (Reference Manzini1992) lexicalized theory of locality (Rogers Reference Rogers1998: 185), and presumably some versions of Minimalism with the related Minimal Link/Shortest Move Condition) cannot capture the non-nested dependencies that are known to characterize some natural language constructions (Shieber Reference Shieber1985) and are therefore descriptively inadequate. Provided that CCG or TAG can be shown to be descriptively adequate to capture the full variety of constructions allowed by human language, then whatever the differences between them, they may immediately be held to be more explanatory than formalisms that reside further up the hierarchy, let alone those at the level of the Type 1 or Type 0 grammars that lurk far outside it. That is to say that, under any of the three views, the hierarchy provides what Chomsky (Reference Chomsky1965: §1.7) called an evaluation measure. For example, if a descriptively adequate account of English grammar at level 3 can be devised that includes the interaction of raising verbs with relativization that Rogers (Reference Rogers2004) analyzes in terms of a level 4-wMSO grammar under TAG assumptions, then it may be preferred.Footnote 6
One might at this point ask what we might be excluding by confining ourselves to grammars that map onto logics that happen to have graphs as models. What kind of grammar corresponds to a logic that is not limited in this way? One answer might be the following: those grammars that include non-monotonic operations that alter structure, such as movement of the kind embraced by early Transformational Theories of the 1970s (Bresnan Reference Bresnan, Culicover, Wasow and Akmajian1977). Such grammars do not seem to correspond to any wMSO logic and would appear to require resource-sensitive ‘Logics of Change’ such as Linear or Dynamic Logic.Footnote 7
Linear Logic (Girard Reference Girard1987), which is well understood in terms of proof-theory, seems reluctant to yield a useful model-theoretic interpretation of the kind available for monotonic logics (cf. Girard Reference Girard, Girard, Lafont and Regnier1995). If so, one interpretation of the implication of MTS for the working syntactician might be the following: Avoid non-monotonicity in rules. A stronger one might be the following: Make sure you are weakly equivalent in generative capacity to some level within the hierarchy of productions/automata/wMSO logics, all of which are inherently monotonic. The latter is, in fact, the definition advocated in the conclusion below. However, Pullum et al. propose a stricter definition, discussed next.
4. MTS vs. Generative-Enumerative Syntax
Rogers’ observation that the levels of the Weir language hierarchy can be mapped to the levels of a hierarchy of sMSO logics, all of which can be assigned a transparent and intuitive model theory, might seem to suggest that the model theoretic syntactic theories should be defined by this hierarchy, whether viewed in terms of production systems, automata or wMSO logic. As the table in (2) indicates, that would make virtually all formally explicit theories of syntax count as card-carrying MTS, including some versions of GB and Chomskyan Minimalism.Footnote 8
However, Rogers defines grammars as sets of wMSO constraints on treesets, which he shows can then be algorithmically converted to more standard grammars in formalisms like GB or TAG. In the latter case, this conversion amounts to generating the entire set of elementary trees that define a language-specific TAG grammar, of the kind developed by hand under the X-TAG project (Bangalore et al. Reference Bangalore, Sarkar, Doran and Hockey1998). The constraint sets can, in principle, be orders of magnitude smaller than the grammars they map onto since many of the constraints may be universal to all grammars, and relatively few be language-specific, such as that this is a VSO language. In principle, the wMSO specification could be a very efficient way of running the large grammars and treesets that are currently maintained for wide-coverage grammars like CCGbank or XTAG, which typically require labor-intensive and time-consuming global changes if the grammar changes.
Pullum (Reference Pullum2013a: 497–498) follows Rogers in placing a further condition for theories of syntax to be accredited as fully model-theoretic – namely, that systems of rules should be replaced by unordered sets of constraints that their tree-structures have to satisfy to count as well-formed. This condition supposedly distinguishes them from the Generative-Enumerative theories. This remains, notwithstanding, a deliberately Chomskyan view of the problem of grammar, according to which the tree corresponding to each sentence of the language is as much of a given as the word-string, with no theoretical role for the semantics that actually determines its form and, ultimately, these constraints. It is the job of the linguist to determine the constraints – a.k.a. significant generalizations concerning the form of such trees – in the style of ‘X-bar’ theory and ‘Relativized Minimality’.
Rogers (Reference Rogers2003) gives a worked example of the partial specification of a fragment of TAG grammar expressed via wMSO constraints. We will instead use Pullum’s (Reference Pullum2020: 7) simpler (first-order) example of a constraint here, in the form of the following formula that can be paraphrased as ‘Every PP node immediately dominates a P node which is its head’:Footnote 9
He reasonably suggests that this formula constitutes a substantive universal constraint on all natural language treesets, which can be further specified for languages like English by a parochial linear precedence rule, saying that if a P node has sister(s), then it precedes them.
Pullum points out that such constraints on trees are not equivalent to GES rules like the following related CFPS production or categorial grammar (CG) category (4a,b):
In particular, while the production (4a) captures the fact that one way of realizing a PP is as immediately dominating P followed by NP, and while the CG category (4b) captures, in addition, the fact that the P(reposition) ‘to’ is the head of PP (since the latter corresponds to its result-type), the constraint (3) says, by contrast, that all PPs have a (less specified) realization.
One might think that the choice between constraint-based and rule-based grammars would be an empirical one, depending on which works best in practice for the grammar or module of grammar to hand. The intrinsic algorithmic advantage of Rogers’ own tree-automaton-related constraints is, as he points out (2003: 293, 318–319), hard to realize in practice. The very abstract constraints that are required, which probably have to be identified top-down, from the most universal to the more parochial, do not seem easily compatible with the way linguists usually work, from particular observations to inductive generalization, (and in practice, usually onward to exceptions to the generalization that will lead to its modification).
Nevertheless, Pullum makes constraint-based formalism part of the definition of MTS, as being crucial to issues of language use, such as neologism, error and gradient grammaticality, which have been relegated by GES linguistics to the purgatory of Performance, under an assumption of autonomy of competence grammar.
The idea of Competence Grammar is to some extent justified by the fact that there are many processing algorithms and learning mechanisms for even the simplest grammars. However, total neglect of Performance ignores the obvious fact that, however language developed in humans, grammar and processor must have come into existence as a package deal, for the simple reason that the one is of no use without the other. To that extent, it would at least be prudent for linguists to keep their theories compatible with those algorithms and mechanisms, for which the minimum requirement is recognition time polynomial in the length of the string, a property guaranteed by the wMSO hierarchy.Footnote 10
But Pullum’s criticisms of GES go considerably further in identifying a number of other characteristics of performance mechanisms that should act as constraints on linguistic theorizing. These criticisms are of three kinds.
The first is that generativists have been extremely careless over the years in their use of the term ‘infinite’ (Pullum & Scholz Reference Pullum, Scholz and Van der Hulst2010). They have frequently claimed that infinitude is an intrinsic property of the sets of sentences that constitute human languages, when what they mean is that natural languages are unbounded sets (nor is unboundedness a distinctive property of human languages as is often claimed, since some quite trivial finite-state (type 3) languages are also unbounded). Unboundedness is a completely unsurprising property of any reasonable theory of natural grammar, rather than distinguishing human language. I think Pullum is quite right on this score, but I do not think it distinguishes MTS from GES.
A second kind of criticism, focused on ‘holism’ in grammar and in language acquisition (Pullum Reference Pullum2020), seems to amount to an argument against non-monotonicity. Non-monotonicity means that you cannot determine whether a structure is well-formed according to the grammar by purely local application of either rules or constraints. Similarly, if during language acquisition, a datum – that is, a situation pairing a meaning representation with a string – fails to yield an analysis according to the grammar $ \mathcal{G} $ that the child or the linguist has induced so far, then monotonicity in the universal set of rules/constraints that it has to choose from may make it difficult to make a modular change to a more adequate grammar $ {\mathcal{G}}^{\prime } $ . Culicover & Wexler (Reference Culicover, Wexler, Culicover, Wasow and Akmajian1977), Wexler & Culicover (Reference Wexler and Culicover1980) and Berwick (Reference Berwick1985) had to invoke various ‘Freezing’ and ‘Subset’ Principles to constrain the search space in the face of non-monotonicity, causing Baker (Reference Baker, Culicover, Wasow and Akmajian1977, Reference Baker1979) to advocate the entire elimination of non-monotonic rules from the theory of grammar, consistent with the wMSO logic-based language hierarchy, since monotonicity is as noted earlier, guaranteed by wMSO logic. However, we have also noted that most modern theories of syntax at least pay lip service to monotonicity, as under various ‘Projection Principles’ and the ‘Inclusiveness Condition’ of Chomsky (Reference Chomsky1995, passim), suggesting that this property may also be attainable under the GES approach.
The third and most telling group of arguments made against GES relate to its all-or-none rigidity with respect to phenomena like gradient grammaticality, ‘quandaries’ (where there is a meaning for which the grammar fails to provide any fully grammatical realization) and the like. I will give some examples below, but they all come down the claim that it is part of the definition of a GES that it incorporates a closed lexicon.
Grammatical gradience has always been recognized by linguists and has usually been accounted for in terms of the number and significance of rule violations incurred, or by reordering constraints (Keller Reference Keller1998, Reference Keller2001). But it is standard when trying to prove mathematical properties such as closure under intersection for languages and grammars to close the grammar under finite (though unbounded) sets of rules and/or lexical items, and talking about it as generating ‘all and only’ the sentences of the relevant language(s). Similarly, when talking about the acquisition problem in mathematical terms, authors such as Chomsky (Reference Chomsky1965) have on occasion talked about language learning as the problem of instantaneously identifying a unique grammar, perhaps defined by settings of a finite set of parameters, on the basis of a finite sample of strings of the language alone. On occasion, this model has been taken rather literally as a model of acquisition – for example, by Gold (Reference Gold1967) and Yang (Reference Yang2002).
However, it does not seem necessary to think of an adult or a child who learns a new word that the context and/or rest of the sentence allows them to understand as bearing a known category such as that of a transitive verb – even if they do not know what it means, as when reading ‘Jabberwocky’ – as having changed their grammar. It seems more natural to assume that natural grammars in actual use include a ‘wild card’ for unknown words, rather like the ‘*’ matching any sequence of characters in a regular expression (RE), or perhaps some more phono/morpho-tactically-specific learned RE. The wild card simply allows matching unknown words to be treated generatively, as lexically ambiguous over all preterminal labels (or perhaps just open-class preterminals). This process is fundamental to the account discussed below of language acquisition by the child, to whom all words are initially unknown, but who has access to (noisy, ambiguous) contextually supported meanings that can be associated with originally unknown words. Seen in this light, Pullum’s problem of the Open Lexicon and the problem of Child Language Acquisition both reduce to the problem of ambiguity resolution.
Of course, backing off to lexical wild cards as a last resort for unknown words allows some additional categorial ambiguity into the grammar. But there is already a massive amount of ambiguity in every natural language – of a kind we never allow in the artificial languages of mathematics, logic and computer programming – so a little more lexical ambiguity can hardly matter. In particular, it does not change the wMSO tree-language level of the grammar.
The question of exactly how all that proliferating syntactic ambiguity in the grammar is actually resolved is of course an important question for practical parsers. For wide-coverage computational parsers, ambiguity has to be resolved using a statistical model, usually estimated from a treebank or corpus of sentences and associated tree structures that is representative of the language in actual use, or possibly by learning a direct end-to-end transducer between strings and meaning representations. In human sentence processing, the same function is performed by some combination of distributional models at all levels, including semantics and inference about context (Altmann & Steedman Reference Altmann and Steedman1988).
It is the parsing model that does the work of limiting algorithmic search among what for realistic cases are routinely thousands and sometimes millions of alternative syntactically legal parses. Such models also work very well in guessing which of a finite number of preterminal categories is most likely for the wild card on each occasion an unknown word is encountered.Footnote 11
Statistical parsing models also provide a basis for dealing with gradient grammaticality, since actual wide-coverage parsers always give multiple analyses (including ungrammatical ones), ranking them according to their statistical similarity to the training data. For the same reason, they will usually accept alternative versions of quandaries like ‘?his and my book’ vs. ‘??him and me’s book’, for which there is no single correct realization, since both are likely to be similar in varying degrees to examples that have occurred in the training data somewhere.
As Pullum (Reference Pullum2020) points out, a statistical parsing model of this kind is central to Abend et al.’s (Reference Abend, Kwiatkowski, Smith, Goldwater and Steedman2017) CCG-based model of language acquisition, which learns a variational Bayesian model of all possible lexical categories and all possible instantiations of a few universal syntactic rules that it has ever encountered in a single incremental pass through a corpus of transcribed child-directed utterances (CDU) paired with logical forms (including irrelevant distractors) that are unaligned with words, under the regime known as ‘semantic bootstrapping’. For example, let us assume that the first CDU that the child pays attention to is ‘Nice doggies!’, meaning nice dogs, and is uttered in a situation in which, among other things, there are dogs which (the adult surmises) the child likes, and the child has access to that meaning (or whatever corresponds to it in the child’s language of mind).Footnote 12
On the assumption that the child can analyze that meaning, using a universal rule of function application, as made up of the universal predicate nice applied to the entity dogs, they still cannot immediately know whether ‘nice’ is an adjectival predicate of type $ N/N $ meaning nice or a nominal of type $ N $ meaning dogs, or part of some other contextually salient but irrelevant meaning. It will therefore consider multiple equiprobable lexical possibilities for the word ‘nice’, each pairing a different syntactic type with a different semantic concept. (‘Doggies’, of course, is similarly ambiguous between $ N: dogs $ and $ N\hskip0.3em \backslash \hskip0.3em N: nice $ .) However, further exposure to paired sentences and contextually supported meanings will, even in the presence of noise and other distractions, offer many more occasions on which ‘doggies’ could be an $ N $ meaning dogs and ‘nice’ could be $ N/N $ meaning nice than ones on which they could be anything else. The probability mass associated with the wrong initial hypothetical lexical entries will accordingly rapidly be lowered in contrast to that of the correct ones. The same will apply to more complex examples involving syntactic discontinuities, such as ‘What you want is the doggie’, although the meaning representation will be correspondingly more complex, such as $ \lambda x. want\hskip0.5em x\hskip0.5em you\hskip0.5em \wedge \hskip0.5em doggie\hskip0.5em x $ , and the rules for breaking them down into phrases and lexical items will be more diverse, involving rules of movement, feature passing or function composition, according to whatever the child uses for Universal Grammar. To that extent, this mechanism must work cross-linguistically, although demonstrating that fact empirically may require a firmer grasp than we have at present on the linguistic semantics that underlies those universal principles.Footnote 13
More generally, those among the child’s hypothesized pairings of words with syntactic and semantic categories that correspond to the adult grammar will gain high cumulative probability mass because examples consistent with their involvement are frequently encountered. Others that do not correspond to the adult grammar lose probability mass in proportion because such evidence is increasingly rarely encountered. As the child’s probabilistic grammar approaches that of the adult, their certainty in assigning novel words to known categories on the basis of known context increases to the point of allowing them often to do so in their first exposure, in a process referred to as ‘one-trial learning’, which is also characteristic of Abend et al.’s learner in the later stages. The ability of adults to interpret sentences including novel words similarly shows that they are all still language learners in exactly the same sense.
The growing fragment of English grammar (or whatever language is being learned) that a linguist would identify as ‘the grammar’ only exists in such systems in a distributional sense. That is to say that the meaning of an English transitive verb is associated with a VO syntactic category with overwhelmingly high probability, but not as high as 1, or certainty. The CCG theory of grammar that constrains it and gives it form is generative and lexicalized, and resides firmly at level 3 of the extended hierarchy of languages defined by wMSO logic, LCFRS and recursion theory. However, such a system does not specify a set of ‘all and only’ the strings of the language it generates because of the generative lexical wild card. That stringset does not have the sharp boundaries – much less the closed lexicon and holistic acquisition properties – that Pullum (Reference Pullum2020: 5) identifies in many current GES theories.
Seen in the light of such systems, Pullum and colleagues’ MTS requirement can be seen as a call for linguists to bring their theories back into line with the requirements of the psychological theory of language acquisition and natural language processing, which are for syntactic-semantic type transparency, for a continuum of grammaticality, and for an open and incrementally-learnable lexicon.
To reach this conclusion is not of course to say that syntacticians should themselves start working on sentence processing and language acquisition, but rather that in order to be true to the nature of language itself, they need to give those who do work on those problems theories of grammar that they can work with. My own suggestion for such a theory is, of course, CCG. But other alternatives are possible, as long as they are similarly model-theoretic syntactically.
5. Conclusion
Model-theoretic syntax is a good idea. Its elegant formulation in terms of a hierarchy of weak monadic second-order logics over trees of successively greater dimensionality – completing the unity of the language hierarchy across phrase-structure productions, automata and logic – suggests a deeper type-transparent connection of syntactic structure to the model-theoretic semantics of natural language itself, and a possible explanation for why universal grammar should be constrained in this way.
It may well also turn out that constraint-based grammars have an empirical edge for some linguistic purposes, as linguists in HPSG (Ginzburg & Sag Reference Ginzburg and Sag2000: 2) and LFG (Bresnan Reference Bresnan2001: vii) have always argued for syntax, and as (as a reviewer points out) may well be the case for morphology and the lexicon. However, it is less clear that a constraint basis for the theory of grammar is a necessary requirement for natural language or that it follows from anything of any linguistic relevance in the definition of MTS.
The part of that definition that is of the utmost importance is that syntactic formalisms, whether constraint-based or generative, should live at some level – the less expressive, the better – of that extended hierarchy of abstract families of languages/grammars. For the working syntactician, model-theoretic syntax carries three attractions: First, it guarantees that the theory is monotonic and semantically type-transparent, reconnecting with the requirements of processing and acquisition. Second, it guarantees that the theory is explanatory, in the sense of excluding classes of phenomena that can only be captured by more expressive theories. Third, it affords a partial evaluation metric on descriptively adequate theories, whether they are expressed in terms of constraints or rules. It also offers a rapprochement in understanding with the psychologists and computer scientists, by many of whom linguistic theory has been abandoned since the early seventies.
Acknowledgements
Thanks to Geoff Pullum for tolerating innumerable questions and emails concerning MTS. Alex Koller, Alex Lascarides, Jim Rogers and Bonnie Webber gave logical advice. The responsibility for any misinterpretation remains with the author. Thanks also to the reviewers for suggesting many improvements to the paper. The project semantax has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 742137).
Competing interest
The author has no conflicts of interest to declare.