1.0 Introduction
Fish swim, birds fly, people speak. For the first two, the standard wisdom is that fish and birds do what they do partly in virtue of being biologically built to do what they do. Mentalist conceptions of linguistics apply similar reasoning to humans and their linguistic behavior. So situated, the goal of linguistics is to describe and explain the mental/brain properties that allow for human linguistic facility. More specifically, just as ornithologists take it for granted that many features of birds are biologically dedicated to efficiently supporting flight, and ichthyologists assume that fish come with many properties to optimize swimming, linguists (of the mentalist variety) propose that humans come with a faculty of language (FL) endowed with linguistically bespoke properties which partly ground the linguistic competence characteristic of humans.
We even know a little about the fine structure of FL due to sixty-five years of research by Generative grammarians. We know, for example, two very general things. First, that part of linguistic competence consists in having acquired a Grammar (G) able to recursively generate an unbounded number of distinct hierarchically organized structures that pair an articulation with a meaning (i.e. <π,λ> pairs). Second, we also know that any (non-pathological) child can acquire the G of any language and that the course of acquisition of that G is more or less the same across all acquirers and all Gs. This does not mean that there are no individual differences. Rather, the targets of acquisition and the time course of their acquisition is largely unaffected by anything other than placement in the appropriate speech community. Put any kid in any English/Swahili/Basque/… speaking environment and the child will acquire facility in English/Swahili/Basque/… in more or less the same way in more or less the same time. And acquiring facility means (at least) being able to pair an articulation π with a meaning λ over an unbounded domain of linguistic objects.
The “unbounded” part above directly implicates the existence of an acquired G; for the only way for a finite entity (the brain) to display an unbounded capacity like the one we find manifest in linguistic behavior (which we know involves dealing with unboundedly many different discrete hierarchically organized objects) is as the expression of a finitely specifiable generative procedure that takes its prior computed outputs as subsequent inputs for further computation. In other words, the unbounded nature of human linguistic facility implicates the existence of Gs (i.e. recursive rule systems) that generate an infinity (i.e. unbounded number) of distinct hierarchically organized objects from a finite specifiable set of atoms, by combining these atoms together into larger structures that can themselves be further combined into yet larger structures. All of this we know, and we have known it for quite a while, and it should be neither controversial nor tendentious.
What is (rightly) debated and still under active investigation is the exact specification of the recursive procedures found in human Gs. To say that human Gs are recursive leaves open the question of what the relevant generative procedures look like. And this is a very, very, very BIG question. There are an infinite number of possible recursive functions, only a very, very, … very small number of which (maybe just one really!) are attested in natural language grammars. Therefore, not surprisingly, generative research over the last sixty-five years has explored many options and has changed its collective mind repeatedly about the nature of the procedures that FL makes available to generate linguistic structures and establish linguistic dependencies. In what follows, I outline how the mentalist Generativist project has investigated the fine structure of FL and Universal Grammar (UG).Footnote 1 The goal is to appreciate the logic of this roughly seventy-year project and identify how the Minimalist Program (MP) conceptually fits into that project. Here goes.
1.1 Some Salient Facts, Some Obvious Consequences, and the Questions They Raise
The Generative Program began with a focus on two salient facts. The first is that native speakers of a (human) language are linguistically creative in the sense that they are capable of producing and understanding an unbounded number of qualitatively different discrete kinds of linguistic expressions (e.g. phrases and sentences).Footnote 2 The second salient fact, let’s call it linguistic flexibility, is that any human child can acquire any human language if placed in the appropriate speech community. Further, so far as we can tell, the capacity to acquire competence in a specific language L is more or less uniform in the species in that the end state attained (linguistic competence) is (more or less) the same and its course of development is (roughly) uniform regardless of the child and regardless of the language.Footnote 3
These two facts are not subtle. Nobody will win a fancy prize for doing clever and laborious experiments to discover their existence. However, until Chomsky noted them over sixty years ago, they were little noticed, and few bothered to ask how either was possible. A central ambition of the Generative Program has been to address how these two facts could be true. What allows humans to be linguistically creative and linguistically flexible? More specifically, what kinds of minds could support these two related yet different kinds of capacities?
The Generative answer to this pair of questions is now relatively well known. Here is a snapshot version.
Linguistic creativity (LC) is explained by assuming that native speakers of a human language L have a particular kind of knowledge of L. What kind? Native speakers have acquired a grammar (G) (aka, a generative procedure) that recursively specifies an open-ended number of meaning–sound pairs for that language. In other words, linguistic creativity in L (LCL) rests (at least in part) on having a G of L (GL). The fact that GL is recursive explains LC’s open-ended nature, in that creatures endowed with GLs will have knowledge of an unbounded number of linguistic objects. That’s what recursive systems do. They finitely specify a capacity that extends over an unbounded (i.e. infinite) domain. LC reflects the fact that native speakers have internalized such a recursive system, and this recursively specified G is what (at least in part) endows native speakers with the power to produce endlessly many novel sentences/phrases and allows them to understand such novelties upon hearing them. So, to the question: how is it that competent native speakers can be linguistically creative?, we have the answer: in virtue of having internalized a GL, a recursive generative procedure, that specifies the (unboundedly many) objects of L for their particular native language L. So LC is (in part) explained in terms of internalized GLs.Footnote 4
And what of linguistic flexibility (LFL), the capacity humans have to acquire any GL under the right input conditions? The possibility of LFL follows if humans come equipped with a faculty of language (FL) with the power to yield grammars when fed the (linguistically) relevant data. In the simple case, such data will be bits of language L produced/uttered by proficient native speakers of L based on the GLs that they as proficient native speakers have internalized.Footnote 5 So, LFL follows if humans are endemically endowed with FLs that can map the bits of a language L a child is exposed to (and takes in) (aka, “primary linguistic data of L” (PLDL)) onto a grammar of L (GL). Or pictorially:
(1) PLDL → FL → GL
In other words, FL is a function that takes PLDL and maps it onto a GL (i.e. FL(PLDL) = GL). From the perspective of the Generative Program so construed, G and FL are empirical hypotheses about how two facts (i.e. LC and LFL) are possible. From this perspective, the program identifies the focus of inquiry to be GLs (specific generative procedures internalized by native speakers of particular Ls) and FL (the recipe that allows humans to acquire GLs when appropriately linguistically placed).
That GLs exist in native speakers and that FL exists as a human biological endowment are NOT exciting claims. They are close to the conceptual minimum required to accommodate our two very salient facts, LC and LFL. That something like these two cognitive (and ultimately biological) objects exist is really a no-brainer. After all, to say that a native speaker has internalized a GL is to acknowledge that s/he has an unbounded capacity to use and understand L. And to say that someone has an FL is just to say that s/he has a second-order capacity to acquire the first-order capacity specified by a GL. But given that native speakers are quite obviously linguistically creative and given that humans are quite obviously capable of acquiring any GL if exposed to PLDL, the supposition that GLs and FLs “exist” and are legitimate objects of inquiry must be correct. The inferential leaps from LC to GLs and from LFL to FL are very, very short.
The hard empirical question, then, is not whether these objects exist, but what they look like in detail. In other words, the hard part of the Generative Program is specifying what GLs look like (i.e. what kinds of recursive generative procedures they embody) and what the fine structure of FL is (i.e. what principles it must embody to allow for the acquisition of GLs for arbitrary Ls).
Given this framing, an important subsidiary question of interest is the degree to which the structures of GLs and the fine structure of FL are linguistically bespoke or cognitively and/or computationally generic. In other words, a central sub-project of the program will be to determine to what extent (if any) our first- and second-order linguistic facilities require a mental apparatus specifically tuned to the properties of language and to what extent the capacities manifested in linguistic behavior reflect our combined cognitive and computational powers more generally.
In case you haven’t noticed, this last question is quite definitely an empirical one. To date, the Generative answer has been that linguistic proficiency does require specifically linguistic cognition. The minimalist codicil to this general conclusion has been that it only requires a dollop of such, rather than a large heaping shovelful. We will return to this issue anon, but for now, let’s take a quick trip through the history of Generative Grammar so that we can appreciate how Minimalism, the latest step in the Generative Program, fits into the entire Generative Grammar project.Footnote 6
1.2 The First Two Stages of the Generative Program
Again, let’s start with the two big facts (i.e. LC and LFL) and ask how to rationally investigate them. Recall that addressing LC requires saying something about the GLs that a native speaker of L has acquired, in particular a specification of the generative procedures that it embodies (i.e. the particular rules of grammar that characterize a native speaker’s (unbounded) knowledge of/sense of the language L). And addressing LFL requires specifying the fine structure of FL that allows humans to become native speakers of a particular L, which means specifying how a person uses PLDL to acquire their GL.Footnote 7
This description of the research problem immediately suggests a rational order of inquiry. To address LFL questions requires having some GL specimens. After all, the LFL question is how humans acquire grammars, and unless we have some idea of what kinds of grammars humans actually acquire, it will be well-nigh impossible to investigate how humans do what they/we do.Footnote 8 So, as a practical matter, the first step in the Generative Program will be to find some plausible candidate rules of grammar embodied in particular GLs. Not surprisingly, this kind of investigation indeed characterizes a good deal of the first stages of Generative inquiry.
So, the first question on the research agenda should have been (and was): What properties (rules, generative procedures, principles) characterize individual Gs? More particularly, what kinds of recursive rules do GLs incorporate?
We know part of the answer to this last question because of another obvious fact about natural languages: The kinds of linguistic objects that Gs relate are meaning–sound pairings. For example, among the things a native speaker of English knows is that Dogs chase cats does not mean the same thing as Cats chase dogs while Cats are chased by dogs does. There are an unbounded number of such systematic facts that a competent native speaker of a given natural language knows.
Thus we know two important things about any GL: (i) it involves recursive rules and (ii) it produces meaning–sound pairings.
The first fact suggests that linguistic competence consists (in part) in mastery of a system of rules that specifies the natural language mastered. Why a rule system? Because that is the only way to finitely specify an effectively infinite capacity. We cannot just list the objects in the domain of a native speaker’s competence and treat the capacity as akin to looking things up on a giant list because, given LC, the list would have to go on forever. The capacity can only be specified in terms of a finite procedure that describes (i.e. generates) it. Thus, we conclude that linguistic mastery of a language L consists (in part) in acquiring a set of rules (i.e. a GL) that generate the kinds of linguistic objects that a native speaker of L is competent with.
The second fact tells us something more about these GLs. They must specify pairings of meanings with sounds. Thus the rule systems that native speakers have mastered are rules that generate objects with two distinctive properties. GLs consist of generative procedures that tie a specific meaning profile together with a specific sound profile,Footnote 9 and they do this over an effectively infinite domain. So GLs are functions whose range is meaning–sound pairs, viz. an infinite number of objects like this: <m,s>. What’s the domain? Some finite set of “atoms” that can combine again and again to yield more and more complex <m,s> pairs. Let’s call these atoms “morphemes.”
Putting this all together, we know from the basic facts and some very elementary reasoning that native speakers master GLs (recursive procedures) that map morphemes into an unbounded range of <m,s>s. THIS. WE. KNOW. What we don’t know is what the specific rules that Gs contain look like (or, for that matter, what the ‘m’s and ‘s’s look like). And that brings us to our first research question: describe specific rules characteristic of natural language Gs and specify their variety and interactions. The earliest Generative research aimed to provide some candidate rules of specific grammars and show how their interactions would mirror some of the complexities that native speakers’ competence displays. In other words, the first order of business in Generative research involved producing detailed model grammars of the kinds of rules that particular GLs have and how these rules interact. Many different rules were investigated: movement rules, deletion rules, phrase structure rules and binding rules, to name four. And their complex modes of interaction were limned. Consider some details.
1.3 Step 1: Some Possible Rules of GLs
Recall that one of the central facts about natural languages is that they contain a practically infinite number of objects that pair a meaning with a sound.Footnote 10 They also contain dependencies defined over the structures of these objects. In early theories of Generative Grammar, phrase structure (PS) rules recursively specified the infinite class of well-formed “base” structures in a given G. Lexical insertion (LI) rules specified the class of admissible local dependencies in a given G, and transformational (T) rules specified the class of non-local dependencies in a given G.Footnote 11 Let’s consider each in turn.
PS-rules are recursive and their successive application creates bigger and bigger hierarchically organized structures on which LI- and T-rules operate to generate other structures and dependencies.Footnote 12 (2) provides some candidate PS-rules (the ‘(…)’ indicates optional expansion):
a. S → NP aux VP
b. VP → V (NP) (PP)
c. NP → (det) N (PP) (S)
d. PP → P NP
These four rules suffice to generate an unbounded number of hierarchically structured objects. Thus, a sentence like John kissed Mary has the structure in (3) generated using rules (2a,b,c).
(3) [S [NP N] aux [VP V [NP N]]]
LI-rules like those in (4) insert terminals into these structures, yielding the structured phrase marker (PM) in (5):
a. N → John, Mary …
b. V → kiss, …
c. aux → past
(5) [S [NP [N John]] [aux past] [VP [V kiss] [NP [N Mary]]]]Footnote 13
PMs like (5) also reflect local inter-lexical dependencies. Note that replacing kiss with arrive yields an unacceptable sentence: *John arrived Mary. The PS-rules can generate the relevant structure (i.e. (3)), but the LI-rules cannot insert arrive in the V position of (3) because arrive is not lexically marked as transitive. In other words, NP^kiss^NP is a fine local dependency, but NP^arrive^NP is not.
Given structures like (5), T-rules can apply to rearrange them, thereby coding for a variety of non-local dependencies.Footnote 14 What kind of dependencies? The unit of transformational analysis in early Generativism is the construction. Some examples include: Passive, Wh-questions, Polar questions, Raising, Equi-NP Deletion (aka control), Super Equi, Topicalization, Clefting, Dative Shift (aka Double Object Constructions), Particle Shift, There constructions (aka Existential Constructions), Reflexivization, Pronominalization, Extraposition, among others. Though these rules fall into some natural formal classes (see below), they also contain a great deal of construction-specific information, reflecting construction-specific morphological peccadillos and restrictions. Here’s an illustration.
Consider the Passive rule in (6). ‘X’/‘Y’ in (6) are variables. The rule says that if you can factor a string into the parts on the left of the arrow (viz. the structural description) you can change the structure to the one on the right of the arrow (the structural change). Applied to (5/7a), this yields the derived phrase marker (7b).
(6) X - NP1 - AUX - V - NP2 -Y → X - NP2 - be+en - V - by NP1 - Y
a. X- [NP John] [aux past] [V kiss] [NP Mary]-Y
b. X- [NP Mary] [aux past] be+en [V kiss] by [NP John]-Y
Note that the rule codes the fact that what was once the object of kiss is now a derived subject. Despite this change in position, Mary is still understood as the kissee. Similarly, John, the former subject of (5) and the kisser, is now the object of the preposition by, and still the kisser. Thus, the passive rule in (6) codes the fact that Mary was kissed by John and John kissed Mary have a common thematic structure as both have a derivation which starts from the same underlying PM in (5). In effect, this proposal tracks the non-local dependency between Mary and kiss in Mary was kissed by John by proposing that the input to this sentence involves a PM where kiss and Mary are locally proximate (as in (5)).
The research focus in this first phase of grammatical investigation was on carefully describing the detailed features of a variety of different constructions, rather than on factoring out their common features.Footnote 15 Observe that (7b) introduces new morphemes into the PM (e.g. be+en, by), in addition to rearranging the nominal expressions. T-rules did quite a bit of this, as we shall see below. What’s important to note for current purposes is the division of labor between PS-, LI- and T-rules. The first generates unboundedly many hierarchical structures, the second “chooses” the right ones for the lexical elements involved (and locally codes their “thematic” properties) and the last rearranges them to produce novel surface forms that retain the “thematic” relations specified in the inputs to the T-rules, even when the relata are no longer in their original proximate positions.Footnote 16 So, for example, in (7b) Mary is still understood as the kissee despite no longer being adjacent to the verb kiss.
T-rules, despite their individual idiosyncrasies, fall into a few identifiable formal families. For example, control constructions are generated by a T-rule (Equi-NP Deletion) that deletes part of the input structure. Sluicing constructions also delete material but, in contrast to Equi-NP Deletion, they do not require a PM-internal grammatical trigger (aka antecedent) to do so. Movement rules (like Passive in (6) or Raising) rearrange elements in a PM. And T-rules that generate Reflexive and Bound Pronoun constructions neither move nor delete elements but replace the lower of two identical lexical NPs with morphologically appropriate formatives (as we will illustrate presently).
In sum, the first epoch of Generative inquiry provided a budget of actual examples of the kinds of rules that Gs contain (i.e. PS, LI and T) and the kinds of properties these rules had to have to be capable of specifying the kinds of recursion and the kinds of dependencies characteristically found within natural languages. In other words, early Generative work developed a compendium of examples of actual G rules in a variety of languages.
Nor was this all. Early Generative Grammar also provided models for how these different rules interact. Recall that one of the key features of natural languages is that they include effectively unbounded hierarchically organized objects. This means that the rules talk to one another and apply to one another’s outputs to produce an endless series of complex structures and dependencies. Early Generative research started exploring how G rules could interact and it was quickly discovered how complex and subtle G interactions could be. For example, in the Standard Theory, rules apply cyclically (from smaller domains to larger domains that contain these smaller domains) and in a certain fixed order (e.g. PS-rules applying before T-rules). Sometimes the order of rule application is intrinsic (follows from the nature of the rules involved) and sometimes not. Sometimes the application of a rule creates the structural conditions for the application of another (feeding), sometimes it destroys the structures (bleeding) thereby preventing a possible operation from applying. These rule systems could be very complex, and these initial investigations gave linguists a first serious taste of what a sophisticated capacity natural language competence was.
It is worth going through an example to get a feel for this complexity. For illustration, consider some binding data and the rules of Reflexivization and Pronominalization, and their interactions with PS-rules and T-rules like Raising.
Lees and Klima (L&K) (Reference Lees and Klima1963) offered the following two rules to account for an interesting array of binding data in English (see data in (10)–(13)).Footnote 17 These rules must apply when they can and are (extrinsically) ordered so that (8) applies before (9).
(8) Reflexivization:
X - NP1 - Y - NP2 - Z → X - NP1 - Y - pronoun+self - Z
(where NP1=NP2, pronoun has the phi-features of NP2, and NP1/NP2 are in the same simplex sentence)
(9) Pronominalization:
X - NP1 - Y - NP2 - Z → X - NP1 - Y - pronoun - Z
As is evident, the two rules are formally very similar. Both apply to identical NPs in a phrase marker and morphologically convert one to a reflexive or to a pronoun. (8), however, only applies to nominals in the same simplex clause (i.e. to “clause-mates”), while (9) is not similarly restricted. As (8) obligatorily applies before (9), Reflexivization will bleed the environment for the application of Pronominalization by changing NP2 to a reflexive (thereby rendering the two NPs no longer “identical”). A consequence of this ordering is that Reflexivization and Pronominalization rules apply in distinct domains. In English, this means that Reflexives and (bound) pronouns must be in complementary distribution.Footnote 18
An illustration should make things clear. Consider the derivation of (10a) (where himself/him are understood as anaphorically dependent on John1). It has the underlying form (10b). We can factor (10b) as in (10c) as per the Reflexivization rule (8). This results in converting (10c) to (10d) with the surface output (10e) carrying a reflexive interpretation. Note that the Reflexivization derivation codes the fact that John is both washer and washee, as well as that John non-locally relates to himself.
a. John1 washed himself/*him
b. John washed John
c. X-John-Y-John-Z
d. X-John-Y-him+self-Z
e. John washed himself
What blocks John likes him with a similar anaphoric reading (i.e. where John is co-valued with him)? To derive this structure Pronominalization must apply to (10c). However, it cannot, as (8) is ordered before (9) and both rules are obligatory (i.e. they must apply when they can apply). But once (8) applies, we get (10d), which no longer has a structural description amenable to (9). Thus, the application of (8) bleeds the grammatical context for the application of (9) and John likes him with a bound reading of the pronoun cannot be derived (i.e. there is no licit grammatical relation between John and him).
This changes in (11). Reflexivization cannot apply to (11c) as the two Johns are in different clauses. As (8) cannot apply, (9) can (indeed, must) as it is not similarly restricted to apply to clause-mates. In sum, the inability to apply (8) allows (and demands) the application of (9). Thus does the L&K theory derive the complementary distribution of reflexives and bound pronouns.
a. John believes that Mary washed *himself/him
b. John believes that Mary washed John
c. X-John-Y-John
d. X-John-Y-him
e. John believes that Mary washed him
There is one other feature of note: The binding rules in (8) and (9) also effectively derive a class of (what are now commonly called) Principle C effects, given the background assumption that reflexives and pronouns morphologically obscure an underlying expression which is identical to the antecedent. Thus, the two rules prevent the derivation of structures like (12) in which the bound reflexive/pronoun c-commands its antecedent.Footnote 19
a. *Himself1 kissed Bill1
b. *He1 thinks that John1 is tall
It should be noted that deriving Principle C effects in this way is not particularly deep. The rules derive the effect by stipulating that it should be the higher (actually, leftmost) of two identical NPs that is retained in the structural change of the relevant transformation while the lower (rightmost) one is replaced by a reflexive/pronoun.Footnote 20
The L&K theory can also explain the data in (13) and (16) in the context of a G with a rule like Raising to Object in (14), which, let’s assume, obligatorily applies before (8)/(9).
a. *John1 believes him/he-self1 is intelligent
b. John1 believes he1 is intelligent
(14) Raising to Object:
X - V - C - NP - Y → X - V - NP - C - Y
(where C, the complementizer, is phonetically null and non-finite)Footnote 21
(14) cannot apply to raise the finite embedded subject in (15) to the matrix clause, as the null complementizer C of the embedded clause is finite. This prevents (8) from applying to derive (13a), as (8) is restricted to NPs that are clause-mates. But, as failure to apply (8) requires the application of (9), the mini-grammar depicted here leads to the derivation of (13b) from (15).
(15) John1 believes C John1 is intelligent
Analogously, (8), (9) and (14) also explain the facts in (15), if (14) is obligatory and must apply when it can.Footnote 22
a. John1 believes himself1 to be intelligent
b. *John1 believes him1 to be intelligent
The L&K analysis can be expanded further to handle yet more data when combined with other rules of G. And this is exactly the point: to investigate the kinds of rules Gs contain by seeing how their interactions derive non-trivial linguistic data sets. This allows us to explore what kinds of rules exist (by proposing some and seeing how they work) and what kinds of interactions rules can have (they can feed and bleed one another, they are ordered, obligatory, etc.).
The L&K analysis above illustrates two important features of these early proposals. First, it (in combination with other rules) compactly summarizes a (practically infinite) set of binding “effects,” patterns of data concerning the relation of anaphoric expressions to their antecedents in a range of phrasal configurations. It doesn’t outline all the data that we now take to be relevant to binding theory (e.g. it does not address the contrast in John1’s mother likes him/*himself1), but many of the data points discussed by L&K have become part of the canonical data set that any theory of binding is responsible for. Thus, the complementary distribution of reflexives and (bound) pronouns in these sentential configurations is now a canonical fact that every subsequent theory of binding has aimed to explain. So too the locality (viz. the clause-mate condition) required between antecedent and anaphor for successful reflexivization, the anti-locality requirement on licit bound pronouns (i.e. bound pronouns and their antecedents cannot be clause-mates) and the prohibition against anaphors c-commanding the antecedents of which they are anaphoric dependents.
The variety of data L&K identifies is also noteworthy. From very early on, the Generative Program understood that both positive and negative data are relevant for understanding how FL and Gs are structured. Positive data is another name for the “good” cases (examples like (10e) and (11e)), where an anaphoric dependency is licensed. Negative data are the * cases (examples like (12a) and (16b)) where the relevant dependency is illicit. Grammars, in short, not only specify what can be done, they also specify what cannot be done. Generativists have discovered that negative data often reveals more about the structure of FL and a particular G than positive data does.Footnote 23
Second, L&K provides a theory of these effects in the two rules (8) and (9). As we shall see, this theory was not retained in later versions of Generative Grammar.Footnote 24 The L&K account relies on machinery (obligatory rule application, bleeding and feeding relations among rules, rule ordering, Raising to Object, etc.) that was replaced in later theory by different kinds of rules with different kinds of properties. The L&K rules themselves are also very complex (e.g. they are extrinsically ordered). Later approaches to binding attempt to isolate the relevant factors and generalize them to other kinds of rules. We return to this anon.
One more terminological point: In what follows, it is useful to distinguish between “effects” and “theory.” As Generative theories changed over the years, discovered effects (e.g. that Reflexivization and Pronominalization are in complementary distribution, that Wh-movement out of islands is illicit, that PRO appears in non-finite subject positions, etc.) have been largely retained, though the theories developed to explain these effects have often changed significantly.Footnote 25 For example, as we will see below, the L&K theory was replaced by Principles A, B and C of the binding theory, yet a central binding effect (viz. the complementarity between Reflexivization and Pronominalization) was retained. This is similar to what we observe in the mature sciences (think ideal gas laws with respect to thermodynamics and later statistical mechanics). What is clearly cumulative in the history of Generative Grammar is the conservation of discovered effects. Theory changes, and deepens. Some theoretical approaches are discarded, some refashioned and some resuscitated after having been abandoned. Effects, however, are largely conserved and a standard boundary condition of theoretical admissibility in later theory is that the new theory with its novel assumptions explain the effects that the older replaced theory explained.
I should also add that for large stretches of theoretical time, basic theory has also been conserved (e.g. some version of the cycle has been with us since almost the inception of the Generative Program). However, the cumulative nature of Generative research is most evident in the preservation of the various discovered effects. In Section 1.6, I list a number of these. It is an impressive group. But first, let’s take a look at how establishing a set of plausible G rules sets the stage for addressing the second Generative question concerning linguistic flexibility.
1.4 Step 2: Categorizing, Simplifying and Unifying the Rules
As noted, the first stage of Generative research yields a bunch of rules describing a bunch of linguistic constructions in addition to providing early models of how the different kinds of rules might interact to generate an unbounded number of <m,s>s within a given natural language L. Here we look at how this prepared the way for research focusing on the second question concerning the nature of linguistic flexibility (LFL): what must FL look like given that it can produce GLs with these kinds of rules and these kinds of interactions? At the risk of stating the obvious (not a risk I worry much about), observe that asking this question only makes practical sense once we have serious candidate GLs, language-specific generative procedures. For the LFL question to be fecund presupposes that we have identified some GL rules with the right properties, for it is GL rules like these that we want FL to target. Given this, it is not surprising that LFL issues awaited (partial) answers to the conceptually prior LC question.
Investigations into FL moved along two tracks: (i) cross-linguistic investigations of GLs different from English to see to what degree the GLs proposed for English carry over to those of other natural languages and (ii) simplification and unification of GLs so as to make them more natural “fits” for FL. The second Generative epoch stretches from roughly the mid-1970s to the early 1990s. Within the Chomsky mentalist version of the Generative Grammar Program, the classical example of this kind of work is Lectures on Government and Binding (LGB; Reference ChomskyChomsky (1981)). Our question in this section is: What did LGB accomplish and how did it do this?
LGB was a mature statement of the work that began with Reference Chomsky, Anderson and KiparskyChomsky’s (1973) Conditions on transformations. This work aimed to simplify the rules that GLs contain by distilling out those features of particular G rules that could be attributed to FL more generally. The distilled features were attributed to FL as design features and were dubbed “Universal Grammar” (UG). The basic GB research strategy was to simplify particular GLs by articulating the innate UG principles of FL. Part of this consisted in categorizing the possible rules a GL could contain. Part involved postulating novel theoretical entities (e.g. traces) which served two functions: (i) they allowed the T-rules to be greatly simplified and (ii) they allowed for a partial unification of two, heretofore distinct, parts of the grammar, namely binding and movement.
Articulating FL in this UGish way also had a natural acquisition interpretation relevant to addressing the fact of linguistic flexibility: in learning a particular GL, the language acquisition device (LAD, aka the child) need “abstract”/“induce” only simple rules from the data, with the more recondite forms of knowledge attained by the child (that earlier theory had coded as part of a rule’s structural description) now being traced to built-in (i.e. innate) structural features of FL (aka the principles of UG). As UG principles are innate, they need not be acquired and so are not hostage to the details of the PLD. That’s the logic, and the program was to simplify language-specific rules by offloading many of their most intricate features to endemic properties of FL as embodied in principles of UG.
As noted, rule simplification has an appealing consequence for acquisition. As language-specific rules can (and do) vary, they must be learned. Thus, simplifying them makes them easier to acquire, while enriching UG allows this simplification to occur without (it is hoped) undue empirical loss. That was the logic. Here are some illustrations.
LGB postulated a modular structure for natural language Gs with the following components and derivational flow.
(17) The LGB Grammar
A Rule Types/Modules
The general organization of the grammar, the ‘Y’-model (17/B), specifies how/where in the derivation these various rules/conditions apply. The Base Rules (17/1) generate X’-structured objects (17/1a) that syntactically reflect “pure GF-θ” (17/1b) (viz. that all and only thematic positions are filled; so logical subjects and logical objects are, at DS, grammatical subjects and grammatical objects), creating phrase markers analogous to (but not exactly the same as) Deep Structures in the Standard (i.e. Aspects) theory. Targets of movement operations are positions generated by the X’-rules in the base which lexical insertion (LI) rules have not lexically filled.Footnote 26 The output of the base component (the combination of X’-rules and LI-rules) is input to the T-component, the part of the grammar that includes movement operations (and that extends the derivation from DS to SS). At SS, various relations are licensed (case, binding, some ECP trace licensing conditions). Derivations then split, with the grammatical structure relevant to sound interpretation (the phonological form (PF)) separated from that required for meaning interpretation. The latter is then mapped via (possible additional) abstract movement rules (rules that have no overt phonetic realization) to logical form (LF), which is the phrase marker that codes the grammatical information relevant to meaning interpretation.Footnote 27
Let’s consider some of the key theoretical and conceptual innovations in the GB model.
Movement rules are entirely reconceptualized in LGB in two important ways. First, they are radically simplified. The simplification involves stripping movement of its constructional specificities (abstracting away from what was moved (e.g. a Topic, or a Wh-morpheme or a Focused element)) and distilling out the fundamental movement operation (dubbed “Move α”). The rule Move α can move any expression anywhere, subject to one restriction: all G rules, including Move α, are structure preserving in the sense that all the constituency present in the input to the rule is preserved/conserved in the output of the rule.Footnote 28
In concrete terms, this preservation/conservation assumption motivates the second key innovation in LGB: Trace theory. Trace theory has two important theoretical consequences: (i) it is a necessary ingredient in the simplification of movement rules to Move α and (ii) it serves to unify movement and binding theory.
So, simplification, unification and conservation are all pressed into service in developing the GB theory of FL. Let’s consider how unifying and simplifying earlier Standard Theory accounts of G operations gets us to a GB-like theory.
First the process of simplification: LGB replaces complex construction-based rules like Passive, which in the Standard Theory look something like (18), with the simple rule of “Move NP,” this being an instance of Move α with α = NP.
(18) X - NP1 - Y - V - NP2 - Z → X - NP2 - be+en - V - by NP1
(where NP1 and NP2 are clause-mates)
Move NP is simpler in three ways. First, (18) involves the movement of two NPs (note: the structural change on the right of the arrow differs from the structural description on the left in that NP2 has moved to near the front of the string from post-verbal position and NP1 has moved from the left edge to the right and now forms part of a by-phrase). Passivization, when analyzed in Move α terms (aka Move NP when α = NP), involves two applications of the simpler rule rather than one application of the complex one. Second, (18) not only moves NPs around, but it also inserts passive morphology (be + en) as well as a by-phrase. Third, in contrast to (18), an application of Move α (where α=NP) allows any NP to move anywhere. Thus, the Move α analysis of Passive factors out the NP movements from the other features of the passive rule. This effectively eliminates the construction-based conception of rules characteristic of the earlier Standard Theory and replaces it with a far more abstract conception of a G rule; it effectively treats earlier construction-based rules as interactions and combinations of simpler ones.
These simplifications, though theoretically desirable, create empirical problems. How? Rules like Move NP left to themselves wildly overgenerate, deriving all sorts of ungrammatical structures (as we illustrate below).Footnote 29 GB addresses this problem in a theoretically novel way. It eliminates the empirically undesirable consequences by enriching UG. In particular, GB theory targets two related dimensions: it simplifies the rules of GL while enriching the structure of FL/UG. Let’s consider this in more detail.
Move α is the simplest possible kind of movement rule. It says something like “move anything anywhere.” Languages differ in what values they allow α to assume, thus allowing for a natural locus of cross-linguistic variation. So, for example, English moves Wh words to the front of the clause to form interrogatives. Chinese doesn’t. In English α can be Wh, in Chinese it cannot be. Or Romance languages move verbs to tense, while English doesn’t. Thus in Romance α can be V, while in English it can’t. And so on. Again, while so simplifying the rules has some appeal, the trick is to simplify without incurring the empirical costs of overgeneration. GB achieves this (in part) via Trace Theory, which is itself a consequence of the Projection Principle, a more general conservation principle that bars derivations from losing syntactic information. Here’s the story.
In the GB framework, Trace Theory implements the general computational principle that derivations be monotonic. For example, if a verb has a transitive syntax in the base, then it must retain this transitive syntax throughout the derivation. Or, put another way, if some NP is an object of a V at some level of representation, the information that it was must be preserved at every subsequent level of representation. In a word, information can be created but not destroyed; that is, G rules are structurally monotonic in the sense that the structure that is input to a rule is preserved in the structure that is output from that rule. Within GB, the name of this general computational principle is the Projection Principle, and the way it is formally implemented is via Trace Theory.
This monotonicity condition is a novelty. Operations within the prior Standard model are not monotonic. To illustrate, take the simple case of Raising to Subject, which can be schematized along the lines of (19):Footnote 30
(19) X - T(ense)1 - Y - NP - T(ense)2 - Z → X - NP - T1 - Y - T2 - Z
(T2=‘to’)
This rule can apply in a configuration like (20a) to derive a structure like (20b):Footnote 31
a. [TP [T present] [VP seem [TP John [T to] [VP like Mary]]]]
b. [TP John [T present] [VP seem [TP [T to] [VP like Mary]]]]
Note that the information that John had been the subject of the embedded clause prior to the application of (19) is lost, as the embedded TP in (20b) no longer has a subject like it does in (20a).
As noted, Trace Theory is a way of implementing the Projection Principle. How exactly? Movement rules in GB are defined as operations that leave traces in positions from which movement occurs. Given Trace Theory, the representation of (20a) after Raising has applied is (21):
(21) [TP John1 [T present] [VP seem [TP t1 [T to] [VP like Mary]]]]
Here t1 is a trace of the moved John, the co-indexing coding the fact that John was once in the position occupied by its trace. As should be clear, via traces, movement now preserves prior syntactic structure (the subject position in (20a) is retained in (21)). As noted, this kind of information-preserving principle (i.e. that grammatical operations cannot destroy structure) becomes a staple of all later theory.Footnote 32
Trace Theory is GB’s first step towards simplifying G rules. The second bolder step is to propose that traces require licensing, and the third boldest step is to execute this by using traces to unify binding and movement. Specifically, binding theory expands to include the relation between a moved α and its trace. Executing this unification, given other standard assumptions (particularly that D-structure represents pure GF-θ) requires rethinking binding and replacing construction-specific rules like Reflexivization in favor of a more abstract way of coding the anaphoric dependency. Again, let’s illustrate.
Say we treat Raising as just an instance of Move α, then we need a way of preventing the derivation of unacceptable sentences like (22a) from sentences with the underlying structure in (22b).
a. *John seems likes Mary
b. [TP [T present] [VP seem [TP John [T present] [VP like Mary]]]]
Now, given a rule like (19), this derivation is impossible. Note that the embedded T is not to but present. Thus, (19) cannot apply to (22b) as its structural description is not met (i.e. the structural description of (19) codes its inapplicability to (22b) thus preventing the derivation of (22a)).Footnote 33 But, if we radically simplify movement rules to “move anything anywhere” (i.e. Move α), the restriction coded in (19) is not available and overgeneration problems (e.g. examples like (22a)) emerge.
To recap, given a rule that simply says “Move NP,” there is nothing preventing the rule from applying to (22b) and moving John to the higher subject position. The unification of movement and binding via Trace Theory serves to prevent such overgeneration. How? By treating the relation between a trace and its antecedent as syntactically identical to that between an antecedent and a reflexive anaphor. Specifically, if the trace in (23a) is a kind of “reflexive” then the derived structure is illicit as the trace left by movement is not bound. In effect, (23a) is blocked in basically the same way that (23b) is.
a. [TP John1 [T present] [VP seem [TP t1 [T Present] [VP like Mary]]]]
b. [TP John1 [T present] [VP believe [TP he-self/him-self1 [T Present] [VP like Mary]]]]
Let’s pause and revel (maybe even wallow!) in the logic on display here: If derivations are monotonic (i.e. obey the Projection Principle) then when move NP (i.e. Move α, with α=NP) applies it leaves a trace in the moved-from position thereby preserving the syntactic structure. Further, if the relation between a moved α and its trace is the same as an anaphor to its antecedent, then the licensing principles that regulate the latter must regulate the former.Footnote 34 So, simplifying derivations by making them monotonic and unifying movement and binding allows for the radical simplification of movement rules (i.e. to a Move α format) without any empirical costs. In other words, simplifying derivations, unifying the modules of the grammar (always a theoretical virtue if possible) serves to advance the simplification of its rules.Footnote 35 The GB virtues of simplification and unification are retained as regulative ideals in contemporary Minimalist thinking.
That’s the basic idea. However, we need to consider a few more details, as reducing (23a) to a binding violation requires reframing the theory of binding. More specifically, it requires that we abstract away from the specifics of the binding constructions and concentrate on the nature of the relations they specify. Here’s what I mean.
The Lees–Klima rule of Reflexivization contrasts with rules like Raising in that the former turns the lower “dependent” into a reflexive while the latter deletes it. Moreover, whereas Reflexivization is a rule that applies exclusively to clause-mates, Raising only applies between clauses. Lastly, whereas Reflexivization is an operation that applies between two identical lexical items (viz. two items introduced by lexical insertion in Deep Structure), Raising does not (in contrast to Equi, for example).Footnote 36 From the perspective of the Standard Theory, then, Raising and Reflexivization could not look more different and unifying them would appear unreasonable. The GB theory, in contrast, by applying quite generally to all nominal expressions, highlights the relevant dependencies that they can enter into (and that differentiate them) and does not get distracted by other (irrelevant) features of the constructions (like their differing morphology or even their formal etiology).
Let me state this another way. The construction specificity of the rules in the Standard Theory has the consequence that most rules look formally different from one another. Thus, unifying Reflexivization and Equi or Equi and Movement does not seem particularly plausible when one considers the formal features of the rules. Only Trace Theory and the abstractions it introduces makes the potential similarities between these various constructions readily visible.
In particular, GB unifies movement and binding via Trace Theory by recasting the rule of Reflexivization. Recasting Reflexivization constructions as Principle A effects allows FL to treat the relation between the nominal that has moved and the trace left by this movement and the relation between the reflexive and the nominal that serves as its antecedent as the same relation. The GB accomplishes this by treating A-traces and reflexives as morphemes of the same kind, subject to the same licensing condition. Thus, critically, this unification requires moving from binding rules like Reflexivization to licensing conditions like Principle A. Let’s consider how.
GB binding theory (BT) divides all nominal (overt) expressions into three categories and associates each with a licensing condition. The three are (i) anaphors (e.g. reflexives, reciprocals, PRO), (ii) pronominals (e.g. pronouns, PRO, pro) and (iii) R-expressions (everything else). BT regulates the interpretation and distribution of these expressions. It includes three conditions, Principles A, B and C, and a specification of the relevant domains and licit dependencies:
(24) GB Binding Principles:
(25) α is the minimal domain for β if α is the smallest clause (TP) with a subject distinct from β.Footnote 37
(26) An expression α is bound by β iff β c-commands α, and β and α are co-indexed.
These three principles together capture all the data we noted in (4)–(10). Let’s see how. The relevant examples are recapitulated in (27). (27a,b,e,f) illustrate that bound reflexives and pronouns are in complementary distribution. (27c,d) illustrate that R-expressions cannot be bound at all.
a. John1 likes himself/*him1
b. John1 believes Mary likes *himself/him1
c. *I expect himself1 to like John1
d. *He1 expects me to like John1
e. John1 believes *himself/he1 is intelligent
f. John1 believes himself/*he to be intelligent
How does BT account for these data? Reflexives are categorized as anaphors and so subject to Principle A. Thus, reflexives must be bound in their minimal domains. Pronouns are pronominals subject to Principle B. Thus, a pronoun cannot be bound in its minimal domain. Thus given BT, pronouns and reflexives must be in complementary distribution.Footnote 38 This accounts for the data in the mono-clausal (27a) and the bi-clausal data in (27b). It also accounts for the data in (27f). The structure is provided in (28):
(28) [TP1 John Present [VP believe [TP2 himself/he to be intelligent]]]
The minimal domain for himself/he is the matrix TP1. Why? Because of (25), which requires that the minimal domain for α must have a subject distinct from α. But himself/he is the subject of TP2. The first TP with a distinct subject is the matrix TP1 and this becomes its binding domain. In TP1 the anaphor must be bound and the pronoun must be free. This accounts for the data in (27f).
(27e) requires some complications. Note that we once again witness the complementary distribution of the bound reflexives and pronouns. The minimal domain should then be the embedded clause if BT is to explain these data. Unfortunately, (25) does not yield this. This problem received various analyses within GB, none of which proved entirely satisfactory. The first proposal was to complicate the notion ‘subject’ by extending it to include the finite marker (which has nominal phi/ϕ (i.e. person, number, gender) features).Footnote 39 This allows the finite T to be a subject for himself/he and their complementary distribution follows given the contrary requirements that A and B impose on anaphors and pronominals.Footnote 40
Principle C excludes (27c,d), as in both cases he/himself binds John. (27c) also violates Principle A.
In sum, BT accounts for the same binding effects the earlier L&K theory does, though in a very different way. It divides the class of nominal expressions into three groups, abstracts out the notion of a binding domain, and provides universal licensing conditions relevant to each.Footnote 41 As with the GB movement theory, most of the BT is plausibly part of the native (“universal”) structure of FL and hence need not be acquired on the basis of PLD. What the learner needs to determine (i.e. acquire) is what group a particular nominal expression falls into. Is each other an anaphor, pronominal or R-expression? Once this is determined, where it can appear and what its antecedents can be follow from the innate architecture of FL. Thus, BT radically simplifies FL by distinguishing what binding applies to from what binding is, and this has a natural interpretation in terms of acquisition: knowledge of what belongs in which category must be acquired, while knowledge of what the relevant categories are and how something in a given category behaves is part of FL and hence innate.Footnote 42
With this as background, let’s return to how GB allows for the unification of binding and movement via Trace Theory. Recall that BT divides all nominal expressions into three groups. Traces are nominal expressions (viz. [NP e]1) and so it is reasonable to suppose that they too are subject to BT. Moreover, as traces determine the θ-roles of their antecedents, they must be related to them for semantic reasons. This would be guaranteed were traces treated like anaphors falling under Principle A. This suffices to assimilate (23a) to (23b) and so it closes the explanatory circle.Footnote 43
So, by generalizing binding considerations to all nominal expressions, and by recasting the binding theory so as to showcase binding domains and binding dependencies, GB makes it natural to unify movement and binding by categorizing traces as anaphoric nominal expressions (a categorization that would be part of FL, hence innate and so in no need of learning). So, simplifying derivations with the Projection Principle leads to Trace Theory, which in turn allows for the unification of movement and binding, which in turn leads to a radical simplification of movement transformations, all without any apparent diminution of empirical coverage.
Let me add one more point concerning linguistic flexibility: recall that one of the big questions concerning language concerns its acquisition by kids on the basis of relatively simple input. The GB story laid the foundations for an answer: The rules were easy to learn because where languages vary the differences are easy to acquire on the basis of simple PLD (e.g. is α = NP or V or Wh or …). GB factors out the intricacies of the Standard Theory rules (e.g. ordering statements and clause-mate conditions) and makes them intrinsic features of FL; hence a child can know them without having to acquire them via PLD. Thus, not only does GB radically simplify and unify the operations in the Standard Theory, a major theoretical accomplishment in itself, it also provides a model for successfully addressing what has been called “Plato’s Problem”: How does knowledge of Gs arise in native speakers despite the relative paucity of data available in the PLD to fix their properties? In other words, how can kids acquire Gs despite the impoverished nature of the PLD?
Let’s end this section. We have illustrated how GB, building on earlier research (and conserving its discovered effects), constructed a more principled theory of FL. Though we looked carefully at binding and movement, the logic outlined above was applied much more broadly. Thus, phrase structure was simplified in terms of X’-theory (pointing towards the elimination of PS-rules altogether in contemporary theory) and island effects were unified under the theory of Subjacency. The latter echoed the discussion above in that it consolidated the view that T-rules are very simple and not construction centered. Rather constructions are complexes of interacting simple basic operations. The upshot is a rich and articulated theory that describes the fixed structure of FL in terms of innate principles of UG. In addition, the very success of GB theory opens a further important question for investigation. Just as research in the Standard Theory paves the way for a fruitful consideration of linguistic universals and “Plato’s Problem,” the success of GB allows for a consideration of “Darwin’s Problem”: How could something like FL have arisen in the species so rapidly and remained so unchanged since its inception? We turn to this in the next chapters, but first, as promised, a (by no means exhaustive) list of effects that sixty years of Generative Grammar research has unearthed.
1.5 Some Effects Generative Grammar Has Discovered over the Last Sixty Years
Here is a partial list of some of the effects that are still being widely investigated (both theoretically and empirically) within Generative research. Some of these effects can be considered analogous to “laws of grammatical structure” which serve as probes into the inner workings of FL. As in the case of L&K’s binding proposal, the effects comprise both negative and positive data and they have served as explanatory targets (and benchmarks) for theories of FL.
These effects also illustrate another distinguishing mark of an emerging science. In the successful sciences, most of the data is carefully constructed, not casually observed. In this sense, it is not “natural” at all, but factitious. The effects enumerated here are similar. They are not thick on the conversational ground. Many of these effects concentrate on what cannot exist (i.e. negative data). Many are only visible in comparatively complex linguistic structures and so are only rarely attested in natural speech or PLD (if at all). Violations of the binding conditions such as John believes himself is intelligent are never attested outside of technical papers in Generative syntax. Thus, in Generative research (as in much of physics, chemistry, biology, etc.) much of the core data used to probe FL is constructed, rather than natural.Footnote 44 To repeat, this is a hallmark of modes of investigation that have made the leap from naturalistic observation to scientific explanation. The kinds of data that drive Generative work are of this constructed kind.Footnote 45
Here, then, is a partial list of some of the more important effects that Generative Grammar has discovered.Footnote 46
(29) A partial list of empirically discovered laws of grammar
4. Minimal distance effects in control configurations
5. Binding effects (A-effects-B-effects)
7. Principle C-effects: an anaphoric element cannot c-command its antecedent
8. CED (condition on extraction domain) effects
9. Fixed subject effects
10. Unaccusativity effects
11. Connectedness effects
12. Obligatory control vs non-obligatory control effects
13. The subject orientation of long-distance anaphors
14. Case effects
15. Theta Criterion effects (Principle of Full Interpretation)
16. NPI (negative polarity item) licensing effects
17. Phrasal headedness effects
18. Clause-mate effects
19. Expletive-associate locality effects
23. Weakest crossover effects
24. Coordinate structure constraint
a. ATB (across-the-board) effects
25. Ellipsis effects
26. A-movement/scrambling obviating WCO (weak crossover) effects
28. Constituency effects
30. Lexical integrity effects
31. Psych verb effects
32. Double object construction effects
33. Predicate-internal subject effects
As in the case of the L&K binding proposal outlined above, just describing these effects involves postulating abstract rules that derive natural language expressions and abstract structures that describe them. Thus, each effect comes together with sets of positive and negative examples and rules/restrictions that describe these data. As in any scientific domain, simply describing the effects already requires quite a bit of theoretical apparatus (e.g. what’s an island, what’s a deletion rule, what’s the difference between A- and A’-movement, what’s case, what’s a clause, etc.). And, as is true elsewhere, the discovery of such effects sets the stage for the next stage of inquiry: explaining why we find these particular effects and seeing what these explanations can tell us about the structure of FL.
1.6 The Minimalist Program and a Novel Research Question
Where are we? Here is a quick recap.
First on the agenda was the problem of linguistic creativity, the fact that native speakers of a given language L have the capacity to understand an unbounded number of different expressions of L. This fact raised the obvious question: How is this possible? The answer: This capacity supervenes on having an internalized finitely specified G that recursively characterizes the linguistic objects of L. So, (part of) the explanation for the fact that native speakers are linguistically creative in their language L is to give a recursive characterization of what constitutes a possible object of L and treat this recursive specification as part of a native speaker’s mental make-up.
More specifically, we reviewed how the first period of syntactic research investigated how grammars might be structured so that they could generate an unbounded number of distinct hierarchically organized objects. The research strategy was to propose specific Gs for given Ls whose interacting rules yielded interesting empirical coverage, generating both a fair number of acceptable sentences and not generating an interesting number of unacceptable sentences. In the process, Generativists discovered an impressive number of effects that served as higher-level targets of explanation for subsequent theory. To say the same thing a little more pompously, early Generative Grammar discovered a bunch of “effects” which catalogued deep-seated generalizations characteristic of the products of human Gs. These effects sometimes fell together as “laws of grammar” taken to reflect the built-in design features of FL. More simply, these effects are plausible reflections of the properties of FL and so can be used to explore the structure of FL.
Or to say this slightly differently, the success in adumbrating properties of particular Gs led to a second further stage of research that built on this success, and which targeted a related, yet different, question: How must humans be built so that they can acquire Gs with the properties we discovered Gs to have? The project, in effect, comes down to specifying the class of possible human Gs.
Importantly, the project of adumbrating the range of possible Gs becomes fruitful once we have a budget of empirically plausible properties of actual Gs! Without some decent examples of actual Gs with their identified properties, it makes little sense to ask how to delimit such Gs. To say this another way, we need to empirically bound the domain of inquiry to make it tractable (and worth investigating), and this is why investigating the properties of FL (properties that will serve to limit the range of possible Gs) only becomes a fertile pursuit once Generativists have identified some features of actual Gs to serve as targets of explanation.
So, we have some empirically plausible features of Gs and we want a theory of FL (or UG) to explain why we find Gs with these properties and not others. Plato’s Problem served as an additional boundary condition on this line of inquiry into FL/UG. Plato’s Problem is the observation that what native speakers know about their languages far exceeds what they could have learned about it by examining the PLD available to them in the course of their G acquisition. Conceptually, addressing Plato’s Problem in the context of a budget of identified G effects suggested a two-pronged attack: first, radical simplification of the rules that Gs contain, and second, enrichment of what FL/UG brings to the task of G acquisition. Eliminating the complexity built into Aspects-style rules and factoring out some simple very general operations like Move α made the language-particular rules that children acquired easier to acquire. This simplification, however, threatened generative chaos by allowing for massive overgeneration of ungrammatical structures. The theoretical task was to prevent this. This was accomplished by enriching the innate structure of FL/UG in principled ways. The key theoretical innovation was Trace Theory motivated by the idea that derivations are information preserving (monotonic). Traces simplified derivations by making them structure preserving, and this further allowed for the unification of movement and binding. These theoretical moves together addressed the overgeneration problem.Footnote 47
This line of inquiry coalesced around a “standard” model of FL/UG (i.e. GB). In particular, GB provided a substantive model of FL/UG and thereby set the stage for contemporary minimalist investigations. More specifically, just as the success of early Generative inquiry into language-particular Gs allowed us to fruitfully address the question why we have these Gs and not others (answer: because we have a GBish FL/UG), adumbrating an empirically substantive conception of FL/UG now allows Generativists to ask the next obvious question: Why does FL/UG have these GBish properties and not others? Moreover, just as asking the question concerning the limits on human Gs would have been premature (and so idle) without first discovering some of the empirical features of actual Gs, so too investigating why we have the FL/UG we actually have (rather than another with other possible organizing principles) would have been premature without first having some reasonable empirically grounded theory of FL/UG like the one GB delivered.Footnote 48
I want to emphasize, re-emphasize and re-re-emphasize this point before getting into some details in the following chapters. I have often heard the claim that Minimalism offers nothing new to the Generative enterprise, methodologically speaking. I agree with part of this. Generative Grammar has always prized explanation, and so the hallmarks of explanation (i.e. deriving the properties one wants explained from simple, elegant, theories that derive them) have also always been valued. To wit: We explain the basic features of a given native speaker’s linguistic productivity in L by showing how they result from a GL that generates the unboundedly many different hierarchical <m,s>s characteristic of that L (or more exactly, that coincides with a native speaker’s “sense” of L) and does not generate any pairs inconsistent with a native speaker’s competence in L. Similarly, we explain why we find the GLs that are empirically attested by showing that our theory of FL/UG (e.g. GB) derives GLs with these properties and does not derive any without them. In both cases we prize simple elegant theories over more complex inelegant ones for the reasons that scientific inquiry has always prized the former over the latter. In the first two periods of Generative Grammar, the methodology remained constant, even as the questions addressed changed.
So too as regards the current minimalist stage of Generative inquiry. We still want simple, elegant theories, but now their derivational target is (roughly) GB and the laws of grammar (and concomitant effects) it adumbrates. These are what minimalist theories aim to explain. Or, to be more precise, we now want theories that derive the theoretical principles of GB and/or the associated effects that these GB principles aimed to explain. Note, we have the same methodological standards as ever (simplicity, elegance, naturalness), but we are now entertaining a different explanandum. Moreover, as we have noted above, targeting the principles of GB and its associated laws of grammar for explanation only really makes sense if we take GB to be reasonably well grounded, both empirically and theoretically.
Let me put this point more broadly: There is a reason Minimalism was a brainchild of the mid-1990s. It took that long to make it a substantive project. Minimalism awaited a plausible theory of FL/UG, which in turn awaited plausible Gs of particular Ls. And all of that took about forty years to develop. By the mid-1990s Generative Grammar had an empirically viable (though not perfect) theory of FL/UG (i.e. GB) and so it made sense to investigate its properties and ask why they are the way they are.
So, is Minimalism just the same old, same old or something new? And by now I hope you know the right answer: YES! It is both nothing new and something very different. That is what makes the Minimalist Program (MP) interesting.
1.7 The Minimalist Program: Explaining the Properties of GB
How should one go about explaining why FL/UG has the properties it does? By deriving them from simpler, more natural, more economical assumptions. And this entails assuming that whatever GB’s merits, it is not the fundamental theory of FL. Standard methodological considerations lend credence to this last assumption. GB is simply too complex to be fundamental. And also too linguistically sui generic. Here’s what I mean.
For Generativists, FL is a human-specific cognitive capacity. This entails that the human-specific linguistic capacity evolved from pre-human ancestors that were not linguistically proficient (at least not the way we are). In other words, FL, the capacity to acquire and deploy Gs, is cognitively novel in humans. In this evolutionary context, GB is a problematic account of FL’s basic properties precisely because it is too “complex” and too linguistically specific. In particular, the more FL’s properties are linguistically bespoke (rather than cognitively and computationally generic), and the more complex the internal organization of FL, the harder it is to explain how it arose from non-linguistic minds (i.e. minds bereft of FLs). Put more positively, the simpler the structure of FL, and the less linguistically specific its operations and principles, the easier it should be to explain how they could have arisen from a-linguistic minds. And this line of thought has an immediate consequence and suggests a concrete research program. The consequence is that though GB might be a good description of FL, it cannot be the fundamental theory of FL. The fundamental theory must be simpler and less linguistically specific. The program is to develop such a simpler theory that has (roughly) GB and its properties as limit consequences. Let’s flesh these general points out bit.
Within GB, FL is both very complex and the proposed innate principles and operations are very linguistically specific. The complexity is manifest both in the overall modular architecture of the basic GB theory and in the specific principles and operations characteristic of each module. (30) and (31) reiterate the basic structure of the theory.
(31)
Though some critical relations crosscut (many of) the various modules (e.g. government), the modules each have their own special features. For example, X’-theory traffics in notions like specifier, complement, head, maximal projection, adjunct and bar-level. Case theory also singles out heads but distinguishes between those that are case assigning and those that require case. There is also a case filter, case features and case assigning configurations (government). Theta theory also uses government but for the assignment of θ-roles, which are assigned in D-structure by heads and are regulated by the Theta Criterion, a condition that requires every argument to get one and at most one θ-role. Movement exploits another set of concepts and primitives: bounding node/barrier, escape hatch, subjacency principle, antecedent government, head government, γ-marking, γ-checking and more. Last, the construal rules come in four different types: one for PRO, one for local anaphors like reflexives and reciprocals, one for pronouns and one for all the other kinds of DPs, dubbed R-expressions. There is also a specific licensing domain for anaphors and pronouns, indexing procedures for the specification of syntactic antecedence relations and hierarchical requirements (c-command) between an antecedent and its anaphoric dependent. Furthermore, all of these conditions are extrinsically ordered to apply at various derivational levels specified in the Y-model.Footnote 49
If the information outlined in (30) and (31) is on the right track, then FL is richly structured with very domain-specific (viz. linguistically tuned) information. And though such linguistic specificity is a positive with regard to addressing Plato’s Problem, it raises difficulties when trying to address Darwin’s Problem. Indeed, the logic of the two problems has them largely pulling in opposite directions. A rich linguistically specific FL plausibly eases the child’s task by restricting what the child needs to use the PLD to acquire. However, the more cognitively sui generis FL is, the more complicated the evolutionary path to FL. Thus, from the perspective of Darwin’s Problem, we want the operations and principles of FL to be cognitively (or computationally) general and very simple. It is this tension that the Minimalist Program aims to resolve.
The tension is exacerbated when the evolutionary timeline is considered. The consensus opinion is that humans became linguistically capable about 100,000 years ago and that the capacity that evolved has remained effectively unchanged ever since.Footnote 50 Thus, whatever the addition to a-linguistic minds that made them “language ready,” it must have been relatively minor (the addition of at most one or two linguistically bespoke operations/principles). Or, putting this another way, our FL is what you get when you wed (at most) one (or two) linguistically specific features with a cognitively a-linguistic generic brain.
Navigating the narrows between Plato’s Problem and Darwin’s Problem suggests a twofold strategy: (i) Simplify GB by unifying the various FL-internal modules and (ii) Show that this simplified FL can be distilled into largely general cognitive/computational parts plus (at most) one linguistically specific one.Footnote 51
Before proceeding, please note yet again that GB is the target of explanation. In other words, the Minimalist Program takes GB to be a good approximate model of FL’s fine structure. It is not fundamental, but it is still very good, in that Minimalism assumes that GB has largely correctly identified (and described) phenomena (laws) that directly reflect the innate structure of FL. If MP is realizable, then FL is less linguistically parochial than GB supposes, even though it has operations and principles of the kind that GB adumbrates. If MP is realizable, then FL exploits many generic operations and principles (i.e. operations and principles not domain restricted to language) in its linguistic computations and uses these for linguistic ends. On this view, Minimalism takes GB’s answer to Plato’s Problem to be largely correct though it disagrees with GB about how domain-specific the innate architecture of FL is. Borrowing terminology common in physics (perhaps grandiosely), Minimalism takes GB to be a good effective theory of FL but denies that it is the fundamental theory of FL. A useful practical consequence of this is to take the principles of GB to be targets for derivation by the more fundamental principles that minimalist theories will discover.
That’s the program and that’s how the Minimalist Program fits into the overall Generative research program. Has the program been successful? I believe it has been triumphantly so. I try to make this case in the chapters that follow.