1. Introduction
Japanese phonology contains many inconsistencies, seemingly driven by multiple sublexicons and morpheme-specific processes. On the one hand, there are sounds that are contrastive in some portions of the lexicon but exist only as allophones elsewhere. For example, [f] exists only as an allophone of /h/ before [u] in native Japanese words (e.g., /pune/ → huneFootnote 1 → [fune] ‘boat’) but can appear before any vowel in loanwords from English (e.g., fairu ‘file’, feisubukku ‘Facebook’, fōku ‘fork’, firutā ‘filter’, and futtobōru ‘football’). On the other hand, there are processes that apply in some areas of the lexicon but are inert elsewhere. For example, the famous process of rendaku or sequential voicing applies to compounds whose second member is a native Japanese word, but generally does not apply to compounds whose second member is a loanword (Itō and Mester Reference Itō and Mester2003). This aspect of Japanese phonology inspired the introduction of indexed constraints (Fukazawa Reference Fukazawa1998, Fukazawa et al. Reference Fukazawa, Kitahara, Ono, Catherine Gruber, Higgins, Olson and Wysocki1998, Itō and Mester Reference Itō, Mester and Tsujimura1999, Itō and Mester Reference Itō and Mester2001), which have remained influential within constraint-based frameworks as a method for modelling various kinds of irregularity. The descriptive utility of indexed constraints in unquestionable, and a method for learning them has been developed by Pater (Reference Pater2004, Reference Pater and Parker2006, Reference Pater2010) as a supplemental module to the Error-Driven Constraint Demotion algorithm for learning constraint rankings (Tesar and Smolensky Reference Tesar and Smolensky1998). There are, however, some unresolved theoretical questions surrounding indexed constraints.Footnote 2
The most prominnent of these has been the debate in the literature over whether we must allow only faithfulness constraints to be indexed (Fukazawa Reference Fukazawa1998, Fukazawa et al. Reference Fukazawa, Kitahara, Ono, Catherine Gruber, Higgins, Olson and Wysocki1998, Itō and Mester Reference Itō, Mester and Tsujimura1999, Reference Itō and Mester2001) or whether we must allow both faithfulness and markedness constraints to be indexed (Pater Reference Pater2000, Reference Pater2004, Reference Pater and Parker2006, Reference Pater2010; Flack Reference Flack2007, Gouskova Reference Gouskova2007, Reference Gouskova2012; Jurgec Reference Jurgec2010). Furthermore, with respect to learnability, how does an algorithm choose which constraint to copy when it has more than one option? In this article, we will add to the evidence that markedness constraints should be eligible for indexation by analyzing some data from Japanese. We furthermore argue that indexed markedness constraints are only necessary when exceptional triggering (Pater Reference Pater2010) occurs across a morpheme boundary, and that they should be restricted to just these circumstances. Using the data from Japanese, we go on to show that learning algorithms should be biased towards indexing faithfulness constraints where there is a choice, since doing so will create indexed markedness constraints in only the desired cases. We then discuss the implications this bias has for the debate over whether to allow the indexation of markedness constraints.
The Japanese phenomenon in question is the sporadic post-nasal voicing that occurs in Japanese counter words, which are combinations of a numeral and a classifier morpheme. Many classifier morphemes begin with a voiceless obstruent, and these morpheme-initial obstruents sometimes become voiced when preceded by the morpheme-final nasal consonant of certain numerals, but not in every case that we would expect. The initial obstruents of some classifier morphemes like kai ‘times’ and soku ‘footwear’ become voiced after the numerals san ‘three’, sen ‘thousand’, and man ‘ten thousand’, while the initial obstruents of other classifiers like satu ‘books’ and kyaku ‘chairs’ never become voiced at all (Itō and Mester Reference Itō and Mester2003: 140). Interestingly, a small subset of classifier morphemes is intermediate between the other two subsets. They obligatorily become voiced after san ‘three’, but either completely resist becoming voiced after sen ‘thousand’ and man ‘ten thousand’ (e.g., the classifier ken ‘houses’) or else optionally voice after these two numerals (e.g., the classifier hiki ‘small animals’ (Itō and Mester Reference Itō and Mester2003: 140). Even stranger is the fact that all classifier-initial obstruents fail to become voiced after the numeral yon ‘four’, despite it too ending in a nasal and thus being eligible to trigger post-nasal voicing (Itō and Mester Reference Itō and Mester2003: 140). The above generalizations are summarized in Table 1 below (modified from Itō and Mester Reference Itō and Mester2003: 140), where all instances of obligatory post-nasal voicing are presented in shaded cells.Footnote 3 Please note that this is not an exhaustive list of counter words in Japanese.
The crucial piece of information here is that not all nasal-final numerals are equal in terms of triggering post-nasal voicing. We will argue that without a bias toward indexing faithfulness constraints, the learner can be led to believe it is the classifier that exceptionally triggers voicing (rather than the numeral) - a hypothesis that turns out to be wrong and will need to be corrected.
Before moving on, we would like to acknowledge the comments of two anonymous reviewers on the relationship between [p], [b], and [h] in this data set. One reviewer points out that the presence of /p/ → [p] in this data set's column for yon is problematic for Itō and Mester's (Reference Itō and Mester2003) suggestion that the lack of voicing after yon may stem from earlier forms with a synonymous numeral si (see section 2). The other reviewer points out that we never see non-alternating [h] in this data set and asks how this fact would fit into our analysis of the voicing. We unfortunately could not further investigate these aspects of the data without straying from the focus of the article, although we suspect that they would be amenable to an analysis using indexed constraints.
The remainder of this article is structured as follows. Section 2 analyzes the Japanese counter word data within the context of the debate over whether to permit indexed markedness constraints. The section shows that indexed markedness constraints are necessary, but also argues that they should be limited to cases where the exceptional triggering of a process crosses a morpheme boundary. Section 3 demonstrates that Pater's (Reference Pater2004, Reference Pater and Parker2006, Reference Pater2010) Inconsistency Resolution algorithm for learning indexed constraints can handle the Japanese data, though it runs the risk of creating superfluous indexed markedness constraints that end up serving no purpose. Section 4 argues that the accidental proliferation of indexed constraints can be avoided by appealing to a preference for indexing faithfulness constraints over markedness constraints, and discusses the implications this bias has for the debate over whether to allow the indexation of markedness constraints. Finally, section 5 concludes and highlights directions for future research.
2. Japanese Post-Nasal Voicing and Constraint Indexation
Indexed constraints were introduced to account for the common situation in which certain morphemes are treated differently than others, requiring a ranking of two (or possibly more) constraints that contradicts the ranking of those same constraints needed elsewhere in the language. This is precisely the case for the Japanese data above, where the presence and lack of post-nasal voicing require essentially opposite constraint rankings. In Optimality Theory (OT; Prince and Smolensky Reference Prince and Smolensky2004) and other constraint-based phonological frameworks, the standard constraints that together produce post-nasal voicing are *NC̥ and Ident[voice]-IO, defined in (1) and (2) respectively
(1) *NC̥: Assign a violation for every sequence of a nasal stop immediately followed by a voiceless obstruent.
(2) Ident[voice]-IO:Footnote 4 Assign a violation for every input segment whose corresponding output segment does not have the same value for the feature [voice].
If *NC̥ is ranked above Ident[voice]-IO, then any underlying voiceless obstruent will become voiced when it immediately follows a nasal obstruent in the output. This is as shown in (3a) for the counter word san-zoku ‘three pairs of footwear’. Underlying voiceless obstruents surface faithfully under the same ranking when they are not immediately preceded by a nasal stop, as shown in (3b) for the counter word ni-soku ‘two pairs of footwear’ (in fact they surface faithfully in this context no matter the constraint ranking).
(3)
This is of course not the whole story in the sub-lexicon of Japanese counter words. Take for instance the counter word san-satu ‘three volumes’, wherein the underlying /s/ does not voice to [z] despite immediately following [n]. To generate the mapping /san-satu/ → [san-satu], Ident[voice]-IO must dominate *NC̥, but it is impossible for *NC̥ to both dominate and be dominated by Ident[voice]-IO. Such ranking paradoxes are problematic and serve as the primary motivation for indexed constraints. Indexation enables constraints to occupy more than one position in the hierarchy, and as a result, ranking paradoxes are in many cases no longer an issue. The ranking paradox above can be resolved in one of two ways.
The first option, as illustrated in (4), is to create a copy of the faithfulness constraint Ident[voice]-IO, indexing it such that it applies only to /satu/, and ranking it above *NC̥. This high-ranking indexed constraint exempts /satu/ (and only /satu/) from the pressure to undergo post-nasal voicing and is akin to saying that faithfulness to underlying voice features is more important for /satu/ than for unindexed morphemes such as /soku/.
(4)
Alternatively, as illustrated in (5), we could create a copy of the markedness constraint *NC̥, indexing it such that it applies only to /soku/, and ranking it above Ident[voice]-IO. This high-ranking constraint places /soku/ (and only /soku/) under greater pressure to undergo post-nasal voicing. This is akin to saying that *NC̥ clusters are worse when the voiceless obstruent comes from the morpheme /soku/ rather than an unindexed morpheme.
(5)
Either of these solutions is satisfactory if we limit ourselves to counter words with the numeral san ‘three’ and numerals without a final nasal obstruent like ni ‘two’. Once we exit this idealized sphere, it becomes apparent that the indexed faithfulness solution is the correct one for the data at hand. Before showing this, however, a brief note on the evaluation of indexed constraints is in order. The literature on indexed constraints generally adopts the notion of local application, defined in (6), whereby indexed constraints are only violated when the offending material is linked directly to underlying material in the indexed morpheme (Pater Reference Pater and Parker2006, Reference Pater2010).
(6) Local application of *XL: Assign a violation mark to any instance of the configuration X that contains a phonological exponent of a morpheme specified as L (Pater Reference Pater2010).
In the case of faithfulness constraints like Ident[voice]-IO, the offending material is an output segment that does not match its input correspondent along some feature. Accordingly, a copy of Ident[voice]-IO indexed to a morpheme can only prevent a change of voicing in segments that are a part of that very morpheme. In contrast, markedness constraints like *NC̥ are violated by a specific configuration of material in an output string, so an indexed markedness constraint can be violated by a substring that crosses a morpheme boundary, as long as that substring contains the output correspondent of an input segment in the indexed morpheme. Accordingly, if a copy of *NC̥ is indexed to a nasal-final numeral (such as san ‘three’), then that numeral can induce post-nasal voicing in an immediately following classifier, since the nasal in the offending NC̥ cluster was provided by the indexed numeral.
With local application in mind, we can now tackle the counter words with the numeral yon ‘four’, all of which categorically show no post-nasal voicing. Itō and Mester (Reference Itō and Mester2003) point out that the lack of voicing after yon ‘four’ may have arisen by historical accident. The language previously used the Sino-Japanese numeral si ‘four’ when counting, but since this numeral is homophonous with si ‘death’, it was replaced by the native Japanese numeral yon ‘four’. The original numeral si ‘four’ did not end in a nasal and so never induced post-nasal voicing, and it would seem that the lack of post-nasal voicing in modern yon ‘four’ forms arose through some kind of faithfulness to the original si ‘four’ forms (see OO-correspondence in Benua Reference Benua, Beckman, Dickey and Urbanczyk1995). A synchronic OT analysis ideally should not have recourse to historical faithfulness, however, and so we propose the following analysis. First, we rank generic Ident[voice]-IO above generic *NC̥, which accounts for the lack of voicing in forms with yon ‘four’, as shown in (7a) Second, we index *NC̥ to the numeral san ‘three’ and rank the copy above generic Ident[voice]-IO, which accounts for the presence of voicing in some forms with san ‘three’, as shown in (7b). Finally, we index Ident[voice]-IO to classifiers like /satu/ that categorically resist voicing, and rank these copies above *NC̥san to ensure the classifiers never voice, as shown in (7c).
(7)
One final quirk of the Japanese data is the fact that the classifiers ken ‘houses’ and hiki ‘animals’ have post-nasal voicing after san ‘three’ but not after sen ‘thousand’ or man ‘ten thousand’. This can be interpreted to mean that such classifiers are less faithful to their underlying voicing features than classifiers like satu ‘volumes’, but at the same time are more faithful to their underlying voicing features than classifiers like soku ‘footwear’. This can be captured by inserting the sub-ranking Ident[voice]-IOken > *NC̥man below *NC̥san but above unindexed Ident[voice]-IO. The tableaux in (8a) and (8b) show how ken ‘houses’ resists voicing after yon ‘four’ and man ‘ten thousand’. That ken nonetheless voices after san ‘three’ is shown in (8c).
(8)
It is important to note that while this analysis can generate all of the observed surface forms and is thus observationally adequate, it does not reflect the intuition that the default behaviour of classifiers should be to remain voiceless no matter the numeral. Classifiers are part of the Sinitic stratum of the Japanese lexicon, and this stratum freely allows NC̥ clusters (Itō and Mester Reference Itō and Mester2003). The above analysis, however, predicts that post-nasal voicing after the numeral san ‘three’ is the default behaviour. To better reflect the limitation of voicing to a minority of classifiers, we could emulate Itō and Mester's (Reference Itō and Mester2003) analysis of the counter word data by adopting Mascaró's (Reference Mascaró2007) treatment of allomorphy. This approach posits that underlying forms can consist of a partially ordered set of allomorphs and includes a constraint Priority that is violated when the selected allomorph is not the first in the partial order. Supposing that the voiceless allomorphs are preferred over the voiced allomorphs (which we denote by writing, for example, /soku > zoku/), we can generate the forms with yon by ranking Priority above generic *NC̥ as shown in (9a). Note how Ident[voice]-IO is not violated by either candidate, since the voiced and voiceless obstruent are part of the underlying form. The preference for the voiceless allomorph is overridden in forms with san ‘three’ because *NC̥san dominates Priority, as shown in (9b). Finally, the categorical lack of voicing in other classifiers (like satu ‘volumes’) results from their lack of a dual underlying form; the high-ranking generic Ident[voice]-IO penalizes the candidate with voicing, as shown in (9c).Footnote 5
(9)
All that being said, whether or not we make use of Priority, we crucially need indexed markedness constraints (*NC̥san, *NC̥sen, and *NC̥man) to account for the varying degrees to which Japanese numerals can trigger voicing in classifier morphemes. This is an interesting result considering the controversy over indexed markedness constraints in the literature. Some researchers restrict indexation to faithfulness constraints only (Fukazawa Reference Fukazawa1998, Fukazawa et al. Reference Fukazawa, Kitahara, Ono, Catherine Gruber, Higgins, Olson and Wysocki1998, Itō and Mester Reference Itō, Mester and Tsujimura1999, Reference Itō and Mester2001), while others allow indexation to apply to both faithfulness and markedness constraints (Pater Reference Pater2000, Reference Pater2004, Reference Pater and Parker2006, Reference Pater2010; Flack Reference Flack2007, Gouskova Reference Gouskova2007, Reference Gouskova2012; Jurgec Reference Jurgec2010, Jurgec and Bjorkman Reference Jurgec and Bjorkman2018). The two main arguments for permitting markedness indexation are that it allows us to analyze a number of patterns that would otherwise defy explanation (Pater Reference Pater2000, Flack Reference Flack2007, Gouskova Reference Gouskova2007, Reference Gouskova2012; Jurgec Reference Jurgec2010, Jurgec and Bjorkman Reference Jurgec and Bjorkman2018) and that it also allows us to distinguish between the exceptional blocking of a process and the exceptional triggering of a process, which are the result of indexed faithfulness constraints and indexed markedness constraints, respectively (Pater Reference Pater2004, Reference Pater and Parker2006, Reference Pater2010). The Japanese post-nasal voicing data are a rather interesting case with regards to this distinction, since they certainly contain exceptional trigerring (san ‘three’ triggers voicing where yon ‘four’ does not) and, depending on our analytical choices, can be said to contain exceptional blocking (soku ‘footwear’ voices after san ‘three’ whereas satu ‘volumes’ does not).
On the other end of the debate, there are three main arguments against using indexed markedness constraints. The first argument is that indexed markedness constraints make it difficult to capture the asymmetrical implicational patterns in nativization processes and phoneme inventories (Itō and Mester Reference Itō, Mester and Tsujimura1999, Reference Itō and Mester2001). For example, the cross-linguistic generalization that the presence of labial stops in an inventory implies the presence of velar stops but not vice versa (Maddieson Reference Maddieson1984) requires a rigid ranking of *Labial > *Velar (Itō and Mester Reference Itō, Mester and Tsujimura1999, Reference Itō and Mester2001). The second argument is that in order to evaluate an indexed markedness constraint, we must allow it to somehow access underlying representations, even though markedness constraints are supposed to evaluate surface forms only (Itō and Mester Reference Itō, Mester and Tsujimura1999). The final argument is that by allowing both markedness and faithfulness constraints to be indexed, we create a “too many options” problem whereby, given an exceptional item, an indexed markedness constraint or an indexed faithfulness constraint can often achieve the same result, but there is no principled reason to choose one solution over the other (Inkelas and Zoll Reference Inkelas and Zoll2007). In light of these drawbacks, it is important to determine which types of patterns absolutely require indexed markedness constraints, and ideally find a principled way to limit the indexation of markedness constraints to just these cases.
We argue that the Japanese data necessarily require indexed markedness constraints precisely because the triggering of post-nasal voicing occurs across a morpheme boundary. When exceptional triggering does not cross a morpheme boundary, it is essentially preventing a marked configuration from appearing within a single morpheme, which can be re-analyzed as the exceptional blocking of an input-output discrepancy in the morphemes that do not undergo the process. In these cases, the choice between using indexed faithfulness constraints over indexed markedness constraints depends only on whether we view the undergoers or resistors of a process as the default (i.e., unindexed) items. When exceptional triggering does cross a morpheme boundary, however, it is preventing a marked configuration that is formed of material coming from the exceptional morpheme and some other morpheme, and when the marked configuration is avoided by changing material in the non-exceptional morpheme (as in 7b and 8c), these cases cannot be re-analyzed as exceptional blocking. Only indexed markedness constraints can generate such exceptional behaviour because, unlike indexed faithfulness constraints, they can be violated by material that straddles a morpheme boundary, thus allowing one morpheme to cause changes to material in a different morpheme.
This observation presents a means to reconcile the two perspectives on indexed markedness constraints: we can acknowledge that they are necessary and should be permissible, while also acknowledging that they are powerful and should perhaps be limited to just those cases where exceptional triggering crosses a morpheme boundary. We will argue below that this restriction need not be mere stipulation and can be derived as the consequence of a learning bias. The argument comes in two parts. First, we will demonstrate in section 3 that current algorithms for learning indexed constraints are able to handle the Japanese data, although the order in which the data are presented can cause a learner to pick incorrect indexation targets, thus populating the hierarchy with duplicate constraints that serve no purpose. We then show in section 4 that these mistaken indexations happen because it is harder to identify the source of exceptional triggering than it is to identify the source of exceptional blocking. Furthermore, determining which morphemes are exceptional blockers makes it easier to correctly identify exceptional triggers. We accordingly propose that the search for exceptional blockers should be prioritized over the search for exceptional triggers, which translates into a preference for indexing faithfulness constraints over indexing markedness constraints. By treating indexed markedness constraints as a last resort in this way, a learner will only create indexed markedness constraints when it witnesses exceptional triggering across a morpheme boundary, exactly as desired.
3. Learning the Japanese Data
The current algorithm for learning indexed constraints is Pater's (Reference Pater2004, Reference Pater and Parker2006, Reference Pater2010) Inconsistency Resolution algorithm, which acts as a supplement to any of the Constraint Demotion learning algorithms for OT (Tesar and Smolensky Reference Tesar, Bruce and Smolensky1993, Reference Tesar, Bruce and Smolensky1995, Reference Tesar and Smolensky1998). For the purposes of this article, we use the Error-Driven Constraint Demotion algorithm (EDCD; Tesar and Smolensky Reference Tesar, Bruce and Smolensky1993, Reference Tesar, Bruce and Smolensky1995, Reference Tesar and Smolensky1998) as the base. The algorithm is error-driven in that the learner updates its current hypothesis when given a datum that it is unable to handle. How the learner determines that its current hypothesis is incorrect is rather simple. Every time the learner is presented with an input-output mapping, it runs the input through its current grammar. If the output produced by its current grammar matches the observed output, no learning takes place, since according to the evidence at hand, the learner has the correct grammar. If the output produced by its current grammar does not match the observed output, the learner's current grammar must be adjusted. In order to identify the constraints that must be re-ordered, the learner creates a mark-data pair. A mark-data pair is essentially a row from a comparative tableau (Prince Reference Prince, Coetzee, Carpenter and de Lacy2002): it compares how well the winning (observed) candidate and the losing (predicted) candidate satisfy each constraint. If the winning candidate incurs fewer violations than the losing candidate on some constraint, then that constraint prefers the winner (marked in the tableau as W). Similarly, if the winning candidate incurs more violations than the losing candidate on some constraints, then that constraint prefers the loser (marked in the tableau with L). If both candidates incur an equal number of violations on some constraint, then the constraint prefers neither (marked as an empty square in the tableau).
Every time the learner creates a mark-data pair, it will run the Constraint Demotion process against its accumulated set of pairs. This loop, described informally in (10), will adjust the constraint hierarchy such that it predicts the correct output.
(10)
i. Start at the highest level in the constraint hierarchy
ii. Find constraints that prefer only winners (i.e., whose columns contain no L marks) and place these in the current level of the constraint hierarchy.
iii. Temporarily delete the mark-data pairs (i.e., the rows) for which there was a W mark in those constraints’ columns, then temporarily delete those constraints’ columns.
iv. If there are remaining mark-data pairs, move one level down in the hierarchy (creating said level if it does not already exist), then repeat (ii) and (iii).
v. When no mark-data pairs remain, place the remaining constraints below the last-constructed level.
It is important that the deletion of mark-data pairs be temporary such that deletion lasts only while Constraint Demotion is being run. As soon as the Constraint Demotion loop terminates, the full database of mark-data pairs created up until the present is fully restored, and the loop will be run on this full set every time a new mark-data pair is added. By letting the learner keep track of all evidence that has led it to its current grammar hypothesis, it can avoid entering an infinite loop when given inconsistent data (Tesar Reference Tesar1997).
A reviewer asks why we use EDCD as opposed to Tesar and Smolensky's (Reference Tesar, Bruce and Smolensky1995) Recursive Constraint Demotion (RCD). The two algorithms are highly similar, the main difference being that RCD operates on a pre-determined set of mark-data pairs, typically constructed by the analyst. Our main reason for choosing EDCD over RCD is that we believe the former better reflects the notion that a learner should come equipped only with the minimum of information necessary to succeed. In the Japanese case, this turns out to be just the universal constraint set and a representative list of input-output pairs. Furthermore, as we show below, the constraint(s) selected for indexation by Pater's (Reference Pater2004, Reference Pater and Parker2006, Reference Pater2010) Inconsistency Resolution algorithm can vary according to the exact set of mark-data pairs. When the learner is working with an incomplete set of mark-data pairs, it might seem that two different indexed constraints can resolve the same ranking paradox, when in fact only one of these is a valid solution in the context of a more complete set of pairs. The goal of this and the next section is to demonstrate that the conditions under which a learner avoids adopting such “eventually incorrect” solutions are also the conditions under which a learner creates indexed markedness constraints only when there is exceptional triggering across a morpheme boundary.
With these reasons for choosing EDCD in mind, let us have it attempt to learn the Japanese data as an illustrative example. For reasons of space, we limit this learning demonstration to counter words with the numerals san ‘three’ and yon ‘four’; the remaining nasal-final numerals sen ‘thousand’ and man ‘ten thousand’ are not crucial to further discussion. Suppose we present the learner with the mappings /san-soku/ → [san-zoku] ‘three pairs of footwear’ and /san-satu/ → [san-satu] ‘three volumes’ in alternation. These mappings require opposite rankings of the constraints *NC̥ and Ident[voice]-IO, so if mark-data pairs are permanently deleted in the learning loop, the learner will perpetually need to create either the mark-data pair in (11a) or the one in (11b), and will have access to only one of these pairs at a time. When the learner creates (11a), it sees that the constraint *NC̥ prefers only winners, and so the learner responds by ranking *NC̥ above Ident[voice]-IO. When the learner creates (11b), it sees that the constraint Ident[voice]-IO prefers only winners, and so the learner responds by ranking Ident[voice]-IO above*NC̥. The learner will therefore constantly switch between two contradictory rankings.
(11)
If, on the other hand, mark-data pairs are only temporarily deleted, the learner will eventually create both pairs and stall since neither constraint prefers only winners according to the current set. Inconsistency detection is the general name given to the ability of a learner to use a set of mark-data pairs to notice when no ranking exists that generates the currently observed data. This ability has found various uses in the literature on learning in OT in addition to the learning of exceptions (e.g., Prince Reference Prince, Coetzee, Carpenter and de Lacy2002, Tesar et al. Reference Tesar, Alderete, Hornwood, Merchant, Nishitani, Prince, Garding and Tsujimura2003, McCarthy Reference McCarthy2005, Tesar and Prince Reference Tesar and Prince2007, Merchant Reference Merchant2008, Merchant and Tesar Reference Merchant and Tesar2008, Akers Reference Akers2012, Tesar Reference Tesar2012, Reference Tesar2014a, Reference Tesar2014b, Reference Tesar, Alderete, Hornwood, Merchant, Nishitani, Prince, Garding and Tsujimura2016), though it is outside the scope of this article to discuss all such uses.
Either constraint, when indexed appropriately, is sufficient to resolve the inconsistency. Suppose we choose the first option and create a copy of *NC̥ indexed to soku; The set of mark-data pairs that results from adding this indexed constraint is shown in (12a). The mark-data pair set is no longer inconsistent, so we can resume constraint demotion. The constraint *NC̥soku is the only one that prefers only winners, so we place it at the top. This eliminates the first mark-data pair (row i), and now Ident[voice]-IO prefers only winners, so we can place it beneath *NC̥soku. Doing so eliminates the remaining mark-data pair (row ii), so we place the remaining constraint *NC̥ at the bottom, which gives us the ranking *NC̥soku > Ident[voice]-IO > *NC̥. The result of choosing the second option, creating a copy of Ident[voice]-IO and indexing it to satu, is shown in (12b). We invite the reader to verify that running the constraint demotion steps on this set of mark-data pairs will yield the ranking Ident[voice]-IOsatu > *NC ̥> Ident[voice]-IO.
(12)
Not only does this example illustrate how inconsistency resolution works, it is also an example of the “too many options” problem pointed out by Inkelas and Zoll (Reference Inkelas and Zoll2007): either an indexed markedness constraint or an indexed faithfulness constraint can eliminate the inconsistency that the learner has found, and the data encountered so far offer no reason to choose one over the other. When more than one constraint is found to be eligible for indexation, Pater (Reference Pater2004, Reference Pater and Parker2006, Reference Pater2010) gives two additional search heuristics. First, the learner selects the constraint that prefers the winner for the smallest set of morphemes (Pater Reference Pater2004, Reference Pater and Parker2006, Reference Pater2010). Second, if there is still more than one choice, the learner selects a markedness constraint rather than a faithfulness constraint (Pater Reference Pater2004, Reference Pater and Parker2006). If there is still more than one choice, the learner will select randomly (Pater Reference Pater2004). We adopt these heuristics for the remainder of this section, though we will discuss them further in the next section, where we propose an alternative.
Both *NC̥ and Ident[voice]-IO prefer the winner for all instances of one out of the three morphemes encountered thus far, and so the bias towards indexing markedness constraints is the deciding factor at this point. Accordingly, the algorithm opts for creating *NC̥soku and will build the constraint hierarchy *NC̥soku > Ident[voice]-IO > *NC̥. Recall, however, that the counter yon ‘four’ does not trigger voicing in any classifiers. This ranking therefore cannot handle the mapping /yon+soku/ → [yon-soku] ‘four pairs of footwear’. The learner instead predicts *[yon-zoku] which, when added to the set of mark-data pairs, creates yet another inconsistent set, summarized in (13).
(13)
No markedness constraint prefers the winner for all instances of a given morpheme, but Ident[voice]-IO prefers the winner for all instances of satu ‘volumes’. Creating indexed Ident[voice]-IOsatu and placing this constraint at the top of the hierarchy eliminates one row and one column of the data set as shown in (14) where the cells deleted on this round are marked in grey. The remaining rows and columns are inconsistent, so the learner stalls again.
(14)
Now *NC̥ prefers the winner for all instances of san ‘three’ so it may be indexed to this morpheme. The new constraint *NC̥san is the only constraint that prefers only winners and the learner therefore adds it to the current level in the hierarchy, right below Ident[voice]-IOsatu. The resulting data set is shown in (15), where the cells deleted on this round are marked in grey.
(15)
This leaves us with a single mark-data pair, whose only W mark is in the column for general Ident[voice]-IO. The learner thus places Ident[voice]-IO below *NC̥san, and since this leaves us with no more mark-data pairs, the remaining constraints *NC̥ and *NC̥soku get placed at the bottom of the hierarchy below Ident[voice]-IO. At this point, the learner has essentially found the target grammar and will only make an incorrect prediction when presented with a mapping in which san ‘three’ is specifically followed by a new “resisting” classifier that behaves like satu. Such mappings will render the mark-data pair set inconsistent, causing the learner to create an additional copy of Ident[voice]-IO since this is the only constraint that prefers the winner for the newly observed morpheme. Placing this new copy at the top of the hierarchy alongside Ident[voice]-IOsatu eliminates the inconsistency introduced by the newly observed mapping.
The above shows that Pater's (Reference Pater2004, Reference Pater and Parker2006, Reference Pater2010) Inconsistency Resolution algorithm can handle the Japanese counter word data, but the solution it finds for the data has an interesting quirk. Namely, if the learner has so far encountered only mappings without any voicing, presenting it with a mapping that contains both voicing and a new classifier will lead it to create a copy of the markedness constraint *NC̥ indexed to the classifier it just encountered. A copy of this markedness constraint indexed to the classifier does indeed resolve the inconsistency at that exact time, but further learning eventually reveals that the classifier was not actually the trigger of the voicing. Consequently, further copies of *NC̥ are then created that correctly identify some of the numerals as the triggers of voicing, and any copies of *NC̥ indexed to classifiers get relegated to the bottom of the hierarchy, serving no ultimate purpose.
For reasons of economy and elegance it is worthwhile to ask if we can ensure that such superfluous constraints are never created in the first place. We argue in the next section that reversing Pater's markedness bias (i.e., preferring indexed faithfulness constraints when there is a choice) does just that. This is because, while it is possible to misidentify the source of exceptional triggering, it is impossible to misidentify the source of exceptional resistance. We also argue that implementing such a faithfulness bias has beneficial theoretical consequences. These include the fact that indexed markedness constraints are only created when there is exceptional triggering across a morpheme boundary, which is desirable in light of the controversy over permitting the indexation of markedness constraints.
4. Indexed Markedness as a Last Resort Only
Recall from the previous section that Pater's (Reference Pater2004, Reference Pater and Parker2006, Reference Pater2010) Inconsistency Resolution algorithm includes two criteria for selecting an indexation target in the event that its main search criterion (find a constraint that prefers only winning candidates relative to some morpheme) produces more than one option. First the learner selects the constraint that prefers the winner for the smallest set of morphemes (Pater Reference Pater2004, Reference Pater and Parker2006, Reference Pater2010). Second, if there is a tie for the smallest set of indexable morphemes, the learner selects a markedness constraint rather than a faithfulness constraint, when possible (Pater Reference Pater2004, Reference Pater and Parker2006). The latter heuristic is incompatible with our claim from section 2 that indexed markedness constraints should be limited to exactly those cases where an indexed faithfulness constraint is incapable of generating the observed mappings. While we could simply stipulate that Pater's bias should be replaced by a faithfulness bias in order to accommodate the above claim, this section provides two further reasons for considering a faithfulness bias. First, we show how a faithfulness bias can handle the cases that Pater (Reference Pater2004, Reference Pater and Parker2006) argues require a markedness bias. Second, we show that the faithfulness bias circumvents the danger of incorrectly identifying exceptional triggers (as described in the previous section) and has several other positive theoretical consequences.
Before discussing the markedness bias, however, we would like to briefly comment on the preference for index classes with fewer members in Pater's (Reference Pater2004, Reference Pater and Parker2006, Reference Pater2010) algorithm. This preference is there to ensure that the learned constraint ranking reflects the language's overall phonology by reducing the number of indexed morphemes (i.e., as much of the phonology as possible is due to general constraints). It is, however, based on the fact that Pater's indexation operation will create a single copy of a constraint and index it to all morphemes for which it prefers only winners. We instead assume that a separate indexed constraint is made for each exceptional morpheme. This idea goes at least as far back as Itō and Mester (Reference Itō, Mester and Tsujimura1999) who argue in an appendix that large-scale stratal indices such as “foreign” or “Sino-Japanese” are likely a group of lexically specific indices occupying the same spot along the hierarchy. Other arguments for completely individualizing indexation include Coetzee and Pater's (Reference Coetzee, Pater, Goldsmith, Riggle and Yu2011) demonstration that doing so allows us to match lexically specific rates of variation with near perfect accuracy, and Moore-Cantwell and Pater's (Reference Moore-Cantwell and Pater2016) demonstration that doing so can account for gradience in phonotactic well-formedness while still capturing the more fixed pronunciations of individual words. We suggest another reason for treating indexation as an individualized operation: during online learning, it is not always guaranteed that all the words belonging to some necessary index class are present in the current mark-data set, and until the learner has seen a sufficiently large amount of data, it is possible for the exceptions to outnumber the regular items. We do, however, leave open the possibility of including a post-learning operation for conflating indexed constraints where possible. Such an operation is included in the learning algorithms of Itō and Mester (Reference Itō, Mester and Tsujimura1999), Pater (Reference Pater, Brugos, Manuella and Ha2005), and Coetzee and Pater (Reference Coetzee and Pater2006).
Moving on to the markedness bias: Pater (Reference Pater2004, Reference Pater, Brugos, Manuella and Ha2005) includes it in his algorithm in order to address the subset problem explored elsewhere in the Optimality Theoretic literature by Smolensky (Reference Smolensky1996) and Prince and Tesar (Reference Prince, Tesar, Kager, Pater and Zonneveld2004). Put simply, the subset problem arises when a learner adopts a constraint hierarchy that accepts all the output forms that the target grammar does, but also accepts more; if the learner continues to see only output forms produced by the target subset grammar, it will never encounter evidence that it has chosen an overly permissive superset grammar. Prince and Tesar (Reference Prince, Tesar, Kager, Pater and Zonneveld2004) consequently propose that learners should rank markedness constraints as high as possible and rank faithfulness constraints as low as possible, which ensures that the learned grammar permits the smallest set of output forms consistent with the observed data (i.e., the least permissive language). Pater (Reference Pater2004, Reference Pater and Parker2006) suggests that his bias towards indexing markedness constraints reflects the spirit of this solution to the subset problem. We do not believe that this is necessarily the case. To see why, consider a single markedness constraint M and a single faithfulness constraint F that together can produce some alternation (i.e., when they are ranked as M > F). A learner that is shown both undergoers and non-undergoers of this process will learn the ranking M X > F > M under a bias towards indexing markedness constraints and will learn the ranking F X > M > F under a bias towards indexing faithfulness constraints. Notice how the default ranking from the markedness bias (F > M) is in fact more permissive than the default ranking from the faithfulness bias (M > F).
That being said, Pater (Reference Pater2004, Reference Pater and Parker2006) presents a case where a learner can seemingly be tricked into adopting a superset grammar unless said learner is biased towards indexing markedness constraints. In this special case, which Pater (Reference Pater2004, Reference Pater and Parker2006) calls exceptional blocking by markedness, exceptional items fail to undergo a process demanded by a ranking M 1 > F in order to avoid violating a different markedness constraint M 2. This situation can be modelled by indexing exceptional items to a copy of M 2, but the same result is also achievable by indexing exceptional items to a copy of F instead. The faithfulness solution, however, misses the generalization that the exceptional morphemes are exceptional precisely because they are in some sense affected more strongly by M 2 than are regular items. The problem, then, is that we want all exceptional items to potentially violate M 2 by virtue of having a particular phonological shape, but we can in theory create exceptions that do not have this profile. This is not necessarily an issue in our eyes. Exceptions are created only as needed in our algorithm and so exceptions with the wrong phonological profile will simply never arise. Nonetheless, for transparency's sake we discuss a concrete example from Pater (Reference Pater2004, Reference Pater and Parker2006) more thoroughly in the paragraphs below.
To illustrate the problem, Pater (Reference Pater and Parker2006) presents a hypothetical language in which codas are deleted except in a handful of monosyllabic items. The ranking NoCoda > Max-IO is responsible for the deletion of codas. Furthermore, placing MinimalWord (all words must be minimally bi-moraic) below NoCoda ensures that even monosyllabic words drop their codas. Tableaux for a monosyllable and disyllable are provided in (16a) and (16b) respectively.
(16)
Given this basic ranking, placing either the indexed faithfulness constraint Max-IOX or the indexed markedness constraint MinimalWordX above NoCoda could generate exceptional monosyllabic words that keep their coda. However, the first option, an indexed faithfulness account, fails to capture the generalization that all exceptions are monosyllabic. A learner that opts for Max-IOX has chosen a superset grammar that can generate exceptional monosyllabic and polysyllabic items as opposed to a subset grammar that can generate only exceptional monosyllabic items. As shown in (17a), a disyllabic word indexed to Max-IOX would retain its coda. A disyllabic word indexed to MinimalWordX, however, would still delete its coda because it has two moras with or without its coda, as shown in (17b).
(17)
Pater (Reference Pater2004, Reference Pater and Parker2006) therefore argues that, when there is a choice, markedness constraints should be chosen as indexation targets over faithfulness constraints, but his analysis hinges on letting Richness of the Base (ROTB; Prince and Smolensky Reference Prince and Smolensky2004) applies to indices. According to this assumption, the rich base contains inputs of the shape /X/Y where X ranges over an infinitude of phonological structures and Y is either empty (i.e., the input is unindexed) or is one of an infinitude of indices. We contend, however, that ROTB does not apply to indices in the same way that it does to the phonological content of underlying forms. While there is no restriction on which and how many indices can exist, indices are not present a priori in the universal rich base and arise only where necessary through learning.Footnote 6 For example the rich base contains the input /papak/ (and an infinitude of other unindexed input strings) but does not also contain inputs of the shape /papak/X where X stands in for an infinitude of possible indices. In the absence of exceptional items larger than one syllable, a learner that has opted to copy Max-IO will not create copies for any disyllabic or larger items since there is no evidence that such copies are necessary. This means that the learner's grammar will effectively be the desired subset grammar, whether or not the learner is aware of the fact that all exceptions are monosyllables. Knowledge of this shared property among exceptions is therefore not essential to learning appropriate indexed constraints, and furthermore the speaker could still be aware of the shared monosyllabicity without it being encoded directly into the grammar. Such knowledge would resemble the ability to find words that rhyme. In this way, cases of blocking by markedness can also be analyzed as blocking by faithfulness, and consequently we do not require a bias towards indexing markedness constraints.
To summarize the argumentation thus far, we have provided a theoretical reason to prefer faithfulness indexation over markedness indexation (indexed markedness constraints are powerful and need restriction) and have shown that the arguments given for the reverse preference do not categorically point to its necessity. We can go a bit further now and show that the faithfulness bias has other positive consequences, thus solidifying its beneficial status. Recall from the last section that the first inconsistency our learner encountered was the set of mark-data pairs in (18). This set was created after seeing both an undergoer and a non-undergoer of post-nasal voicing that contain the same numeral. The set is inconsistent because no constraint prefers only winners (they both have an L in their column somewhere).
(18)
According to the main search criterion of the Inconsistency Resolution algorithm, this learning “dead end” can be escaped in two ways. One way is to copy Ident[voice]-IO and index it to satu ‘volumes’ since the constraint prefers only winners for all instances of that morpheme. The other way is to copy *NC̥ and index it to soku ‘footwear’, again since the constraint prefers only winners for all instances of that morpheme. The first option correctly identifies satu as an exceptional blocker of voicing, but the latter option incorrectly identifies soku as an exceptional undergoer of voicing. With its preference for indexing markedness constraints, Pater's (Reference Pater2004, Reference Pater and Parker2006, Reference Pater2010) algorithm will opt for the latter escape hatch, but this will eventually be corrected once the learner has enough data to realize that the voicing was actually exceptionally triggered by san ‘three’. We contend that due to the mechanics of local application (indexed constraints are only violated when the offending material is linked to the indexed morpheme: (Pater Reference Pater and Parker2006, Reference Pater2010)), a bias towards indexing faithfulness constraints reduces the risk of such misidentification.
The misidentification occurs precisely because indexed markedness constraints are readily violated by phonological material that crosses a morpheme boundary. For example, *NC̥X is violated by a sequence of a nasal and a voiceless obstruent when at least either the nasal or the obstruent is linked back to the indexed morpheme. Indexed faithfulness constraints, on the other hand, are violated only when the entirety of some “offensive configuration” is linked back to the indexed morpheme. For example, Ident[voice]-IOX is only violated when an input element from the indexed morpheme changes its voicing value in the output. Accordingly, when creating indexed faithfulness constraints, it is typically clear which morpheme is resisting the regularly imposed phonological change,Footnote 7 but when creating indexed markedness constraints, it is often unclear which morpheme is triggering the regularly absent process. Biasing indexation to faithfulness constraints is essentially assuming, as much as possible, that exceptional items are exceptional resistors. Such a strategy is consistent with attempting to keep relations between markedness constraints identical for all morphemes in a language and thus ensuring that the learned hierarchy reflects the language's overall phonotactics as much as possible. By seeking out opportunities for indexing faithfulness constraints first, the learner starts by looking for all items in its current data set that resist a process, which makes it easier to find the exceptional triggers of that same process, if any exist.
Aside from this small gain in learning efficacy, the bias towards faithfulness indexation gives us two further advantages. First of all, the bias alleviates the “too many options” problem that was raised by Inkelas and Zoll (Reference Inkelas and Zoll2007), according to which an indexed markedness constraint or an indexed faithfulness constraint can often achieve the same result, but there is no principled reason to choose one solution over the other. Since any of the possible indexed markedness solutions will only be considered by the learner when there is no indexed faithfulness solution available, the learner inherently deals with fewer decisions that have ambiguous answers. Second, by considering indexed markedness only as a last resort, we more easily generate the asymmetrical implicational patterns in nativization processes and phoneme inventories that are discussed by Itō and Mester (Reference Itō, Mester and Tsujimura1999, Reference Itō and Mester2001).
Itō and Mester (Reference Itō, Mester and Tsujimura1999, Reference Itō and Mester2001) show that where n changes are applicable to a given word, there are 2n logically possible outputs, yet languages often permit only up to n + 1 different outputs. This is because, rather than being independent, the changes enter into an asymmetrical implicational relationship. If we assign a numerical order to the changes, with a higher number meaning less importance to the grammar, we typically observe that applying change x cannot be done without also applying changes up to and including x − 1 (i.e., all the changes that are more important than change x). Conversely, one can apply the changes up to and including x − 1 without also having to apply change x. For example the French word [ʒ˜ɔɡlœ:ʁ] ‘juggler’ can undergo four changes when being nativized by a German speaker: ʒ → j, ˜ɔ →ɔŋ, œ: → ø:, and ʁ →ɐ (Itō Mester, 2001). This would give 24 = 16 logically possible pronunciations, although only five are actually produced by German speakers, giving us the following implicational hierarchy: if ʒ → j then ʁ →ɐ, if ʁ →ɐ then œ: →ø:, and if œ: → ø: then ˜ɔ → ɔŋ (Itō and Mester Reference Itō and Mester2001). Furthermore, Itō and Mester (Reference Itō, Mester and Tsujimura1999, Reference Itō and Mester2001) point out that phoneme inventories tend to be constructed according to implicational scales such that the presence of a more marked sound along a given scale implies the presence of all less marked sounds on that scale. For example, Maddieson (Reference Maddieson1984: 35) found that “an implicational hierarchy can be set up such that the presence of /p/ implies the strong likelihood of the presence of /k/, which similarly implies presence of /t/”. Itō and Mester (Reference Itō, Mester and Tsujimura1999, Reference Itō and Mester2001) analyze the above asymmetrical implications by having a rigid hierarchy of markedness constraints. By allowing indexed markedness constraints only as a last resort in our learning algorithm, we maximally preserve a language's markedness hierarchy, making it more likely for the observed asymmetrical implications to emerge.
5. Discussion and Conclusion
A synchronic analysis of post-nasal voicing in the Japanese counter word data requires both indexed faithfulness constraints and indexed markedness constraints, thus adding to the evidence in favour of allowing markedness constraints to be indexed. We furthermore argued that indexed markedness constraints are only necessary when exceptional triggering occurs across a morpheme boundary (like in the Japanese data), and in light of the various arguments that have been raised against indexed markedness constraints, we argue that they should be restricted to just these circumstances. Current learning algorithms can handle the Japanese data, but they run the risk of generating redundant constraints that serve no purpose in the final grammar, precisely because they are biased to choose markedness constraints over faithfulness constraints when there is more than one option for indexation. An algorithm biased towards indexing faithfulness constraints does not encounter this problem, and in fact will only create indexed markedness when there is exceptional triggering across a morpheme boundary, exactly as desired. Furthermore, such a faithfulness bias can mitigate the “too many options” problem (Inkelas and Zoll Reference Inkelas and Zoll2007) faced by theories that permit the indexation of both faithfulness and markedness constraints, and can more easily generate the asymmetrical implicational patterns present in nativization processes and phoneme inventories (Itō and Mester Reference Itō, Mester and Tsujimura1999, Reference Itō and Mester2001).
Like any analysis, though, our proposal that indexation is biased to faithfulness constraints comes with its limitations, the two most important of which are that we assume the learner starts with a full constraint set and already knows the underlying form of all morphemes. Recent work has investigated potential methods for inducing the constraints themselves (e.g., Hayes and Wilson Reference Hayes and Wilson2008) without relying on the standard assumption that the constraint set is innate or universal. Furthermore, it is obviously false that learners come equipped with knowledge of underlying representations, and recent work has shown that under the assumption that the learning target is an output-driven map, inconsistency detection makes it possible to learn constraint rankings and underlying forms side by side (Merchant Reference Merchant2008, Merchant and Tesar Reference Merchant and Tesar2008, Akers Reference Akers2012, Tesar Reference Tesar2012, Reference Tesar2014a, Reference Tesar2014b, Reference Tesar2016). It is interesting that a continually updated set of mark-data pairs is versatile enough to signal that a word might be an exception or that a word may need a different underlying form, and future work may be able to create a hybrid approach to resolving inconsistency that both updates underlying forms and indexes constraints as appropriate. Three further limitations of the current proposal are its incorrect predictions with respect to the productivity of a phonological alternation, its reliance on the strict dominance relations between constraints in standard OT, and its reliance on exceptionality being an all-or-nothing property of a morpheme, each of which are discussed in the paragraphs below.
The proposed algorithm will converge on an indexed faithfulness grammar where one is available, a bias which we have argued is desirable. That being said, there are still cases in which we would prefer the indexed markedness solution, and these have to do with the proper prediction of default phonology. Cases of exceptional triggering that do not cross a morpheme boundary can be generated with indexed markedness (M X > F > M), but they can easily be re-analyzed as cases of exceptional blocking (F X > M > F) if the exceptions are re-labelled as regular items and vice versa. The only diagnostic between the two analyses would be the number of items that undergo versus resist the alternation. The proposed algorithm, however, always picks the indexed faithfulness solution no matter the ratio between undergoers and resistors, which makes the rather unintuitive prediction that in cases where all but one word in a language resists an alternation, the one undergoer will be labelled the default by virtue of never causing the creation of an indexed copy of the faithfulness constraint in question, and every other morpheme will be labelled an exception by virtue of triggering the creation of such an indexed constraint. If this were truly how default phonology was determined, we would expect speakers in Wug tests to always apply an alternation wherever they can to nonce words if at least one word of the language exhibits it, whereas we normally expect them to extend productive alternations. As a tentative solution, we could implement the proposal of Burness (Reference Burness2016) who suggests that when an alternation is discovered to be unproductive, the exceptional undergoers should store all of their allomorphs in their underlying forms. Storing the allomorphs in the underlying form exempts the exceptional undergoers from the relevant faithfulness constraint(s), thus allowing the faithfulness constraint(s) to be ranked higher than would be the case otherwise, letting the constraint(s) reflect the behaviour of regular morphemes. This of course raises the question of how an algorithm would create multiple underlying forms for a single morpheme. Multiple underlying forms are perhaps not the only possible remedy, but the incorrect predictions vis-à-vis productivity are a crucial weakness of the current proposal and will require further research.
On the subject of productivity, a reviewer asks how many counter words exist in Japanese, and whether the numeral-classifier combinations that exhibit exceptional phonology are simply stored in the lexicon. Work by Yang (Reference Yang2016) proposes a specific threshold number of exceptions past which a rule becomes unproductive and should be abandoned. This threshold is equal to the number N of potential undergoers divided by the natural log of N. For example, a rule applicable to 100 items can tolerate up to about 22 exceptions before it becomes more worthwhile to simply memorize what happens to each item individually. Does the rule of post-nasal voicing after san ‘three’ meet this standard for productivity? This is difficult to answer since counter words are generally written in kanji (the logographic component of the Japanese writing system) which do not indicate voicing. One means to investigate the rule's productivity (or lack thereof) would be to conduct a wug test where native Japanese speakers are taught new objects that require a nonce classifier and then are asked to count them. We leave such an investigation for future research.
The Error-Driven Constraint Demotion algorithm (Tesar and Smolensky Reference Tesar and Smolensky1998) and the Inconsistency Resolution algorithm (Pater Reference Pater2004, Reference Pater and Parker2006, Reference Pater2010) both assume that constraints are related to each other through a relation of strict dominance. Recently, however, the field has been moving from frameworks where constraints are in strict dominance relationships towards frameworks where constraints are assigned numerical weights. Such weight-based frameworks are able to model variable rates f application as well as gradient acceptability and include Stochastic OT (Boersma Reference Boersma1997), Harmonic Grammar (Legendre et al. Reference Legendre, Miyata and Smolensky1990; see Pater Reference Pater2009 for a recent overview), Noisy Haromic Grammar (Coetzee and Pater Reference Coetzee, Pater, Goldsmith, Riggle and Yu2011), and Maximum Entropy grammars (Goldwater and Johnson Reference Goldwater and Johnson2003, Hayes and Wilson Reference Hayes and Wilson2008). It is therefore worthwhile to ask how indexed constraints would be learned in a weight-based framework. The learning algorithms for the above weight-based frameworks are relatively similar, all using iterative procedures that converge on the optimal constraint weights. As they are currently formulated, however, these algorithms have no way of distinguishing cases in which a morpheme is inconsistent with itself (e.g., /pad/ variably surfaces as [pat] or [pad]) from cases in which a morpheme is inconsistent with another morpheme (e.g., /pad/ generally surfaces as [pat] whereas /kad/ generally surfaces as [kad]). The former intra-lexical inconsistency can be captured by noisy evaluation and arguably does not necessitate constraint indexation, but the latter cross-lexical inconsistency cannot always be resolved with noisy evaluation and very likely requires one or more indexed constraints. Further research is needed to determine how a learning algorithm for frameworks using constraint weights could find exceptional morphemes when the cross-lexical inconsistency caused by these exceptions is obscured by intra-lexical inconsistency.
Finally, the learning simulations above assume that exceptionality is an all-or-nothing property of a morpheme, but recent work has found that the exceptional status of a morpheme can be “revoked” in the presence of other morphemes (Jurgec Reference Jurgec, Huang, Poole and Rysling2014, Gouskova and Linzen Reference Gouskova and Linzen2015, Jurgec and Bjorkman Reference Jurgec and Bjorkman2018). For example, some Dutch speakers produce [ɹ] in recent loans from English like the name Op[ɹ]ah, but when derivational morphology is added to those loanwords, these same speakers produce the native segment [ʁ], like in the diminutive Op[ʁ]ah-tje (Jurgec Reference Jurgec, Huang, Poole and Rysling2014, Jurgec and Bjorkman Reference Jurgec and Bjorkman2018). Jurgec Bjorkman (Reference Jurgec and Bjorkman2018) analyze these data in OT and propose that indexed constraints specify a property and a domain such that they apply only when all morphemes within the specified domain have the specified property. We suspect that it will not be too difficult to modify current constraint indexing algorithms such that, after determining that an indexed constraint is necessary, they can determine the domain within which that indexed constraint must apply. Whether our proposed faithfulness bias will be necessary in such a modified algorithm remains to be seen.