Hostname: page-component-cd9895bd7-gxg78 Total loading time: 0 Render date: 2024-12-23T05:39:54.308Z Has data issue: false hasContentIssue false

Error-driven learning in Optimality Theory and Harmonic Grammar: a comparison*

Published online by Cambridge University Press:  16 January 2017

Giorgio Magri*
Affiliation:
CNRS, Université Paris 8
*

Abstract

OT error-driven learning admits guarantees of efficiency, stochastic tolerance and noise robustness which hold independently of any substantive assumptions on the constraints. This paper shows that the HG learner used in the current literature does not admit such constraint-independent guarantees. The HG theory of error-driven learning thus needs to be substantially restricted to specific constraint sets.

Type
Articles
Copyright
Copyright © Cambridge University Press 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

*

Parts of this paper were presented at the 21st Manchester Phonology Meeting in 2013 and at the 11th Old World Conference in Phonology in 2014. I wish to thank Paul Boersma and Joe Pater for useful discussion. Three anonymous reviewers and the associate editor of the journal also provided me with detailed and valuable suggestions. The research reported in this paper was supported by a grant from the Fyssen Research Foundation, as well as by a Marie Curie Intra European Fellowship within the 7th European Community Framework Programme.

Appendices providing more technical details and simulation results can be found in supplementary online materials at https://doi.org/10.1017/S0952675716000221.

References

REFERENCES

Bane, Max, Riggle, Jason & Sonderegger, Morgan (2010). The VC dimension of constraint-based grammars. Lingua 120. 11941208.CrossRefGoogle Scholar
Bíró, Tamás S. (2006). Finding the right words: implementing Optimality Theory with simulated annealing. PhD dissertation, University of Groningen.Google Scholar
Block, H. D. (1962). The perceptron: a model of brain functioning. Review of Modern Physics 34. 123135.CrossRefGoogle Scholar
Boersma, Paul (1997). How we learn variation, optionality, and probability. Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam 21. 4358.Google Scholar
Boersma, Paul (1998). Functional phonology. PhD dissertation, University of Amsterdam. Published, The Hague: Holland Academic Graphics.Google Scholar
Boersma, Paul (2009). Some correct error-driven versions of the Constraint Demotion Algorithm. LI 40. 667686.Google Scholar
Boersma, Paul & Hayes, Bruce (2001). Empirical tests of the Gradual Learning Algorithm. LI 32. 4586.Google Scholar
Boersma, Paul & van Leussen, Jan-Willem (2014). Fast evaluation and learning in multi-level parallel constraint grammars. Ms, University of Amsterdam.Google Scholar
Boersma, Paul & Pater, Joe (2016). Convergence properties of a Gradual Learning Algorithm for Harmonic Grammar. In McCarthy, John J. & Pater, Joe (eds.) Harmonic Grammar and Harmonic Serialism. London: Equinox. 389434.Google Scholar
Cesa-Bianchi, Nicolò & Lugosi, Gábor (2006). Prediction, learning, and games. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Chomsky, Noam (1965). Aspects of the theory of syntax. Cambridge, Mass.: MIT Press.Google Scholar
Coetzee, Andries W. & Kawahara, Shigeto (2013). Frequency biases in phonological variation. NLLT 31. 4789.Google Scholar
Coetzee, Andries W. & Pater, Joe (2008). Weighted constraints and gradient restrictions on place co-occurrence in Muna and Arabic. NLLT 26. 289337.Google Scholar
Coetzee, Andries W. & Pater, Joe (2011). The place of variation in phonological theory. In Goldsmith, John, Riggle, Jason & Yu, Alan (eds.) The handbook of phonological theory. 2nd edn. Malden, Mass. & Oxford: Wiley-Blackwell. 401434.CrossRefGoogle Scholar
Collins, Michael (2002). Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In Haji, Jan & Matsumoto, Yuji (eds.) Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) . Stroudsburg, PA: Association for Computational Linguistics. 18.Google Scholar
Cristianini, Nello & Shawe-Taylor, John (2000). An introduction to Support Vector Machines and other kernel-based methods. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Eisner, Jason (2000). Easy and hard constraint ranking in Optimality Theory: algorithms and complexity. In Eisner, Jason, Karttunen, Lauri & Thériault, Alain (eds.) Finite-state phonology: Proceedings of the 5th Workshop of the ACL Special Interest Group in Computational Phonology (SIGPHON). 22–33.Google Scholar
Frank, Robert & Kapur, Shyam (1996). On the use of triggers in parameter setting. LI 27. 623660.Google Scholar
Freund, Yoav & Schapire, Robert E. (1999). Large margin classification using the perceptron algorithm. Machine Learning 37. 277296.CrossRefGoogle Scholar
Fürnkranz, Johannes & Hüllermeier, Eyke (2010). Preference learning. Berlin & Heidelberg: Springer.Google Scholar
Gibson, Edward & Wexler, Kenneth (1994). Triggers. LI 25. 407454.Google Scholar
Hayes, Bruce (2004). Phonological acquisition in Optimality Theory: the early stages. In Kager et al. (2004). 158–203.CrossRefGoogle Scholar
Heinz, Jeffrey (2011). Computational phonology. Part I: Foundations. Language and Linguistics Compass 5. 140152.CrossRefGoogle Scholar
Jäger, Gerhard & Rosenbach, Anette (2006). The winner takes it all – almost: cumulativity in grammatical variation. Linguistics 44. 937971.CrossRefGoogle Scholar
Jarosz, Gaja (2010). Implicational markedness and frequency in constraint-based computational models of phonological learning. Journal of Child Language 37. 565606.CrossRefGoogle ScholarPubMed
Jarosz, Gaja (2013). Learning with hidden structure in Optimality Theory and Harmonic Grammar: beyond Robust Interpretive Parsing. Phonology 30. 2771.CrossRefGoogle Scholar
Jesney, Karen & Tessier, Anne-Michelle (2011). Biases in Harmonic Grammar: the road to restrictive learning. NLLT 29. 251290.Google Scholar
Kager, René, Pater, Joe & Zonneveld, Wim (eds.) (2004). Constraints in phonological acquisition. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Keller, Frank (2000). Gradience in grammar: experimental and computational aspects of degrees of grammaticality. PhD dissertation, University of Edinburgh.Google Scholar
Kivinen, Jyrki (2003). Online learning of linear classifiers. In Mendelson, Shahar & Smola, Alexander J. (eds.) Advanced lectures on machine learning. Berlin & Heidelberg: Springer. 235257.CrossRefGoogle Scholar
Klasner, Norbert & Simon, Hans Ulrich (1995). From noise-free to noise-tolerant and from on-line to batch learning. In Maass, Wolfgang (ed.) Proceedings of the 8th Annual Conference on Computational Learning Theory (COLT) . New York: ACM. 250257.Google Scholar
Legendre, Géraldine, Miyata, Yoshiro & Smolensky, Paul (1998a). Harmonic Grammar: a formal multi-level connectionist theory of linguistic well-formedness: an application. In Proceedings of the 12th Annual Conference of the Cognitive Science Society. Hillsdale: Erlbaum. 884–891.Google Scholar
Legendre, Géraldine, Miyata, Yoshiro & Smolensky, Paul (1998b). Harmonic Grammar: a formal multi-level connectionist theory of linguistic well-formedness: theoretical foundations. In Proceedings of the 12th Annual Conference of the Cognitive Science Society. Hillsdale: Erlbaum. 388–395.Google Scholar
Legendre, Géraldine, Sorace, Antonella & Smolensky, Paul (2006). The Optimality Theory–Harmonic Grammar connection. In Smolensky & Legendre (2006: vol. 2). 903–966.Google Scholar
Levelt, Clara C., Schiller, Niels O. & Levelt, Willem J. (2000). The acquisition of syllable types. Language Acquisition 8. 237264.CrossRefGoogle Scholar
Magri, Giorgio (2012a). Constraint promotion: not only convergent, but also efficient. CLS 48. 471485.Google Scholar
Magri, Giorgio (2012b). Convergence of error-driven ranking algorithms. Phonology 29. 213269.CrossRefGoogle Scholar
Magri, Giorgio (2013a). The complexity of learning in Optimality Theory and its implications for the acquisition of phonotactics. LI 44. 433468.Google Scholar
Magri, Giorgio (2013b). HG has no computational advantages over OT: toward a new toolkit for computational OT. LI 44. 569609.Google Scholar
Magri, Giorgio (2015). How to keep the HG weights non-negative: the truncated Perceptron reweighting rule. Journal of Language Modelling 3. 345375.CrossRefGoogle Scholar
Magri, Giorgio (2016). Noise robustness and stochastic tolerance of OT error-driven ranking algorithms. Journal of Logic and Computation 26. 959988.CrossRefGoogle Scholar
Magri, Giorgio (forthcoming). Idempotency in Optimality Theory. JL.Google Scholar
Magri, Giorgio & Storme, Benjamin (forthcoming). A closer look at Boersma & Hayes’ Ilokano metathesis test case. CLS 49.Google Scholar
Minsky, Marvin L. & Papert, Seymour A. (1969). Perceptrons: an introduction to computational geometry. Cambridge, Mass.: MIT Press.Google Scholar
Mohri, Mehryar & Rostamizadeh, Afshin (2013). Perceptron mistake bounds. https://arxiv.org.abs/1305.0208.Google Scholar
Mohri, Mehryar, Rostamizadeh, Afshin & Talwalkar, Ameet (2012). Foundations of machine learning. Cambridge, Mass.: MIT Press.Google Scholar
Novikoff, Albert B. J. (1962). On convergence proofs on Perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata. Vol. 12. New York: Polytechnic Institute of Brooklyn. 615–622.Google Scholar
Pater, Joe (2008). Gradual learning and convergence. LI 39. 334345.Google Scholar
Pater, Joe (2009). Weighted constraints in generative linguistics. Cognitive Science 33. 9991035.CrossRefGoogle ScholarPubMed
Prince, Alan & Smolensky, Paul (2004). Optimality Theory: constraint interaction in generative grammar. Malden, Mass. & Oxford: Blackwell.CrossRefGoogle Scholar
Prince, Alan & Tesar, Bruce (2004). Learning phonotactic distributions. In Kager et al. (2004). 245–291.CrossRefGoogle Scholar
Riggle, Jason (2009). The complexity of ranking hypotheses in Optimality Theory. Computational Linguistics 35. 4759.CrossRefGoogle Scholar
Rosenblatt, Frank (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review 65. 386408.CrossRefGoogle ScholarPubMed
Rosenblatt, Frank (1962). Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Washington, DC: Spartan.Google Scholar
Shalev-Shwartz, Shai & Singer, Yoram (2005). A new perspective on an old Perceptron algorithm. In Auer, Peter & Meir, Ron (eds.) Learning theory. Berlin & Heidelberg: Springer. 264278.CrossRefGoogle Scholar
Smolensky, Paul & Legendre, Géraldine (eds.) (2006). The harmonic mind: from neural computation to optimality-theoretic grammar. 2 vols. Cambridge, Mass.: MIT Press.Google Scholar
Staubs, Robert, Becker, Michael, Potts, Christopher, Pratt, Patrick, McCarthy, John J. & Pater, Joe (2010). OT-Help 2.0. Software package. http://web.linguist.umass.edu/~OTHelp/.Google Scholar
Tesar, Bruce (2004). Using inconsistency detection to overcome structural ambiguity. LI 35. 219253.Google Scholar
Tesar, Bruce (2013). Output-driven phonology: theory and learning. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Tesar, Bruce & Smolensky, Paul (1998). Learnability in Optimality Theory. LI 29. 229268.Google Scholar
Tesar, Bruce & Smolensky, Paul (2000). Learnability in Optimality Theory. Cambridge, Mass.: MIT Press.CrossRefGoogle Scholar
Wexler, Kenneth & Culicover, Peter W. (1980). Formal principles of language acquisition. Cambridge, Mass.: MIT Press.Google Scholar
Supplementary material: File

Magri supplementary material

Magri supplementary material 1

Download Magri supplementary material(File)
File 3.1 MB