Error-driven learning in Optimality Theory and Harmonic Grammar: a comparison*

Giorgio Magri

doi:10.1017/S0952675716000221

Error-driven learning in Optimality Theory and Harmonic Grammar: a comparison*

Published online by Cambridge University Press: 16 January 2017

Giorgio Magri

Show author details

Giorgio Magri*: Affiliation:
CNRS, Université Paris 8
*: E-mail: [email protected].

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

OT error-driven learning admits guarantees of efficiency, stochastic tolerance and noise robustness which hold independently of any substantive assumptions on the constraints. This paper shows that the HG learner used in the current literature does not admit such constraint-independent guarantees. The HG theory of error-driven learning thus needs to be substantially restricted to specific constraint sets.

Type: Articles
Information: Phonology , Volume 33 , Issue 3 , December 2016 , pp. 493 - 532

DOI: https://doi.org/10.1017/S0952675716000221 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Parts of this paper were presented at the 21st Manchester Phonology Meeting in 2013 and at the 11th Old World Conference in Phonology in 2014. I wish to thank Paul Boersma and Joe Pater for useful discussion. Three anonymous reviewers and the associate editor of the journal also provided me with detailed and valuable suggestions. The research reported in this paper was supported by a grant from the Fyssen Research Foundation, as well as by a Marie Curie Intra European Fellowship within the 7th European Community Framework Programme.

Appendices providing more technical details and simulation results can be found in supplementary online materials at https://doi.org/10.1017/S0952675716000221.

References

REFERENCES

Bane, Max, Riggle, Jason & Sonderegger, Morgan (2010). The VC dimension of constraint-based grammars. Lingua 120. 1194–1208.CrossRef Google Scholar

Bíró, Tamás S. (2006). Finding the right words: implementing Optimality Theory with simulated annealing. PhD dissertation, University of Groningen.Google Scholar

Block, H. D. (1962). The perceptron: a model of brain functioning. Review of Modern Physics 34. 123–135.CrossRef Google Scholar

Boersma, Paul (1997). How we learn variation, optionality, and probability. Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam 21. 43–58.Google Scholar

Boersma, Paul (1998). Functional phonology. PhD dissertation, University of Amsterdam. Published, The Hague: Holland Academic Graphics.Google Scholar

Boersma, Paul (2009). Some correct error-driven versions of the Constraint Demotion Algorithm. LI 40. 667–686.Google Scholar

Boersma, Paul & Hayes, Bruce (2001). Empirical tests of the Gradual Learning Algorithm. LI 32. 45–86.Google Scholar

Boersma, Paul & van Leussen, Jan-Willem (2014). Fast evaluation and learning in multi-level parallel constraint grammars. Ms, University of Amsterdam.Google Scholar

Boersma, Paul & Pater, Joe (2016). Convergence properties of a Gradual Learning Algorithm for Harmonic Grammar. In McCarthy, John J. & Pater, Joe (eds.) Harmonic Grammar and Harmonic Serialism. London: Equinox. 389–434.Google Scholar

Cesa-Bianchi, Nicolò & Lugosi, Gábor (2006). Prediction, learning, and games. Cambridge: Cambridge University Press.CrossRef Google Scholar

Chomsky, Noam (1965). Aspects of the theory of syntax. Cambridge, Mass.: MIT Press.Google Scholar

Coetzee, Andries W. & Kawahara, Shigeto (2013). Frequency biases in phonological variation. NLLT 31. 47–89.Google Scholar

Coetzee, Andries W. & Pater, Joe (2008). Weighted constraints and gradient restrictions on place co-occurrence in Muna and Arabic. NLLT 26. 289–337.Google Scholar

Coetzee, Andries W. & Pater, Joe (2011). The place of variation in phonological theory. In Goldsmith, John, Riggle, Jason & Yu, Alan (eds.) The handbook of phonological theory. 2nd edn. Malden, Mass. & Oxford: Wiley-Blackwell. 401–434.CrossRef Google Scholar

Collins, Michael (2002). Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In Haji, Jan & Matsumoto, Yuji (eds.) Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) . Stroudsburg, PA: Association for Computational Linguistics. 1–8.Google Scholar

Cristianini, Nello & Shawe-Taylor, John (2000). An introduction to Support Vector Machines and other kernel-based methods. Cambridge: Cambridge University Press.CrossRef Google Scholar

Eisner, Jason (2000). Easy and hard constraint ranking in Optimality Theory: algorithms and complexity. In Eisner, Jason, Karttunen, Lauri & Thériault, Alain (eds.) Finite-state phonology: Proceedings of the 5th Workshop of the ACL Special Interest Group in Computational Phonology (SIGPHON). 22–33.Google Scholar

Frank, Robert & Kapur, Shyam (1996). On the use of triggers in parameter setting. LI 27. 623–660.Google Scholar

Freund, Yoav & Schapire, Robert E. (1999). Large margin classification using the perceptron algorithm. Machine Learning 37. 277–296.CrossRef Google Scholar

Fürnkranz, Johannes & Hüllermeier, Eyke (2010). Preference learning. Berlin & Heidelberg: Springer.Google Scholar

Gibson, Edward & Wexler, Kenneth (1994). Triggers. LI 25. 407–454.Google Scholar

Hayes, Bruce (2004). Phonological acquisition in Optimality Theory: the early stages. In Kager et al. (2004). 158–203.CrossRef Google Scholar

Heinz, Jeffrey (2011). Computational phonology. Part I: Foundations. Language and Linguistics Compass 5. 140–152.CrossRef Google Scholar

Jäger, Gerhard & Rosenbach, Anette (2006). The winner takes it all – almost: cumulativity in grammatical variation. Linguistics 44. 937–971.CrossRef Google Scholar

Jarosz, Gaja (2010). Implicational markedness and frequency in constraint-based computational models of phonological learning. Journal of Child Language 37. 565–606.CrossRef Google Scholar PubMed

Jarosz, Gaja (2013). Learning with hidden structure in Optimality Theory and Harmonic Grammar: beyond Robust Interpretive Parsing. Phonology 30. 27–71.CrossRef Google Scholar

Jesney, Karen & Tessier, Anne-Michelle (2011). Biases in Harmonic Grammar: the road to restrictive learning. NLLT 29. 251–290.Google Scholar

Kager, René, Pater, Joe & Zonneveld, Wim (eds.) (2004). Constraints in phonological acquisition. Cambridge: Cambridge University Press.CrossRef Google Scholar

Keller, Frank (2000). Gradience in grammar: experimental and computational aspects of degrees of grammaticality. PhD dissertation, University of Edinburgh.Google Scholar

Kivinen, Jyrki (2003). Online learning of linear classifiers. In Mendelson, Shahar & Smola, Alexander J. (eds.) Advanced lectures on machine learning. Berlin & Heidelberg: Springer. 235–257.CrossRef Google Scholar

Klasner, Norbert & Simon, Hans Ulrich (1995). From noise-free to noise-tolerant and from on-line to batch learning. In Maass, Wolfgang (ed.) Proceedings of the 8th Annual Conference on Computational Learning Theory (COLT) . New York: ACM. 250–257.Google Scholar

Legendre, Géraldine, Miyata, Yoshiro & Smolensky, Paul (1998a). Harmonic Grammar: a formal multi-level connectionist theory of linguistic well-formedness: an application. In Proceedings of the 12th Annual Conference of the Cognitive Science Society. Hillsdale: Erlbaum. 884–891.Google Scholar

Legendre, Géraldine, Miyata, Yoshiro & Smolensky, Paul (1998b). Harmonic Grammar: a formal multi-level connectionist theory of linguistic well-formedness: theoretical foundations. In Proceedings of the 12th Annual Conference of the Cognitive Science Society. Hillsdale: Erlbaum. 388–395.Google Scholar

Legendre, Géraldine, Sorace, Antonella & Smolensky, Paul (2006). The Optimality Theory–Harmonic Grammar connection. In Smolensky & Legendre (2006: vol. 2). 903–966.Google Scholar

Levelt, Clara C., Schiller, Niels O. & Levelt, Willem J. (2000). The acquisition of syllable types. Language Acquisition 8. 237–264.CrossRef Google Scholar

Magri, Giorgio (2012a). Constraint promotion: not only convergent, but also efficient. CLS 48. 471–485.Google Scholar

Magri, Giorgio (2012b). Convergence of error-driven ranking algorithms. Phonology 29. 213–269.CrossRef Google Scholar

Magri, Giorgio (2013a). The complexity of learning in Optimality Theory and its implications for the acquisition of phonotactics. LI 44. 433–468.Google Scholar

Magri, Giorgio (2013b). HG has no computational advantages over OT: toward a new toolkit for computational OT. LI 44. 569–609.Google Scholar

Magri, Giorgio (2015). How to keep the HG weights non-negative: the truncated Perceptron reweighting rule. Journal of Language Modelling 3. 345–375.CrossRef Google Scholar

Magri, Giorgio (2016). Noise robustness and stochastic tolerance of OT error-driven ranking algorithms. Journal of Logic and Computation 26. 959–988.CrossRef Google Scholar

Magri, Giorgio (forthcoming). Idempotency in Optimality Theory. JL.Google Scholar

Magri, Giorgio & Storme, Benjamin (forthcoming). A closer look at Boersma & Hayes’ Ilokano metathesis test case. CLS 49.Google Scholar

Minsky, Marvin L. & Papert, Seymour A. (1969). Perceptrons: an introduction to computational geometry. Cambridge, Mass.: MIT Press.Google Scholar

Mohri, Mehryar & Rostamizadeh, Afshin (2013). Perceptron mistake bounds. https://arxiv.org.abs/1305.0208.Google Scholar

Mohri, Mehryar, Rostamizadeh, Afshin & Talwalkar, Ameet (2012). Foundations of machine learning. Cambridge, Mass.: MIT Press.Google Scholar

Novikoff, Albert B. J. (1962). On convergence proofs on Perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata. Vol. 12. New York: Polytechnic Institute of Brooklyn. 615–622.Google Scholar

Pater, Joe (2008). Gradual learning and convergence. LI 39. 334–345.Google Scholar

Pater, Joe (2009). Weighted constraints in generative linguistics. Cognitive Science 33. 999–1035.CrossRef Google Scholar PubMed

Prince, Alan & Smolensky, Paul (2004). Optimality Theory: constraint interaction in generative grammar. Malden, Mass. & Oxford: Blackwell.CrossRef Google Scholar

Prince, Alan & Tesar, Bruce (2004). Learning phonotactic distributions. In Kager et al. (2004). 245–291.CrossRef Google Scholar

Riggle, Jason (2009). The complexity of ranking hypotheses in Optimality Theory. Computational Linguistics 35. 47–59.CrossRef Google Scholar

Rosenblatt, Frank (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review 65. 386–408.CrossRef Google Scholar PubMed

Rosenblatt, Frank (1962). Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Washington, DC: Spartan.Google Scholar

Shalev-Shwartz, Shai & Singer, Yoram (2005). A new perspective on an old Perceptron algorithm. In Auer, Peter & Meir, Ron (eds.) Learning theory. Berlin & Heidelberg: Springer. 264–278.CrossRef Google Scholar

Smolensky, Paul & Legendre, Géraldine (eds.) (2006). The harmonic mind: from neural computation to optimality-theoretic grammar. 2 vols. Cambridge, Mass.: MIT Press.Google Scholar

Staubs, Robert, Becker, Michael, Potts, Christopher, Pratt, Patrick, McCarthy, John J. & Pater, Joe (2010). OT-Help 2.0. Software package. http://web.linguist.umass.edu/~OTHelp/.Google Scholar

Tesar, Bruce (2004). Using inconsistency detection to overcome structural ambiguity. LI 35. 219–253.Google Scholar

Tesar, Bruce (2013). Output-driven phonology: theory and learning. Cambridge: Cambridge University Press.CrossRef Google Scholar

Tesar, Bruce & Smolensky, Paul (1998). Learnability in Optimality Theory. LI 29. 229–268.Google Scholar

Tesar, Bruce & Smolensky, Paul (2000). Learnability in Optimality Theory. Cambridge, Mass.: MIT Press.CrossRef Google Scholar

Wexler, Kenneth & Culicover, Peter W. (1980). Formal principles of language acquisition. Cambridge, Mass.: MIT Press.Google Scholar

Magri supplementary material

Magri supplementary material 1

File 3.1 MB

Article contents

Error-driven learning in Optimality Theory and Harmonic Grammar: a comparison*

Abstract

Access options

Article purchase

Temporarily unavailable

Footnotes

References

REFERENCES

Magri supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests