Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-01-11T04:19:38.106Z Has data issue: false hasContentIssue false

A neural approach for inducing multilingual resources and natural language processing tools for low-resource languages

Published online by Cambridge University Press:  06 August 2018

O. ZENNAKI
Affiliation:
CEA, LIST, Vision and Content Engineering Laboratory, Gif-sur-Yvette, France e-mails: [email protected], [email protected] Laboratory of Informatics of Grenoble, Univ. Grenoble-Alpes, Grenoble, France e-mail: [email protected]
N. SEMMAR
Affiliation:
CEA, LIST, Vision and Content Engineering Laboratory, Gif-sur-Yvette, France e-mails: [email protected], [email protected]
L. BESACIER
Affiliation:
Laboratory of Informatics of Grenoble, Univ. Grenoble-Alpes, Grenoble, France e-mail: [email protected]

Abstract

This work focuses on the rapid development of linguistic annotation tools for low-resource languages (languages that have no labeled training data). We experiment with several cross-lingual annotation projection methods using recurrent neural networks (RNN) models. The distinctive feature of our approach is that our multilingual word representation requires only a parallel corpus between source and target languages. More precisely, our approach has the following characteristics: (a) it does not use word alignment information, (b) it does not assume any knowledge about target languages (one requirement is that the two languages (source and target) are not too syntactically divergent), which makes it applicable to a wide range of low-resource languages, (c) it provides authentic multilingual taggers (one tagger for N languages). We investigate both uni and bidirectional RNN models and propose a method to include external information (for instance, low-level information from part-of-speech tags) in the RNN to train higher level taggers (for instance, Super Sense taggers). We demonstrate the validity and genericity of our model by using parallel corpora (obtained by manual or automatic translation). Our experiments are conducted to induce cross-lingual part-of-speech and Super Sense taggers. We also use our approach in a weakly supervised context, and it shows an excellent potential for very low-resource settings (less than 1k training utterances).

Type
Article
Copyright
Copyright © Cambridge University Press 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Al-Rfou, R., Perozzi, B., and Skiena, S. 2013. Polyglot: distributed word representations for multilingual nlp. In Proceedings of the SIGNLL Conference on Computational Natural Language Learning, pp. 183–192.Google Scholar
Annesi, P., and Basili, R. 2010. Cross-lingual alignment of FrameNet annotations through Hidden Markov Models. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics, Springer, Berlin, Heidelberg, pp. 12–25.Google Scholar
Aufrant, L., Wisniewski, G., and Yvon, F. 2016. Zero-resource dependency parsing: boosting delexicalized cross-lingual transfer with linguistic knowledge. In Proceedings of the 26th International Conference on Computational Linguistics, pp. 119–130.Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C. 2003. A neural probabilistic language model. Journal of Machine Learning Research 3, 11371155.Google Scholar
Bengio, Y., Schwenk, H., Senécal, J.-S., Morin, F., and Gauvain, J.-L. 2006. Neural probabilistic language models. In Dawn E, H.. and C, J. Lakhmi. (eds.), Innovations in Machine Learning, pp. 137186. Berlin, Heidelberg: Springer.Google Scholar
Bentivogli, L., Forner, P., and Pianta, E. 2004. Evaluating cross-language annotation transfer in the multisemcor corpus. In Proceedings of the 20th International Conference on Computational Linguistics, Association for Computational Linguistics, pp. 364–371.Google Scholar
Bérard, A., Servan, C., Pietquin, O, and Besacier, L. 2016. MultiVec: a multilingual and multilevel representation learning toolkit for NLP. In Proceedings of the 10th Edition of the Language Resources and Evaluation Conference, pp. 4188–4192.Google Scholar
Besacier, L., Barnard, E., Karpov, A., and Schultz, T., 2014. Automatic speech recognition for under-resourced languages: a survey. Speech Communication 56: 85100.Google Scholar
Besacier, L., Lecouteux, B., Azouzi, M., and Luong, N.-Q. 2012. The LIG English to French machine translation system for IWSLT 2012. In Proceedings of the 9th International Workshop on Spoken Language Translation, pp. 102–108.Google Scholar
Brants, T. 2000. TnT: a statistical part-of-speech tagger. In Proceedings of the 6th Conference on Applied Natural Language Processing, Association for Computational Linguistics, pp. 224–231.Google Scholar
Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., and Mercer, R. L., 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19: 263311.Google Scholar
Buchholz, S., and Marsi, E. 2006. CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the 10th Conference on Computational Natural Language Learning, Association for Computational Linguistics, pp. 149–164.Google Scholar
Cho, K., van Merriënboer, B., Bahdanau, D., and Bengio, Y. 2014. On the properties of neural machine translation: encoder–decoder approaches. In Proceedings of the Syntax, Semantics and Structure in Statistical Translation, pp. 103–111.Google Scholar
Ciaramita, M., and Altun, Y. 2006. Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 594–602.Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and Kuksa, P., 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12: 24932537.Google Scholar
Das, D., and Petrov, S., 2011. Unsupervised part-of-speech tagging with bilingual graph-based projections. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, Association for Computational Linguistics, pp. 600609.Google Scholar
Duong, L., Cook, P., Bird, S., and Pecina, P. 2013. Simpler unsupervised POS tagging with bilingual projections. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 2, pp. 634–639.Google Scholar
Durrett, G., Pauls, A., and Klein, D. 2012. Syntactic transfer using a bilingual lexicon. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, pp. 1–11.Google Scholar
Elman, J. L., 1990. Finding structure in time. Cognitive science 14: 179211.Google Scholar
Fellbaum, C., 1998. WordNet. Wiley Online Library, Cambridge, MA: MIT Press.Google Scholar
Fraser, A., and Marcu, D., 2007. Measuring word alignment quality for statistical machine translation. Computational Linguistics 33: 293303.Google Scholar
Garside, R., Leech, G. N., and McEnery, T. 1997. Corpus Annotation: Linguistic Information from Computer Text Corpora. Taylor & Francis, Abingdon.Google Scholar
Gouws, S., and Søgaard, A. 2015. Simple task-specific bilingual word embeddings. In Proceedings of the 14th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1386–1390.Google Scholar
Gouws, S., Bengio, Y., and Corrado, G. 2015. BilBOWA: fast bilingual distributed representations without word alignments. In Proceedings of the 32nd International Conference on Machine Learning, pp. 748–756.Google Scholar
Graves, A. 2012. Supervised sequence labelling. In Supervised Sequence Labelling with Recurrent Neural Networks, pp. 513. Berlin, Heidelberg: Springer.Google Scholar
Gutiérrez Vázquez, Y., Fernández Orquín, A., Montoyo Guijarro, A., Vázquez Pérez, S. 2011. Enriching the Integration of Semantic Resources Based on Wordnet. Sociedad Española para el Procesamiento del Lenguaje Natural, 47: 249257, Huelva, Spain.Google Scholar
Henderson, J. 2004. Discriminative training of a neural network statistical parser. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 95–102.Google Scholar
Jiang, W., Liu, Q., and , Y. 2011. Relaxed cross-lingual projection of constituent syntax. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 1192–1201.Google Scholar
Jiang, W., , Y., Huang, L., and Liu, Q., 2015. Automatic adaptation of annotations. Computational Linguistics Journal 41: 119147.Google Scholar
Kim, S., Toutanova, K., and Yu, H. 2012. Multilingual named entity recognition using parallel data and metadata from wikipedia. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 694–702.Google Scholar
Koehn, P., 2005. Europarl: a parallel corpus for statistical machine translation. MT Summit 5: 7986.Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., and Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, Association for Computational Linguistics, pp. 177–180.Google Scholar
Kucera, H., and Francis, W. 1979. A Standard Corpus of Present-Day Edited American English, for Use with Digital Computers (Revised and amplified from 1967 version). Providence, RI: Brown University Press.Google Scholar
Li, S., Graça, J. V., and Taskar, B. 2012. Wiki-ly supervised part-of-speech tagging. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, pp. 1389–1398.Google Scholar
Luong, T., Pham, H., and Manning, C. D. 2015. Bilingual word representations with monolingual quality in mind. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp. 151–159.Google Scholar
Manion, S. L., and Sainudiin, R. 2013. DAEBAK!: peripheral diversity for multilingual word sense disambiguation. In Proceedings of SemEval, pp. 250–254.Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., and Khudanpur, S. 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, pp. 1045–1048.Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Advances in Neural Information Processing Systems, pp. 3111–3119.Google Scholar
Miller, G. A., Leacock, C., Tengi, R., and Bunker, R. T. 1993. A semantic concordance. In Proceedings of the Workshop on Human Language Technology, Association for Computational Linguistics, pp. 303–308.Google Scholar
Nasiruddin, M., Tchechmedjiev, A., Blanchon, H., and Schwab, D. 2015. Création rapide et efficace dun système de désambiguïsation lexicale pour une langue peu dotée. In Proceedings of the 22nd TALN (Traitement Automatique des Langues Naturelles) Conference.Google Scholar
Navigli, R., and Ponzetto, S. P., 2012. BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193: 217250.Google Scholar
Navigli, R., Jurgens, D., and Vannella, D. 2013. Semeval-2013: Multilingual word sense disambiguation. In Proceedings of the Second Joint Conference on Lexical and Computational Semantics, vol. 2, pp. 222–231.Google Scholar
Och, F. J., and Ney, H. 2000. Improved statistical alignment models. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 440–447.Google Scholar
Pado, S., and Pitel, G.. 2007. Annotation précise du français en sémantique de rôles par projection cross-linguistique. In Actes de la 14e conférence sur le Traitement Automatique des Langues Naturelles (communications orales), pp. 271–280.Google Scholar
Pan, S. J., and Yang, Q., 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22: 13451359.Google Scholar
Passban, P., Liu, Q., and Way, A., 2017. Translating low-resource languages by vocabulary adaptation from close counterparts. ACM Transactions on Asian and Low-Resource Language Information Processing 16: 29.Google Scholar
Petrov, S., Das, D., and McDonald, R. 2012. A universal part-of-speech tagset. In Proceedings of the 8th International Conference on Language Resources and Evaluation, European Language Resources Association, pp. 2089–2096.Google Scholar
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. 1985. Learning internal representations by error propagation. DTIC Document. No. ICS-8506. California Univ San Diego La Jolla Inst for Cognitive Science.Google Scholar
Salah, M. H., Blanchon, H., Zrigui, M., and Schwab, D. 2016. Amélioration de la traduction automatique dun corpus annoté. In Proceedings of the 23rd TALN (Traitement Automatique des Langues Naturelles) Conference.Google Scholar
Schmid, H. 1995. Treetagger | a language independent part-of-speech tagger. Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, vol. 46, p. 28. Available at https://protect-eu.mimecast.com/s/STrqCK8y8fB91wiMedpW?domain=cis.uni-muenchen.dehttp://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/Google Scholar
Schmidhuber, J., 1992. A fixed size storage O (n3) time complexity learning algorithm for fully recurrent continually running networks. Neural Computation 4: 243248.Google Scholar
Schuster, M., and Paliwal, K. K., 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45: 26732681.Google Scholar
Schwab, D., Goulian, J., Tchechmedjiev, A., and Blanchon, H. 2012. Ant colony algorithm for the unsupervised word sense disambiguation of texts: comparison and evaluation. In Proceedings of the 25th International Conference on Computational Linguistics, pp. 2389–2404.Google Scholar
Sundermeyer, M., Oparin, I., Gauvain, J.-L., Freiberg, B., Schluter, R., and Ney, H. 2013. Comparison of feedforward and recurrent neural network language models. In IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8430–8434.Google Scholar
Sutskever, I., Vinyals, O., and Le, Q. V. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems, pp. 3104–3112.Google Scholar
Täckström, O., McDonald, R., and Uszkoreit, J. 2012. Cross-lingual word clusters for direct transfer of linguistic structure. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, pp. 477–487.Google Scholar
Täckström, O., McDonald, R., and Nivre, J. 2013. Target language adaptation of discriminative transfer parsers. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, pp. 1061–1071.Google Scholar
Täckström, O., Das, D., Petrov, S., McDonald, R., and Nivre, J., 2013. Token and type constraints for cross-lingual part-of-speech tagging. Transactions of the Association for Computational Linguistics 1: 112.Google Scholar
Titov, I., and Klementiev, A. 2012. Crosslingual induction of semantic roles. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 647–656.Google Scholar
Van der Plas, L., and Apidianaki, M. 2014. Cross-lingual word sense disambiguation for predicate labelling of french. In Proceedings of the 21st TALN (Traitement Automatique des Langues Naturelles) Conference, pp. 46–55.Google Scholar
Veronis, J., 2000. Annotation automatique de corpus: panorama et état de la technique. Ingénierie des langues 4 (4): 111129.Google Scholar
Veronis, J., Hamon, O., Ayache, C., Belmouhoub, R., Kraif, O., Laurent, D., Nguyen, T. M. H., Semmar, N., Stuck, F., and Zaghouani, W. 2008. Arcade II Action de recherche concertée sur l’alignement de documents et son évaluation. Chapitre2, Editions Hermés.Google Scholar
Van der Maaten, L., and Hinton, G. (2008) Visualizing data using t-SNE. Journal of Machine Learning Research 9: 25792605.Google Scholar
Wisniewski, G., Pécheux, N., Gahbiche-Braham, S., and Yvon, F. 2014. Cross-lingual part-of-speech tagging through ambiguous learning. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, vol. 14, pp. 1779–1785.Google Scholar
Yarowsky, D., Ngai, G., and Wicentowski, R. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the 1st International Conference on Human Language Technology Research, pp. 1–8.Google Scholar