Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2024-12-23T12:28:40.599Z Has data issue: false hasContentIssue false

Transfer learning for Turkish named entity recognition on noisy text

Published online by Cambridge University Press:  28 January 2020

Emre Kağan Akkaya
Affiliation:
Department of Computer Engineering, Hacettepe University, Turkey
Burcu Can*
Affiliation:
Department of Computer Engineering, Hacettepe University, Turkey
*
*Corresponding author. E-mail: [email protected]

Abstract

In this article, we investigate using deep neural networks with different word representation techniques for named entity recognition (NER) on Turkish noisy text. We argue that valuable latent features for NER can, in fact, be learned without using any hand-crafted features and/or domain-specific resources such as gazetteers and lexicons. In this regard, we utilize character-level, character n-gram-level, morpheme-level, and orthographic character-level word representations. Since noisy data with NER annotation are scarce for Turkish, we introduce a transfer learning model in order to learn infrequent entity types as an extension to the Bi-LSTM-CRF architecture by incorporating an additional conditional random field (CRF) layer that is trained on a larger (but formal) text and a noisy text simultaneously. This allows us to learn from both formal and informal/noisy text, thus improving the performance of our model further for rarely seen entity types. We experimented on Turkish as a morphologically rich language and English as a relatively morphologically poor language. We obtained an entity-level F1 score of 67.39% on Turkish noisy data and 45.30% on English noisy data, which outperforms the current state-of-art models on noisy text. The English scores are lower compared to Turkish scores because of the intense sparsity in the data introduced by the user writing styles. The results prove that using subword information significantly contributes to learning latent features for morphologically rich languages.

Type
Article
Copyright
© Cambridge University Press 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aguilar, G., Maharjan, S., Monroy, A.P.L. and Solorio, T. (2017). A multi-task approach for named entity recognition in social media data. In Proceedings of the 3rd Workshop on Noisy User-Generated Text, Copenhagen, Denmark. Association for Computational Linguistics, pp. 148153.CrossRefGoogle Scholar
Bikel, D.M., Miller, S., Schwartz, R. and Weischedel, R. (1997). Nymble: A high-performance learning name-finder. In Proceedings of the Fifth Conference on Applied Natural Language Processing, ANLC’97. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 194201.CrossRefGoogle Scholar
Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research 3, 9931022.Google Scholar
Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, 135146.CrossRefGoogle Scholar
Cao, K. and Rei, M. (2016). A joint model for word embedding and word morphology. In Proceedings of the 1st Workshop on Representation Learning for NLP. Association for Computational Linguistics, pp. 1826.CrossRefGoogle Scholar
Çelikkaya, G., Torunoğlu, D. and Eryiğit, G. (2013). Named entity recognition on real data: A preliminary investigation for Turkish. In 2013 7th International Conference on Application of Information and Communication Technologies (AICT). IEEE, pp. 15.CrossRefGoogle Scholar
Chiu, J.P. and Nichols, E. (2015). Named entity recognition with bidirectional LSTM-CNNs. arXiv preprint arXiv:1511.08308.Google Scholar
Cotterell, R. and Duh, K. (2017). Low-resource named entity recognition with cross-lingual, character-level neural conditional random fields. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Taipei, Taiwan. Asian Federation of Natural Language Processing, pp. 9196.Google Scholar
Derczynski, L., Nichols, E., van Erp, M. and Limsopatham, N. (2017). Results of the WNUT2017 shared task on novel and emerging entity recognition. In Proceedings of the 3rd Workshop on Noisy User-generated Text, Copenhagen, Denmark. Association for Computational Linguistics, pp. 140147.Google Scholar
Eken, B. and Tantuğ, C. (2015). Recognizing named entities in Turkish tweets. In Proceedings of the Fourth International Conference on Software Engineering and Applications, Dubai, UAE.CrossRefGoogle Scholar
Glorot, X. and Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Teh, Y.W. and Titterington, M. (eds), Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Chia Laguna Resort, Sardinia, Italy, volume 9 of Proceedings of Machine Learning Research. PMLR, pp. 249256.Google Scholar
Godin, F., Vandersmissen, B., De Neve, W. and Van de Walle, R. (2015). Multimedia lab @ ACL WNUT NER shared task: Named entity recognition for Twitter microposts using distributed word representations. In Proceedings of the Workshop on Noisy User-generated Text. Association for Computational Linguistics, pp. 146153.CrossRefGoogle Scholar
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation 9(8), 17351780.Google ScholarPubMed
Huang, Z., Xu, W. and Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991.Google Scholar
Jansson, P. and Liu, S. (2017). Distributed representation, LDA topic modelling and deep learning for emerging named entity recognition from social media. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pp. 154159.CrossRefGoogle Scholar
Küçük, D. and Steinberger, R. (2014). Experiments to improve named entity recognition on Turkish tweets. arXiv preprint arXiv:1410.8668.Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K. and Dyer, C. (2016). Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360.Google Scholar
Landauer, T.K., Foltz, P.W. and Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes 25(2–3), 259284.CrossRefGoogle Scholar
Limsopatham, N. and Collier, N. H. (2016). Bidirectional LSTM for named entity recognition in Twitter messages. In Proceedings of the 2nd Workshop on Noisy User-generated Text, Osaka, Japan, pp. 145–152.Google Scholar
Lin, B.Y., Xu, F., Luo, Z. and Zhu, K. (2017). Multi-channel BiLSTM-CRF model for emerging named entity recognition in social media. In Proceedings of the 3rd Workshop on Noisy User-generated Text. Association for Computational Linguistics, pp. 160165.Google Scholar
Ma, X. and Hovy, E. (2016). End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354.Google Scholar
McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153157.CrossRefGoogle ScholarPubMed
Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.Google Scholar
Nair, V. and Hinton, G.E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10. USA: Omnipress, pp. 807814.Google Scholar
Okur, E., Demir, H. and Özgür, A. (2016). Named entity recognition on Twitter for Turkish using semi-supervised learning with word embeddings. In LREC.Google Scholar
Pagliardini, M., Gupta, P. and Jaggi, M. (2017). Unsupervised learning of sentence embeddings using compositional n-gram features. arXiv preprint arXiv:1703.02507.Google Scholar
Pennington, J., Socher, R. and Manning, C.D. (2014). Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics, pp. 15321543.Google Scholar
Petasis, G., Vichot, F., Wolinski, F., Paliouras, G., Karkaletsis, V. and Spyropoulos, C.D. (2001). Using machine learning to maintain rule-based named-entity recognition and classification systems. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL’01, Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 426433.CrossRefGoogle Scholar
Reimers, N., Eckle-Kohler, J., Schnober, C., Kim, J. and Gurevych, I. (2014). Germeval-2014: Nested named entity recognition with neural networks. In Proceedings of the KONVENS GermEval Shared Task on Named Entity Recognition, Hildesheim, Germany.Google Scholar
Riedl, M. and Padó, S. (2018). A named entity recognition shootout for German. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia. Association for Computational Linguistics, pp. 120125.CrossRefGoogle Scholar
Sak, H., Güngör, T. and Saraçlar, M. (2008). Turkish language resources: Morphological parser, morphological disambiguator and web corpus. In Advances in Natural Language Processing, pp. 417427. Springer.Google Scholar
Sak, H., Güngör, T. and Saraçlar, M. (2011). Resources for Turkish morphological processing. Language Resources and Evaluation 45(2), 249261.CrossRefGoogle Scholar
Şeker, G.A. and Eryiğit, G. (2017). Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content 1. Semantic Web 8(5), 625642.CrossRefGoogle Scholar
Sezer, B., Sezer, T. and Ünivesitesi, M. (2013). TS Corpus: Herkes için Türkçe derlem. In Proceedings 27th National Linguistics Conference, May, pp. 34.Google Scholar
Sikdar, U.K. and Gambäck, B. (2017). A feature-based ensemble approach to recognition of emerging and rare named entities. In Proceedings of the 3rd Workshop on Noisy User-generated Text, Copenhagen, Denmark. Association for Computational Linguistics, pp. 177181.Google Scholar
Suzuki, J. and Isozaki, H. (2008). Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In Proceedings of ACL-08: HLT, pp. 665673. Association for Computational Linguistics.Google Scholar
Torunoğlu, D. and Eryiğit, G. (2014). A cascaded approach for social media text normalization of Turkish. In Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM), Gothenburg, Sweden. Association for Computational Linguistics, pp. 6270.Google Scholar
Tür, G., Hakkani-Tür, D. and Oflazer, K. (2003). A statistical information extraction system for Turkish. Natural Language Engineering 9(2), 181210.CrossRefGoogle Scholar
Üstün, A. and Can, B. (2016). Unsupervised morphological segmentation using neural word embeddings. In Král, P. and Martín-Vide, C. (eds), Statistical Language and Speech Processing, pp. 4353. Cham: Springer International Publishing.Google Scholar
Üstün, A., Kurfal, M. and Can, B. (2018). Characters or morphemes: How to represent words? In Proceedings of The Third Workshop on Representation Learning for NLP, Melbourne, Australia. Association for Computational Linguistics, pp. 144153.Google Scholar
von Däniken, P. and Cieliebak, M. (2017). Transfer learning and sentence level features for named entity recognition on tweets. In Proceedings of the 3rd Workshop on Noisy User-generated Text, Copenhagen, Denmark. Association for Computational Linguistics, pp. 166171.Google Scholar
Williams, J. and Santia, G. (2017). Context-sensitive recognition for emerging and rare entities. In Proceedings of the 3rd Workshop on Noisy User-generated Text, Copenhagen, Denmark. Association for Computational Linguistics, pp. 172176.CrossRefGoogle Scholar
Wu, Y., Zhao, J. and Xu, B. (2003). Chinese named entity recognition combining statistical model with human knowledge. In Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition, Sapporo, Japan. Association for Computational Linguistics, pp. 65–72.CrossRefGoogle Scholar
Yang, Z., Salakhutdinov, R. and Cohen, W.W. (2017). Transfer learning for sequence tagging with hierarchical recurrent networks. arXiv preprint arXiv:1703.06345.Google Scholar
Yin, Z. and Shen, Y. (2018). On the dimensionality of word embedding. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18. USA: Curran Associates Inc, pp. 895906.Google Scholar
Supplementary material: File

Kağan Akkaya and Can Supplementary Materials

Kağan Akkaya and Can Supplementary Materials 1

Download Kağan Akkaya and Can Supplementary Materials(File)
File 35.2 KB