Hostname: page-component-cd9895bd7-gvvz8 Total loading time: 0 Render date: 2024-12-23T12:30:37.625Z Has data issue: false hasContentIssue false

Sentiment analysis in Turkish: Supervised, semi-supervised, and unsupervised techniques

Published online by Cambridge University Press:  17 April 2020

Cem Rıfkı Aydın*
Affiliation:
Department of Computer Engineering, Boğaziçi University, Istanbul34342, Turkey
Tunga Güngör
Affiliation:
Department of Computer Engineering, Boğaziçi University, Istanbul34342, Turkey
*
*Corresponding author. E-mail: [email protected]

Abstract

Although many studies on sentiment analysis have been carried out for widely spoken languages, this topic is still immature for Turkish. Most of the works in this language focus on supervised models, which necessitate comprehensive annotated corpora. There are a few unsupervised methods, and they utilize sentiment lexicons either built by translating from English lexicons or created based on corpora. This results in improper word polarities as the language and domain characteristics are ignored. In this paper, we develop unsupervised (domain-independent) and semi-supervised (domain-specific) methods for Turkish, which are based on a set of antonym word pairs as seeds. We make a comprehensive analysis of supervised methods under several feature weighting schemes. We then form ensemble of supervised classifiers and also combine the unsupervised and supervised methods. Since Turkish is an agglutinative language, we perform morphological analysis and use different word forms. The methods developed were tested on two datasets having different styles in Turkish and also on datasets in English to show the portability of the approaches across languages. We observed that the combination of the unsupervised and supervised approaches outperforms the other methods, and we obtained a significant improvement over the state-of-the-art results for both Turkish and English.

Type
Article
Copyright
© Cambridge University Press 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abdul-Mageed, M., Diab, M.T. and Korayem, M. (2011). Subjectivity and sentiment analysis of modern standard Arabic. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 2, Portland, OR, USA, pp. 587591.Google Scholar
Akın, A.A. and Akın, M.D. (2007). Zemberek, an open source NLP framework for Turkic languages. Structure 10, 15.Google Scholar
Baccianella, S., Esuli, A. and Sebastiani, F. (2010). SENTIWORDNET 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta, pp. 22002204.Google Scholar
Baziotis, C., Pelekis, N. and Doulkeridis, C. (2017). DataStories at SemEval-2017 Task 4: deep LSTM with attention for message-level and topic-based sentiment analysis. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, pp. 747754.CrossRefGoogle Scholar
Britz, D. (2017). Convolutional neural network for text classification in TensorFlow. https://github.com/dennybritz/cnn-text-classification-tf.Google Scholar
Çetin, M. and Amasyalı, M.F. (2013). Active learning for Turkish sentiment analysis. In Proceedings of the International Symposium on Innovations in Intelligent Systems and Applications, Albenia, Bulgaria, pp. 14.Google Scholar
Chen, R. and Yu, K. (2018). Fast OOV words incorporation using structured word embeddings for neural network language model. In International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, Canada, pp. 61196123.CrossRefGoogle Scholar
Davidov, D., Tsur, O. and Rappoport, A. (2010). Enhanced sentiment learning using Twitter hashtags and smileys. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Beijing, China, pp. 241249.Google Scholar
Dehkhargani, R., Saygn, Y., Yanıkoğlu, B. and Oflazer, K. (2016). SentiTurkNet: a Turkish polarity lexicon for sentiment analysis. Language Resources and Evaluation 50(3), 667685.CrossRefGoogle Scholar
Farhadloo, M. and Rolland, E. (2013). Multi-class sentiment analysis with clustering and score representation. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops, Dallas, TX, USA, pp. 904912.CrossRefGoogle Scholar
Felbo, B., Mislove, A., Søgaard, A., Rahwan, I. and Lehmann, S. (2017). Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In Proceedings of the EMNLP 2017: Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 16151625.CrossRefGoogle Scholar
Fontes, L.A. (2009). Interviewing Client Across Cultures: A Practitioner’s Guide. New York: Guilford Press.Google Scholar
Garneau, N., Leboeuf, J.S. and Lamontagne, L. (2018). Predicting and interpreting embeddings for out of vocabulary words in downstream tasks. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, pp. 331333.CrossRefGoogle Scholar
Go, A., Bhayani, R. and Huang, L. (2009). Twitter sentiment classification using distant supervision. Processing 150, 16.Google Scholar
Goldberg, Y. and Hirst, G. (2017). Neural Network Methods in Natural Language Processing. San Rafael: Morgan & Claypool Publishers.CrossRefGoogle Scholar
Guha, S., Joshi, A. and Varma, V. (2015). SIEL: aspect based sentiment analysis in reviews. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, CO, USA, pp. 759766.CrossRefGoogle Scholar
Güngör, O. and Yıldız, E. (2017). Linguistic features in Turkish word representations. In Proceedings of the 25th Signal Processing and Communications Applications Conference (SIU), Antalya, Turkey, pp. 14.Google Scholar
Hamilton, W., Clark, K., Leskovec, J. and Jurafsky, D. (2016). Inducing domain-specific sentiment lexicons from unlabeled corpora. In Proceedings of the EMNLP 2016: Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, pp. 111.CrossRefGoogle Scholar
Hatzivassiloglou, V. and McKeown, K.R. (1997). Predicting the semantic orientation of adjectives. In Proceedings of the 35th Annual Meeting of the ACL and the 8th Conference of the European Chapter of the ACL, New Brunswick, NJ, USA, pp. 174181.Google Scholar
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation 9(8), 17351780.CrossRefGoogle ScholarPubMed
Horn, F. (2017). Context encoders as a simple but powerful extension of word2vec. In Proceedings of the 2nd Workshop on Representation Learning for NLP, Vancouver, Canada, pp. 1014.CrossRefGoogle Scholar
Jang, H. and Shin, H. (2010). Language-specific sentiment analysis in morphologically rich language. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING ’10): Posters, Beijing, China, pp. 498506.Google Scholar
Jiang, M., Lan, M. and Wu, Y. (2017). ECNU at SemEval-2017 Task 5: an ensemble of regression algorithms with effective features for fine-grained sentiment analysis in financial domain. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, pp. 888893.CrossRefGoogle Scholar
Joshi, A., Bhattacharyya, P. and Balamurali, A.R. (2010). A fall-back strategy for sentiment analysis in Hindi: a case study. In Proceedings of the 8th ICON, Kharagbur, India, pp. 16.Google Scholar
Kaya, M., Fidan, G. and Toroslu, İ. (2012). Sentiment analysis of Turkish political news. In Proceedings of the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01, Washington, DC, USA, pp. 174180.CrossRefGoogle Scholar
Kulcu, S. and Doğdu, E. (2016). A scalable approach for sentiment analysis of Turkish tweets and linking tweets to news. In Proceedings of the 2016 IEEE Tenth International Conference on Semantic Computing, Noida, India, pp. 471476.CrossRefGoogle Scholar
Lango, M., Brzezinski, D. and Stefanowski, J. (2016). PUT at SemEval-2016 Task 4: the ABC of Twitter sentiment analysis. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, CA, USA, pp. 126132.CrossRefGoogle Scholar
Li, G. and Liu, F. (2010). A clustering-based approach on sentiment analysis. In Proceedings of the 2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering, Hangzhou, China, pp. 331337.Google Scholar
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y. and Pott, C. (2011). Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, Portland, OR, USA, pp. 142150.Google Scholar
Martnez-Cámara, E., Martın-Valdivia, M.T., Molina-González, M.D. and Perea-Ortega, J.M. (2014). Integrating Spanish lexical resources by meta-classifiers for polarity classification. Journal of Information Science 3, 538554.CrossRefGoogle Scholar
Martineau, J. and Finin, T. (2009). Delta TFIDF: an improved feature space for sentiment analysis. In Proceedings of the Third AAAI International Conference on Weblogs and Social Media, San Jose, CA, USA, pp. 258–261.Google Scholar
Medagoda, N. (2016) Sentiment analysis on morphologically rich languages: an artificial neural network (ANN) approach. In Shanmuganathan S. and Samarasinghe S. (eds), Artificial Neural Network Modelling. Springer International Publishing, pp. 377–393.CrossRefGoogle Scholar
Medagoda, N. (2017). Framework for Sentiment Classification for Morphologically Rich Languages: A Case Study for Sinhala. PhD Thesis. Auckland, New Zealand: Auckland University of Technology.Google Scholar
Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013). Efficient estimation of word representations in vector space. CoRR 1301(3), 112.Google Scholar
Ng, A.Y. and Jordan, M.I. (2002). On discriminative vs generative classifiers: a comparison of logistic regression and naive Bayes. In Neural Information Processing Systems, Vancouver, Canada, pp. 841848.Google Scholar
Pang, B. and Lee, L. (2005). Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), Sydney, Australia, pp. 115124.CrossRefGoogle Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. and Duchesnay, E. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12, 28252830.Google Scholar
Rosenthal, S., Farra, N. and Nakov, P. (2017). SemEval-2017 Task 4: sentiment analysis in Twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, pp. 502518.CrossRefGoogle Scholar
Sak, H., Güngör, T. and Saraçlar, M. (2007). Morphological disambiguation of Turkish text with perceptron algorithm. In Proceedings of the CICLing 2007, Mexico City, Mexico, pp. 107118.CrossRefGoogle Scholar
Sak, H., Güngör, T. and Saraçlar, M. (2008). Turkish language resources: morphological parser, morphological disambiguator and web corpus. In Proceedings of the GoTAL 2008, Gothenburg, Sweden, pp. 417427.CrossRefGoogle Scholar
Santos, C.N. and Gatti, M. (2014). Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of the COLING 2014, The 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland, pp. 6978.Google Scholar
Saroufim, C., Almatarky, A. and Abdel Hady, M. (2018). Language independent sentiment analysis with sentiment-specific word embeddings. In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Brussels, Belgium, pp. 1423.CrossRefGoogle Scholar
Taboada, M., Anthony, C. and Voll, K. (2006). Methods for creating semantic orientation dictionaries. In Proceedings of Fifth International Conference on Language Resources and Evaluation (LREC), Genoa, Italy, pp. 427432.Google Scholar
Thelwall, M., Buckley, K. and Paltoglou, G. (2012). Sentiment strength detection for the social web. Journal of the American Society for Information Science and Technology 63(1), 163173.CrossRefGoogle Scholar
Torunoğlu, D. and Eryiğit, G. (2014). A cascaded approach for social media text normalization of Turkish. In Proceedings of the 5th Workshop on Language Analysis for Social Media at EACL, Gothenburg, Sweden, pp. 62–70.CrossRefGoogle Scholar
Turney, P.D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, pp. 417–424.Google Scholar
Türkmenoğlu, C. and Tantuğ, A.C. (2014). Sentiment analysis in Turkish media. In Proceedings of the Workshop on Issues of Sentiment Discovery and Opinion Mining, International Conference on Machine Learning, Beijing, China, pp. 111.Google Scholar
Vural, A.G., Cambazoğlu, B.B., Şenkul, P. and Tokgöz, Z.Ö. (2012). A framework for sentiment analysis in Turkish: application to polarity detection of movie reviews in Turkish. In Proceedings of the 27th International Symposium on Computer and Information Sciences, Paris, France, pp. 437–445.Google Scholar
Wang, S. and Manning, C.D. (2012). Baselines and bigrams: simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea, pp. 90–94.Google Scholar
Yang, H. and Chao, A.F.Y. (2015). Sentiment analysis for Chinese reviews of movies in multi-genre based on morpheme-based features and collocations. Information Systems Frontiers 17(6), 13351352.CrossRefGoogle Scholar
Yıldırım, E., Çetin, F.S., Eryiğit, G. and Temel, T. (2014). The impact of NLP on Turkish sentiment analysis. In Proceedings of the TURKLANG’14 International Conference on Turkic Language Processing, Istanbul, Turkey, pp. 1–6.Google Scholar
Yıldız, E., Tırkaz, C., Şahin, H.B., Eren, M.T. and Sönmez, O.O. (2016). A morphology-aware network for morphological disambiguation. In 30th AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, pp. 2863–2869.Google Scholar