Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-01-09T22:16:50.276Z Has data issue: false hasContentIssue false

To use or not to use: Feature selection for sentiment analysis of highly imbalanced data

Published online by Cambridge University Press:  07 August 2017

SANDRA KÜBLER
Affiliation:
Department of Linguistics, Indiana University, Bloomington, IN 47405, USA e-mail: [email protected]
CAN LIU
Affiliation:
Department of Computer Science, Indiana University, Bloomington, IN 47405, USA e-mails: [email protected], [email protected]
ZEESHAN ALI SAYYED
Affiliation:
Department of Computer Science, Indiana University, Bloomington, IN 47405, USA e-mails: [email protected], [email protected]

Abstract

We investigate feature selection methods for machine learning approaches in sentiment analysis. More specifically, we use data from the cooking platform Epicurious and attempt to predict ratings for recipes based on user reviews. In machine learning approaches to such tasks, it is a common approach to use word or part-of-speech n-grams. This results in a large set of features, out of which only a small subset may be good indicators for the sentiment. One of the questions we investigate concerns the extension of feature selection methods from a binary classification setting to a multi-class problem. We show that an inherently multi-class approach, multi-class information gain, outperforms ensembles of binary methods. We also investigate how to mitigate the effects of extreme skewing in our data set by making our features more robust and by using review and recipe sampling. We show that over-sampling is the best method for boosting performance on the minority classes, but it also results in a severe drop in overall accuracy of at least 6 per cent points.

Type
Articles
Copyright
Copyright © Cambridge University Press 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This work is based on research supported by the U.S. Office of Naval Research (ONR) via grant #N00014-10-1-0140.

References

Agarwal, B., and Mittal, N., 2012. Categorical probability proportion difference (CPPD): A feature selection method for sentiment classification. In Proceedings of the 2nd Workshop on Sentiment Analysis where AI meets Psychology (SAAIP), Mumbai, India, pp. 1726.Google Scholar
Baccianella, S., Esuli, A., and Sebastiani, F., 2010. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of International Conference on Language Resources and Evaluation (LREC), Valletta, Malta, vol. 10, pp. 2200–4.Google Scholar
Bird, S., Klein, E., and Loper, E. 2009. Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Sebastopol, CA.Google Scholar
Bollen, J., Mao, H., and Zeng, X.-J. 2011. Twitter mood predicts the stock market. Journal of Computational Science 2: 18.CrossRefGoogle Scholar
Brank, J., Grobelnik, M., Milic-Frayling, N., and Mladenic, D. 2002. Feature selection using linear support vector machines. Technical Report MSR-TR-2002-63, Microsoft Research.Google Scholar
Brants, T., 2000. TnT – A statistical part-of-speech tagger. In Proceedings of the 1st Conference of the North American Chapter of the Association for Computational Linguistics and the 6th Conference on Applied Natural Language Processing (ANLP/NAACL), Seattle, WA, pp. 224–31.Google Scholar
Brown, P., Della Pietra, V., deSouza, P., Lai, J., and Mercer, R., 1992. Class-based n-gram models of natural language. Computational Linguistics 18 (4): 467–79.Google Scholar
Chen, J., Huang, H., Tian, S., and Qu, Y., 2009. Feature selection for text classification with Naïve Bayes. Expert Systems with Applications 36 (3): 5432–5.Google Scholar
Crammer, K., and Singer, Y., 2002. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research 2: 265–92.Google Scholar
Duric, A., and Song, F., 2012. Feature selection for sentiment analysis based on content and syntax models. Decision Support Systems 53 (4): 704–11.CrossRefGoogle Scholar
Forman, G., 2003. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3 : 1289–305.Google Scholar
Forman, G. 2004. A pitfall and solution in multi-class feature selection for text classification. In Proceedings of the 21st International Conference on Machine Learning, Banff, Canada.Google Scholar
Glorot, X., Bordes, A., and Bengio, Y., 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th International Conference on Machine Learning (ICML), Bellevue, WA, pp. 513–20.Google Scholar
Guyon, I., and Elisseeff, A., 2003. An introduction to variable and feature selection. Journal of Machine Learning Research 3 : 1157–82.Google Scholar
Joachims, T. 1999. Making large-scale SVM learning practical. In Schölkopf, B., Burges, C., and Smola, A. (eds.), Advances in Kernel Methods – Support Vector Learning. MIT Press, Massachusetts Institute of Technology.Google Scholar
Koo, T., Carreras, X., and Collins, M., 2008. Simple semi-supervised dependency parsing. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL:HLT), Columbus, OH, pp. 595603.Google Scholar
Kummer, O., and Savoy, J., 2012. Feature selection in sentiment analysis. In Proceeding of the Conférence en Recherche d’Infomations et Applications (CORIA), Bordeaux, France, pp. 273–84.Google Scholar
Li, S., Xia, R., Zong, C., and Huang, C.-R., 2009. A framework of feature selection methods for text categorization. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, pp. 692700.Google Scholar
Liang, P. 2005. Semi-Supervised Learning for Natural Language. Master’s Thesis, MIT.Google Scholar
Liu, B. 2012. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers.CrossRefGoogle Scholar
Liu, C., Guo, C., Dakota, D., Rajagopalan, S., Li, W., Kübler, S., and Yu, N., 2014a. “My curiosity was satisfied, but not in a good way”: Predicting user ratings for online recipes. In Proceedings of the 2nd Workshop on Natural Language Processing for Social Media (SocialNLP), Dublin, Ireland, pp. 1221.CrossRefGoogle Scholar
Liu, C., Kübler, S., and Yu, N., 2014b. Feature selection for highly skewed sentiment analysis tasks. In Proceedings of the 2nd Workshop on Natural Language Processing for Social Media (SocialNLP), Dublin, Ireland, pp. 211.Google Scholar
Maas, A., Daly, R., Pham, P., Huang, D., Ng, A., and Potts, C., 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, pp. 142–50.Google Scholar
Maier, W., Kübler, S., Dakota, D., and Whyatt, D. 2014. Parsing German: How much morphology do we need? In Proceedings of the 1st Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages (SPMRL-SANCL), Dublin, Ireland, pp. 114.Google Scholar
Mitchell, T. 1997. Machine Learning. McGraw-Hill.Google Scholar
Mullen, T., and Collier, N., 2004. Sentiment analysis using support vector machines with diverse information sources. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 4, Barcelona, Spain, pp. 412–8.Google Scholar
Nakagawa, T., Inui, K., and Kurohashi, S. 2010. Dependency tree-based sentiment classification using CRFs with hidden variables. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 786–94.Google Scholar
Ng, A. 2004. Feature selection, L1 vs. L2 regularization, and rotational invariance. In Proceedings of the 21st International Conference on Machine Learning, Banff, Canada.Google Scholar
O’Keefe, T., and Koprinska, I., 2009. Feature selection and weighting methods in sentiment analysis. In Proceedings of the 14th Australasian Document Computing Symposium (ADCS), Sydney, Australia, pp. 6774.Google Scholar
Pang, B., and Lee, L. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, Spain.Google Scholar
Pang, B., and Lee, L., 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2 (1–2): 1135.CrossRefGoogle Scholar
Pang, B., Lee, L., and Vaithyanathan, S., 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, PA, pp. 7986.Google Scholar
Porter, M., 1980. An algorithm for suffix stripping. Program 14 (3): 130–7.CrossRefGoogle Scholar
Sadamitsu, K., Sekine, S., and Yamamoto, M., 2008. Sentiment analysis based on probabilistic models using inter-sentence information. In Proceedings of International Conference on Language Resources and Evaluation (LREC), Marrakesh, Morocco, pp. 2892–6.Google Scholar
Santorini, B. 1990. Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd revision, 2nd printing). Dept. Comput. Inf. Sci., Univ. Pennsylvania.Google Scholar
Severyn, A., and Moschitti, A., 2015. On the automatic learning of sentiment lexicons. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, CO, pp. 1397–402.Google Scholar
Socher, R., Pennington, J., Huang, E. H., Ng, A. Y., and Manning, C. D., 2011. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, pp. 151–61.Google Scholar
Sun, A., Grishman, R., and Sekine, S., 2011. Semi-supervised relation extraction with large-scale word clustering. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, pp. 521–9.Google Scholar
Tkachenko, M., and Simanovsky, A., 2012. Named entity recognition: Exploring features. In Proceedings of KONVENS 2012, 11th Conference on Natural Language Processing, Vienna, Austria, pp. 118–27.Google Scholar
Wilson, T., Wiebe, J., and Hoffmann, P., 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, Canada, pp. 347–54.Google Scholar
Yang, Y., and Pedersen, J., 1997. A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML), Nashville, TN, pp. 412–20.Google Scholar
Ye, Q., Zhang, Z., and Law, R., 2009. Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Systems with Applications 36 (3): 6527–35.Google Scholar
Yu, N., Zhekova, D., Liu, C., and Kübler, S. 2013. Do good recipes need butter? Predicting user ratings of online recipes. In Proceedings of the IJCAI Workshop on Cooking with Computers, Beijing, China.Google Scholar
Zheng, Z., Wu, X., and Srihari, R., 2004. Feature selection for text categorization on imbalanced data. ACM SIGKDD Explorations Newsletter 6 (1): 80–9.CrossRefGoogle Scholar