A cross-corpus study of subjectivity identification using unsupervised learning†

DONG WANG; YANG LIU

doi:10.1017/S1351324911000234

A cross-corpus study of subjectivity identification using unsupervised learning†

Published online by Cambridge University Press: 16 August 2011

DONG WANG and

YANG LIU

Show author details

DONG WANG: Affiliation:
Department of Computer Science, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas e-mail: [email protected], [email protected]
YANG LIU: Affiliation:
Department of Computer Science, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas e-mail: [email protected], [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In this study, we investigate using unsupervised generative learning methods for subjectivity detection across different domains. We create an initial training set using simple lexicon information and then evaluate two iterative learning methods with a base naive Bayes classifier to learn from unannotated data. The first method is self-training, which adds instances with high confidence into the training set in each iteration. The second is a calibrated EM (expectation-maximization) method where we calibrate the posterior probabilities from EM such that the class distribution is similar to that in the real data. We evaluate both approaches on three different domains: movie data, news resource, and meeting dialogues, and we found that in some cases the unsupervised learning methods can achieve performance close to the fully supervised setup. We perform a thorough analysis to examine factors, such as self-labeling accuracy of the initial training set in unsupervised learning, the accuracy of the added examples in self-training, and the size of the initial training set in different methods. Our experiments and analysis show inherent differences across domains and impacting factors explaining the model behaviors.

Type: Articles
Information: Natural Language Engineering , Volume 18 , Issue 3 , July 2012 , pp. 375 - 397

DOI: https://doi.org/10.1017/S1351324911000234 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Andreevskaia, A. and Bergler, S. 2008. When specialists and generalists work together: overcoming domain dependence in sentiment tagging. In Proceedings of ACL/HLT, Columbus, Ohio.Google Scholar

Chapelle, O., Schölkopf, B. and Zien, A. (eds). 2006. Semi-Supervised Learning. Cambridge, MA: MIT Press.CrossRef Google Scholar

Choi, Y. and Cardie, C. 2009. Adapting a polarity lexicon using integer linear programming for domainspecific sentiment classification. In Proceedings of EMNLP, Singapore.Google Scholar

Dai, W., Xue, G.-R., Yang, Q., and Yu, Y. 2007. Transferring naive Bayes classifiers for text classification. In Proceedings of AAAI, Vancouver, British Columbia, Canada.Google Scholar

Dasgupta, S. and Ng, V. 2009. Mine the easy, classify the hard: a semi-supervised approach to automatic sentiment classification. In Proceedings of ACL-IJCNLP, Suntec, Singapore.Google Scholar

Druck, G., Pal, C., McCallum, A., and Zhu, X. 2007. Semi-supervised classification with hybrid generative/discriminative methods. In Proceedings of ACM SIGKDD, San Jose, CA, USA.Google Scholar

Gyamfi, Y., Wiebe, J., Mihalcea, R. and Akkaya, C. 2009. Integrating knowledge for subjectivity sense labeling. In Proceedings of NAACL, Boulder, CO, USA.Google Scholar

Hu, M. and Liu, B. 2006. Opinion extraction and summarization on the web. In Proceedings of AAAI, Boston, MA, USA.Google Scholar

Kim, S.-M. and Hovy, E. 2005. Automatic detection of opinion bearing words and sentences. In Proceedings of ACL, Jeju Island, Korea.Google Scholar

Li, S., Huang, C.-R., Zhou, G., and Lee, S. Y. M. 2010. Employing personal/impersonal views in supervised and semi-supervised sentiment classification. In Proceedings of ACL, Uppsala, Sweden.Google Scholar

Melville, P., Gryc, W. and Lawrence, R. D. 2009. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of ACM SIGKDD, Paris, France.Google Scholar

Murray, G. and Carenini, G. 2008. Summarizing spoken and written conversations. In Proceedings of EMNLP, Honolulu, Hawaii.Google Scholar

Murray, G. and Carenini, G. 2009. Detecting subjectivity in multiparty speech. In Proceedings of Interspeech, Brighton, UK.Google Scholar

Nakagawa, T., Inui, K. and Kurohashi, S. 2010. Dependency tree-based sentiment classification using CRFs with hidden variables. In Proceedings of NAACL, Los Angeles, CA, USA.Google Scholar

Ng, V., Dasgupta, S. and Arifin, S. M. N. 2006. Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In Proceedings of COLING/ACL, Sydney, Australia.Google Scholar

Ni, X., Xue, G.-R., Ling, X., Yu, Y., and Yang, Q. 2007. Exploring in the weblog space by detecting informative and affective articles. In Proceedings of WWW, Banff, Alberta, Canada.Google Scholar

Nigam, K., McCallum, A. K., Thrun, S., and Mitchell, T. 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning 39: 103–34.CrossRef Google Scholar

Nishikawa, H., Hasegawa, T., Matsuo, Y. and Kikui, G. 2010. Optimizing informativeness and readability for sentiment summarization. In Proceedings of ACL, Uppsala, Sweden.Google Scholar

Pang, B. and Lee, L. 2004. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of ACL, Barcelona, Spain.Google Scholar

Pang, B. and Lee, L. 2008. Using very simple statistics for review search: An exploration. In Proceedings of COLING, Manchester, UK.Google Scholar

Pang, B., Lee, L. and Vaithyanathan, S. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of EMNLP, Philadelphia, PA, USA.Google Scholar

Raaijmakers, S. and Kraaij, W. 2008. A shallow approach to subjectivity classification. In Proceedings of ICWSM, Seattle, DC, USA.Google Scholar

Raaijmakers, S., Truong, K. and Wilson, T. 2008. Multimodal subjectivity analysis of multiparty conversation. In Proceedings of EMNLP, Honolulu, Hawaii.Google Scholar

Riloff, E. and Wiebe, J. 2003. Learning extraction patterns for subjective expressions. In Proceedings of EMNLP, Stroudsburg, PA, USA.Google Scholar

Riloff, E., Wiebe, J. and Phillips, W. 2005. Exploiting subjectivity classification to improve information extraction. In Proceedings of AAAI, Pittsburgh, PA, USA.Google Scholar

Sebastiani, F., Esuli, A. and Sebastiani, F. 2006. Determining term subjectivity and term orientation for opinion mining. In Proceedings of EACL, Trento, Italy.Google Scholar

Tsuruoka, Y. and Tsujii, J. 2003. Training a naive Bayes classifier via the EM algorithm with a class distribution constraint. In Proceedings of NAACL, Edmonton, Canada.Google Scholar

Wiebe, J. and Riloff, E. 2005. Creating subjective and objective sentence classifiers from unannotated texts. In Proceedings of CICLing, Mexico City, Mexico.Google Scholar

Wiebe, J., Wilson, T., Bruce, R., Bell, M., and Martin, M. 2004. Learning subjective language. Computational Linguistics 30 (3): 277–308.CrossRef Google Scholar

Wiegand, M. and Klakow, D. 2010. Bootstrapping supervised machine-learning polarity classifiers with rule-based classification. In Proceedings of WASSA, Lisbon, Portugal.Google Scholar

Wilson, T. 2008. Annotating subjective content in meetings. In Proceedings of LREC, Marrakech, Morocco.Google Scholar

Wilson, T. and Wiebe, J. 2003. Annotating opinions in the world press. In Proceedings of SIGdial, Sapporo, Japan.Google Scholar

Wilson, T., Wiebe, J. and Hwa, R. 2004. Just how mad are you? Finding strong and weak opinion clauses. In Proceedings of AAAI, San Jose, CA, USA.Google Scholar

Wilson, T., Wiebe, J. and Hoffmann, P. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of HLT-EMNLP, Vancouver, British Columbia, Canada.Google Scholar

Yu, H. and Hatzivassiloglou, V. 2003. Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of EMNLP, Stroudsburg, PA, USA.Google Scholar

Zhou, S., Chen, Q. and Wang, X. 2010. Active deep networks for semi-supervised sentiment classification. In Proceedings of COLING, Beijing, China.Google Scholar

Article contents

A cross-corpus study of subjectivity identification using unsupervised learning†

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests