Classifying Korean comparative sentences for comparison analysis

SEON YANG; YOUNGJOONG KO

doi:10.1017/S1351324913000211

Classifying Korean comparative sentences for comparison analysis

Published online by Cambridge University Press: 09 September 2013

SEON YANG and

YOUNGJOONG KO

Show author details

SEON YANG: Affiliation:
Department of Computer Engineering, Dong-A University, 840 Hadan 2-dong, Saha-gu, Busan, 604-714, South Korea e-mail: [email protected], [email protected]
YOUNGJOONG KO*: Affiliation:
Department of Computer Engineering, Dong-A University, 840 Hadan 2-dong, Saha-gu, Busan, 604-714, South Korea e-mail: [email protected], [email protected]
*: Corresponding author.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Comparisons sort objects based on their superiority or inferiority and they may have major effects on a variety of evaluation processes. The Web facilitates qualitative and quantitative comparisons via online debates, discussion forums, product comparison sites, etc., and comparison analysis is becoming increasingly useful in many application areas. This study develops a method for classifying sentences in Korean text documents into several different comparative types to facilitate their analysis. We divide our study into two tasks: (1) extracting comparative sentences from text documents and (2) classifying comparative sentences into seven types. In the first task, we investigate many actual comparative sentences by referring to previous studies and construct a lexicon of comparisons. Sentences that contain elements from the lexicon are regarded as comparative sentence candidates. Next, we use machine learning techniques to eliminate non-comparative sentences from the candidates. In the second task, we roughly classify the comparative sentences using keywords and use a transformation-based learning method to correct initial classification errors. Experimental results show that our method could be suitable for practical use. We obtained an F1-score of 90.23% in the first task, an accuracy of 81.67% in the second task, and an overall accuracy of 88.59% for the integrated system with both tasks.

Type: Articles
Information: Natural Language Engineering , Volume 20 , Issue 4 , October 2014 , pp. 557 - 581

DOI: https://doi.org/10.1017/S1351324913000211 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Berger, A. L., Della Pietra, S. A., and Della Pietra, V. J., 1996. A maximum entropy approach to natural language processing. Computational Linguistics 22 (1): 39–71.Google Scholar

Black, W. J., and Vasilakopoulos, A., 2002. Language-independent named entity classification by modified transformation-based learning and by decision tree induction. In Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002), Taipei, Taiwan, vol. 20, pp. 1–4.Google Scholar

Brill, E., 1992. A simple rule-based part of speech tagger. In Proceedings of the Third Conference on Applied Natural language Processing (ANLP 1992), Trento, Italy, pp. 152–5.CrossRef Google Scholar

Brill, E., 1995. Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Computational Linguistics 21 (4): 543–65.Google Scholar

Ding, X., Liu, B., and Yu, P. S., 2008. A holistic lexicon-based approach to opinion mining. In Proceedings of ACM International Conference on Web Search and Data Mining (WSDM 2008), Stanford, USA, pp. 231–40.Google Scholar

Esuli, A., and Sebastiani, F., 2006a. Determining term subjectivity and term orientation for opinion mining. In Proceedings of European Chapter of the Association for Computational Linguistics (EACL 2006), Trento, Italy, pp. 193–200.Google Scholar

Esuli, A., and Sebastiani, F. 2006b. SentiWordNet: a publicly available lexical resource for opinion mining. In Proceedings of the Fifth Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy.Google Scholar

Ha, G.-J., 1999a. Korean Modern Comparative Syntax. Seoul, Korea: Pijbook Press.Google Scholar

Ha, G.-j., 1999b. Research on Korean equality comparative syntax. Association for Korean Linguistics 5: 229–65.Google Scholar

Hu, M., and Liu, B., 2004. Mining and summarizing customer reviews. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2004 (KDD 2004), Seattle, USA, pp. 168–77.Google Scholar

Jeong, I.-s., 2000. Research on Korean adjective superlative comparative syntax. Korean Han-min-jok Eo-mun-hak 36: 61–86.Google Scholar

Jindal, N., and Liu, B., 2006a. Identifying comparative sentences in text documents. In Proceedings of Association for Computing Machinery/Special Interest Group on Information Retrieval (SIGIR 2006), Seattle, USA, pp. 244–51.Google Scholar

Jindal, N., and Liu, B., 2006b. Mining comparative sentences and relations. In Proceedings of Association for Advancement of Artificial Intelligence (AAAI 2006), Boston, USA, pp. 1331–6.Google Scholar

Joachims, T., 1998. Text categorization with support vector machines: learning with many relevant features. In Proceedings of European Conference on Machine Learning (ECML 1998), Chemnitz, Germany, pp. 137–42.Google Scholar

Kaji, N., and Kitsuregawa, M., 2007. Building lexicon for sentiment analysis from massive collection of HTML documents. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), Prague, Czech Republic, pp. 1075–83.Google Scholar

Kanayama, H., and Nasukawa, T., 2006. Fully automatic lexicon expansion for domain-oriented sentiment analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), Sydney, Australia, pp. 355–63.Google Scholar

Kim, S.-M. and Hovy, E., 2006. Identifying and analyzing judgment opinions. In Proceedings of the Human Language Technology Conference – North American Chapter of the Association for Computational Linguistics (NAACL 2006), New York City, USA, pp. 200–207.Google Scholar

Li, J., and Sun, M., 2007. Experimental study on sentiment classification of Chinese review using machine learning techniques. In International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE 2007), Beijing, China, pp. 393–400.CrossRef Google Scholar

Liu, B., 2006. Web Data Mining. New York City, USA: Springer.Google Scholar

Oh, K.-s., 2004. The difference between ‘Man-kum’ comparative and ‘Cheo-rum’ comparative. Society of Korean Semantics 14: 197–221.Google Scholar

Pang, B., and Lee, L., 2004. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, pp. 271–8.Google Scholar

Refaeilzadeh, P., Tang, L., and Liu, H. 2009. Cross-validation. In Encyclopedia of Database Systems, pp. 532–8. New York City, USA: Springer.CrossRef Google Scholar

Riloff, E., and Wiebe, J., 2003. Learning extraction patterns for subjective expressions. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2003), Sapporo, Japan, pp. 105–12.Google Scholar

Riloff, E., Wiebe, J., and Wilson, T., 2003. Learning subjective nouns using extraction pattern bootstrapping. In Proceedings of the Seventh Conference on Natural Language Learning (CoNLL-2003), New York City, USA, pp. 25–32.Google Scholar

Turney, P. D., 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, USA, pp. 417–24.Google Scholar

Wan, X., 2008. Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), Waikiki, Hawaii, USA, pp. 553–61.Google Scholar

Wan, X. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the Association of Computational Linguistics and the International Joint Conference on Natural Language Processing (ACL–IJCNLP 2009), Singapore, pp. 235–43.Google Scholar

Wiebe, J., and Riloff, E., 2005. Creating subjective and objective sentence classifiers from unannotated texts. In Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2005), Mexico City, Mexico, pp. 486–97.CrossRef Google Scholar

Wiebe, J., Wilson, T., and Cardie, C., 2005. Annotating expressions of opinions and emotions in language. Language Resources and Evaluation 39: 165–210.CrossRef Google Scholar

Wilson, T., Wiebe, J., and Hoffmann, P., 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of Human Language Technologies Conference/Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), Vancouver, Canada, pp. 347–54.Google Scholar

Article contents

Classifying Korean comparative sentences for comparison analysis

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests