Published online by Cambridge University Press: 09 September 2013
Comparisons sort objects based on their superiority or inferiority and they may have major effects on a variety of evaluation processes. The Web facilitates qualitative and quantitative comparisons via online debates, discussion forums, product comparison sites, etc., and comparison analysis is becoming increasingly useful in many application areas. This study develops a method for classifying sentences in Korean text documents into several different comparative types to facilitate their analysis. We divide our study into two tasks: (1) extracting comparative sentences from text documents and (2) classifying comparative sentences into seven types. In the first task, we investigate many actual comparative sentences by referring to previous studies and construct a lexicon of comparisons. Sentences that contain elements from the lexicon are regarded as comparative sentence candidates. Next, we use machine learning techniques to eliminate non-comparative sentences from the candidates. In the second task, we roughly classify the comparative sentences using keywords and use a transformation-based learning method to correct initial classification errors. Experimental results show that our method could be suitable for practical use. We obtained an F1-score of 90.23% in the first task, an accuracy of 81.67% in the second task, and an overall accuracy of 88.59% for the integrated system with both tasks.