Natural Language Engineering: Volume 18 - Issue 3

In Memoriam: David L. Waltz
John Tait
Published online by Cambridge University Press:

25 May 2012, p. 291
- Article
- - You have access
- PDF
- HTML
- Export citation
David L. Waltz died on March 22, 2012 after suffering from brain cancer.
Dave was a good friend to Natural Language Engineering, and provided some sage advice when Roberto Garigliano and I started working on the proposed journal in the early 1990s; he subsequently agreed to serve as a founding editorial board member.

Recentred local profiles for authorship attribution
ROBERT LAYTON, PAUL WATTERS, RICHARD DAZELEY
Published online by Cambridge University Press:

09 June 2011, pp. 293-312
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Authorship attribution methods aim to determine the author of a document, by using information gathered from a set of documents with known authors. One method of performing this task is to create profiles containing distinctive features known to be used by each author. In this paper, a new method of creating an author or document profile is presented that detects features considered distinctive, compared to normal language usage. This recentreing approach creates more accurate profiles than previous methods, as demonstrated empirically using a known corpus of authorship problems. This method, named recentred local profiles, determines authorship accurately using a simple ‘best matching author’ approach to classification, compared to other methods in the literature. The proposed method is shown to be more stable than related methods as parameter values change. Using a weighted voting scheme, recentred local profiles is shown to outperform other methods in authorship attribution, with an overall accuracy of 69.9% on the ad-hoc authorship attribution competition corpus, representing a significant improvement over related methods.

Exploring patterns in dictionary definitions for synonym extraction
TONG WANG, GRAEME HIRST
Published online by Cambridge University Press:

11 July 2011, pp. 313-342
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Automatic determination of synonyms and/or semantically related words has various applications in Natural Language Processing. Two mainstream paradigms to date, lexicon-based and distributional approaches, both exhibit pros and cons with regard to coverage, complexity, and quality. In this paper, we propose three novel methods—two rule-based methods and one machine learning approach—to identify synonyms from definition texts in a machine-readable dictionary. Extracted synonyms are evaluated in two extrinsic experiments and one intrinsic experiment. Evaluation results show that our pattern-based approach achieves best performance in one of the experiments and satisfactory results in the other, comparable to corpus-based state-of-the-art results.

Semantic composition of AT-LOCATION relation with other relations
HAKKI C. CANKAYA, EDUARDO BLANCO, DAN MOLDOVAN
Published online by Cambridge University Press:

18 August 2011, pp. 343-374
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper presents a method for the composition of at-location with other semantic relations. The method is based on inference axioms that combine two semantic relations yielding another relation that otherwise is not expressed. An experimental study conducted on PropBank, WordNet, and eXtended WordNet shows that inferences have high accuracy. The method is applicable to combining other semantic relations and it is beneficial to many semantically intense applications.

A cross-corpus study of subjectivity identification using unsupervised learning †
DONG WANG, YANG LIU
Published online by Cambridge University Press:

16 August 2011, pp. 375-397
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In this study, we investigate using unsupervised generative learning methods for subjectivity detection across different domains. We create an initial training set using simple lexicon information and then evaluate two iterative learning methods with a base naive Bayes classifier to learn from unannotated data. The first method is self-training, which adds instances with high confidence into the training set in each iteration. The second is a calibrated EM (expectation-maximization) method where we calibrate the posterior probabilities from EM such that the class distribution is similar to that in the real data. We evaluate both approaches on three different domains: movie data, news resource, and meeting dialogues, and we found that in some cases the unsupervised learning methods can achieve performance close to the fully supervised setup. We perform a thorough analysis to examine factors, such as self-labeling accuracy of the initial training set in unsupervised learning, the accuracy of the added examples in self-training, and the size of the initial training set in different methods. Our experiments and analysis show inherent differences across domains and impacting factors explaining the model behaviors.

Evaluating vector space models with canonical correlation analysis *
SAMI VIRPIOJA, MARI-SANNA PAUKKERI, ABHISHEK TRIPATHI, TIINA LINDH-KNUUTILA, KRISTA LAGUS
Published online by Cambridge University Press:

20 September 2011, pp. 399-436
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Vector space models are used in language processing applications for calculating semantic similarities of words or documents. The vector spaces are generated with feature extraction methods for text data. However, evaluation of the feature extraction methods may be difficult. Indirect evaluation in an application is often time-consuming and the results may not generalize to other applications, whereas direct evaluations that measure the amount of captured semantic information usually require human evaluators or annotated data sets. We propose a novel direct evaluation method based on canonical correlation analysis (CCA), the classical method for finding linear relationship between two data sets. In our setting, the two sets are parallel text documents in two languages. A good feature extraction method should provide representations that reflect the semantic content of the documents. Assuming that the underlying semantic content is independent of the language, we can study feature extraction methods that capture the content best by measuring dependence between the representations of a document and its translation. In the case of CCA, the applied measure of dependence is correlation. The evaluation method is based on unsupervised learning, it is language- and domain-independent, and it does not require additional resources besides a parallel corpus. In this paper, we demonstrate the evaluation method on a sentence-aligned parallel corpus. The method is validated by showing that the obtained results with bag-of-words representations are intuitive and agree well with the previous findings. Moreover, we examine the performance of the proposed evaluation method with indirect evaluation methods in simple sentence matching tasks, and a quantitative manual evaluation of word translations. The results of the proposed method correlate well with the results of the indirect and manual evaluations.

NLE volume 18 issue 3 Cover and Front matter
Published online by Cambridge University Press:

25 May 2012, pp. f1-f2
- Article
- - You have access
- PDF
- Export citation

NLE volume 18 issue 3 Cover and Back matter
Published online by Cambridge University Press:

25 May 2012, pp. b1-b8
- Article
- - You have access
- PDF
- Export citation

Natural Language Processing

Refine listing

Actions for selected content:

Natural Language Engineering, Volume 18 - Issue 3 - July 2012

Obituary

In Memoriam: David L. Waltz

Articles

Recentred local profiles for authorship attribution

Exploring patterns in dictionary definitions for synonym extraction

Semantic composition of AT-LOCATION relation with other relations

A cross-corpus study of subjectivity identification using unsupervised learning †

Evaluating vector space models with canonical correlation analysis *

Front Cover (OFC, IFC) and matter

NLE volume 18 issue 3 Cover and Front matter

Back Cover (IBC, OBC) and matter

NLE volume 18 issue 3 Cover and Back matter

Natural Language Processing

Refine listing

Actions for selected content:

Save Search

Natural Language Engineering, Volume 18 - Issue 3 - July 2012

Obituary

Articles

Front Cover (OFC, IFC) and matter

Back Cover (IBC, OBC) and matter