Hostname: page-component-78c5997874-dh8gc Total loading time: 0 Render date: 2024-11-10T04:20:43.007Z Has data issue: false hasContentIssue false

Exploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor Corpus

Published online by Cambridge University Press:  21 September 2005

L. BENTIVOGLI
Affiliation:
ITC-irst, Via Sommarive, 18-38050 Povo, Trento, Italy e-mail: [email protected]
E. PIANTA
Affiliation:
ITC-irst, Via Sommarive, 18-38050 Povo, Trento, Italy e-mail: [email protected]

Abstract

In this article we illustrate and evaluate an approach to create high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the assumption that if a text in one language has been annotated and its translation has not, annotations can be transferred from the source text to the target using word alignment as a bridge. The transfer approach has been tested and extensively applied for the creation of the MultiSemCor corpus, an English/Italian parallel corpus created on the basis of the English SemCor corpus. In MultiSemCor the texts are aligned at the word level and word sense annotated with a shared inventory of senses. A number of experiments have been carried out to evaluate the different steps involved in the methodology and the results suggest that the transfer approach is one promising solution to the resource bottleneck. First, it leads to the creation of a parallel corpus, which represents a crucial resource per se. Second, it allows for the exploitation of existing (mostly English) annotated resources to bootstrap the creation of annotated corpora in new (resource-poor) languages with greatly reduced human effort.

Type
Papers
Copyright
2005 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)