1 Graphs and natural language processing
Graph structures naturally model connections. In natural language processing (NLP) connections are ubiquitous, on anything between small and web scale. We find them between words – as grammatical, collocation or semantic relations – contributing to the overall meaning, and maintaining the cohesive structure of the text and the discourse unity. We find them between concepts in ontologies or other knowledge repositories – since the early ages of artificial intelligence, associative or semantic networks have been proposed and used as knowledge stores, because they naturally capture the language units and relations between them, and allow for a variety of inference and reasoning processes, simulating some of the functionalities of the human mind. We find them between complete texts or web pages, and between entities in a social network, where they model relations at the web scale. Beyond the more often encountered ‘regular’ graphs, hypergraphs have also appeared in our field to model relations between more than two units.
Graphs have been rigorously studied, both mathematically and computationally, providing a well-developed theoretical and practical base to the many natural language processing problems that map into this framework.
In syntax, part-of-speech tagging was tackled using graph clustering (Biemann Reference Biemann2006) and dependency parsing using minimum spanning trees (McDonald et al. Reference McDonald, Pereira, Ribarov and Hajic2005). Related to parsing is the task of prepositional phrase attachment, which found interesting solutions in semi-supervised based learning (Toutanova, Manning and Ng Reference Toutanova, Manning and Ng2004). Min-cut algorithms have also been used in text processing applications, for instance for the problem of coreference (Nicolae and Nicolae Reference Nicolae and Nicolae2006).
In semantics, graphs have been used to construct semantic classes (Widdows and Dorow Reference Widdows and Dorow2002) through networks of words built from very large corpora. On similar word networks, work has also been done on understanding lexical network properties (Ferrer i Cancho and Solé Reference Ferrer i Cancho and Solé2001), or extracting words that follow certain semantic relations such as synonymy (Weale, Brew and Fosler-Lussier Reference Weale, Brew and Fosler-Lussier2009). A significant amount of effort has also been put into the measurement of semantic distance using path-based algorithms on semantic networks (Lin Reference Lin1998) or random-walks (Ramage, Rafferty and Manning Reference Ramage, Rafferty and Manning2009). These random-walk algorithms have been successfully applied to other problems in semantics, such as word sense disambiguation (Sinha and Mihalcea Reference Sinha and Mihalcea2007; Agirre and Soroa Reference Agirre and Soroa2009), name disambiguation (Minkov, Cohen and Ng Reference Minkov, Cohen and Ng2006) among others. The problems of sentiment and subjectivity analysis were also tackled using graph-based methods, such as clustering over graphs to identify the polarity of adjectives (Hatzivassiloglou and McKeown Reference Hatzivassiloglou and McKeown1997), or min-cut algorithms for sentence-level subjectivity analysis (Pang and Lee Reference Pang and Lee2004).
There are also a number of natural language processing applications that found successful solutions in the use of graph-based methods. These include text summarization (Erkan and Radev Reference Erkan and Radev2004; Mihalcea and Tarau Reference Mihalcea and Tarau2004), semi-supervised passage retrieval (Otterbacher, Erkan and Radev Reference Otterbacher, Erkan and Radev2005), keyword extraction (Mihalcea and Tarau Reference Mihalcea and Tarau2004), text mining (Kozareva and Hovy Reference Kozareva and Hovy2011), deriving semantic classes (Kozareva, Riloff and Hovy Reference Kozareva, Riloff and Hovy2008), topic identification (Syed, Finin and Joshi Reference Syed, Finin and Joshi2008; Coursey, Mihalcea and Moen Reference Coursey, Mihalcea and Moen2009), topic segmentation (Malioutov and Barzilay Reference Malioutov and Barzilay2006), machine translation (Zens and Ney Reference Zens and Ney2005), cross-language information retrieval (Monz and Dorr Reference Monz and Dorr2005), and question answering (Molla Reference Molla2006).
2 Overview of the issue
The four papers in the current special issue each showcase and exploit a different aspect/facet of graphs for different tasks in natural language processing.
Kotlerman et al. present a novel twist on the graphs as a framework for the organization of knowledge. Textual Entailment Graphs expand the notion of the ontology as a network of concepts to larger text units that convey complex information. The entailment relation replaces the is-a relation from ontologies, and text fragments replace concepts, to obtain a structure that organizes text units by subsumption of the conveyed information.
Jadidinejad et al. exploit the graph structure of knowledge repositories for the computation of semantic relatedness between texts. Previously, ontologies were used to provide bag-of-concept representations for given texts, and these unstructured collections were used for similarity/relatedness computations. The approach presented in this paper shows that the structure itself is useful: using the graph structure of the ontology through a clique-based semantic kernel can lead to improvements in semantic relatedness estimations.
Fernández et al. and Mitra et al. reveal and exploit sub-structures – communities of nodes representing words with related meanings – in word co-occurrence graphs. Fernández et al. apply frequency-based filtering and ranking and clustering algorithms to form and expose communities in word co-occurrence graphs. These are taken to approximate word senses, that together with bilingual dictionaries help perform sense-level translations.
Mitra et al. start with word-specific co-occurrence graphs, which are clustered using the Chinese Whispers algorithm to form sense-specific clusters. This method is applied on text collections representing disjoint time frames. Changes in the clusters obtained from data from different time spans are analysed and interpreted as effects of diachronic sense changes.
Acknowledgements
We would like to thank all the authors who submitted their papers to our special issue. We are also grateful to the reviewers who spent time over several rounds of revisions to provide feedback and suggestions to the authors. The editorial staff of the Journal of Natural Language Engineering has been extremely helpful along the way, to assist us with information regarding entire process of putting together a special issue, and to remind us of deadlines. Rada Mihalcea’s contribution has been partially supported by National Science Foundation CAREER award #1361274.