Search

3 results

A resampling-based method to evaluate NLI models
Felipe de Souza Salvatore, Marcelo Finger, Roberto Hirata, Jr., Alexandre G. Patriota
Journal:

Natural Language Engineering / Volume 30 / Issue 4 / July 2024

Published online by Cambridge University Press:

09 June 2023, pp. 793-820
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The recent progress of deep learning techniques has produced models capable of achieving high scores on traditional Natural Language Inference (NLI) datasets. To understand the generalization limits of these powerful models, an increasing number of adversarial evaluation schemes have appeared. These works use a similar evaluation method: they construct a new NLI test set based on sentences with known logic and semantic properties (the adversarial set), train a model on a benchmark NLI dataset, and evaluate it in the new set. Poor performance on the adversarial set is identified as a model limitation. The problem with this evaluation procedure is that it may only indicate a sampling problem. A machine learning model can perform poorly on a new test set because the text patterns presented in the adversarial set are not well represented in the training sample. To address this problem, we present a new evaluation method, the Invariance under Equivalence test (IE test). The IE test trains a model with sufficient adversarial examples and checks the model’s performance on two equivalent datasets. As a case study, we apply the IE test to the state-of-the-art NLI models using synonym substitution as the form of adversarial examples. The experiment shows that, despite their high predictive power, these models usually produce different inference outputs for equivalent inputs, and, more importantly, this deficiency cannot be solved by adding adversarial observations in the training data.

A survey of methods for revealing and overcoming weaknesses of data-driven Natural Language Understanding
Viktor Schlegel, Goran Nenadic, Riza Batista-Navarro
Journal:

Natural Language Engineering / Volume 29 / Issue 1 / January 2023

Published online by Cambridge University Press:

22 April 2022, pp. 1-31
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Recent years have seen a growing number of publications that analyse Natural Language Understanding (NLU) datasets for superficial cues, whether they undermine the complexity of the tasks underlying those datasets and how they impact those models that are optimised and evaluated on this data. This structured survey provides an overview of the evolving research area by categorising reported weaknesses in models and datasets and the methods proposed to reveal and alleviate those weaknesses for the English language. We summarise and discuss the findings and conclude with a set of recommendations for possible future research directions. We hope that it will be a useful resource for researchers who propose new datasets to assess the suitability and quality of their data to evaluate various phenomena of interest, as well as those who propose novel NLU approaches, to further understand the implications of their improvements with respect to their model’s acquired capabilities.

Explainable lexical entailment with semantic graphs
Adam Kovacs, Kinga Gemes, Andras Kornai, Gabor Recski
Journal:

Natural Language Engineering / Volume 29 / Issue 5 / September 2023

Published online by Cambridge University Press:

28 February 2022, pp. 1223-1246
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
We present novel methods for detecting lexical entailment in a fully rule-based and explainable fashion, by automatic construction of semantic graphs, in any language for which a crowd-sourced dictionary with sufficient coverage and a dependency parser of sufficient accuracy are available. We experiment and evaluate on both the Semeval-2020 lexical entailment task (Glavaš et al. (2020). Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 24–35) and the SherLIiC lexical inference dataset of typed predicates (Schmitt and Schütze (2019). Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 902–914). Combined with top-performing systems, our method achieves improvements over the previous state-of-the-art on both benchmarks. As a standalone system, it offers a fully interpretable model of lexical entailment that makes detailed error analysis possible, uncovering future directions for improving both the semantic parsing method and the inference process on semantic graphs. We release all components of our system as open source software.

Search Results

Refine search

Refine search

Actions for selected content:

3 results

A resampling-based method to evaluate NLI models

A survey of methods for revealing and overcoming weaknesses of data-driven Natural Language Understanding

Explainable lexical entailment with semantic graphs

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

3 results

A resampling-based method to evaluate NLI models

A survey of methods for revealing and overcoming weaknesses of data-driven Natural Language Understanding

Explainable lexical entailment with semantic graphs