Search

3 results

Constructing ensembles for hate speech detection
Izzet Emre Kucukkaya, Cagri Toraman
Journal:

Natural Language Processing ,

Published online by Cambridge University Press:

13 September 2024, pp. 1-26
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Hate speech against individuals and groups with certain demographics is a major issue in social media. Supervised models for hate speech detection mostly utilize labeled data collections to understand textual semantics. However, hate speech detection is a complex task that involves several aspects, including topic and writing style. The complexity of hate speech can be represented by an ensemble of models learned from different aspects of data. Moreover, ensemble members or base models can be modified to give attention to particular aspects of hate speech. In this study, we extract different aspects of hate speech to construct ensembles, thereby improving the performance of hate speech detection by ensemble learning. We conduct detailed experiments on five datasets in multiple languages to generalize our observations. The experimental results, supported by statistical significance tests, show that the performance of hate speech detection can be improved by capturing multiple aspects of hate speech. Our ensemble construction approach outperforms the baselines in terms of the F1 score of the Hate class in 80% of the cases, and the Offensive class in 75% of the cases. We also compare our approach with state-of-the-art ensemble methods from shared tasks and find that our highest-performing method can improve the performance of the Hate class in two out of three datasets. We further discuss our approach and experimental results in terms of ensemble parameters and writing style among ensemble members.

Hate speech detection in low-resourced Indian languages: An analysis of transformer-based monolingual and multilingual models with cross-lingual experiments
Koyel Ghosh, Apurbalal Senapati
Journal:

Natural Language Processing ,

Published online by Cambridge University Press:

27 August 2024, pp. 1-22
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Warning: This paper is based on hate speech detection and may contain examples of abusive/ offensive phrases.
Cyberbullying, online harassment, etc., via offensive comments are pervasive across different social media platforms like ™Twitter, ™Facebook, ™YouTube, etc. Hateful comments must be detected and eradicated to prevent harassment and violence on social media. In the Natural Language Processing (NLP) domain, the most prevalent task is comment classification, which is challenging, and language models based on transformers are at the forefront of this advancement. This paper intends to analyze the performance of language models based on transformers like BERT, ALBERT, RoBERTa, and DistilBERT on the Indian hate speech datasets over binary classification. Here, we utilize the existing datasets, i.e., HASOC (Hindi and Marathi) and HS-Bangla. So, we evaluate several multilingual language models like MuRIL-BERT, XLM-RoBERTa, etc., few monolingual language models like RoBERTa-Hindi, Maha-BERT (Marathi), Bangla-BERT (Bangla), Assamese-BERT (Assamese), etc., and perform cross-lingual experiment also. For further analyses, we perform multilingual, monolingual, and cross-lingual experiments on our Hate Speech Assamese (HS-Assamese) (Indo-Aryan language family) and Hate Speech Bodo (HS-Bodo) (Sino-Tibetan language family) dataset (HS dataset version 2) also and achieved a promising result. The motivation of the cross-lingual experiment is to encourage researchers to learn about the power of the transformer. Note that no pre-trained language models are currently available for Bodo or any other Sino-Tibetan languages.

SSL-GAN-RoBERTa: A robust semi-supervised model for detecting Anti-Asian COVID-19 hate speech on social media
Xuanyu Su, Yansong Li, Paula Branco, Diana Inkpen
Journal:

Natural Language Engineering , First View

Published online by Cambridge University Press:

03 August 2023, pp. 1-20
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Anti-Asian speech during the COVID-19 pandemic has been a serious problem with severe consequences. A hate speech wave swept social media platforms. The timely detection of Anti-Asian COVID-19-related hate speech is of utmost importance, not only to allow the application of preventive mechanisms but also to anticipate and possibly prevent other similar discriminatory situations. In this paper, we address the problem of detecting Anti-Asian COVID-19-related hate speech from social media data. Previous approaches that tackled this problem used a transformer-based model, BERT/RoBERTa, trained on the homologous annotated dataset and achieved good performance on this task. However, this requires extensive and annotated datasets with a strong connection to the topic. Both goals are difficult to meet without employing reliable, vast, and costly resources. In this paper, we propose a robust semi-supervised model, SSL-GAN-RoBERTa, that learns from a limited heterogeneous dataset and whose performance is further enhanced by using vast amounts of unlabeled data from another related domain. Compared with the RoBERTa baseline model, the experimental results show that the model has substantial performance gains in terms of Accuracy and Macro-F1 score in different scenarios that use data from different domains. Our proposed model achieves state-of-the-art performance results while efficiently using unlabeled data, showing promising applicability to other complex classification tasks where large amounts of labeled examples are difficult to obtain.

Search Results

Refine search

Refine search

Actions for selected content:

3 results

Constructing ensembles for hate speech detection

Hate speech detection in low-resourced Indian languages: An analysis of transformer-based monolingual and multilingual models with cross-lingual experiments

SSL-GAN-RoBERTa: A robust semi-supervised model for detecting Anti-Asian COVID-19 hate speech on social media

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

3 results

Constructing ensembles for hate speech detection

Hate speech detection in low-resourced Indian languages: An analysis of transformer-based monolingual and multilingual models with cross-lingual experiments

SSL-GAN-RoBERTa: A robust semi-supervised model for detecting Anti-Asian COVID-19 hate speech on social media