Leveraging transfer learning for hate speech detection in Portuguese social media posts

Gil Ramos; Fernando Batista; Ricardo Ribeiro; Pedro Fialho; Sérgio Moro; António Fonseca; Rita Guerra; Paula Carvalho; Catarina Marques; Cláudia Silva

Ciência_Iscte Publications Publication Detailed Description

Scientific journal paper Q1

Leveraging transfer learning for hate speech detection in Portuguese social media posts

Gil Ramos (Ramos, G.); Fernando Batista (Batista, F.); Ricardo Ribeiro (Ribeiro, R.); Pedro Fialho (Fialho, P.); Sérgio Moro (Moro, S.); António Fonseca (Fonseca, A.); Rita Guerra (Guerra, R.); Paula Carvalho (Carvalho, P.); Catarina Marques (Marques, C.); Cláudia Silva (Silva, C.); et al.

Journal Title

IEEE Access

Year (definitive publication)

2024

Language

English

Country

United States of America

More Information

Visit Link

Web of Science®

Times Cited: 10

(Last checked: 2026-07-25 12:57)

View record in Web of Science®

Article Impact Index: 1.4

Scopus

Times Cited: 14

(Last checked: 2026-07-22 22:31)

View record in Scopus

Article Impact Index: 1.3

Google Scholar

Times Cited: 19

(Last checked: 2026-07-23 19:10)

View record in Google Scholar

Overton

This publication is not indexed in Overton

Abstract

The rapid rise of social media has brought about new ways of digital communication, along with a worrying increase in online hate speech (HS), which, in turn, has led researchers to develop several Natural Language Processing methods for its detection. Although significant strides have been made in automating HS detection, research focusing on the European Portuguese language remains scarce (as it happens in several under-resourced languages). To address this gap, we explore the efficacy of various transfer learning models, which have been shown in the literature to have better performance for this task than other Deep Learning models. We employ BERT-like models pre-trained on Portuguese text, such as BERTimbau and mDeBERTa, as well as GPT, Gemini and Mistral generative models, for the detection of HS within Portuguese online discourse. Our study relies on two annotated corpora of YouTube comments and tweets, both annotated as HS and non-HS. Our findings show that the best model for the YouTube corpus was a variant of BERTimbau retrained with European Portuguese tweets and fine-tuned for the HS task, with an F-score of 87.1% for the positive class, outperforming the baseline models by more than 20% and with a 1.8% increase compared with base BERTimbau. The best model for the Twitter corpus was GPT-3.5, with an F-score of 50.2% for the positive class. We also assess the impact of using in-domain and mixed-domain training sets, as well as the impact of providing context in generative model prompts on their performance.

Acknowledgements

Keywords

Hate speech,Transfer learning,Transformer models,Generative models,Text classification

Fields of Science and Technology Classification

Computer and Information Sciences - Natural Sciences
Other Natural Sciences - Natural Sciences
Civil Engineering - Engineering and Technology
Electrical Engineering, Electronic Engineering, Information Engineering - Engineering and Technology
Materials Engineering - Engineering and Technology

Funding Records

Funding Reference	Funding Entity
101049306	Comissão Europeia

Contributions to the Sustainable Development Goals of the United Nations

With the objective to increase the research activity directed towards the achievement of the United Nations 2030 Sustainable Development Goals, the possibility of associating scientific publications with the Sustainable Development Goals is now available in Ciência_Iscte. These are the Sustainable Development Goals identified by the author(s) for this publication. For more detailed information on the Sustainable Development Goals, click here.

Publication Identifiers

Scopus (source: Ciência_Iscte)	2-s2.0-85199862643
DOI (source: author)	10.1109/ACCESS.2024.3430848
WoS (source: Ciência_Iscte)	WOS:001278992400001
Ciência_Iscte ID	ci-pub-104797
Handle (source: Ciência-IUL)	http://hdl.handle.net/10071/32083

Other Publication Details

Online Publication Year	2024
Publisher	IEEE
Indexes	Web of Science©; Scopus;
ISSN	2169-3536 (print) 2169-3536 (online)
ISBN	--
Impact Factor	--
Volume	12	Number
Series
Article Number
Pages	101374 - 101389
Peer Reviewed	Yes
ISCTE-IUL Repository	Link to the repository
Publication Date (online)
Publication Date (print)

Altmetric

Dimensions

PlumX Metrics