Gender detection of Twitter users based on multiple information sources

Marco Vicente; Fernando Batista; João Paulo Carvalho

Ciência_Iscte Publicações Descrição Detalhada da Publicação

Capítulo de livro Q4

Gender detection of Twitter users based on multiple information sources

Marco Vicente (Vicente, M.); Fernando Batista (Batista, F.); João Paulo Carvalho (Carvalho, J. P.);

Título Livro

Interactions Between Computational Intelligence and Mathematics Part 2. Studies in Computational Intelligence

Ano (publicação definitiva)

2019

Língua

Inglês

País

Suíça

Mais Informação

Visitar Link

Web of Science®

Esta publicação não está indexada na Web of Science®

Scopus

N.º de citações: 17

(Última verificação: 2026-05-30 20:15)

Ver o registo na Scopus

Índice de Impacto do Artigo: 2.5

Ver Mais

Google Scholar

N.º de citações: 51

(Última verificação: 2026-06-02 21:23)

Ver o registo no Google Scholar

Overton

Esta publicação não está indexada no Overton

Abstract/Resumo

Twitter provides a simple way for users to express feelings, ideas and opinions, makes the user generated content and associated metadata, available to the community, and provides easy-to-use web and application programming interfaces to access data. The user profile information is important for many studies, but essential information, such as gender and age, is not provided when accessing a Twitter account. However, clues about the user profile, such as the age and gender, behaviors, and preferences, can be extracted from other content provided by the user. The main focus of this paper is to infer the gender of the user from unstructured information, including the username, screen name, description and picture, or by the user generated content. We have performed experiments using an English labelled dataset containing 6.5 M tweets from 65 K users, and a Portuguese labelled dataset containing 5.8 M tweets from 58 K users. We have created four distinct classifiers, trained using a supervised approach, each one considering a group of features extracted from four different sources: user name and screen name, user description, content of the tweets, and profile picture. Features related with the activity, such as number of following and number of followers, were discarded, since these features were found not indicative of gender. A final classifier that combines the prediction of each one of the four previous individual classifiers achieves the best performance, corresponding to 93.2% accuracy for English and 96.9% accuracy for Portuguese data.

Agradecimentos/Acknowledgements

Palavras-chave

Gender classification,Twitter users,Gender database,Text mining

Classificação Fields of Science and Technology

Ciências da Computação e da Informação - Ciências Naturais

Registos de financiamentos

Referência de financiamento	Entidade Financiadora
PTDC/IVC-ESCT/4919/2012	Fundação para a Ciência e a Tecnologia
UID/CEC/50021/2013	Fundação para a Ciência e a Tecnologia
SFRH/BSAB/136312/2018	Fundação para a Ciência e a Tecnologia

Projetos Relacionados

Esta publicação é um output do(s) seguinte(s) projeto(s):

Extracção Inteligente de Informação de Redes Sociais Públicas

Identificadores da Publicação

Scopus (fonte: autor)	2-s2.0-85056266882
Outro ID (fonte: ORCID)	cv-prod-id-839982
DOI (fonte: autor)	10.1007/978-3-030-01632-6_3
Scopus (fonte: Ciência_Iscte)	2-s2.0-85056266882
Handle (fonte: Ciência-IUL)	http://hdl.handle.net/10071/16796
ID Ciência_Iscte	ci-pub-50916

Outros Detalhes da Publicação

Ano Publicação Online	2018
Editora	Springer International Publishing
Indexação	Scopus;
ISSN	1860-949X (print) 1860-9503 (online)
ISBN	978-3-030-01632-6 (print) 978-3-030-01632-6 (online)
Volume	794
Série		Fascículo/TOMO
Páginas	39 - 54	Total Páginas	--
Edição	--
Avaliado Cientificamente	Sim
Editores	Kóczy, László T. and Medina-Moreno, Jesús; Ramírez-Poussa, Eloísa
Repositório ISCTE-IUL	Link para o repositório
Data Publicação (online)
Data Publicação (print)

Altmetric

Dimensions

PlumX Metrics