Improving Twitter gender classification using multiple classifiers

Marco Vicente; Fernando Batista; João Paulo Carvalho

Ciência_Iscte Publicações Descrição Detalhada da Publicação

Publicação em atas de evento científico

Improving Twitter gender classification using multiple classifiers

Marco Vicente (Vicente, M.); Fernando Batista (Batista, F.); João Paulo Carvalho (Carvalho, J. P.);

Proceedings of ESCIM 2016

Ano (publicação definitiva)

2016

Língua

Inglês

País

Espanha

Mais Informação

Visitar Link

Web of Science®

Esta publicação não está indexada na Web of Science®

Scopus

Esta publicação não está indexada na Scopus

Google Scholar

N.º de citações: 3

(Última verificação: 2026-06-02 21:23)

Ver o registo no Google Scholar

Overton

Esta publicação não está indexada no Overton

Abstract/Resumo

The user profile information is important for many studies, but essential information, such as gender and age, is not provided when creating a Twitter account. However, clues about the user profile, such as the age and gender, behaviors, and preferences, can be extracted from other content provided by the user. The main focus of this paper is to infer the gender of the user from unstructured information, including the username, screen name, description and picture, or by the user generated content. Our experiments use an English labelled dataset containing 6.5M tweets from 65K users, and a Portuguese labelled dataset containing 5.8M tweets from 58K users. We use supervised approaches, considering four groups of features extracted from different sources: user name and screen name, user description, content of the tweets, and profile picture. A final classifier that combines the prediction of each one of the four previous partial classifiers achieves 93.2% accuracy for English and 96.9% accuracy for Portuguese data.

Agradecimentos/Acknowledgements

Palavras-chave

Gender classification,Twitter users,Gender database,Text mining

Registos de financiamentos

Referência de financiamento	Entidade Financiadora
PTDC/IVC-ESCT/4919/2012	Fundação para a Ciência e a Tecnologia
UID/CEC/50021/2013	Fundação para a Ciência e a Tecnologia

Projetos Relacionados

Esta publicação é um output do(s) seguinte(s) projeto(s):

Extracção Inteligente de Informação de Redes Sociais Públicas

Identificadores da Publicação

Outro ID (fonte: ORCID)	cv-prod-id-840060
Handle (fonte: Ciência-IUL)	http://hdl.handle.net/10071/23218
ID Ciência_Iscte	ci-pub-30885

Outros Detalhes da Publicação

Ano Publicação Online	2016
Editora	Universidad de Cádiz
Indexação	--
ISSN	--
ISBN	978-84-617-5119-8 (print) 978-84-617-5119-8 (online)
Volume
Número Artigo
Páginas	121 - 127	Total Páginas	7
Avaliado Cientificamente	Sim
Meio de Divulgação	Ambos (impresso e digital)
Editores	Kóczy, L., and Medina, J.
Título do Evento	8th European Symposium on Computational Intelligence and Mathematics
Organizador do Evento	Széchenyi István University
Cidade	Sofia
Tipo de Evento	Conferência
Classificação do Evento	Europeu
Ano do Evento	2016
Tipo de Publicação no Evento	Artigo Completo
Repositório ISCTE-IUL	Link para o repositório
Data Publicação (online)
Data Publicação (print)

Altmetric