Improving Twitter gender classification using multiple classifiers

Marco Vicente; Fernando Batista; João Paulo Carvalho

Ciência_Iscte Comunicações Descrição Detalhada da Comunicação

Comunicação em evento científico

Improving Twitter gender classification using multiple classifiers

Marco Vicente (M. Vicente); Fernando Batista (Batista, F.); João Paulo Carvalho (João P. Carvalho);

Título Evento

Proc. of the 8th European Symposium on Computational Intelligence and Mathematics (ESCIM 2016)

Ano (publicação definitiva)

2016

Língua

Inglês

País

Mais Informação

Visitar Link

Abstract/Resumo

The user profile information is important for many studies, but essential information, such as gender and age, is not provided when creating a Twitter account. However, clues about the user profile, such as the age and gender, behaviors, and preferences, can be extracted from other content provided by the user. The main focus of this paper is to infer the gender of the user from unstructured information, including the username, screen name, description and picture, or by the user generated content. Our experiments use an English labelled dataset containing 6.5M tweets from 65K users, and a Portuguese labelled dataset containing 5.8M tweets from 58K users. We use supervised approaches, considering four groups of features extracted from different sources: user name and screen name, user description, content of the tweets, and profile picture. A final classifier that combines the prediction of each one of the four previous partial classifiers achieves 93.2% accuracy for English and 96.9% accuracy for Portuguese data.

Agradecimentos/Acknowledgements

Palavras-chave

Identificadores da Publicação

ID Ciência_Iscte

ci-pub-30886

Outros Detalhes da Publicação

Avaliado Cientificamente	Sim
Meio de Divulgação	Ambos (impresso e digital)
Cidade	Sofia, Bulgaria
Tipo de Evento	Conferência
Classificação do Evento	Europeu
Tipo de Apresentação no Evento	Apresentação Oral
Data Publicação (online)
Data Publicação (print)