Improving Twitter gender classification using multiple classifiers

Marco Vicente; Fernando Batista; João Paulo Carvalho

Ciência_Iscte Publications Publication Detailed Description

Publication in conference proceedings

Improving Twitter gender classification using multiple classifiers

Marco Vicente (Vicente, M.); Fernando Batista (Batista, F.); João Paulo Carvalho (Carvalho, J. P.);

Proceedings of ESCIM 2016

Year (definitive publication)

2016

Language

English

Country

Spain

More Information

Visit Link

Web of Science®

This publication is not indexed in Web of Science®

Scopus

This publication is not indexed in Scopus

Google Scholar

Times Cited: 3

(Last checked: 2026-01-29 21:29)

View record in Google Scholar

Overton

This publication is not indexed in Overton

Abstract

The user profile information is important for many studies, but essential information, such as gender and age, is not provided when creating a Twitter account. However, clues about the user profile, such as the age and gender, behaviors, and preferences, can be extracted from other content provided by the user. The main focus of this paper is to infer the gender of the user from unstructured information, including the username, screen name, description and picture, or by the user generated content. Our experiments use an English labelled dataset containing 6.5M tweets from 65K users, and a Portuguese labelled dataset containing 5.8M tweets from 58K users. We use supervised approaches, considering four groups of features extracted from different sources: user name and screen name, user description, content of the tweets, and profile picture. A final classifier that combines the prediction of each one of the four previous partial classifiers achieves 93.2% accuracy for English and 96.9% accuracy for Portuguese data.

Acknowledgements

Keywords

Gender classification,Twitter users,Gender database,Text mining

Funding Records

Funding Reference	Funding Entity
PTDC/IVC-ESCT/4919/2012	Fundação para a Ciência e a Tecnologia
UID/CEC/50021/2013	Fundação para a Ciência e a Tecnologia

Related Projects

This publication is an output of the following project(s):

Intelligent Mining of Public Social Networks’ Influence in Society

Publication Identifiers

Other ID (source: ORCID)	cv-prod-id-840060
Ciência_Iscte ID	ci-pub-30885
Handle (source: Ciência-IUL)	http://hdl.handle.net/10071/23218

Other Publication Details

Online Publication Year	2016
Publisher	Universidad de Cádiz
Indexes	--
ISSN	--
ISBN	978-84-617-5119-8 (print) 978-84-617-5119-8 (online)
Volume
Article Number
Pages	121 - 127	Total Pages	7
Peer Reviewed	Yes
Dissemination Mean	Both (printed and digital)
Editors	Kóczy, L., and Medina, J.
Event Title	8th European Symposium on Computational Intelligence and Mathematics
Event Organizer	Széchenyi István University
City	Sofia
Event Type	Conference
Event Classification	European
Event Year	2016
Event Publication Type	Full Paper
ISCTE-IUL Repository	Link to the repository
Publication Date (online)
Publication Date (print)

Altmetric