Export Publication

The publication can be exported in the following formats: APA (American Psychological Association) reference format, IEEE (Institute of Electrical and Electronics Engineers) reference format, BibTeX and RIS.

Export Reference (APA)
Vicente, M., Batista, F. & Carvalho, J. P. (2016). Improving Twitter gender classification using multiple classifiers. In Kóczy, L., and Medina, J. (Ed.), Proceedings of ESCIM 2016. (pp. 121-127). Sofia: Universidad de Cádiz.
Export Reference (IEEE)
M. Vicente et al.,  "Improving Twitter gender classification using multiple classifiers", in Proc. of ESCIM 2016, Kóczy, L., and Medina, J., Ed., Sofia, Universidad de Cádiz, 2016, pp. 121-127
Export BibTeX
@inproceedings{vicente2016_1715940112945,
	author = "Vicente, M. and Batista, F. and Carvalho, J. P.",
	title = "Improving Twitter gender classification using multiple classifiers",
	booktitle = "Proceedings of ESCIM 2016",
	year = "2016",
	editor = "Kóczy, L., and Medina, J.",
	volume = "",
	number = "",
	series = "",
	pages = "121-127",
	publisher = "Universidad de Cádiz",
	address = "Sofia",
	organization = "Széchenyi István University",
	url = "http://escim2016.uca.es/proceedings/"
}
Export RIS
TY  - CPAPER
TI  - Improving Twitter gender classification using multiple classifiers
T2  - Proceedings of ESCIM 2016
AU  - Vicente, M.
AU  - Batista, F.
AU  - Carvalho, J. P.
PY  - 2016
SP  - 121-127
CY  - Sofia
UR  - http://escim2016.uca.es/proceedings/
AB  - The user profile information is important for many studies, but essential information, such as gender and age, is not provided when creating a Twitter account. However, clues about the user profile, such as the age and gender, behaviors, and preferences, can be extracted from other content provided by the user. The main focus of this paper is to infer the gender of the user from unstructured information, including the username, screen name, description and picture, or by the user generated content. Our experiments use an English labelled dataset containing 6.5M tweets from 65K users, and a Portuguese labelled dataset containing 5.8M tweets from 58K users. We use supervised approaches, considering four groups of features extracted from different sources: user name and screen name, user description, content of the tweets, and profile picture. A final classifier that combines the prediction of each one of the four previous partial classifiers achieves 93.2% accuracy for English and 96.9% accuracy for Portuguese data.
ER  -