Ciência_Iscte Publicações Descrição Detalhada da Publicação Exportar

Exportar Publicação

A publicação pode ser exportada nos seguintes formatos: referência da APA (American Psychological Association), referência do IEEE (Institute of Electrical and Electronics Engineers), BibTeX e RIS.

Exportar Referência (APA)

De Angelis, L. & Dias, J. G. (2014). Mining categorical sequences from data using a hybrid clustering method. European Journal of Operational Research. 234 (3), 720-730

Exportar Referência (IEEE)

L. D. Angelis and J. M. Dias,  "Mining categorical sequences from data using a hybrid clustering method", in European Journal of Operational Research, vol. 234, no. 3, pp. 720-730, 2014

Exportar BibTeX

@article{angelis2014_1785152823409,
	author = "De Angelis, L. and Dias, J. G.",
	title = "Mining categorical sequences from data using a hybrid clustering method",
	journal = "European Journal of Operational Research",
	year = "2014",
	volume = "234",
	number = "3",
	doi = "10.1016/j.ejor.2013.11.002",
	pages = "720-730",
	url = "http://www.sciencedirect.com/science/article/pii/S0377221713009016"
}

Exportar RIS

TY  - JOUR
TI  - Mining categorical sequences from data using a hybrid clustering method
T2  - European Journal of Operational Research
VL  - 234
IS  - 3
AU  - De Angelis, L.
AU  - Dias, J. G.
PY  - 2014
SP  - 720-730
SN  - 0377-2217
DO  - 10.1016/j.ejor.2013.11.002
UR  - http://www.sciencedirect.com/science/article/pii/S0377221713009016
AB  - The identification of different dynamics in sequential data has become an every day need in scientific fields such as marketing, bioinformatics, finance, or social sciences. Contrary to cross-sectional or static data, this type of observations (also known as stream data, temporal data, longitudinal data or repeated measures) are more challenging as one has to incorporate data dependency in the clustering process. In this research we focus on clustering categorical sequences. The method proposed here combines model-based and heuristic clustering. In the first step, the categorical sequences are transformed by an extension of the hidden Markov model into a probabilistic space, where a symmetric Kullback-Leibler distance can operate. Then, in the second step, using hierarchical clustering on the matrix of distances, the sequences can be clustered. This paper illustrates the enormous potential of this type of hybrid approach using a synthetic data set as well as the well-known Microsoft dataset with website users search patterns and a survey on job career dynamics.
ER  -