Scientific journal paper Q1
Mining categorical sequences from data using a hybrid clustering method
Luca de Angelis (De Angelis, L.); José G. Dias (Dias, J. G.);
Journal Title
European Journal of Operational Research
Year (definitive publication)
2014
Language
English
Country
Netherlands
More Information
Web of Science®

Times Cited: 23

(Last checked: 2024-08-24 11:46)

View record in Web of Science®


: 0.5
Scopus

Times Cited: 24

(Last checked: 2024-08-22 11:32)

View record in Scopus


: 0.5
Google Scholar

Times Cited: 42

(Last checked: 2024-08-22 23:43)

View record in Google Scholar

Abstract
The identification of different dynamics in sequential data has become an every day need in scientific fields such as marketing, bioinformatics, finance, or social sciences. Contrary to cross-sectional or static data, this type of observations (also known as stream data, temporal data, longitudinal data or repeated measures) are more challenging as one has to incorporate data dependency in the clustering process. In this research we focus on clustering categorical sequences. The method proposed here combines model-based and heuristic clustering. In the first step, the categorical sequences are transformed by an extension of the hidden Markov model into a probabilistic space, where a symmetric Kullback-Leibler distance can operate. Then, in the second step, using hierarchical clustering on the matrix of distances, the sequences can be clustered. This paper illustrates the enormous potential of this type of hybrid approach using a synthetic data set as well as the well-known Microsoft dataset with website users search patterns and a survey on job career dynamics.
Acknowledgements
--
Keywords
Data mining,Sequential data,Hidden Markov models,Clustering,Categorical data
  • Economics and Business - Social Sciences
Funding Records
Funding Reference Funding Entity
PTDC/EGE-GES/103223/2008 Fundação para a Ciência e a Tecnologia
PEst-OE/EGE/UI0315/2014 Fundação para a Ciência e a Tecnologia

With the objective to increase the research activity directed towards the achievement of the United Nations 2030 Sustainable Development Goals, the possibility of associating scientific publications with the Sustainable Development Goals is now available in Ciência-IUL. These are the Sustainable Development Goals identified by the author(s) for this publication. For more detailed information on the Sustainable Development Goals, click here.