Ciência-IUL
Comunicações
Descrição Detalhada da Comunicação
An MML embedded approach for estimating the number of clusters
Título Evento
17th conference of the International Federation of Classification Societies
Ano (publicação definitiva)
2022
Língua
Inglês
País
Portugal
Mais Informação
Web of Science®
Esta publicação não está indexada na Web of Science®
Scopus
Esta publicação não está indexada na Scopus
Google Scholar
Esta publicação não está indexada no Google Scholar
Abstract/Resumo
Assuming that the data originate from a finite mixture of multinomial
distributions, we study the performance of an integrated Expectation Maximization
(EM) algorithm considering Minimum Message Length (MML) criterion to select
the number of mixture components. The referred EM-MML approach, rather than
selecting one among a set of pre-estimated candidate models (which requires running
EM several times), seamlessly integrates estimation and model selection in a
single algorithm. Comparisons are provided with EM combined with well-known
information criteria – e.g. the Bayesian information Criterion.We resort to synthetic
data examples and a real application. The EM-MML computation time is a clear advantage
of this method; also, the real data solution it provides is more parsimonious,
which reduces the risk of model order overestimation and improves interpretability.
Agradecimentos/Acknowledgements
--
Palavras-chave
finite mixture model,EM algorithm,model selection,minimum message length,categorical data