Export Publication

The publication can be exported in the following formats: APA (American Psychological Association) reference format, IEEE (Institute of Electrical and Electronics Engineers) reference format, BibTeX and RIS.

Export Reference (APA)
Cardoso, M. G. M. S. (2017). Clustering aggregated data: the use of distances on distribution laws . 10th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics 2017).
Export Reference (IEEE)
M. M. Cardoso,  "Clustering aggregated data: the use of distances on distribution laws ", in 10th Int. Conf. of the ERCIM WG on Computational and Methodological Statistics (CMStatistics 2017), 2017
Export BibTeX
@misc{cardoso2017_1716239059732,
	author = "Cardoso, M. G. M. S.",
	title = "Clustering aggregated data: the use of distances on distribution laws ",
	year = "2017",
	url = "http://cmstatistics.org/CMStatistics2017/"
}
Export RIS
TY  - CPAPER
TI  - Clustering aggregated data: the use of distances on distribution laws 
T2  - 10th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics 2017)
AU  - Cardoso, M. G. M. S.
PY  - 2017
UR  - http://cmstatistics.org/CMStatistics2017/
AB  - Big data can be reduced into aggregated data using common summary statistics. The use of clustering procedures for aggregated data should, naturally, rely on adequate distances to compute heterogeneity between aggregated observations – e.g. considering histogram data. In this setting, distances on distribution laws can be particularly useful, although little work has been done in this area. A clustering analysis is conducted illustrating the use of three distances based on distribution laws on aggregated data. The data set considered originates from the European Social Survey and regards human values across regions in Europe. A K-Medoids algorithm is used. The results obtained are compared based on several indicators: e.g. within-between clusters average distance, average silhouette width, Calinski and Harabasz index and Dunn index. A congruent evaluation is conducted using the same distances both to build the clusters and evaluate them. In addition, indicators are also computed based on the commonly used Euclidean distance. Discussion refers to the choice a particular clustering solution.
ER  -