Evaluation of the Gower coefficient modifications in hierarchical clustering
DOI:
https://doi.org/10.51936/eqvy9516Abstract
This paper thoroughly examines three recently introduced modifications of the Gower coefficient, which were determined for data with mixed-type variables in hierarchical clustering. On the contrary to the original Gower coefficient, which only recognizes if two categories match or not in the case of nominal variables, the examined modifications offer three different approaches to measuring the similarity between categories. The examined dissimilarity measures are compared and evaluated regarding the quality of their clusters measured by three internal indices (Dunn, silhouette, McClain) and regarding their classification abilities measured by the Rand index. The comparison is performed on 810 generated datasets. In the analysis, the performance of the similarity measures is evaluated by different data characteristics (the number of variables, the number of categories, the distance of clusters, etc.) and by different hierarchical clustering methods (average, complete, McQuitty and single linkage methods). As a result, two modifications are recommended for the use in practice.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Zdeněk Šulc, Martin Matějka, Jiří Procházka, Hana Řezanková (Author)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.