From MFKP_wiki

Jump to: navigation, search


K-groups: a generalization of K-means clustering

Songzi Li, Maria L. Rizzo

We propose a new class of distribution-based clustering algorithms, called k-groups, based on energy distance between samples. The energy distance clustering criterion assigns observations to clusters according to a multi-sample energy statistic that measures the distance between distributions. The energy distance determines a consistent test for equality of distributions, and it is based on a population distance that characterizes equality of distributions. The k-groups procedure therefore generalizes the k-means method, which separates clusters that have different means. We propose two k-groups algorithms: k-groups by first variation; and k-groups by second variation. The implementation of k-groups is partly based on Hartigan and Wong's algorithm for k-means. The algorithm is generalized from moving one point on each iteration (first variation) to moving $m$ $(m > 1)$ points. For univariate data, we prove that Hartigan and Wong's k-means algorithm is a special case of k-groups by first variation. The simulation results from univariate and multivariate cases show that our k-groups algorithms perform as well as Hartigan and Wong's k-means algorithm when clusters are well-separated and normally distributed. Moreover, both k-groups algorithms perform better than k-means when data does not have a finite first moment or data has strong skewness and heavy tails. For non--spherical clusters, both k-groups algorithms performed better than k-means in high dimension, and k-groups by first variation is consistent as dimension increases. In a case study on dermatology data with 34 features, both k-groups algorithms performed better than k-means.


(12 Nov 2017) 
Key: INRMM:14485739

Keywords

         

Article-Level Metrics (Altmetrics)
arXiv code


Available versions (may include free-access full text)

arXiv (abstract), arXiv (PDF)

Versions of the publication are also available in Google Scholar.
Google Scholar code: GScluster:8572499804935328473

Works citing this publication (including grey literature)

An updated list of who cited this publication is available in Google Scholar.
Google Scholar code: GScites:8572499804935328473

Further search for available versions

Search in ResearchGate (or try with a fuzzier search in ResearchGate)
Search in Mendeley (or try with a fuzzier search in Mendeley)

Publication metadata

Bibtex, RIS, RSS/XML feed, Json, Dublin Core

Digital preservation of this INRMM-MiD record

Internet Archive

Meta-information Database (INRMM-MiD).
This database integrates a dedicated meta-information database in CiteULike (the CiteULike INRMM Group) with the meta-information available in Google Scholar, CrossRef and DataCite. The Altmetric database with Article-Level Metrics is also harvested. Part of the provided semantic content (machine-readable) is made even human-readable thanks to the DCMI Dublin Core viewer. Digital preservation of the meta-information indexed within the INRMM-MiD publication records is implemented thanks to the Internet Archive.
The library of INRMM related pubblications may be quickly accessed with the following links.
Search within the whole INRMM meta-information database:
Search only within the INRMM-MiD publication records:
Full-text and abstracts of the publications indexed by the INRMM meta-information database are copyrighted by the respective publishers/authors. They are subject to all applicable copyright protection. The conditions of use of each indexed publication is defined by its copyright owner. Please, be aware that the indexed meta-information entirely relies on voluntary work and constitutes a quite incomplete and not homogeneous work-in-progress.
INRMM-MiD was experimentally established by the Maieutike Research Initiative in 2008 and then improved with the help of several volunteers (with a major technical upgrade in 2011). This new integrated interface is operational since 2014.