DECWA: Density-Based Clustering using Wasserstein Distance

Clustering is a data analysis method for extracting knowledge by discovering groups of data called clusters. Among these methods, state-of-the-art density-based clustering methods have proven to be effective for arbitrary-shaped clusters.

Despite their encouraging results, they suffer to find low-density clusters, near clusters with similar densities, and high-dimensional data. Our proposals are a new characterization of clusters and a new clustering algorithm based on spatial density and probabilistic approach.

First of all, sub-clusters are built using spatial density represented as probability density function (p.d.f) of pairwise distances between points. A method is then proposed to agglomerate similar sub-clusters by using both their density (p.d.f) and their spatial distance.

The key idea we propose is to use the Wasserstein metric, a powerful tool to measure the distance between p.d.f of sub-clusters. We show that our approach outperforms other state-of-the-art density-based clustering methods on a wide variety of datasets.

En savoir plus ICI.