Download PDFOpen PDF in browser
Switch back to the title and the abstract in Korean

Distance-Weighted k-Means Clustering for Class-Imbalance Problem

EasyChair Preprint no. 11145

4 pagesDate: October 23, 2023


In this paper, a distance-weighted k-means clustering technique is proposed to address the issue of class imbalance. K-means clustering is a popular method for grouping data points into clusters, but it can suffer from reduced performance when there is data imbalance among classes due to its characteristic of updating cluster centroids based on the average of all data points within the same cluster. To tackle this problem, the proposed model calculates the distance between all data points and cluster centroids when updating the centroids and uses these distances to compute a weighted average, obtaining new centroids. The goal is to improve clustering results between imbalanced classes through iterative processes. Experimental results using real data demonstrate that the proposed model outperforms existing research that calculates cluster centroids using either mean or median. Specifically, when measuring the silhouette coefficient, a metric that quantifies the cohesion within clusters and the separation between clusters, the comparison models using mean or median values yielded negative results, while the proposed model achieved a value of 0.1919, indicating superior clustering quality.

Keyphrases: class imbalance, cluster centroid, K-means clustering

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
  author = {Hyesoo Shin and Ki Yong Lee},
  title = {Distance-Weighted k-Means Clustering for Class-Imbalance Problem},
  howpublished = {EasyChair Preprint no. 11145},

  year = {EasyChair, 2023}}
Download PDFOpen PDF in browser