Download PDFOpen PDF in browserDistance-Weighted k-Means Clustering for Class-Imbalance ProblemEasyChair Preprint 111454 pages•Date: October 23, 2023AbstractIn this paper, a distance-weighted k-means clustering technique is proposed to address the issue of class imbalance. K-means clustering is a popular method for grouping data points into clusters, but it can suffer from reduced performance when there is data imbalance among classes due to its characteristic of updating cluster centroids based on the average of all data points within the same cluster. To tackle this problem, the proposed model calculates the distance between all data points and cluster centroids when updating the centroids and uses these distances to compute a weighted average, obtaining new centroids. The goal is to improve clustering results between imbalanced classes through iterative processes. Experimental results using real data demonstrate that the proposed model outperforms existing research that calculates cluster centroids using either mean or median. Specifically, when measuring the silhouette coefficient, a metric that quantifies the cohesion within clusters and the separation between clusters, the comparison models using mean or median values yielded negative results, while the proposed model achieved a value of 0.1919, indicating superior clustering quality. Keyphrases: K-means clustering, class imbalance, cluster centroid
|