
AbstractIn the k-means problem with penalties, we are given a data set $${\cal D} \subseteq \mathbb{R}^\ell $$ of n points where each point $$j \in {\cal D}$$ is associated with a penalty cost pj and an integer k. The goal is to choose a set $${\rm{C}}S \subseteq {{\cal R}^\ell }$$ with |CS| ≤ k and a penalized subset $${{\cal D}_p} \subseteq {\cal D}$$ to minimize the sum of the total squared distance from the points in D / Dp to CS and the total penalty cost of points in Dp, namely $$\sum\nolimits_{j \in {\cal D}\backslash {{\cal D}_p}} {d^2}(j,{\rm{C}}S) + \sum\nolimits_{j \in {{\cal D}_p}} {p_j}$$. We employ the primal-dual technique to give a pseudo-polynomial time algorithm with an approximation ratio of (6.357+ε) for the k-means problem with penalties, improving the previous best approximation ratio 19.849+∊ for this problem given by Feng et al. in Proceedings of FAW (2019).
linear program, \(k\)-means problem with penalties, JV algorithm, Approximation algorithms, approximation algorithm, Computational aspects of data analysis and big data
linear program, \(k\)-means problem with penalties, JV algorithm, Approximation algorithms, approximation algorithm, Computational aspects of data analysis and big data
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
