
The integrity of data is crucial for the majority of existing data analysis methods. However, incomplete and unbalanced datasets from the collection and organization process will affect analysis accuracy. Existing interpolation algorithms often overlook feature importance, resulting in either cumbersome processes or underutilization of data information. This study introduces an interpolation method based on clustering algorithm focused on improving the accuracy and efficiency of missing data processing in datasets. First, in this paper, we clarify the problem to be solved about data interpolation, and we consider the importance of the information brought by the data itself to the interpolation, so we propose a scheme that combines clustering and interpolation. We propose a new method that uses the Lp norm as a similarity measure in the K-means clustering algorithm, and introduce a controllable weighting formula based on the current data segmentation. Methodologically, the clustering and interpolation are synchronized by iteratively updating the variable optimization cost function. The experimental results demonstrate significant improvements of the proposed interpolation algorithm over traditional techniques, particularly in tasks such as data labeling and classification within real datasets for clustering and classification.
feature weights, incomplete data, Lp-norm, data interpolation, Electrical engineering. Electronics. Nuclear engineering, data labeling, Clustering, TK1-9971
feature weights, incomplete data, Lp-norm, data interpolation, Electrical engineering. Electronics. Nuclear engineering, data labeling, Clustering, TK1-9971
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
