
Aiming at the kernel regression of large-scale data, in this paper, we propose a distributed ADMM algorithm based on the Spark platform. It is difficult to calculate and store the kernel matrix of large-scale data. Thus, the Nystrom sampling method is utilized to approximate the kernel matrix, which is applied in solving the kernel regression problem. To verify the effectiveness of the algorithm, we performed numerical experiments on the Spark big data platform. The experimental results show that, given accuracy and computational cost, when the sampling ratio is 2–5%, the kernel matrix reaches the most reasonable approximation degree. The approximate kernel matrix method can solve the problem that the true kernel cannot tackle. Additionally, the approximate kernel regression could be utilized to deal with large-scale data problems, where the computational cost can be greatly reduced and the ideal accuracy can be obtained.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
