
arXiv: 2011.09655
Federated Learning (FL) has been widely accepted as the solution for privacy-preserving machine learning without collecting raw data. While new technologies proposed in the past few years do evolve the FL area, unfortunately, the evaluation results presented in these works fall short in integrity and are hardly comparable because of the inconsistent evaluation metrics and experimental settings. In this paper, we propose a holistic evaluation framework for FL called FedEval, and present a benchmarking study on seven state-of-the-art FL algorithms. Specifically, we first introduce the core evaluation taxonomy model, called FedEval-Core, which covers four essential evaluation aspects for FL: Privacy, Robustness, Effectiveness, and Efficiency, with various well-defined metrics and experimental settings. Based on the FedEval-Core, we further develop an FL evaluation platform with standardized evaluation settings and easy-to-use interfaces. We then provide an in-depth benchmarking study between the seven well-known FL algorithms, including FedSGD, FedAvg, FedProx, FedOpt, FedSTC, SecAgg, and HEAgg. We comprehensively analyze the advantages and disadvantages of these algorithms and further identify the suitable practical scenarios for different algorithms, which is rarely done by prior work. Lastly, we excavate a set of take-away insights and future research directions, which are very helpful for researchers in the FL area.
Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Cryptography and Security, Computer Science - Performance, Computer Science - Distributed, Parallel, and Cluster Computing, Distributed, Parallel, and Cluster Computing (cs.DC), Cryptography and Security (cs.CR), Machine Learning (cs.LG)
Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Cryptography and Security, Computer Science - Performance, Computer Science - Distributed, Parallel, and Cluster Computing, Distributed, Parallel, and Cluster Computing (cs.DC), Cryptography and Security (cs.CR), Machine Learning (cs.LG)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
