SoK: Training Machine Learning Models over Multiple Sources with Privacy Preservation

Song, Lushan; Lin, Guopeng; Wang, Jiaxuan; Wu, Haoqi; Ruan, Wenqiang; Han, Weili

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2020

Data sources: arXiv.org e-Print Archive

https://dx.doi.org/10.48550/ar...

Article . 2020

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

SoK: Training Machine Learning Models over Multiple Sources with Privacy Preservation

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2020Embargo end date: 01 Jan 2020Publisher:arXiv

Authors: Song, Lushan; Lin, Guopeng; Wang, Jiaxuan; Wu, Haoqi; Ruan, Wenqiang; Han, Weili;

doi: 10.48550/arxiv.2012.03386

arXiv: 2012.03386

SoK: Training Machine Learning Models over Multiple Sources with Privacy Preservation

- Summary
- Subjects
- Related research
  (8)
- Metrics

Abstract

Nowadays, gathering high-quality training data from multiple data sources with privacy preservation is a crucial challenge to training high-performance machine learning models. The potential solutions could break the barriers among isolated data corpus, and consequently enlarge the range of data available for processing. To this end, both academic researchers and industrial vendors are recently strongly motivated to propose two main-stream folders of solutions mainly based on software constructions: 1) Secure Multi-party Learning (MPL for short); and 2) Federated Learning (FL for short). The above two technical folders have their advantages and limitations when we evaluate them according to the following five criteria: security, efficiency, data distribution, the accuracy of trained models, and application scenarios. Motivated to demonstrate the research progress and discuss the insights on the future directions, we thoroughly investigate these protocols and frameworks of both MPL and FL. At first, we define the problem of Training machine learning Models over Multiple data sources with Privacy Preservation (TMMPP for short). Then, we compare the recent studies of TMMPP from the aspects of the technical routes, the number of parties supported, data partitioning, threat model, and machine learning models supported, to show their advantages and limitations. Next, we investigate and evaluate five popular FL platforms. Finally, we discuss the potential directions to resolve the problem of TMMPP in the future.

19pages, 4 figures

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Cryptography and Security, Cryptography and Security (cs.CR), Machine Learning (cs.LG)

8 Research products, page 1 of 1

federated software on GitHub
IsRelatedTo
Paddle software on GitHub
IsRelatedTo
PySyft software on GitHub
IsRelatedTo
tf-encrypted software on GitHub
IsRelatedTo
fresco software on GitHub
IsRelatedTo
FATE software on GitHub
IsRelatedTo
federated-averaging-tutorials software on GitHub
IsRelatedTo
PaddleFL software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering