Tailwind: Fast and atomic RDMA-based replication

descriptionPublicationkeyboard_double_arrow_right Conference object 01 Jan 2018 France, Spain English Publisher:USENIX AssociationFunded by:NSF | CAREER: Safe and Efficien..., EC | BigStorage, NSF | CRII: CSR: Large-scale Sy...

Authors: Taleb, Yacine; Stutsman, Ryan; Antoniu, Gabriel; Cortés, Toni;

handle: 2117/189504

Tailwind: Fast and atomic RDMA-based replication

- Summary
- Subjects
- Metrics

Abstract

Replication is essential for fault-tolerance. However, in in-memory systems, it is a source of high overhead. Remote direct memory access (RDMA) is attractive to create redundant copies of data, since it is low-latency and has no CPU overhead at the target. However, existing approaches still result in redundant data copying and active receivers. To ensure atomic data transfers, receivers check and apply only fully received messages. Tailwind is a zero-copy recovery-log replication protocol for scale-out in-memory databases. Tailwind is the first replication protocol that eliminates all CPU-driven data copying and fully bypasses target server CPUs, thus leaving backups idle. Tailwind ensures all writes are atomic by leveraging a protocol that detects incomplete RDMA transfers. Tailwind substantially improves replication throughput and response latency compared with conventional RPC-based replication. In symmetric systems where servers both serve requests and act as replicas, Tailwind also improves normal-case throughput by freeing server CPU resources for request processing. We implemented and evaluated Tailwind on RAMCloud, a low-latency in-memory storage system. Experiments show Tailwind improves RAMCloud’s normal-case request processing throughput by 1.7×. It also cuts down writes median and 99th percentile latencies by 2x and 3x respectively.

This work has been supported by the BigStorage project, funded by the European Union under the Marie SklodowskaCurie Actions (H2020-MSCA-ITN-2014-642963), by the Spanish Ministry of Science and Innovation (contract TIN2015- 65316), by Generalitat de Catalunya (contract 2014- SGR-1051). This material is based upon work supported by the National Science Foundation under Grant Nos. CNS-1566175 and CNS-1750558. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. This work was supported in part by Facebook and VMware.

Peer Reviewed

Countries

France, Spain

Related Organizations

Institut Mines-Télécom
France
Inria Rennes - Bretagne Atlantique Research Centre
France
Universitat Polite`cnica de Catalunya
Spain
University of Rennes 1
France
French Institute for Research in Computer Science and Automation
France

View all View all

Keywords

Tolerància als errors (Informàtica), Ordinadors -- Memòries, Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors, [INFO.INFO-OS] Computer Science [cs]/Operating Systems [cs.OS], In-memory storage, RDMA, [INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Replication, [INFO] Computer Science [cs], Fault-tolerant computing, Computer storage devices, :Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC]

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average