publication . Preprint . 2019

Repair Pipelining for Erasure-Coded Storage: Algorithms and Evaluation

Li, Xiaolu; Yang, Zuoru; Li, Jinhong; Li, Runhui; Lee, Patrick P. C.; Huang, Qun; Hu, Yuchong;
Open Access English
  • Published: 05 Aug 2019
Abstract
We propose repair pipelining, a technique that speeds up the repair performance in general erasure-coded storage. By carefully scheduling the repair of failed data in small-size units across storage nodes in a pipelined manner, repair pipelining reduces the single-block repair time to approximately the same as the normal read time for a single block in homogeneous environments. We further design different extensions of repair pipelining algorithms for heterogeneous environments and multi-block repair operations. We implement a repair pipelining prototype, called ECPipe, and integrate it as a middleware into three open-source distributed storage systems. Experime...
Subjects
free text keywords: Computer Science - Distributed, Parallel, and Cluster Computing
Download from
42 references, page 1 of 3

[1] Facebook's Hadoop. https://github.com/facebookarchive/hadoop-20.

[2] Hadoop 3.1.1 HDFS. https://hadoop.apache.org/docs/r3.1.1/.

[3] HDFS-RAID. http://wiki.apache.org/hadoop/HDFS-RAID.

[10] R. Bhagwan, K. Tati, Y. Cheng, S. Savage, and G. Voelker. Total Recall: System support for automated availability management. In Proc. of NSDI, 2004.

[11] B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, et al. Windows Azure storage: A highly available cloud storage service with strong consistency. In Proc. of ACM SOSP, Oct 2011.

[12] Y. Chen, S. Mu, J. Li, C. Huang, J. Li, A. Ogus, and D. Phillips. Giza: Erasure coding objects across global data centers. In Proc. of USENIX ATC, 2017.

[13] M. Chowdhury, S. Kandula, and I. Stoica. Leveraging endpoint flexibility in data-intensive clusters. In Proc. of ACM SIGCOMM, Aug 2013. [OpenAIRE]

[14] B. Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, M. F. Kaashoek, J. Kubiatowicz, and R. Morris. Efficient replica maintenance for distributed storage systems. In Proc. of NSDI, 2006.

[15] J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proc. of USENIX OSDI, Dec 2004.

[16] A. G. Dimakis, P. B. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran. Network coding for distributed storage systems. IEEE Trans. on Info. Theory, 56(9):4539--4551, Sep 2010.

[17] D. Ford, F. Labelle, F. I. Popovici, M. Stokel, V.-A. Truong, L. Barroso, C. Grimes, and S. Quinlan. Availability in globally distributed storage systems. In Proc. of USENIX OSDI, Oct 2010.

[18] S. Ghemawat, H. Gobioff, and S. Leung. The google file system. In Proc. of ACM SOSP, Dec 2003. [OpenAIRE]

[19] C. A. R. Hoare. Algorithm 65: find. Communications of the ACM, 4(7):321--322, 1961. [OpenAIRE]

[20] M. Holland and G. A. Gibson. Parity declustering for continuous operation in redundant disk arrays. In Proc. of ASPLOS, 1992.

[21] Y. Hu, X. Li, M. Zhang, P. P. Lee, X. Zhang, P. Zhou, and D. Feng. Optimal repair layering for erasure-coded data centers: From theory to practice. ACM Trans. on Storage, 13(4):33, 2017.

42 references, page 1 of 3
Abstract
We propose repair pipelining, a technique that speeds up the repair performance in general erasure-coded storage. By carefully scheduling the repair of failed data in small-size units across storage nodes in a pipelined manner, repair pipelining reduces the single-block repair time to approximately the same as the normal read time for a single block in homogeneous environments. We further design different extensions of repair pipelining algorithms for heterogeneous environments and multi-block repair operations. We implement a repair pipelining prototype, called ECPipe, and integrate it as a middleware into three open-source distributed storage systems. Experime...
Subjects
free text keywords: Computer Science - Distributed, Parallel, and Cluster Computing
Download from
42 references, page 1 of 3

[1] Facebook's Hadoop. https://github.com/facebookarchive/hadoop-20.

[2] Hadoop 3.1.1 HDFS. https://hadoop.apache.org/docs/r3.1.1/.

[3] HDFS-RAID. http://wiki.apache.org/hadoop/HDFS-RAID.

[10] R. Bhagwan, K. Tati, Y. Cheng, S. Savage, and G. Voelker. Total Recall: System support for automated availability management. In Proc. of NSDI, 2004.

[11] B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, et al. Windows Azure storage: A highly available cloud storage service with strong consistency. In Proc. of ACM SOSP, Oct 2011.

[12] Y. Chen, S. Mu, J. Li, C. Huang, J. Li, A. Ogus, and D. Phillips. Giza: Erasure coding objects across global data centers. In Proc. of USENIX ATC, 2017.

[13] M. Chowdhury, S. Kandula, and I. Stoica. Leveraging endpoint flexibility in data-intensive clusters. In Proc. of ACM SIGCOMM, Aug 2013. [OpenAIRE]

[14] B. Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, M. F. Kaashoek, J. Kubiatowicz, and R. Morris. Efficient replica maintenance for distributed storage systems. In Proc. of NSDI, 2006.

[15] J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proc. of USENIX OSDI, Dec 2004.

[16] A. G. Dimakis, P. B. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran. Network coding for distributed storage systems. IEEE Trans. on Info. Theory, 56(9):4539--4551, Sep 2010.

[17] D. Ford, F. Labelle, F. I. Popovici, M. Stokel, V.-A. Truong, L. Barroso, C. Grimes, and S. Quinlan. Availability in globally distributed storage systems. In Proc. of USENIX OSDI, Oct 2010.

[18] S. Ghemawat, H. Gobioff, and S. Leung. The google file system. In Proc. of ACM SOSP, Dec 2003. [OpenAIRE]

[19] C. A. R. Hoare. Algorithm 65: find. Communications of the ACM, 4(7):321--322, 1961. [OpenAIRE]

[20] M. Holland and G. A. Gibson. Parity declustering for continuous operation in redundant disk arrays. In Proc. of ASPLOS, 1992.

[21] Y. Hu, X. Li, M. Zhang, P. P. Lee, X. Zhang, P. Zhou, and D. Feng. Optimal repair layering for erasure-coded data centers: From theory to practice. ACM Trans. on Storage, 13(4):33, 2017.

42 references, page 1 of 3
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue