Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Report . null
Data sources: ZENODO
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

CEPH RGW MULTISITE CONSISTENCY MONITORING SERVICE

Authors: Grabowski, Dawid; Bocchi, Enrico; Lekshmanan, Abhishek;

CEPH RGW MULTISITE CONSISTENCY MONITORING SERVICE

Abstract

The long-term storage and availability of vast datasets, such as those generated by the Large Hadron Collider (LHC), are critical to CERN’s scientific mission. The Ceph distributed storage system, with its RADOS Gateway (RGW) S3-compatible object storage interface, provides a scalable and resilient solution. To ensure high availability and disaster recovery, RGW can be deployed in a multisite replication configuration, for instance, between the Meyrin and Prévessin data centers. However, maintaining perfect data consistency across geographically distributed sites presents a significant challenge. Latency, network partitions, or software bugs can lead to replication inconsistencies, where data exists at one site but is missing or outdated at another. This project addresses this challenge through the development of the Ceph RGW Multisite Consistency Monitor, a comprehensive tool designed to detect and diagnose replication discrepancies. The tool operates in two distinct modes: a non-intrusive Passive Monitoring Mode that listens to real-time S3 operations via Ceph’s Kafka-based bucket notifications, and an Active Testing Mode that generates controlled S3 workload (PUT/DELETE operations) to stress-test the replication pipeline and validate consistency under load. The system leverages the AWS S3 command-line interface for object manipulation and a high-performance C++ component for real-time Kafka event processing. By comparing the ground truth of performed S3 operations with the stream of replication notifications, the monitor can pinpoint specific inconsistencies, such as missing notifications, extra notifications, and orphaned synchronization events. The tool produces detailed JSON reports and human-readable summaries, providing storage administrators with the necessary diagnostics to maintain the integrity of CERN’s distributed storage infrastructure.

Powered by OpenAIRE graph
Found an issue? Give us feedback