
This Tech Note presents an evaluation of scalable solutions for streaming time-series data, critical for real-time analysis in large-scale national research facilities like the NSF Laser Interferometer Gravitational-Wave Observatory (LIGO). The study assesses various time-series databases (ClickHouse, InfluxDB, TimescaleDB) and communication protocols (Kafka, Arrow Flight), focusing on query performance, data ingestion, and scalability. ClickHouse and Kafka emerged as preferred solutions, providing high performance and flexibility for environments with large-scale data requirements. The evaluation is based on use cases from facilities like LIGO, aiming to improve real-time data processing capabilities in NSF Major Facilities.
This project is supported by the U.S. National Science Foundation Office of Advanced Cyberinfrastructure in the Directorate for Computer Information Science under Grant #2127548.
NSF Major Facilities, Data Processing, Scalability, Data Streaming, Time Series Databases
NSF Major Facilities, Data Processing, Scalability, Data Streaming, Time Series Databases
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
