
With the prevalence of cloud computing, more and more enterprises are migrating applications to cloud infrastructures. Logs are the key to helping customers understand the status of their applications running on the cloud. They are vital for various scenarios, such as service stability assessment, root cause analysis and user activity profiling. Therefore, it is essential to manage the massive amount of logs collected on the cloud and tap their value. Although various log storages have been widely used in the past few decades, it is still a non-trivial problem to design a cost-effective log storage for cloud applications. It faces challenges of heavy write throughput of tens of millions of log records per second, retrieval on PB-level logs and massive hundreds of thousands of tenants. Traditional log processing systems cannot satisfy all these requirements. To address these challenges, we propose the cloud-native log database LogStore. It combines shared-nothing and shared-data architecture, and utilizes highly scalable and low-cost cloud object storage, while overcoming the bandwidth limitations and high latency of using remote storage when writing a large number of logs. We also propose a multi-tenant management method that physically isolates tenant data to ensure compliance and flexible data expiration policies, and uses a novel traffic scheduling algorithm to mitigate the impact of traffic skew and hotspots among tenants. In addition, we design an efficient column index structure LogBlock to support queries with full-text search, and combined several query optimization techniques to reduce query latency on cloud object storage. LogStore has been deployed in Alibaba Cloud on a large scale (more than 500 machines), processing logs of more than 100 GB per second, and has been running stably for more than two years.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 14 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
