
Stream data processing is a widely used technology for analysing IoT-generated data shortly after being produced, and delivering timely insights about them. Executing such analysis in geo-distributed platforms enables shorter delays between data production and processing and fewer disturbances due to potential instability of long-distance networks, while retaining the ability to scale the processing capacity up and down according to the demand. However, current stream processing engines were designed for environments made of homogeneous servers connected together using high-speed network links. We experimentally study the performance of Apache Flink coupled with the Gesscale auto-scaler in conditions which resemble those of geo-distributed platforms. We demonstrate that Flink’s backpressure mechanism should not be used as the only trigger for rescaling operations in heterogeneous network conditions. Raw performance, as well as performance predictability, also degrade quickly in the presence of stateful data processing operators and/or high network latency between the processing nodes.
Geo-distributed infrastructure, [INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Data Stream Processing, Elasticity, Stateful operators
Geo-distributed infrastructure, [INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Data Stream Processing, Elasticity, Stateful operators
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
