
handle: 11568/1305327
Abstract In the stream processing paradigm, a huge volume of data is continu- ously processed by standing queries that extract insights from raw inputs. Such queries often keep an internal state (representing useful information of the stream history) to produce results. Examples of state paradigms are notably sliding win- dows, where computation is periodically repeated over the last received data (e.g., inputs received in the last ten seconds sliding every half a second). Furthermore, such a state is replicated per distinct key, which is a user-defined attribute used to partition the same physical stream into logical sub-streams. The combination of several keys (often millions in real-world scenarios), and the window size, make the overall state of a streaming query huge and potentially greater than the avail- able memory. This is particularly critical in small low-end devices where stream analyses might be run to analyze data closer to data producers, like in the Edge computing paradigm. In this paper, we focus on this problem and on how a popu- lar Key-Value Store (KVS) can be integrated into the WindFlow stream processing library. The idea is to design a new set of persistent operators able to transpar- ently move state objects in secondary memory without manual user intervention. The paper shows our design and implementation, and an experimental evaluation based on a set of benchmarks.
Data stream processing; Larger-than-memory state; Multicores; Scaleup architectures; Stream processing engines
Data stream processing; Larger-than-memory state; Multicores; Scaleup architectures; Stream processing engines
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
