
Data-intensive scientific workflows are composed of many tasks that exhibit data precedence constraints leading to communication schemes expressed by means of intermediate files. In such scenarios, the storage layer is often a bottleneck, limiting overall application scalability, due to large volumes of data being generated during runtime at high I/O rates. To alleviate the storage pressure, applications take advantage of in-memory runtime distributed file systems that act as a fast, distributed cache, which greatly enhances I/O performance.In this paper, we present scalability results for MemFS, a distributed in-memory runtime file system. MemFS takes an opposite approach to data locality, by scattering all data among the nodes, leading to well balanced storage and network traffic, and thus making the system both highly per formant and scalable. Our results show that MemFS is platform independent, performing equally well on both private clusters and commercial clouds. On such platforms, running on up to 1024 cores, MemFS shows excellent horizontal scalability (using more nodes), while the vertical scalability (using more cores per node) is only limited by the network b and with. Further more, for this challenge we show how MemFS is able to scale elastically, at runtime, based on the application storage demands. In our experiments, we have successfully used up to 1TB memory when running a large instance of the Montage workflow.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
