Architectural Scalability of Neural Network Inference Using Task-based Programming

The internal structure of interactions in a hidden network can be inferred using a maximum likelihood estimate based on a record of its external behavior, within the framework of the kinetic Ising model. Beyond its origins in statistical physics, solutions to this problem can model the internal structure of a hidden neural network based on activity recordings from a laboratory setting, or the training process of an arti cial neural network in the context of machine learning. The primary obstacle to its practical application is that the amount of computational work required grows rapidly with the dimensions of the represented network, but the vast majority of the operations can be independently evaluated in parallel. In this paper, we investigate the performance characteristics of a proxy application that models this growth, with the purpose of examining its suitability as a candidate application for future exascale platforms. While the application implies an abundant amount of parallelizable computation, the practical scalability of a particular implementation depends on the distribution of its underlying data structure in memory, and the resulting interactions with the memory system of the target architecture. We investigate three di erent programming strategies that cover di erent trade-o s in terms of memory access, from a process-based implementation that partitions the global workload into parallel parts that are strictly sequenced internally, through a combination of thread parallelism and statically scheduled iterations, to a task- based implementation that exposes all the work in terms of potentially parallel work units and schedules their sequencing at run-time. We nd that this trade-o leads to implementations that can utilize computing platforms of growing size comparably well, displaying near-linear speedup on our test system, which makes the application a promising candidate for extreme scale computations. For the present test systems, however, scheduling the computation at run-time comes with an overhead that is not amortized by the gains from additional scheduling exibility, suggesting that the process- based implementation provides the most favorable scalability on present architectures.

Keywords

trade-o leads, exascale platforms, preformance characteristics

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average