Downloads provided by UsageCounts
Lossless data compression is a promising software approach for reducing the bandwidth requirements of scientific applications on accelerator clusters without introducing approximation errors. Suitable compressors must be able to effectively compact floating-point data while saturating the system interconnect to avoid introducing unnecessary latencies. We present ndzip-gpu, a novel, highly-efficient GPU parallelization scheme for the block compressor ndzip, which has recently set a new milestone in CPU floating-point compression speeds. Through the combination of intra-block parallelism and efficient memory access patterns, ndzip-gpu achieves high resource utilization in decorrelating multi-dimensional data via the Integer Lorenzo Transform. We further introduce a novel, efficient warp-cooperative primitive for vertical bit packing, providing a high-throughput data reduction and expansion step. Using a representative set of scientific data, we compare the performance of ndzip-gpu against five other, existing GPU compressors. While observing that effectiveness of any compressor strongly depends on characteristics of the dataset, we demonstrate that ndzip-gpu offers the best average compression ratio for the examined data. On Nvidia Turing, Volta and Ampere hardware, it achieves the highest single-precision throughput by a significant margin while maintaining a favorable trade-off between data reduction and throughput in the double-precision case.
accelerator, gpgpu, floating-point, data compression
accelerator, gpgpu, floating-point, data compression
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 20 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
| views | 1 | |
| downloads | 20 |

Views provided by UsageCounts
Downloads provided by UsageCounts