publication . Preprint . 2018

Mosaic: An Application-Transparent Hardware-Software Cooperative Memory Manager for GPUs

Ausavarungnirun, Rachata; Landgraf, Joshua; Miller, Vance; Ghose, Saugata; Gandhi, Jayneel; Rossbach, Christopher J.; Mutlu, Onur;
Open Access English
  • Published: 30 Apr 2018
Abstract
Modern GPUs face a trade-off on how the page size used for memory management affects address translation and demand paging. Support for multiple page sizes can help relax the page size trade-off so that address translation and demand paging optimizations work together synergistically. However, existing page coalescing and splintering policies require costly base page migrations that undermine the benefits multiple page sizes provide. In this paper, we observe that GPGPU applications present an opportunity to support multiple page sizes without costly data migration, as the applications perform most of their memory allocation en masse (i.e., they allocate a large...
Subjects
free text keywords: Computer Science - Operating Systems, Computer Science - Hardware Architecture
Funded by
NSF| CSR: Medium: Collaborative Research: Enabling GPUs as First-Class Computing Engines
Project
  • Funder: National Science Foundation (NSF)
  • Project Code: 1409723
  • Funding stream: Directorate for Computer & Information Science & Engineering | Division of Computer and Network Systems
Download from
104 references, page 1 of 7

[1] Advanced Micro Devices, Inc., “AMD Accelerated Processing Units,” http://www. amd.com/us/products/technologies/apu/Pages/apu.aspx.

[2] Advanced Micro Devices, Inc., “OpenCL: The Future of Accelerated Application Performance Is Now,” https://www.amd.com/Documents/FirePro_OpenCL_ Whitepaper.pdf.

[3] N. Agarwal, D. Nellans, M. O'Connor, S. W. Keckler, and T. F. Wenisch, “Unlocking Bandwidth for GPUs in CC-NUMA Systems,” in HPCA, 2015. [OpenAIRE]

[4] J. Ahn, S. Jin, and J. Huh, “Revisiting Hardware-Assisted Page Walks for Virtualized Systems,” in ISCA, 2012.

[5] J. Ahn, S. Jin, and J. Huh, “Fast Two-Level Address Translation for Virtualized Systems,” IEEE TC, 2015.

[6] R. Ausavarungnirun, K. Chang, L. Subramanian, G. Loh, and O. Mutlu, “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” in ISCA, 2012.

[7] R. Ausavarungnirun, S. Ghose, O. Kayıran, G. H. Loh, C. R. Das, M. T. Kandemir, and O. Mutlu, “Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance,” in PACT, 2015.

[8] R. Ausavarungnirun, J. Landgraf, V. Miller, S. Ghose, J. Gandhi, C. J. Rossbach, and O. Mutlu, “Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes,” in MICRO, 2017. [OpenAIRE]

[9] R. Ausavarungnirun, V. Miller, J. Landgraf, S. Ghose, J. Gandhi, A. Jog, C. Rossbach, and O. Mutlu, “MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency,” in ASPLOS, 2018. [OpenAIRE]

[10] A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, “Analyzing CUDA Workloads Using a Detailed GPU Simulator,” in ISPASS, 2009. [OpenAIRE]

[11] T. W. Barr, A. L. Cox, and S. Rixner, “Translation Caching: Skip, Don't Walk (the Page Table),” in ISCA, 2010.

[12] T. W. Barr, A. L. Cox, and S. Rixner, “SpecTLB: A Mechanism for Speculative Address Translation,” in ISCA, 2011.

[13] A. Basu, J. Gandhi, J. Chang, M. D. Hill, and M. M. Swift, “E cient Virtual Memory for Big Memory Servers,” in ISCA, 2013.

[14] A. Bhattacharjee, “Large-Reach Memory Management Unit Caches,” in MICRO, 2013.

[15] A. Bhattacharjee, D. Lustig, and M. Martonosi, “Shared Last-level TLBs for Chip Multiprocessors,” in HPCA, 2011.

104 references, page 1 of 7
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue