Memory Access Characterization of OpenMP Workloads on a Multi-core NUMA Machine

Report English OPEN
Pousa Ribeiro, Christiane; Carissimi, Alexandre; Méhaut, Jean-François;
  • Publisher: HAL CCSD
  • Subject: [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] | [ INFO.INFO-DC ] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC]
    acm: Hardware_MEMORYSTRUCTURES

Nowadays, on hierarchical shared memory multiprocessors with Non-Uniform Memory Access (NUMA), the number of cores accessing memory banks is considerably high. Such accesses produce more stress on the memory banks, generating load-balancing issues, memory contention and... View more
  • References (16)
    16 references, page 1 of 2

    [1] D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, “The dash prototype: Logic overhead and performance,” IEEE Trans. Parallel Distrib. Syst., vol. 4, no. 1, pp. 41-61, 1993.

    [2] T. Mu, J. Tao, M. Schulz, and S. A. McKee, “Interactive Locality Optimization on NUMA Architectures,” in SoftVis '03: Proceedings of the 2003 ACM Symposium on Software Visualization. New York, NY, USA: ACM, 2003, pp. 133-ff.

    [3] J. Marathe and F. Mueller, “Hardware Profile-Guided Automatic Page Placement for ccNUMA Systems,” in PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming. New York, NY, USA: ACM, 2006, pp. 90-99. [Online]. Available:

    [4] A. Joseph, J. Pete, and R. Alistair, “Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport,” 2006, pp. 338-352.

    [8] C. P. Ribeiro, M. Castro, L. G. Fernandes, A. Carissimi, and J.-F. Méhaut, “Memory Affinity for Hierarchical Shared Memory Multiprocessors,” in 21st International Symposium on Computer Architecture and High Performance Computing - SBAC-PAD. São Paulo, Brazil: IEEE, 2009.

    [9] Z. Smith, “Bandwidth: a memory bandwidth benchmark for x86 x86_64 ARM based Linux and ARM Windows MobileCE,” 2010. [Online]. Available:

    [10] The BenchIT Project, “Performance Measurement for Scientific Applications,” 2010. [Online]. Available:

    [11] D. Molka, D. Hackenberg, R. Schone, and M. S. Muller, “Memory performance and cache coherency effects on an intel nehalem multiprocessor system,” in PACT '09: Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques. Washington, DC, USA: IEEE Computer Society, 2009, pp. 261-270.

    [12] D. H. Bailey, E. Barzcz, L. Dagum, and H. D. Simon, “Nas parallel benchmark results,” IEEE Concurrency, vol. 1, no. 1, pp. 43-51, 1993.

    [13] M. F. H. Jin and J. Yan., “The OpenMP Implementation of NAS Parallel Benchmarks and its Performance,” Tech. Rep. NAS-99-011, October 1999. [Online]. Available:

  • Metrics
Share - Bookmark