Nowadays, on hierarchical shared memory multiprocessors with Non-Uniform Memory Access (NUMA), the number of cores accessing memory banks is considerably high. Such accesses produce more stress on the memory banks, generating load-balancing issues, memory contention and... View more
 D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, “The dash prototype: Logic overhead and performance,” IEEE Trans. Parallel Distrib. Syst., vol. 4, no. 1, pp. 41-61, 1993.
 T. Mu, J. Tao, M. Schulz, and S. A. McKee, “Interactive Locality Optimization on NUMA Architectures,” in SoftVis '03: Proceedings of the 2003 ACM Symposium on Software Visualization. New York, NY, USA: ACM, 2003, pp. 133-ff.
 J. Marathe and F. Mueller, “Hardware Profile-Guided Automatic Page Placement for ccNUMA Systems,” in PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming. New York, NY, USA: ACM, 2006, pp. 90-99. [Online]. Available: http://portal.acm.org/citation.cfm?id=1122987
 A. Joseph, J. Pete, and R. Alistair, “Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport,” 2006, pp. 338-352.
 C. P. Ribeiro, M. Castro, L. G. Fernandes, A. Carissimi, and J.-F. Méhaut, “Memory Affinity for Hierarchical Shared Memory Multiprocessors,” in 21st International Symposium on Computer Architecture and High Performance Computing - SBAC-PAD. São Paulo, Brazil: IEEE, 2009.
 Z. Smith, “Bandwidth: a memory bandwidth benchmark for x86 x86_64 ARM based Linux and ARM Windows MobileCE,” 2010. [Online]. Available: http://home.comcast.net/~fbui/bandwidth.html
 The BenchIT Project, “Performance Measurement for Scientific Applications,” 2010. [Online]. Available: http://www.benchit.org/
 D. Molka, D. Hackenberg, R. Schone, and M. S. Muller, “Memory performance and cache coherency effects on an intel nehalem multiprocessor system,” in PACT '09: Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques. Washington, DC, USA: IEEE Computer Society, 2009, pp. 261-270.
 D. H. Bailey, E. Barzcz, L. Dagum, and H. D. Simon, “Nas parallel benchmark results,” IEEE Concurrency, vol. 1, no. 1, pp. 43-51, 1993.
 M. F. H. Jin and J. Yan., “The OpenMP Implementation of NAS Parallel Benchmarks and its Performance,” Tech. Rep. NAS-99-011, October 1999. [Online]. Available: www.nas.nasa.gov/News/Techreports/1999/PDF/nas-99-011.pdf