publication . Article . 2014

CÓMPUTO DE ALTO DESEMPEÑO PARA OPERACIONES VECTORIALES EN BLAS-1 // INCREASED COMPUTATIONAL PERFORMANCE FOR VECTOR OPERATIONS ON BLAS-1

José Antonio Muñoz Gómez; Abimael Jiménez Pérez; Gustavo Rodríguez Gómez;
Open Access Spanish
  • Published: 01 Jun 2014 Journal: Publicaciones en Ciencias y Tecnología (issn: 2477-9660, Copyright policy)
  • Publisher: Universidad Centroccidental Lisandro Alvarado
Abstract
The functions library, called Basic Linear Algebra Subprograms (BLAS-1), is considered the programming standard in scientific computing. In this work, we focus on the analysis of various code optimization techniques to increase the computational performance of BLAS-1. In particular, we address a combinational approach to explore possible methods of encoding using unroll technique with different levels of depth, vector data programming with MMX and SSE for Intel processors. Using the main functions of BLAS-1, it was determined numerically a computational increase, expressed in mega-ops, up to 52% compared to the optimized BLAS-1 ATLASlibrary.// RESUMEN: La biblio...
Subjects
free text keywords: Scientific computing, BLAS-1, unroll technique, vector programming, cómputo cientíco, técnica de unroll, programación vectorial, Technology, T, Science, Q
18 references, page 1 of 2

Aiken, A., y Nicolau, A. (1987). Loop quantization: An analysis and algorithm. Department of Computer Science, Cornell University.

Bouhamidi, A., Hached, M., y Jbilou, K. (2013). A meshless method for the numerical computation of the solution of steady burgerstype equations. Applied Numerical Mathematics , 74 (0), 95 - 110. [OpenAIRE]

Chisnall, D. (2007, marzo). Programming with gcc. InformIT Article is provided courtesy of Prentice Hall Professional .

Davidson, J. W., y Jinturkar, S. (1995). Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguation. En In proceedings of the 28th annual international symposium on microarchitecture (pp. 125{132). IEEE Computer Society. [OpenAIRE]

Golub, G. H., y Loan, C. F. V. (1996). Matrix computations (3rd ed.). The Johns Hopkins University Press.

Goto, K., y Van De Geijn, R. (2008). High-performance implementation of the level-3 blas. ACM Trans. Math. Softw., 35 (1), 4:1{4:14.

Hennessy, J. L., y Patterson, D. A. (2003). Computer architecture: A quantitative approach (3.a ed.). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

Higham, N. J. (2002). Accuracy and stability of numerical algorithms (2. ed.). SIAM.

Inc., I. (2012, abril). Intel R 64 and ia-32 architectures optimization reference manual (Vol. A) [Manual de software informatico].

Lawson, C. L., Hanson, R. J., Kincaid, D. R., y Krogh, F. T. (1979, septiembre). Basic linear algebra subprograms for fortran usage. ACM Trans. Math. Softw., 5 (3), 308{323. [OpenAIRE]

Mansour, A., y Gtze, J. (2013). Utilizing robustness of krylov subspace methods in reducing the e ort of sparse matrix vector multiplication. Procedia Computer Science, 18 (0), 2406 - 2409. (2013 International Conference on Computational Science)

Mittal, M., Peleg, A., y Weiser, U. (1997). Mmx technology architecture overview. (Q3).

Napoli, E. D., Fabregat-Traver, D., Quintana-Ort, G., y Bientinesi, P. (2014). Towards an e cient use of the fBLASg library for multilinear tensor contractions. Applied Mathematics and Computation, 235 (0), 454 - 468.

Trefethen, L. N., y Bau, D. (1997). Numerical linear algebra. SIAM.

Van Loan, C. F. (1999). Introduction to scienti c computing. Prentice-Hall.

18 references, page 1 of 2
Any information missing or wrong?Report an Issue