
handle: 2117/428367
GPT transformers are useful for various applications, offering significant advancements in natural language processing tasks. However, their operational costs are substantial has shown in prior work which highlights the financial implications of deploying these models [1]. Essentially, matrix-matrix multiplications (MMM), with their intensive data movement and manipulation of arithmetic weights, underscore the computational demands of these architectures. Naturally, these observations are also found in recent efforts within the research community, which have concentrated on devising specialized formats and algorithms aimed at mitigating these costs. These innovations include reducing bit-width exemplified by Machine Learning eXchange (MLX) formats (essentially small floats), specialized hardware such as TPUs’ systolic arrays, model pruning of up to 40%, and more recently, ternary and binary LLMs (see BitNets [2]). We introduce a generator of ASIC kernels agnostic to the PDK of MMM units for emerging and small floating-point formats, followed by the evaluation of such units. Concretely, our contributions include the automated generation of circuits for any floating-point format with automated pipelining, a systolic array architecture proposal—these two combined form the foundation of MMM units, a framework to automate the translation from high-level language (Python) to silicon for such matrices, the generation of 4 arithmetic formats × 2 accumulator configurations × 4 PDKs = 32 chips, and their performance and efficiency evaluation, all provided as open source.
floating-points, Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors, ASIC, Matrix-Matrix Multiplications, Large Language Models (LLM), transformers, High performance computing, Open-Source Silicon (OSS), Generative Pre-Trained (GPT), arithmetic, Càlcul intensiu (Informàtica)
floating-points, Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors, ASIC, Matrix-Matrix Multiplications, Large Language Models (LLM), transformers, High performance computing, Open-Source Silicon (OSS), Generative Pre-Trained (GPT), arithmetic, Càlcul intensiu (Informàtica)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
