Name: Efficient Allocation of Image Recognition and LLM Tasks on Multi-GPU System
Keywords: Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Performance, Computer Science - Distributed, Parallel, and Cluster Computing, Distributed, Parallel, and Cluster Computing (cs.DC)

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Article , Preprint 01 Jan 2025Embargo end date: 01 Jan 2025 English Publisher:Springer Nature Switzerland

Authors: Lawenda, Marcin; Samborski, Krzesimir; Khloponin, Kyrylo; Szustak, Łukasz;

doi: 10.1007/978-3-031-85703-4_5 , 10.48550/arxiv.2503.15252

arXiv: http://arxiv.org/abs/2503.15252

Efficient Allocation of Image Recognition and LLM Tasks on Multi-GPU System

- Summary
- Subjects
- Metrics

Abstract

This work is concerned with the evaluation of the performance of parallelization of learning and tuning processes for image classification and large language models. For machine learning model in image recognition, various parallelization methods are developed based on different hardware and software scenarios: simple data parallelism, distributed data parallelism, and distributed processing. A detailed description of presented strategies is given, highlighting the challenges and benefits of their application. Furthermore, the impact of different dataset types on the tuning process of large language models is investigated. Experiments show to what extent the task type affects the iteration time in a multi-GPU environment, offering valuable insights into the optimal data utilization strategies to improve model performance. Furthermore, this study leverages the built-in parallelization mechanisms of PyTorch that can facilitate these tasks. Furthermore, performance profiling is incorporated into the study to thoroughly evaluate the impact of memory and communication operations during the training/tuning procedure. Test scenarios are developed and tested with numerous benchmarks on the NVIDIA H100 architecture showing efficiency through selected metrics.

Keywords

Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Performance, Computer Science - Distributed, Parallel, and Cluster Computing, Distributed, Parallel, and Cluster Computing (cs.DC)

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green