How does the inference latency and memory footprint of MixLoRA compare to full fine-tuning on the MMLU benchma

SOVEREIGN Research Kernel

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Report

Data sources: ZENODO

How does the inference latency and memory footprint of MixLoRA compare to full fine-tuning on the MMLU benchma

descriptionPublicationkeyboard_double_arrow_right Report Under curation English Publisher:Zenodo

Authors: SOVEREIGN Research Kernel;

doi: 10.5281/zenodo.20435241

How does the inference latency and memory footprint of MixLoRA compare to full fine-tuning on the MMLU benchma

- Summary

Abstract

Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (\<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2 (through our adaptation of Nvidia's Megatron-LM distributed trainer)Research goal: How does the inference latency and memory footprint of MixLoRA compare to full fine-tuning on the MMLU benchmark for LLaMA-2 models at 7B, 13B, and 70B scales?Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 9.0/10.

Found an issue? Give us feedback