What is the performance gap in code generation tasks (HumanEval) between multimodal models and text-only LLMs

SOVEREIGN Research Kernel

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Report

Data sources: ZENODO

What is the performance gap in code generation tasks (HumanEval) between multimodal models and text-only LLMs

descriptionPublicationkeyboard_double_arrow_right Report Under curation English Publisher:Zenodo

Authors: SOVEREIGN Research Kernel;

doi: 10.5281/zenodo.20440838

What is the performance gap in code generation tasks (HumanEval) between multimodal models and text-only LLMs

- Summary

Abstract

We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B, 34B and 70B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with upResearch goal: What is the performance gap in code generation tasks (HumanEval) between multimodal models and text-only LLMs when provided with additional visual context or pseudocode diagrams?Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.7/10.

Found an issue? Give us feedback