Foundation model evaluation study MMLU HellaSwag ARC WinoGrande TruthfulQA scores

SOVEREIGN Research Kernel

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Report

Data sources: ZENODO

Foundation model evaluation study MMLU HellaSwag ARC WinoGrande TruthfulQA scores

descriptionPublicationkeyboard_double_arrow_right Report Under curation English Publisher:Zenodo

Authors: SOVEREIGN Research Kernel;

doi: 10.5281/zenodo.20440412

Foundation model evaluation study MMLU HellaSwag ARC WinoGrande TruthfulQA scores

- Summary

Abstract

Recent advancements in Natural Language Processing (NLP) technologies have been driven at an unprecedented pace by the development of Large Language Models (LLMs). However, challenges remain, such as generating responses that are misaligned with the intent of the question or producing incorrect answers. This paper analyzes various Prompt Engineering techniques for large-scale language models and identifies methods that can optimize response performance across different datasets without the need for extensive retraining or fine-tuning. In particular, we examine prominent Prompt Engineering techResearch goal: Foundation model evaluation study MMLU HellaSwag ARC WinoGrande TruthfulQA scoresAutonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.8/10.

Found an issue? Give us feedback