Leaderboard Spanish Language Benchmark for Artificial Intelligence Models (TELEIA)

TELEIA Datasets Leaderboard These dataset contains the answers of different LLMs to the TELEIA (Spanish Language Benchmark for Artificial Intelligence Models) dataset.LLMs evaluated: Yi-6B-Chat Meta-Llama-3-8B-Instruct Llama-2-7b-chat-hf gemma-7b-it Mistral-7B-Instruct-v0.1 occiglot-7b-es-en-instruct GPT3.5 GPT4 Files: TELEIA_Cervantes_AVE_results.xlsx: vocabulary and grammatical structures, following the format of the Cervantes AVE exam TELEIA_PCE_results.xlsx: test on morphology and semantics resembling the style of the PCE exam, consisting of short questions or sentences to be completed TELEIA_SIELE_results.xlsx: different texts with questions related to them, based on the reading comprehension task of the SIELE exam Each .xlsx contains a sheet with the results of each model and the following columns: question: question from TELEIA option_a: possible answer from TELEIA option_b: possible answer from TELEIA option_c: possible answer from TELEIA option_d: possible answer from TELEIA correct_answer: correct answer form TELEIA llm_question: complete question made to the LLM tokens_in: list of tokens that compound the question tokens_in_count: number of tokens that compound the question llm_answer: raw answer from the LLM llm_answer_filtered: answer in format {A,B,C,D} from the LLM tokens_out : list of tokens that compound the raw answer tokens_out_count: number of tokens that compound the raw answer word_count : number of words that compound the raw answer

Related Organizations

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Funded by

EC| SMARTY