
TELEIA Datasets Leaderboard These dataset contains the answers of different LLMs to the TELEIA (Spanish Language Benchmark for Artificial Intelligence Models) dataset.LLMs evaluated: Yi-6B-Chat Meta-Llama-3-8B-Instruct Llama-2-7b-chat-hf gemma-7b-it Mistral-7B-Instruct-v0.1 occiglot-7b-es-en-instruct GPT3.5 GPT4 Files: TELEIA_Cervantes_AVE_results.xlsx: vocabulary and grammatical structures, following the format of the Cervantes AVE exam TELEIA_PCE_results.xlsx: test on morphology and semantics resembling the style of the PCE exam, consisting of short questions or sentences to be completed TELEIA_SIELE_results.xlsx: different texts with questions related to them, based on the reading comprehension task of the SIELE exam Each .xlsx contains a sheet with the results of each model and the following columns: question: question from TELEIA option_a: possible answer from TELEIA option_b: possible answer from TELEIA option_c: possible answer from TELEIA option_d: possible answer from TELEIA correct_answer: correct answer form TELEIA llm_question: complete question made to the LLM tokens_in: list of tokens that compound the question tokens_in_count: number of tokens that compound the question llm_answer: raw answer from the LLM llm_answer_filtered: answer in format {A,B,C,D} from the LLM tokens_out : list of tokens that compound the raw answer tokens_out_count: number of tokens that compound the raw answer word_count : number of words that compound the raw answer
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
