Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Yemeni Proverbs

Authors: Thmer, Nasser; AL-LAITH, ALI; Shoaib, Muhammad; Alhuzali, Hassan;

Yemeni Proverbs

Abstract

Yemeni Proverbs: A Benchmark Corpus for Figurative and Cultural Language Modeling This dataset contains 5,252 Yemeni Arabic proverbs paired with their corresponding explanations in Modern Standard Arabic (MSA). The corpus was compiled from four printed proverb anthologies and three publicly accessible digital repositories between January and June 2024. The dataset was created through manual transcription of printed materials and structured extraction of digital sources. All entries were manually verified to ensure accurate pairing between proverb text and its original explanation. Duplicate and incomplete records were removed during preprocessing. Each record includes the following fields: id: Unique integer identifier proverb: Dialectal Yemeni Arabic proverb (UTF-8 encoded) explanation: Explanation in Modern Standard Arabic (transcribed from source) source: Title of the printed anthology or name of the digital repository city: Geographic origin if explicitly stated in the source (otherwise null) url: Direct link to online source when applicable (null for printed sources) The corpus preserves dialectal orthography and does not introduce new explanatory annotations. All explanations were transcribed directly from the original sources. Geographic metadata is available for approximately 27% of entries. No geographic inference was performed when such information was not explicitly provided in the source materials. The dataset is intended to support research in: Figurative language understanding Dialect-aware Arabic NLP Culturally grounded language modeling Evaluation of generative models on non-MSA input Computational folkloristics This repository contains: Yemeni_proverbs.json (primary dataset file, UTF-8 encoded) The dataset is distributed under the Creative Commons Attribution 4.0 (CC BY 4.0) license.Users are responsible for consulting original publishers for access to full source documents under their respective terms.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average