
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
Herein is a data set comprising 98k limericks scraped from the The Omnificent English Dictionary In Limerick Form - OEDILF. It is a subset of the full data set, filtered to pass a basic test of standard limerick form (i.e., ensuring five lines, no emojis, no symbols). Each limerick was written by a human contributor whose work has passed through a rigorous moderation. This dataset is released alongside two companion papers: "BPoMP: The Benchmark of Poetic Minimal Pairs ��� Limericks, Rhyme, and Narrative Coherence" (Abdibayev, Riddell, Rockmore, RANLP 2021) and "Automating the Detection of Poetic Features: The Limerick as Model Organism" (Abdibayev, Riddell, Igarashi, Rockmore, SIGHUM 2021). The dataset is primarily released for use by NLP researchers interested in studying formal structure of poetry and more generally, interested in computational poetics. Each limerick is accompanied by metadata: author information, id within the website and "is_limerick" field, which denotes if limerick was recognized by our custom filter that was built to check for formal limerick properties (this tagging was a goal of the SIGHUM paper and reflects the results reported there - see the paper for details). Thus, if "is_limerick"=True this is a true positive, "is_limerick"=False is (almost surely) a false negative. We identify 70% of these as limericks and provide the tagging as a benchmark for the community to improve upon. With these considerations in mind we hope that NLP community will use this dataset to study poetical knowledge of language models trained on large corpora as many of their properties still remain a mystery to the community at large. We are excited for the possibilities ahead! UPDATE: we released a new version of our dataset that contains all of the limericks that we planned to publish. Previous version (v2) was created using code that contained a bug which in turn lowered the number of available limericks.
limericks, natural language processing, computational poetics, nlp, limerick, poetry
limericks, natural language processing, computational poetics, nlp, limerick, poetry
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
views | 59 | |
downloads | 26 |