Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao ZENODOarrow_drop_down
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
ZENODO
Dataset . 2024
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2024
License: CC BY
Data sources: Datacite
addClaim

SlangTrack Dataset

Authors: aloraini, Afnan;

SlangTrack Dataset

Abstract

The SlangTrack (ST) Dataset is a novel, meticulously curated resource aimed at addressing the complexities of slang detection in natural language processing. This dataset uniquely emphasizes words that exhibit both slang and non-slang contexts, enabling a binary classification system to distinguish between these dual senses. By providing comprehensive examples for each usage, the dataset supports fine-grained linguistic and computational analysis, catering to both researchers and practitioners in NLP. Key Features: Unique Words: 48,508 Total Tokens: 310,170 Average Post Length: 34.6 words Average Sentences per Post: 3.74 These features ensure a robust contextual framework for accurate slang detection and semantic analysis. Target Word Selection: The target words were carefully chosen to align with the goals of fine-grained analysis. Each word in the dataset: It coexists in the slang SD wordlist and the Corpus of Historical American English (COHA). Has between 2 and 8 distinct senses, including both slang and non-slang meanings. Was cross-referenced using trusted resources such as: Green's Dictionary of Slang Urban Dictionary Online Slang Dictionary Oxford English Dictionary Features at least one slang and one dominant non-slang sense. Excludes proper nouns to maintain linguistic relevance and focus. Data Sources and Collection: 1. Corpus of Historical American English (COHA): Historical examples were extracted from the cleaned version of COHA (CCOHA). Data spans the years 1980–2010, capturing the evolution of target words over time. 2. Twitter: Twitter was selected for its dynamic, real-time communication, offering rich examples of contemporary slang and informal language. For each target word, 1,000 examples were collected from tweets posted between 2010–2020, reflecting modern usage. Dataset Scope: The final dataset comprises ten target words, meeting strict selection criteria to ensure linguistic and computational relevance. Each word: Demonstrates semantic diversity, balancing slang and non-slang senses. Offers robust representation across both historical (COHA) and modern (Twitter) contexts. The SlangTrack Dataset is a public resource, fostering research in slang detection, semantic evolution, and informal language processing. Combining historical and contemporary sources provides a comprehensive platform for exploring the nuances of slang in natural language.

EOSC Subjects

Twitter Data

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average