Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2022
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2022
License: CC BY
Data sources: ZENODO
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
ZENODO
Dataset . 2022
Data sources: Datacite
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
ZENODO
Dataset . 2022
Data sources: ZENODO
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
ZENODO
Dataset . 2022
Data sources: Datacite
versions View all 3 versions
addClaim

TweetNERD - End to End Entity Linking Benchmark for Tweets

Authors: Mishra, Shubhanshu; Saini, Aman; Makki, Raheleh; Mehta, Sneha; Haghighi, Aria; Mollahosseini, Ali;

TweetNERD - End to End Entity Linking Benchmark for Tweets

Abstract

TweetNERD - End to End Entity Linking Benchmark for Tweets Paper - Video - Neurips Page This is the dataset described in the paper TweetNERD - End to End Entity Linking Benchmark for Tweets (accepted to Thirty-sixth Conference on Neural Information Processing Systems (Neurips) Datasets and Benchmarks Track). Named Entity Recognition and Disambiguation (NERD) systems are foundational for information retrieval, question answering, event detection, and other natural language processing (NLP) applications. We introduce TweetNERD, a dataset of 340K+ Tweets across 2010-2021, for benchmarking NERD systems on Tweets. This is the largest and most temporally diverse open sourced dataset benchmark for NERD on Tweets and can be used to facilitate research in this area. UPDATE: The new version contains an additional ~125K Tweets leading to a total dataset size of ~465K Tweets. TweetNERD dataset is released under Creative Commons Attribution 4.0 International (CC BY 4.0) LICENSE. The license only applies to the data files present in this dataset. See Data usage policy below. Check out more details at https://github.com/twitter-research/TweetNERD Usage We provide the dataset split across the following tab seperated files: OOD.public.tsv: OOD split of the data in the paper. Academic.public.tsv: Academic split of the data described in the paper. part_*.public.tsv: Remaining data split into parts in no particular order. Each file is tab separated and has has the following format: tweet_id phrase start end entityId score 22 twttr 20 25 Q918 3 21 twttr 20 25 Q918 3 1457198399032287235 Diwali 30 38 Q10244 3 1232456079247736833 NO_PHRASE -1 -1 NO_ENTITY -1 For tweets which don't have any entity, their column values for phrase, start, end, entityId, score are set NO_PHRASE, -1, -1, NO_ENTITY, -1 respectively. Description of file columns is as follows: Column Type Missing Value Description tweet_id string ID of the Tweet phrase string NO_PHRASE entity phrase start int -1 start offset of the phrase in text using UTF-16BE encoding end int -1 end offset of the phrase in the text using UTF-16BE encoding entityId string NO_ENTITY Entity ID. If not missing can be NOT FOUND, AMBIGUOUS, or Wikidata ID of format Q{numbers}, e.g. Q918 score int -1 Number of annotators who agreed on the phrase, start, end, entityId information In order to use the dataset you need to utilize the tweet_id column and get the Tweet text using the Twitter API (See Data usage policy section below). Data stats Split Number of Rows Number unique tweets OOD 34102 25000 Academic 51685 30119 part_0 11830 10000 part_1 35681 25799 part_2 34256 25000 part_3 36478 25000 part_4 37518 24999 part_5 36626 25000 part_6 34001 24984 part_7 34125 24981 part_8 32556 25000 part_9 32657 25000 part_10 32442 25000 part_11 32033 24972 part_12 76559 25000 part_13 67240 24920 part_14 67745 25000 part_15 67652 25000 part_16 65739 25000 Data usage policy Use of this dataset is subject to you obtaining lawful access to the Twitter API, which requires you to agree to the Developer Terms Policies and Agreements. Please cite the following if you use TweetNERD in your paper: @dataset{TweetNERD_Zenodo_2022_6617192, author = {Mishra, Shubhanshu and Saini, Aman and Makki, Raheleh and Mehta, Sneha and Haghighi, Aria and Mollahosseini, Ali}, title = {{TweetNERD - End to End Entity Linking Benchmark for Tweets}}, month = jun, year = 2022, note = {{Data usage policy Use of this dataset is subject to you obtaining lawful access to the [Twitter API](https://developer.twitter.com/en/docs /twitter-api), which requires you to agree to the [Developer Terms Policies and Agreements](https://developer.twitter.com/en /developer-terms/).}}, publisher = {Zenodo}, version = {0.0.0}, doi = {10.5281/zenodo.6617192}, url = {https://doi.org/10.5281/zenodo.6617192} } @inproceedings{TweetNERDNeurips2022, author = {Mishra, Shubhanshu and Saini, Aman and Makki, Raheleh and Mehta, Sneha and Haghighi, Aria and Mollahosseini, Ali}, booktitle = {Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks}, pages = {}, title = {TweetNERD - End to End Entity Linking Benchmark for Tweets}, volume = {2}, year = {2022}, eprint = {arXiv:2210.08129}, doi = {10.48550/arXiv.2210.08129} }

Data usage policy Use of this dataset is subject to you obtaining lawful access to the [Twitter API](https://developer.twitter.com/en/docs/twitter-api), which requires you to agree to the [Developer Terms Policies and Agreements](https://developer.twitter.com/en/developer-terms/).

Keywords

Twitter, Social Media, Tweet, Entity Linking, Named Entity Recognition, Wikidata

EOSC Subjects

Twitter Data

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 114
    download downloads 112
  • 114
    views
    112
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
114
112