Downloads provided by UsageCounts
An automated benchmark dataset for (Named Entity Recognition) NER and (Named Entity Linking) NEL tools, based on Greek Wikipedia events pages. Note: This data includes data from the following sources: - Wikipedia el.wikipedia.org Description The dataset is provided in the form of three JSON-formatted subsets i.e., train, validation and test in an analogy of 70-20-10. The current version of the dataset contains 18,617 events annotated with 40,798 entity mentions and 36,189 links to elWikipedia (and wikidata ids). The dataset contains annotations belonging to 8 entity types: person, organization, location, gpe, event, facility, product and work of art. Overall dataset statistics Docs Tokens Sentences Surface Mentions Valid Links Red Links Train 13,031 332,077 16,927 28,593 25,365 3,228 Validation 3,722 94,746 4,844 8,168 7,240 928 Test 1,862 47,450 2,427 4,037 3,584 453 Total 18,617 474,361 24,200 40,798 36,189 4,609 Example A record example is given below. { "json_file": "February 2012_39_0 events", "text": "Sudan and South Sudan sign non-aggression pact.", "ground_truth_mentions": [ {"start": 0, "end": 4, "surface_mention": "Sudan", "mention_type": "GPE"}, {"start": 10, "end": 20, "surface_mention": "South Sudan", "mention_type": "GPE"} ], "ground_truth_links": [ {"enwiki": "Sudan","wikidata": "Q1049"}, {"enwiki": "South_Sudan", "wikidata": "Q958"} ] } Code https://gitlab.isl.ics.forth.gr/debatelab/elwiki_events_benchmark Acknowledgments This work has received funding from the Hellenic Foundation for Research and Innovation (HFRI) and the General Secretariat for Research and Technology (GSRT), under grant agreement No 4195.
Benchmarking, Named Entity Linking, Named Entity Recognition
Benchmarking, Named Entity Linking, Named Entity Recognition
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 30 | |
| downloads | 1 |

Views provided by UsageCounts
Downloads provided by UsageCounts