Downloads provided by UsageCounts
This dataset contains pairs of news articles drawn from the first half of 2020 and annotated for seven aspects of similarity: GEO: How similar is the geographic focus (places, cities, countries, etc.) of the two articles? ENT: How similar are the named entities (e.g., people, companies, organizations, products, named living beings), excluding previously considered locations appearing in the two articles? TIME Are the two articles relevant to similar time periods or describing similar time periods? NAR How similar are the narrative schemas presented in the two articles? OVERALL Overall, are the two articles covering the same substantive news story? (excluding style, framing, and tone) STYLE Do the articles have similar writing styles? TONE Do the articles have similar tones? Further details are provided in Chen et al. (2022). SemEval-2022 Task 8: Multilingual news article similarity. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). https://aclanthology.org/2022.semeval-1.155/ The data in this repository includes pairs of URLs and annotations. The text of webpages is generally via the Internet Archive in this special collection: https://archive.org/details/2020-multilingual-news-article-similarity . A script to download and process the webpages is available at https://github.com/euagendas/semeval_8_2022_ia_downloader .
This research has received funding through the Volkswagen Foundation. We thank Media Cloud for access to increased API volume. We thank the Internet Archive which made it possible for participants to all have access to the same data. We are deeply grateful to the annotators and task participants: thank you. The webpages annotated in this dataset are mostly available via the Internet Archive in this special collection: https://archive.org/details/2020-multilingual-news-article-similarity
{"references": ["Chen et al. (2022). SemEval-2022 Task 8: Multilingual news article similarity. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)"]}
semantic, news, multilingual, similarity, NLP, SemEval
semantic, news, multilingual, similarity, NLP, SemEval
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 58 | |
| downloads | 86 |

Views provided by UsageCounts
Downloads provided by UsageCounts