RepLab Summarization Dataset

RepLab Summarization Dataset This package contains the dataset generated in the research published in the paper: "Javier Rodríguez-Vidal, Jorge Carrillo-de-Albornoz, Enrique Amigó, Laura Plaza, Julio Gonzalo and Felisa Verdejo. 2019. Automatic Generation of Entity-Oriented Summaries for Reputation Management. Ambient Intelligence & Humanized Computing." The dataset is available for research purpose. If you use it, please, cite us. This README file contains: 1) A brief description of the corpus 2) A description of the contents of each directory in this package. 1. Description of RepLab Summarization Dataset The RepLab summarization dataset contains companies data from the RepLab 2013 dataset (http://nlp.uned.es/replab2013/), where users from Twitter talk about different topics of the companies. Each topic consists of a different number of tweets posted by Twitter users. The collection comprises tweets about 31 entities from two domains: automotive and banking. As a result, our subset of RepLab 2013 comprises 71,303 English and Spanish tweets For each entity, tweets are groupped in topics and for each topic three different summaries are manually generated: abstractive english, abstractive spanish and extractive. Please see the paper for further details. 2. Description of the contents of this package ./entities: This directory includes the information of each organization in order to create a summary. Each .xml file corresponds to an entity and includes the following information: -”Corpus entity”: Id of the entity. -”cluster”: each one of the topics of the entity. -"label": name of the topic. -"priority": level of relevance of the topic: Alert (the highest priority being a reputation alert, i.e., an issue that requires an immediate response from the entity), Midly_important (relevant for the entity, an intermediate priority) or unimportant (the lowest priority). -”tweet”: Information about the tweets. -"id": Id of the tweet. -"date": When the tweet was written. -"followers": Of the author of the tweet. -"polarity": Of the tweet. -"text": Text of the tweet. -"summary": Information about the summary: -"abstract_EN": Abstractive summary in English. -"abstract_ES": Abstractive summary in Spanish. -"tweet": Id of the tweet(s) selected for the extractive summary (if it is not filled, the extractive summary is the one of the tweets in the topic).

Related Organizations

Universidad Nacional de Costa Rica
Costa Rica

Keywords

Summarization, Twitter, Microblogs, Online Reputation Management, Search with diversity

EOSC Subjects

Twitter Data

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	106
download	downloads	20

106
views
20
downloads
Powered by

Found an issue? Give us feedback

visibility

download

0

Average

106

20