research data . Dataset . 2017

Webis-TLDR-17 Corpus

Syed, Shahbaz; Voelske, Michael; Potthast, Martin; Stein, Benno;
Open Access English
  • Published: 07 Nov 2017
  • Publisher: Zenodo
Abstract
This corpus contains preprocessed posts from the Reddit dataset, suitable for abstractive summarization using deep learning. The format is a json file where each line is a JSON object representing a post. The schema of each post is shown below: author: string (nullable = true) body: string (nullable = true) normalizedBody: string (nullable = true) content: string (nullable = true) content_len: long (nullable = true) summary: string (nullable = true) summary_len: long (nullable = true) id: string (nullable = true) subreddit: string (nullable = true) subreddit_id: string (nullable = true) title: string (nullable = true) Specifically, the content and summary fiel...
Subjects
free text keywords: tl;dr, Abstractive Summarization, Social Media Dataset
Download fromView all 2 versions
Open Access
Zenodo
Dataset . 2017
Provider: Datacite
Open Access
Zenodo
Dataset . 2017
Provider: Zenodo
Open Access
Zenodo
Dataset . 2017
Provider: Datacite
Any information missing or wrong?Report an Issue