Scalable Micro-planned Generation of Discourse from Structured Data

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2020Embargo end date: 01 Jan 2018 English Publisher:MIT Press - JournalsJournal:Computational Linguistics, volume 45, pages 737-763 (issn: 0891-2017, eissn: 1530-9312,

Copyright policy )

Authors: Laha, Anirban; Jain, Parag; Mishra, Abhijit; Sankaranarayanan, Karthik;

doi: 10.1162/coli_a_00363 , 10.48550/arxiv.1810.02889

arXiv: 1810.02889

Scalable Micro-planned Generation of Discourse from Structured Data

- Summary
- Subjects
- Metrics

Abstract

We present a framework for generating natural language description from structured data such as tables; the problem comes under the category of data-to-text natural language generation (NLG). Modern data-to-text NLG systems typically use end-to-end statistical and neural architectures that learn from a limited amount of task-specific labeled data, and therefore exhibit limited scalability, domain-adaptability, and interpretability. Unlike these systems, ours is a modular, pipeline-based approach, and does not require task-specific parallel data. Rather, it relies on monolingual corpora and basic off-the-shelf NLP tools. This makes our system more scalable and easily adaptable to newer domains. Our system utilizes a three-staged pipeline that: (i) converts entries in the structured data to canonical form, (ii) generates simple sentences for each atomic entry in the canonicalized representation, and (iii) combines the sentences to produce a coherent, fluent, and adequate paragraph description through sentence compounding and co-reference replacement modules. Experiments on a benchmark mixed-domain data set curated for paragraph description from tables reveals the superiority of our system over existing data-to-text approaches. We also demonstrate the robustness of our system in accepting other popular data sets covering diverse data types such as knowledge graphs and key-value maps.

Related Organizations

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Computation and Language, Computational linguistics. Natural language processing, P98-98.5, Computation and Language (cs.CL)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	9
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%