Input Repair via Synthesis and Lightweight Error Feedback

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2022Embargo end date: 01 Jan 2022Publisher:arXivJournal:CoRR, volume abs/2208.08235

Authors: Lukas Kirschner; Ezekiel O. Soremekun; Rahul Gopinath; Andreas Zeller;

doi: 10.48550/arxiv.2208.08235

arXiv: 2208.08235

Input Repair via Synthesis and Lightweight Error Feedback

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

Often times, input data may ostensibly conform to a given input format, but cannot be parsed by a conforming program, for instance, due to human error or data corruption. In such cases, a data engineer is tasked with input repair, i.e., she has to manually repair the corrupt data such that it follows a given format, and hence can be processed by the conforming program. Such manual repair can be time-consuming and error-prone. In particular, input repair is challenging without an input specification (e.g., input grammar) or program analysis. In this work, we show that incorporating lightweight failure feedback (e.g., input incompleteness) to parsers is sufficient to repair any corrupt input data with maximal closeness to the semantics of the input data. We propose an approach (called FSYNTH) that leverages lightweight error-feedback and input synthesis to repair invalid inputs. FSYNTH is grammar-agnostic, and it does not require program analysis. Given a conforming program, and any invalid input, FSYNTH provides a set of repairs prioritized by the distance of the repair from the original input. We evaluate FSYNTH on 806 (real-world) invalid inputs using four well-known input formats, namely INI, TinyC, SExp, and cJSON. In our evaluation, we found that FSYNTH recovers 91% of valid input data. FSYNTH is also highly effective and efficient in input repair: It repairs 77% of invalid inputs within four minutes. It is up to 35% more effective than DDMax, the previously best-known approach. Overall, our approach addresses several limitations of DDMax, both in terms of what it can repair, as well as in terms of the set of repairs offered.

Related Organizations

Unversity of Luxembourg
Luxembourg
University of Luxembourg
Luxembourg
Saarland University
Germany
CISPA Helmholtz Center for Information Security
Germany
University of Sydney
Australia

Keywords

Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Software Engineering, Computer Science - Programming Languages, D.2, Programming Languages (cs.PL)

3 Research products, page 1 of 1

Comparison of measured and computed portal dose for IMRT treatment
2006IsAmongTopNSimilarDocuments
Debugging inputs
2020IsAmongTopNSimilarDocuments
fsynth-artifact software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average