publication . Preprint . 2020

Equivalence of Dataflow Graphs via Rewrite Rules Using a Graph-to-Sequence Neural Model

Kommrusch, Steve; Barollet, Théo; Pouchet, Louis-Noël;
Open Access English
  • Published: 17 Feb 2020
Abstract
In this work we target the problem of provably computing the equivalence between two programs represented as dataflow graphs. To this end, we formalize the problem of equivalence between two programs as finding a set of semantics-preserving rewrite rules from one into the other, such that after the rewrite the two programs are structurally identical, and therefore trivially equivalent. We then develop the first graph-to-sequence neural network system for program equivalence, trained to produce such rewrite sequences from a carefully crafted automatic example generation algorithm. We extensively evaluate our system on a rich multi-type linear algebra expression l...
Subjects
arXiv: Computer Science::Programming Languages
free text keywords: Computer Science - Machine Learning, Computer Science - Formal Languages and Automata Theory, Statistics - Machine Learning
Download from
54 references, page 1 of 4

[1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jefrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).

[2] Umair Z Ahmed, Pawan Kumar, Amey Karkare, Purushottam Kar, and Sumit Gulwani. 2018. Compilation error repair: for the student programs, from the student programs. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering Education and Training. ACM, 78-87.

[3] Christophe Alias and Denis Barthou. 2004. On the recognition of algorithm templates. Electronic Notes in Theoretical Computer Science 82, 2 (2004), 395-409.

[4] Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A Survey of Machine Learning for Big Code and Naturalness. ACM Comput. Surv. 51, 4, Article 81 (July 2018), 37 pages. https://doi.org/10.1145/3212695 [OpenAIRE]

[5] Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. https://openreview.net/forum?id=BJOFETxR-

[6] Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. Code2Vec: Learning Distributed Representations of Code. Proc. ACM Program. Lang. 3, POPL, Article 40 (Jan. 2019), 29 pages. https: //doi.org/10.1145/3290353 [OpenAIRE]

[7] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

[8] Wenlei Bao, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, and Ponnuswamy Sadayappan. 2016. Polycheck: Dynamic verification of iteration space transformations on afine programs. In ACM SIGPLAN Notices, Vol. 51. ACM, 539-554.

[9] Denis Barthou, Paul Feautrier, and Xavier Redon. 2002. On the equivalence of two systems of afine recurrence equations. In Euro-Par 2002 Parallel Processing.

[10] Rohan Bavishi, Michael Pradel, and Koushik Sen. 2017. Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts. http://tubiblio.ulb.tu-darmstadt.de/101419/

[11] Daniel Beck, Gholamreza Hafari, and Trevor Cohn. 2018. Graph-toSequence Learning using Gated Graph Neural Networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 273-283. http://aclweb.org/anthology/P18-1026

[12] Yves Bertot and Pierre Castéran. 2013. Interactive theorem proving and program development: CoqâĂŹArt: the calculus of inductive constructions. Springer Science & Business Media. [OpenAIRE]

[13] Pavol Bielik, Veselin Raychev, and Martin Vechev. 2016. PHOG: Probabilistic Model for Code. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research), Maria Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. PMLR, New York, New York, USA, 2933-2942. http://proceedings.mlr.press/ v48/bielik16.pdf [OpenAIRE]

[14] Joseph Tobin Buck and Edward A Lee. 1993. Scheduling dynamic dataflow graphs with bounded memory using the token flow model. In 1993 IEEE international conference on acoustics, speech, and signal processing, Vol. 1. IEEE, 429-432.

[15] Jerry R Burch, Edmund M Clarke, Kenneth L McMillan, David L Dill, and Lain-Jinn Hwang. 1992. Symbolic model checking: 1020 states and beyond. Information and computation 98, 2 (1992), 142-170.

54 references, page 1 of 4
Abstract
In this work we target the problem of provably computing the equivalence between two programs represented as dataflow graphs. To this end, we formalize the problem of equivalence between two programs as finding a set of semantics-preserving rewrite rules from one into the other, such that after the rewrite the two programs are structurally identical, and therefore trivially equivalent. We then develop the first graph-to-sequence neural network system for program equivalence, trained to produce such rewrite sequences from a carefully crafted automatic example generation algorithm. We extensively evaluate our system on a rich multi-type linear algebra expression l...
Subjects
arXiv: Computer Science::Programming Languages
free text keywords: Computer Science - Machine Learning, Computer Science - Formal Languages and Automata Theory, Statistics - Machine Learning
Download from
54 references, page 1 of 4

[1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jefrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).

[2] Umair Z Ahmed, Pawan Kumar, Amey Karkare, Purushottam Kar, and Sumit Gulwani. 2018. Compilation error repair: for the student programs, from the student programs. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering Education and Training. ACM, 78-87.

[3] Christophe Alias and Denis Barthou. 2004. On the recognition of algorithm templates. Electronic Notes in Theoretical Computer Science 82, 2 (2004), 395-409.

[4] Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A Survey of Machine Learning for Big Code and Naturalness. ACM Comput. Surv. 51, 4, Article 81 (July 2018), 37 pages. https://doi.org/10.1145/3212695 [OpenAIRE]

[5] Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. https://openreview.net/forum?id=BJOFETxR-

[6] Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. Code2Vec: Learning Distributed Representations of Code. Proc. ACM Program. Lang. 3, POPL, Article 40 (Jan. 2019), 29 pages. https: //doi.org/10.1145/3290353 [OpenAIRE]

[7] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

[8] Wenlei Bao, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, and Ponnuswamy Sadayappan. 2016. Polycheck: Dynamic verification of iteration space transformations on afine programs. In ACM SIGPLAN Notices, Vol. 51. ACM, 539-554.

[9] Denis Barthou, Paul Feautrier, and Xavier Redon. 2002. On the equivalence of two systems of afine recurrence equations. In Euro-Par 2002 Parallel Processing.

[10] Rohan Bavishi, Michael Pradel, and Koushik Sen. 2017. Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts. http://tubiblio.ulb.tu-darmstadt.de/101419/

[11] Daniel Beck, Gholamreza Hafari, and Trevor Cohn. 2018. Graph-toSequence Learning using Gated Graph Neural Networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 273-283. http://aclweb.org/anthology/P18-1026

[12] Yves Bertot and Pierre Castéran. 2013. Interactive theorem proving and program development: CoqâĂŹArt: the calculus of inductive constructions. Springer Science & Business Media. [OpenAIRE]

[13] Pavol Bielik, Veselin Raychev, and Martin Vechev. 2016. PHOG: Probabilistic Model for Code. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research), Maria Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. PMLR, New York, New York, USA, 2933-2942. http://proceedings.mlr.press/ v48/bielik16.pdf [OpenAIRE]

[14] Joseph Tobin Buck and Edward A Lee. 1993. Scheduling dynamic dataflow graphs with bounded memory using the token flow model. In 1993 IEEE international conference on acoustics, speech, and signal processing, Vol. 1. IEEE, 429-432.

[15] Jerry R Burch, Edmund M Clarke, Kenneth L McMillan, David L Dill, and Lain-Jinn Hwang. 1992. Symbolic model checking: 1020 states and beyond. Information and computation 98, 2 (1992), 142-170.

54 references, page 1 of 4
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue