C, Lambda Calculus and Compiler Verification - a study in Haskell of purely-functional techniques for a formal specification of imperative programming languages and an epistemically-sound verification of their compilers

Formal verification of a compiler is a long-standing problem in computer science and, although recent years have seen substantial achievements in the area, most of the proposed solutions do not scale very well with the complexity of modern software development environments. In this thesis, I present a formal semantic model of the popular C programming language described in the ANSI/ISO/IEC Standard 9899:1990, in the form of a mapping of C programs to computable functions expressed in a suitable variant of lambda calculus. The specification is formulated in a highly readable functional style and is accompanied by a complete Haskell implementation of the compiler, covering all aspects of the translation from a parse tree of a C program down to the actual sequence of executable machine instructions, resolving issues of separate compilation, allowing for optimising program transformations and providing rigorous guarantees of the implementation's conformance to a formal definition of the language. In order to achieve these goals, I revisit the challenge of compiler verification from its very philosophical foundations. Beginning with the basic epistemic notions of knowledge, correctness and proof, I show that a fully rigorous solution of the problem requires a constructive formulation of the correctness criteria in terms of the translation process itself, in contrast with the more popular extensional approaches to compiler verification, in which correctness is generally defined as commutativity of the system with respect to a formal semantics of the source and target languages, effectively formalising various aspects of the compiler independently of each other and separately from its eventual implementation. I argue that a satisfactory judgement of correctness should always constitute a direct formal description of the job performed by the software being judged, instead of an axiomatic definition of some abstract property such as commutativity of the translation system, since the later approach fails to establish a crucial causal connection between a judgement of correctness and a knowledge of it. The primary contribution of this thesis is the new notion of linear correctness, which strives to provide a constructive formulation of a compiler's validity criteria by deriving its judgement directly from a formal description of the language translation process itself. The approach relies crucially on a denotational semantic model of the source and target languages, in which the domains of program meanings are unified with the actual intermediate program representation of the underlying compiler implementation. By defining the concepts of a programming language, compiler and compiler correctness in category theoretic terms, I show that every linearly correct compiler is also valid in the more traditional sense of the word. Further, by presenting a complete verified translation of the standard C programming language within this framework, I demonstrate that linear correctness is a highly effective approach to the problem of compiler verification and that it scales particularly well with the complexity of modern software development environments.

Country

Australia

Related Organizations

UNSW Sydney
Australia

Keywords

Purely functional intermediate representations, C, Haskell, Language semantics, Compiler verification, Programming languages, ANSI/ISO/IEC 9899:1990, 004, Lambda calculus

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Related to Research communities

north-american-studies