[LLVMdev] Argument Lowering Redux

Wed Aug 6 01:05:04 PDT 2014

On 4 Aug 2014, at 20:27, Vadim Chugunov <vadimcn at gmail.com> wrote:

> Couldn't LLVM provide an early IR transform pass that lowers "high-level" argument definitions into the current target-dependent form, converting by-value structs into sret arguments as needed?   It seems to me that, at least for structs, all information that such a pass would require, is representable in the current LLVM IR.  Of course, under this proposal, unions would need to be re-introduced into IR in some form (perhaps as structs tagged with a "union" flag?).  However, if they are immediately lowered into structs, the rest of optimization pipeline would not need to change.

This is the approach taken by WHIRL (which is well worth studying by anyone looking at how to design a compiler IR - it's not perfect, but does have some nice ideas).  The front end for Pro64-derived compilers generates a very high-level representation, containing C types and C-like flow control structures (and some things like multiple entry points for Fortran).  This is then progressively lowered towards something that looks more like assembly.  Different optimisations are run at different layers.

There are a few problems with adopting this as-is for LLVM without some significant changes:

- The LLVM IR type system is not rich enough to express unions or the difference between _Complex float and struct {float i,r;} in C (for example).

- The IR modification infrastructure is not set up to make it easy to change the type of a value.  Changing the type signature of a function requires creating a new function and then copying all of the instructions into it.  This would be very expensive.

The LLVM model has a notion of canonical forms, which are fragile undocumented implicit contracts between producers and consumers of IR.  These serve roughly the same purpose as the layers in WHRIL.  This specifies (used in the loosest possible sense of the word) the IR that the back end expects to correspond to particular C types and, unfortunately, knowledge of this is very leaky and ends up having to permeate the entire optimisation stack.  

David