[llvm-dev] What was the IR made for precisely?

Tue Nov 1 04:31:05 PDT 2016

On 28 Oct 2016, at 21:25, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> ----- Original Message -----
>> From: "Chris Lattner via llvm-dev" <llvm-dev at lists.llvm.org>
>> To: "David Chisnall" <David.Chisnall at cl.cam.ac.uk>
>> Cc: llvm-dev at lists.llvm.org, "ジョウェットジェームス" <b3i4zz1gu1 at docomo.ne.jp>
>> Sent: Friday, October 28, 2016 2:13:06 PM
>> Subject: Re: [llvm-dev] What was the IR made for precisely?
>> 
>> 
>>> On Oct 28, 2016, at 1:21 AM, David Chisnall
>>> <David.Chisnall at cl.cam.ac.uk> wrote:
>>> 
>>> On 28 Oct 2016, at 02:43, ジョウェットジェームス <b3i4zz1gu1 at docomo.ne.jp>
>>> wrote:
>>>> 
>>>> I would need to sum up all the rules and ABIs and sizes for all
>>>> the targets I need and generate different IR for each, am I
>>>> correct?
>>> 
>>> This is a long-known limitation of LLVM IR and there are a lot of
>>> proposals to fix it.  It would be great if the LLVM Foundation
>>> would fund someone to do the work, as it isn’t a sufficiently high
>>> priority for any of the large LLVM consumers and would make a huge
>>> difference to the utility of LLVM for a lot of people.
>> …
>>> I think it would be difficult to do it within the timescale of the
>>> GSoC unless the student was already an experienced LLVM developer.
>>> It would likely involve designing some good APIs (difficult!),
>>> refactoring a bunch of Clang code, and creating a new LLVM
>>> library.  I’ve not seen a GSoC project on this scale succeed in
>>> any of the open source projects that I’ve been involved with.  If
>>> we had a good design doc and a couple of engaged mentors then it
>>> might stand a chance.
>> 
>> Is there a specific design that you think would work?  One of the
>> major problems with this sort of proposal is that you need the
>> entire clang type system to do this, which means it depends on a
>> huge chunk of the Clang AST.  At that point, this isn’t a small
>> library that clang uses, this is a library layered on top of Clang
>> itself.
> 
> Given that ABIs are defined in terms of C (and sometimes now C++) language constructs, I think that something like this is the best of all bad options. Really, however, it depends only on the AST and CodeGen, and maybe those (along with 'Basic', etc.) could be made into a separately-compilable library. Along with an easy ASTBuilder for C types and function declarations we should be able to satisfy this use case.

Indeed.  Today, I can go and get the MIPS, ARM, x86-64, or whatever ABI specification and it defines how all of the C types map to in-memory types and where arguments go.  We currently have no standard for how any of this is represented in IR, and I have to look at what clang generates if I want to generate C-compatible IR (and this is not stable over time - the contract between clang and the x86 back end has changed at least once that I remember).  The minimum that you need to be able to usefully interoperate with C is:

- The ability to map each of the C types (int, long, float, double) to the corresponding LLVM type.

- The ability to generate an LLVM struct that corresponds to a particular C struct (including loads and stores from struct members)

- The ability to construct functions that have a C API signature and call functions that have such a signature.

We’ve discussed possible APIs for this in the Cambridge LLVM Socials a couple of times.  I think that the best proposal was along the following lines:

A new CABIBuilder that handles constructing C ABI elements.  This would have the primitive C types as static fields and would allow you to construct a C struct type by passing C types (primitives or other structs, optionally with array sizes).  From this it would construct an LLVM struct and provide IRBuilder-like methods for constructing GEPs to specific fields (and probably loads and stores to bitfields).

The same approach would be used for functions and calls.  Once you’ve built the CFunctionType from C structs and primitives for arguments, you would have an analogue of IRBuilder’s CreateCall / CreateInvoke that would take the IR types that correspond to the C types and marshal them correctly.

On the other side of the call (constructing a C ABI function by passing a set of C types to the builder), you’d get an LLVM Function that took the arguments in whatever LLVM expects and then stores them into Allocas, which would be returned to the callee, so the front-end author would never need to look at the explicit parameters.

You’d need a small subset of Clang’s AST for this (none of the stuff for builtins, nothing for C++ / Objective-C, and so on) and several of the bits of CodeGen (in particular, CGTargetInfo contains a bunch of stuff that really should be in LLVM, for example with respect to variadics).  It’s a big bit of refactoring work, and a lot of it would probably need to end up duplicated in both clang and LLVM (though it should be easy to automate the testing).

Another alternative is to expose these APIs via from Clang itself, so if you need them then you will have to link clang’s Basic, AST and CodeGen libraries (which is only about 10MB in a release build and could be dynamically linked if they’re used by multiple things).  This approach would also make it easier to extend the interfaces to allow header parsing and C++ interop (which would be nice for using things like shared_ptr across FFI boundaries).

David