[LLVMdev] Some additions to the C bindings

Fri Oct 9 16:56:42 PDT 2009

On Thu, Oct 8, 2009 at 5:20 AM, Kenneth Uildriks <kennethuil at gmail.com> wrote:
>
> Thanks.  Let me start by talking a bit about my project.
>
> I'm working on a compiler/language that supports run-time code
> generation and compile-time code execution.  Besides the obvious
> benefits of easier JITting, I also get the benefits of C++ templates
> and metaprogramming without all of the headaches.
>
> To make this work, the compiler actually compiles functions down into
> function generators, outputting calls to the LLVM C-bindings that
> generate a "regular" function.  The programmer can then either leave
> them in that form for run-time JITting, or have the compiler JIT and
> execute those function generators in order to get "regular" functions.
>  Either or both can be exposed as public functions and left in place
> by the optimizer.  The function generator gets its own set of
> parameters, and multiple functions with variations can be generated at
> compile time or runtime.
>
> He can also put compile-time expressions inside the body of functions,
> so that when the function generator runs, the compile-time expressions
> are evaluated and used for function generation.  Those compile-time
> expressions can use global variables and/or the function generator
> parameters..
>
> Anyway, this scheme means that extensive LLVM capability needs to be
> available to generated code, since it's the generated code that
> creates all of the "regular" functions.  Generated code has a much
> easier time calling the C bindings than the C++ API.

You're already doing something a bit more complicated than me :) This
does seem a bit more advanced than what llvm-c is intended for,
though. Is there a reason why you can't make a C++ library to do all
this advanced stuff, and just expose some C hooks for your generated
code?

> I'm using it to support renaming functions and still allowing
> generated code to look up those functions by name; basically searching
> for all global strings containing the function name, and replacing all
> uses of them with uses of the new function name.
>
> I would like to do away with that, though, but I haven't quite managed
> to get rid of all cases where LLVMGetNamedFunction is called by
> generated code.
>
> Also, I've gotten the impression from other developers that the
> C-bindings are considered incomplete and that there is a general
> desire to expose more functionality, and eventually all LLVM
> functionality, through them.

While it's lacking in some areas, it's intentional that not all of
llvm is exposed through llvm-c. I learned that after my patches to
expose APInt/APFloat were turned down :) Llvm's a large object
oriented project, and maintaining a mapping between the c and c++ api
would be pretty challenging, especially since llvm promises to never
remove anything from llvm-c until 3.0. In order to ease development,
it's really designed to just provide the minimum interface for getting
data into llvm. If you want to do something advanced like modify the
bytecode, you really should be writing against the c++ api.

> This supports the "address-of" operator.  Any Value that is a LoadInst
> can have its address taken.  I need the pointer operand of the
> LoadInst to get the address Value.
>
> I figured GetOperand was a good starting point, and could support most
> of the operand use cases out there.

I'm not sure if I understand. The load instruction takes an address as
an argument and stores the value into a register, therefore you must
already have the address already. Or am I misinterpreting what you're
saying?

> When I've parsed an int literal and put it on my evaluation stack as a
> Value, there's a case where I need to get it back as an int.
> Specifically, the LLVMBuildExtractValue function requires an int, not
> a Constant, to represent the member.  I believe that GEP does as well
> when applied to a struct.

GEP doesn't need to take a constant to work.

%0 = alloca { i32, i32 }
%1 = alloca i32
store i32 0, %1
%2 = load %1
%3 = getelementptr { i32, i32 }*, i32 0, %2
%4 = load %3

extractvalue should only be used if you're using value arrays or
structs, and you need to statically know the indexes. If you don't,
then you really should be using GEPs and let the optimizations do
their thing.

> In order to do away with include files, I'm supporting importing
> modules in bitcode form.  To call a function from an imported module,
> I need to put an external into the compiled module, and it really
> ought to have the same function and argument attributes as the
> original.  And I want to be able to do that while JITting at runtime
> as well.

If I understand correctly, why aren't the functions already marked
external? If they aren't then an optimizer could theoretically
optimize them away. It may also be more appropriate to pass the
function information through some different channel by the frontend,
rather than directly processing the bytecode. Anyone else have any
experience with doing this?