[LLVMdev] Some additions to the C bindings

Fri Oct 9 17:38:48 PDT 2009

On Fri, Oct 9, 2009 at 6:56 PM, Erick Tryzelaar
<idadesub at users.sourceforge.net> wrote:
>
> You're already doing something a bit more complicated than me :) This
> does seem a bit more advanced than what llvm-c is intended for,
> though. Is there a reason why you can't make a C++ library to do all
> this advanced stuff, and just expose some C hooks for your generated
> code?

I suppose not.  It seemed easier for me and advanageous for y'all for
me to get these functions added.  But I can ship my own bridge library
as part of my stdlib.

>
>
>> I'm using it to support renaming functions and still allowing
>> generated code to look up those functions by name; basically searching
>> for all global strings containing the function name, and replacing all
>> uses of them with uses of the new function name.
>>
>> I would like to do away with that, though, but I haven't quite managed
>> to get rid of all cases where LLVMGetNamedFunction is called by
>> generated code.
>>
>> Also, I've gotten the impression from other developers that the
>> C-bindings are considered incomplete and that there is a general
>> desire to expose more functionality, and eventually all LLVM
>> functionality, through them.
>
>
> While it's lacking in some areas, it's intentional that not all of
> llvm is exposed through llvm-c. I learned that after my patches to
> expose APInt/APFloat were turned down :) Llvm's a large object
> oriented project, and maintaining a mapping between the c and c++ api
> would be pretty challenging, especially since llvm promises to never
> remove anything from llvm-c until 3.0. In order to ease development,
> it's really designed to just provide the minimum interface for getting
> data into llvm. If you want to do something advanced like modify the
> bytecode, you really should be writing against the c++ api.

Then the assumptions under which I submitted the patch were wrong.  I
guess it does make sense to ship my own bridge library, then.
Actually, it might be better for me to compile it with llvm-gcc and
ship it as bitcode, come to think of it... one more place that the
optimizer can do its thing.

>
>
>> This supports the "address-of" operator.  Any Value that is a LoadInst
>> can have its address taken.  I need the pointer operand of the
>> LoadInst to get the address Value.
>>
>> I figured GetOperand was a good starting point, and could support most
>> of the operand use cases out there.
>
>
> I'm not sure if I understand. The load instruction takes an address as
> an argument and stores the value into a register, therefore you must
> already have the address already. Or am I misinterpreting what you're
> saying?

When I parse an expression, it gets turned into a Value and stored
away for further processing.  (Actually, it gets turned into calls
into LLVM for creating that Value object when the function generator
is run, but anyway...)  At that point, I don't keep separate track of
what went into the Value... I can examine the Value itself to get that
information, or do without it.

Any value that lives in memory is represented by a LoadInst from a
pointer to that memory.  To take the address, I get the pointer back
out of the LoadInst.  Anything that isn't a LoadInst cannot have its
address taken.  I end up with about the same rules that C and C++ have
for when an address can be taken.

>
>
>> When I've parsed an int literal and put it on my evaluation stack as a
>> Value, there's a case where I need to get it back as an int.
>> Specifically, the LLVMBuildExtractValue function requires an int, not
>> a Constant, to represent the member.  I believe that GEP does as well
>> when applied to a struct.
>
>
> GEP doesn't need to take a constant to work.
>
> %0 = alloca { i32, i32 }
> %1 = alloca i32
> store i32 0, %1
> %2 = load %1
> %3 = getelementptr { i32, i32 }*, i32 0, %2
> %4 = load %3
>
> extractvalue should only be used if you're using value arrays or
> structs, and you need to statically know the indexes. If you don't,
> then you really should be using GEPs and let the optimizations do
> their thing.

That works in most cases.  Perhaps it should be that way in all cases.
 I wanted to be able to work with struct values without having to
spill them first.  Not that it would make any real difference in the
optimized code.

>> In order to do away with include files, I'm supporting importing
>> modules in bitcode form.  To call a function from an imported module,
>> I need to put an external into the compiled module, and it really
>> ought to have the same function and argument attributes as the
>> original.  And I want to be able to do that while JITting at runtime
>> as well.
>
>
> If I understand correctly, why aren't the functions already marked
> external? If they aren't then an optimizer could theoretically
> optimize them away. It may also be more appropriate to pass the
> function information through some different channel by the frontend,
> rather than directly processing the bytecode. Anyone else have any
> experience with doing this?

The functions are marked external in the imported module.  But I must
create a matching declaration in the module I'm compiling in order to
create calls to them.

Functions that are not marked external are not imported.

Also, functions in the imported module can be JITted and called at
compile time.  Public function/type generators would be used
extensively this way, and would let you ship the equivalent of
template functions/classes in compiled form, something you *still*
can't do with most existing C++ compilers.

Anyway, consider the patch withdrawn (except for that one bit you
already committed).  Thank you for looking at it and telling me more
about the motivation behind the C-binding's current state.