[LLVMdev] code generation order revisited.

Thu Jun 12 09:52:00 PDT 2008

On Jun 12, 2008, at 11:38, Hendrik Boom wrote:

> On Tue, 06 May 2008 16:06:35 -0400, Gordon Henriksen wrote:
>
>> On 2008-05-06, at 13:42, Hendrik Boom wrote:
>>
>>> One more question. I hope you're not getting tired of me already.  
>>> Does generating LLVM code have to proceed in any particular order?
>>>
>>> Of course, if I am writing LLVM assembler by appending characters  
>>> to the end of a sequential file, I'd have to write everything in  
>>> the order prescribed by the assembler syntax.
>>>
>>> But if I'm using the C interface to build an LLVM parse tree, does  
>>> that have to be in any particular time-order?  Can I, for example,  
>>> define a few functions, start scattering code into them, decide I'  
>>> like to declare some more local variables in one of them, generate  
>>> code for another, return to the first one and stick in a new basic  
>>> block at its start, discover I should have declared some more  
>>> global variables, and so forth?
>>>
>>> That could be very convenient.
>>
>> Yes, you can absolutely do this.
>
> I think I may have found an exception to this -- the API seems to  
> require me to have all the fields for a struct ready before I  
> construct the struct.  I don't have the ability to make a struct  
> type, use it to declare some variables, and still contribute fields  
> to it during the rest of the compilation.
>
> Is there a reason for this limitation other than no one thinking of  
> it? Does it need to have all the type information early in building  
> the parser tree?  I can't really imagine that.  I for one could do  
> without this limitation.

You really can't do this since LLVM types are shape isomorphic.  
Observe what happens to the types of @x and @y:

     gordon$ cat input.ll
     %xty = type {i32}
     %yty = type {i32}
     @x = external constant %xty
     @y = external constant %yty

     gordon$ llvm-as < input.ll | llvm-dis
     ; ModuleID = '<stdin>'
             %xty = type { i32 }
             %yty = type { i32 }
     @x = external constant %xty ; <%xty*> [#uses=0]
     @y = external constant %xty ; <%xty*> [#uses=0]

(This is not a side-effect of llvm-as or llvm-dis, but a fundamental  
property of the LLVM 'Type' class.)

The only type that is not shape-isomorphic is 'opaque'. Each mention  
of 'opaque' in LLVM IR is a distinct type:

     gordon$ cat input2.ll
     %xty = type opaque
     %yty = type opaque
     @x = external constant %xty
     @y = external constant %yty

     gordon$ llvm-as < input2.ll | llvm-dis
     ; ModuleID = '<stdin>'
         %xty = type opaque
         %yty = type opaque
     @x = external constant %xty ; <%xty*> [#uses=0]
     @y = external constant %yty ; <%yty*> [#uses=0]

> I won't even ask to be able to contribute more fields at link time,  
> though that would be useful, too.  Such link-time-assembled  
> structures ould resemble the DXD dummy control sections sections  
> that PL/1 used on OS/360.

This is absolutely possible:

     @Type.field.offs = external constant i32
...
     %Type.field.offs = load i32* @Type.field.offs
     %obj.start = bitcast %object* %obj to i8*
     %obj.field = getelementptr i8* %obj.start, i32 0, i32  
%Type.field.offs
     %field.ptr = bitcast %obj.field to %field*
     %field.val = load %field* %field.ptr

This is completely analogous to opaque data types in C. You can use  
any of the following techniques:

     typedef struct OpaqueFoo *FooRef;   /* like %object = type opaque  
in LLVM */

     typedef void *FooRef;               /* like %object = type i8 in  
LLVM */

     typedef struct {
       struct Vtable *VT;
     } Base;
     typedef Base *FooRef;               /* like %object = type  
{ %vtable* } in LLVM */

— Gordon