[LLVMdev] Suggestion: Support union types in IR

Chris Lattner clattner at apple.com
Wed Dec 31 12:41:59 PST 2008


On Dec 30, 2008, at 12:41 PM, Talin wrote:
> I've been thinking about how to represent unions or "disjoint types"  
> in LLVM IR. At the moment, the only way I know to achieve this right  
> now is to create a struct that is as large as the largest type in  
> the union and then bitcast it to access the fields contained within.  
> However, that requires that the frontend know the sizes of all of  
> the various low-level types (the "size_t" problem, which has been  
> discussed before), otherwise you get problems trying to mix pointer  
> and non-pointer types.

That's an interesting point.  As others have pointed out, we've  
resisted having a union type because it isn't strictly needed for the  
current set of front-ends.  If a front-end is trying to generate  
target-independent IR though, I can see the utility.  The "gep trick"  
won't work for type generation.

> It seems to me that adding a union type to the IR would be a logical  
> extension to the language. The syntax for declaring a union would be  
> similar to that of declaring a struct. To access a union member, you  
> would use GetElementPointer, just as if it were a struct. The only  
> difference is that in this case, the GEP doesn't actually modify the  
> address, it merely returns the input argument as a different type.  
> In all other ways, unions would be treated like structs, except that  
> the size of the union would always be the size of the largest  
> member, and all of the fields within the union would be located  
> located at relative offset zero.

Yes, your proposal makes sense, for syntax, I'd suggest:  u{ i32, float}

> Unions could of course be combined with other types:
>
>    {{int|float}, bool} *
>    n = getelementptr i32 0, i32 0, i32 1
>
> So in the above example, the GEP returns a pointer to the float field.

I don't have a specific problem with adding this.  The cost of doing  
so is that it adds (a small amount of) complexity to a lot of places  
that walk the type graphs.  The only pass that I predict will be  
difficult to update to handle this is the BasicAA pass, which reasons  
about symbolic (not concrete) offsets and should return mustalias in  
the appropriate cases.  Also, to validate this, I think llvm-gcc  
should start generating this for C unions where possible.

If you're interested in implementing this and seeing all the details  
of the implementation through to the end, I don't see significant  
problems.  I think adding a simple union type would make more sense  
than adding first-class support for a *discriminated* union.

-Chris



More information about the llvm-dev mailing list