[LLVMdev] Suggestion: Support union types in IR

Thu May 7 23:10:09 PDT 2009

Talin wrote:
> Chris Lattner wrote:
>> On Dec 30, 2008, at 12:41 PM, Talin wrote:
>>   
>>> I've been thinking about how to represent unions or "disjoint types"  
>>> in LLVM IR. At the moment, the only way I know to achieve this right  
>>> now is to create a struct that is as large as the largest type in  
>>> the union and then bitcast it to access the fields contained within.  
>>> However, that requires that the frontend know the sizes of all of  
>>> the various low-level types (the "size_t" problem, which has been  
>>> discussed before), otherwise you get problems trying to mix pointer  
>>> and non-pointer types.
>>>     
>> That's an interesting point.  As others have pointed out, we've  
>> resisted having a union type because it isn't strictly needed for the  
>> current set of front-ends.  If a front-end is trying to generate  
>> target-independent IR though, I can see the utility.  The "gep trick"  
>> won't work for type generation.
>>
>>   
>>> It seems to me that adding a union type to the IR would be a logical  
>>> extension to the language. The syntax for declaring a union would be  
>>> similar to that of declaring a struct. To access a union member, you  
>>> would use GetElementPointer, just as if it were a struct. The only  
>>> difference is that in this case, the GEP doesn't actually modify the  
>>> address, it merely returns the input argument as a different type.  
>>> In all other ways, unions would be treated like structs, except that  
>>> the size of the union would always be the size of the largest  
>>> member, and all of the fields within the union would be located  
>>> located at relative offset zero.
>>>     
>> Yes, your proposal makes sense, for syntax, I'd suggest:  u{ i32, float}
>>
>>   
>>> Unions could of course be combined with other types:
>>>
>>>    {{int|float}, bool} *
>>>    n = getelementptr i32 0, i32 0, i32 1
>>>
>>> So in the above example, the GEP returns a pointer to the float field.
>>>     
>> I don't have a specific problem with adding this.  The cost of doing  
>> so is that it adds (a small amount of) complexity to a lot of places  
>> that walk the type graphs.  The only pass that I predict will be  
>> difficult to update to handle this is the BasicAA pass, which reasons  
>> about symbolic (not concrete) offsets and should return mustalias in  
>> the appropriate cases.  Also, to validate this, I think llvm-gcc  
>> should start generating this for C unions where possible.
>>
>> If you're interested in implementing this and seeing all the details  
>> of the implementation through to the end, I don't see significant  
>> problems.  I think adding a simple union type would make more sense  
>> than adding first-class support for a *discriminated* union.
>>   
> So, I spent a little bit of time looking at how to code this. Rather 
> than defining a brand-new LLVM type, I think the best approach is to 
> generalize the existing StructType. We can change the "isPacked" flag 
> (which is stored in the SubclassData of the type) to a 3-valued enum 
> called StructLayout, which has 3 members: UnpackedLayout, PackedLayout, 
> and UnionLayout.
> 
> After that, it's mostly a case of hunting down everywhere that calls 
> "isPacked", and adding the appropriate case. :)

Hi Talin, thanks for looking in to this.

I think there's already a lot of code that makes the assumption that the 
different members of a StructType represent distinct storage. Why is it 
easier to co-opt StructType than to create a new 'UnionType' under 
CompositeType? What's the tradeoff?

Nick