[llvm-commits] RFC: initial union syntax support

Thu May 21 08:08:40 PDT 2009

Hi Nick,

>>>>> Then it would seem I misunderstood the purpose of unions. I thought 
>>>>> the problem was that it was impossible to declare a type which 
>>>>> would be "as large as the largest of any of these" without having 
>>>>> accurate TargetData. The union type was supposed to do that and 
>>>>> nothing more.
>>>>
>>>> I sent an example earlier showing that you can do this already without
>>>> union types.
>>>
>>> Close. Your trick does perform a ptrtoint which requires knowing what
>>> int size is large enough. Fortunately in your case it's indexing off of
>>> null so it's very unlikely that it won't fit in 16 bits or less, but
>>> it's still not as good as a first-class union.
>>
>> if the union is bigger than accessible memory, then you are going to
>> be able to allocate one anyway.  Conclusion: doing GEP of null can be
>> assumed to not result in pointer overflow.  Thus the only problem is
>> ptrtoint.  You can do ptrtoint to i64 on all platforms, which solves
>> the problem since you can't alloca an amount that doesn't fit in i64
>> anyway.  That said, this whole technique is pretty ugly.
> 
> Sure.
> 
>>> I've been thinking about the original suggestion and the reasons I 
>>> objected to it. It seems that the original suggest was to think about 
>>> a union as a structure where the offset into each element is zero 
>>> instead of being contiguous to each other. That makes the original 
>>> proposal make a whole lot more sense to me than it did originally.
>>>
>>> Despite Chris' message to the contrary, I still think u{i32, i32} 
>>> shouldn't be allowed (rather, it should be folded to u{i32} by the 
>>> getter). We could provide an accessor that returns the element number 
>>> for a given Type* and the only drawback is that it means doing an 
>>> extra lookup through a small list. Allowing GEP makes sense, and 
>>> unions should certainly be first class aggregates.
>>
>> another possibility is to not introduce new union types, but instead
>> to enhance the alloca instruction to take a list of types : it would
>> then allocate enough memory for all of the types in the list.  The
>> return type could be that of the first type in the list.
> 
> That's only good for stack variables. It doesn't work for globals.

good point.  How about allowing the "align" parameter for alloca's and
globals to be a ConstantExpr of integer type, rather than a ConstantInt?
(If it doesn't resolve to a ConstantInt at codegen time, when the target
is known, then codegen can abort).  As I pointed out in a previous mail,
you can get the size of a "union type" in a target independent way as a
ConstantExpr, by taking the size of each type (using the GEP trick) and
calculating the maximum using constant expressions.  You can also get
the alignment as a constant expression using a variant of the trick (I
explained how in yet another email).  So the only thing currently
stopping you declaring locals or globals of a "union type" in a target
independent is the fact that "align" is required to be a ConstantInt.
I think this approach would be much easier to implement than a new
"union type:.

Ciao,

Duncan.