[llvm-commits] RFC: initial union syntax support

Thu May 21 12:56:51 PDT 2009

On Thu, May 21, 2009 at 10:08 AM, Duncan Sands <baldrick at free.fr> wrote:
> Hi Nick,
>
>>>>>> Then it would seem I misunderstood the purpose of unions. I thought
>>>>>> the problem was that it was impossible to declare a type which
>>>>>> would be "as large as the largest of any of these" without having
>>>>>> accurate TargetData. The union type was supposed to do that and
>>>>>> nothing more.
>>>>>
>>>>> I sent an example earlier showing that you can do this already without
>>>>> union types.
>>>>
>>>> Close. Your trick does perform a ptrtoint which requires knowing what
>>>> int size is large enough. Fortunately in your case it's indexing off of
>>>> null so it's very unlikely that it won't fit in 16 bits or less, but
>>>> it's still not as good as a first-class union.
>>>
>>> if the union is bigger than accessible memory, then you are going to
>>> be able to allocate one anyway.  Conclusion: doing GEP of null can be
>>> assumed to not result in pointer overflow.  Thus the only problem is
>>> ptrtoint.  You can do ptrtoint to i64 on all platforms, which solves
>>> the problem since you can't alloca an amount that doesn't fit in i64
>>> anyway.  That said, this whole technique is pretty ugly.
>>
>> Sure.
>>
>>>> I've been thinking about the original suggestion and the reasons I
>>>> objected to it. It seems that the original suggest was to think about
>>>> a union as a structure where the offset into each element is zero
>>>> instead of being contiguous to each other. That makes the original
>>>> proposal make a whole lot more sense to me than it did originally.
>>>>
>>>> Despite Chris' message to the contrary, I still think u{i32, i32}
>>>> shouldn't be allowed (rather, it should be folded to u{i32} by the
>>>> getter). We could provide an accessor that returns the element number
>>>> for a given Type* and the only drawback is that it means doing an
>>>> extra lookup through a small list. Allowing GEP makes sense, and
>>>> unions should certainly be first class aggregates.
>>>
>>> another possibility is to not introduce new union types, but instead
>>> to enhance the alloca instruction to take a list of types : it would
>>> then allocate enough memory for all of the types in the list.  The
>>> return type could be that of the first type in the list.
>>
>> That's only good for stack variables. It doesn't work for globals.
>
> good point.  How about allowing the "align" parameter for alloca's and
> globals to be a ConstantExpr of integer type, rather than a ConstantInt?
> (If it doesn't resolve to a ConstantInt at codegen time, when the target
> is known, then codegen can abort).  As I pointed out in a previous mail,
> you can get the size of a "union type" in a target independent way as a
> ConstantExpr, by taking the size of each type (using the GEP trick) and
> calculating the maximum using constant expressions.  You can also get
> the alignment as a constant expression using a variant of the trick (I
> explained how in yet another email).  So the only thing currently
> stopping you declaring locals or globals of a "union type" in a target
> independent is the fact that "align" is required to be a ConstantInt.
> I think this approach would be much easier to implement than a new
> "union type:.

This works in some cases, but I think the key consideration is how
easy it is to reason about types and memory in analysis/optimizations.
 All of these already have to handle zero-length struct elements,
which meant almost nothing needed adjusting for unions.  A union type
makes it very explicit what is going on.  Lists of sizes and
alignments and bitcasts all over (even just to get said size and
alignment (and this would require a constantExpr for alignof(type))
seems to just be a lot of extra operations to express something that
can be expressed very simply in an easy to understand way.

The fact that bitcasts are equivalent to:
%agg1 = insertvalue union {i32, float} %agg, i32 1, 0    ; yields
union {i32, float}
%result = extractvalue union {i32, float} %agg1, 1    ; yields float
is an interesting side effect.  (and the one I am least happy about,
though it could be cleaned up very simply).

Andrew