[LLVMdev] [PATCH] - Union types, attempt 2

Fri Jan 15 15:19:40 PST 2010

On Fri, Jan 15, 2010 at 3:13 PM, Talin <viridia at gmail.com> wrote:

> On Fri, Jan 15, 2010 at 11:02 AM, Dan Gohman <gohman at apple.com> wrote:
>
>>
>> On Jan 13, 2010, at 12:11 PM, Talin wrote:
>> >
>> > It depends on whether or not unions can be passed around as SSA values
>> or not. I can think of situations where you would want to.
>>
>> I'm skeptical that you *really* want to (i.e. that you wouldn't
>> be better off just writing helper functions in your front-end
>> which do the addressing and load/store and then moving on).
>> But, I'm not really interested in getting in the way here.
>>
>> Let me give you a use case then:
>
> Say I have a function which returns either a floating-point number or an
> error code (like divide by zero or something). The way that I would
> represent this return result is:
>
>    { i1, union { float, i32 } }
>
> In other words, what we have is a small struct that contains a one-bit
> discriminator field, followed by a union of float and i32. The discriminator
> field tells us what type is stored in the union - 0 = float, 1 = i32, so
> this is a typical 'tagged' union. (We can also have untagged or "C-style"
> unions, as long as the programmer has some other means of knowing what type
> is stored in the union.)
>
> Using a union here (as opposed to using bitcast) solves a number of
> problems:
>
> 1) The size of the struct is automatically calculated by taking the largest
> field of the union. Without unions, your frontend would have to calculate
> the size of each possible field, as well as their alignment, and use that to
> figure the maximum structure size. If your front-end is target-agnostic, you
> may not even know how to calculate the correct struct size.
>
> 2) The struct is small enough to be returned as a first-class SSA value,
> and with a union you can use it directly. Since bitcast only works on
> pointers, in order to use it you would have to alloca some temporary memory
> to hold the function result, store the result into it, then use a
> combination of GEP and bitcast to get a correctly-typed pointer to the
> second field, and finally load the value. With a union, you can simply
> extract the second field without ever having to muck about with pointers and
> allocas.
>
> 3) The union provides an additional layer of type safety, since you can
> only extract types which are declared in the union, and not any arbitrary
> type that you could get with a bitcast. (Although I consider this a
> relatively minor point since type safety isn't a major concern in IR.)
>
> 4) It's possible that some future version of the optimizer could use the
> additional type information  provided by the union which the bitcast does
> not. Perhaps an optimizer which knows that all of the union members are
> numbers and not pointers could make some additional assumptions...
>
> 5) Something I forgot to mention - by allowing GEP and extractvalue to work
with unions, we can  handle unions nested inside structs and vice versa with
a single GEP instruction. This is my main argument against having special
instructions for dealing with unions.

For example, in the case of { i1, union { float, i32 } }* we can use a GEP
with indices [0, 1, 0] to get access to the float field in a single GEP
instruction.

So just as GEP allows chaining together operations on structs, pointers and
arrays, we can also chain them together with operations on unions. This can
be quite powerful I think.

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100115/296521a1/attachment.html>