[LLVMdev] [PATCH] - Union types, attempt 2

Fri Jan 15 15:13:36 PST 2010

On Fri, Jan 15, 2010 at 11:02 AM, Dan Gohman <gohman at apple.com> wrote:

>
> On Jan 13, 2010, at 12:11 PM, Talin wrote:
> >
> > It depends on whether or not unions can be passed around as SSA values or
> not. I can think of situations where you would want to.
>
> I'm skeptical that you *really* want to (i.e. that you wouldn't
> be better off just writing helper functions in your front-end
> which do the addressing and load/store and then moving on).
> But, I'm not really interested in getting in the way here.
>
> Let me give you a use case then:

Say I have a function which returns either a floating-point number or an
error code (like divide by zero or something). The way that I would
represent this return result is:

   { i1, union { float, i32 } }

In other words, what we have is a small struct that contains a one-bit
discriminator field, followed by a union of float and i32. The discriminator
field tells us what type is stored in the union - 0 = float, 1 = i32, so
this is a typical 'tagged' union. (We can also have untagged or "C-style"
unions, as long as the programmer has some other means of knowing what type
is stored in the union.)

Using a union here (as opposed to using bitcast) solves a number of
problems:

1) The size of the struct is automatically calculated by taking the largest
field of the union. Without unions, your frontend would have to calculate
the size of each possible field, as well as their alignment, and use that to
figure the maximum structure size. If your front-end is target-agnostic, you
may not even know how to calculate the correct struct size.

2) The struct is small enough to be returned as a first-class SSA value, and
with a union you can use it directly. Since bitcast only works on pointers,
in order to use it you would have to alloca some temporary memory to hold
the function result, store the result into it, then use a combination of GEP
and bitcast to get a correctly-typed pointer to the second field, and
finally load the value. With a union, you can simply extract the second
field without ever having to muck about with pointers and allocas.

3) The union provides an additional layer of type safety, since you can only
extract types which are declared in the union, and not any arbitrary type
that you could get with a bitcast. (Although I consider this a relatively
minor point since type safety isn't a major concern in IR.)

4) It's possible that some future version of the optimizer could use the
additional type information  provided by the union which the bitcast does
not. Perhaps an optimizer which knows that all of the union members are
numbers and not pointers could make some additional assumptions...

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100115/d12df0d7/attachment.html>