[LLVMdev] Union type, is it really used or necessary?

Nick Lewycky nicholas at mxc.ca
Wed Jul 21 00:46:06 PDT 2010

Neal N. Wang wrote:
> On Tue, Jul 20, 2010 at 2:46 PM, Talin <viridia at gmail.com
> <mailto:viridia at gmail.com>> wrote:
>     On Tue, Jul 20, 2010 at 8:34 AM, Chris Lattner <clattner at apple.com
>     <mailto:clattner at apple.com>> wrote:
>         On Jul 20, 2010, at 1:36 AM, Anton Korobeynikov wrote:
>          >> used to make the code manipulating the union type "well
>         typed". This
>          >> approach seems work very well, is there really a need to
>         keep union type in
>          >> LLVM?
>          > I think in its current state the unions should be removed
>         from LLVM IR
>          > in next release. It's pretty much unfinished and noone is
>         willing to
>          > work on them.
>         I agree.
>     Unfortunately I wasn't able to take the union stuff much farther
>     than I did. Partly that was because my LLVM-related work has been on
>     hiatus for the last 4 months or so due to various issues going on in
>     my personal life. But it was also partly because I had reached the
>     limit of my knowledge in this area, I wasn't able to delve deeply
>     enough into the code generation side of LLVM to really understand
>     what needed to be done to support unions.
>     As far as converting a union into a C struct that is large enough to
>     hold all possible types of the union, there are two minor problems
>     associated with this approach:
>     1) For frontends that generate target-agnostic code, it is difficult
>     to calculate how large this struct should be. (Which is larger, 3
>     int32s or two pointers? You don't know unless your frontend knows
>     the size of a pointer.) In my case, I finally decided to abandon my
>     goal of making my frontend completely target-neutral. While it's
>     relatively easy to write a frontend that is 99% target-neutral with
>     LLVM, that last 1% cannot be eliminated.
> This is indeed a problem if a front-end or any pass has to compute the
> size of a type.  For example, Sometimes I need to find out the size of a
> type in my pass, I then call TargetData.getTypeStorageSize() to get the
> size of a particular type.  This practice will introduce
> architecture-dependent LLVM code.  IMHO, LLVM cannot avoid this problem
> anyway, unless such function is removed or returns a ConstantExpr.
> Probably, LLVM has a function that returns a ConstantExpr type size, I'm
> just ignorant in this aspect.

:-)  It's ConstantExpr::getSizeOf(Ty).

You can then pass that into an alloca and allocate that number of bytes.

> Another thought is can you delay the computing of the maximum storage of
> a union type by using a max operator?

Sure, but that's annoying. The max(%X, %Y) becomes 'select i1 (icmp ult 
%X, %Y), %X, %Y), or in code:
   Constant *SizeX = ConstantExpr::getSizeOf(Ty1);
   Constant *SizeY = ConstantExpr::getSizeOf(Ty2);
   Constant *GT = ConstantExpr::getICmp(ICmpInst::UGT, SizeX, SizeY);
   Constant *Max = ConstantExpr::getSelect(GT, SizeX, SizeY);

> Your example can be represented as "struct { max([3xi32], [2xi8*],...)
> }", this approach will avoid deciding the size in front-ends. But again
> allowing TargetData.getTypeStorageSize() can compromise the
> architecture-neutrality goal.
>     2) Extracting the values from the union require pointer casting,
>     which means that the union cannot be an SSA value - it has to have
>     an address. This probably isn't a big issue in languages like C++
>     which use unions infrequently, but other languages which use
>     algebraic type systems might suffer a loss of performance due to the
>     need to store union types in memory.
> Can mem2reg alleviate  this problem?

If the memory is alloca'd then mem2reg should take care of it, yes. Note 
that the constant expression needs to be resolved to a concrete number 
at some point for this to take place, which in practise means that the 
TargetData will need to be added and an instcombine run will need to 
take place before mem2reg can do its work.


More information about the llvm-dev mailing list