[llvm-dev] RFC: Representing unions in TBAA

Daniel Berlin via llvm-dev llvm-dev at lists.llvm.org
Wed Mar 1 16:36:53 PST 2017

Clang generates tbaa trees that are not conservatively correct in the face
of unions, for any definition of correct you have :)

We mostly rely on basicaa to say things are must-alias, but if it says
may-alias, in the case of unions, tbaa will often say "noalias".

On Wed, Mar 1, 2017, 4:00 PM Flamedoge <code.kchoi at gmail.com> wrote:

> Pardon my exclamation, are you saying TBAA will return NoAlias between
> union members under current implementation?
> If so, this is definitely a concern.
> Kevin
> On Wed, Mar 1, 2017 at 3:30 PM, Daniel Berlin via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing our
> current TBAA tree for union generation is definitely irretrievably broken.
> I'll be honest here. I'm pretty sure your proposal doesn't go far enough.
> But truthfully,  I would rather see us come closer to a representation we
> know works, which is GCC's.
> Let me try to simplify what you are suggesting, and what we have.
> Our current representation is basically inverted from GCC, but we don't
> allow things that would enable it to work.
> Given
> union {int a, short b};
> GCC's will be:
>  union
>   /   \
> short int
> Everything is implicitly a subset of alias set 0 (which for C/C++ is char)
> just to avoid representing it.
> Our metadata has no child links, and only a single parent link.
> You can't represent the above form because you can't make a single short
> node a child  of every union/struct it needs to be (lack of multiple
> parents means you can't just frob them all to offset zero).
> Your suggestion is to invert this as a struct
> short  int
>    |    /
> union
> We don't allow multiple parents, however.
> Because of that, you suggest we follow all nodes for unions, special
> casing union-type nodes somehow
> Let me suggest something different:
> We know the current structure fails us in a number of ways.
> Instead of trying to shoehorn this into our current structure, I suggest:
> we stop messing around and just have a GCC style tree, and represent the
> children instead of the parents.
> We make contained types descendants instead of ancestors.
> We allow multiple children at each offset and for scalar type nodes.x`
> We could do that without changing to children, but honestly,  i strongly
> believe nobody who ever looks at this really understands it right now.
> (If someone really does like the ancestor form, i'd love to understand why
> :P)
> So if we are going to change it, we should change it to something that is
> easier to understand.
> something like:
> scalar type node -> {name, children nodes}
> struct node -> {name, array of {offset, child node} }
> Paths are separate from the tbaa tree itself, and are just:
> path node -> {struct node, offset} or whatever.
> unions are just scalar type nodes with multiple children, not struct nodes
> with special-cased offset zero.
> This also has a well-defined upgrade path.
> For "old-style" DAGs, like exist now, we know we have to regen the
> metadata, and that it is wrong (So we could, for example, simply disable it
> for correctness reasons)
> .
> Anything with a "new-style" DAG, we know is usable.
> In this representation, the most generic tbaa node is just the one that
> contains the other.
> If neither contains the other, you can just make one that has both as
> children and use that.
> (instead of now, where you'd have to have multiple parents again).
> (The above also may be better for other languages)
> --Dan
> On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron <perrons at ca.ibm.com> wrote:
> Seems like the comments have stopped.  I'll try to get a patch together.
> Then we can continue the discussion from there.
> Hubert, as for the issue with the llvm optimizations losing the TBAA
> information, it should be the responsibility to make sure the aliasing is
> changed in the correct way.  One function related to this has already been
> mentioned:  getMostGenericTBAA.
> Later,
> Steven Perron
> From:        Hubert Tong <hubert.reinterpretcast at gmail.com>
> To:        Steven Perron/Toronto/IBM at IBMCA
> Cc:        Daniel Berlin <dberlin at dberlin.org>, llvm-dev <
> llvm-dev at lists.llvm.org>, Sanjoy Das <sanjoy at playingwithpointers.com>
> Date:        2017/02/15 07:44 AM
> Subject:        Re: [llvm-dev] RFC: Representing unions in TBAA
> ------------------------------
> On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron <*perrons at ca.ibm.com*
> <perrons at ca.ibm.com>> wrote:
> 3) How should we handle a reference directly through a union, and a
> reference that is not through the union?
> My solution was to look for each member of the union overlaps the given
> offset, and see if any of those members aliased the other reference.  If no
> member aliases the other reference, then the answer is no alias.  Otherwise
> the answer is may alias.  The run time for this would be proportional to
>  "distance to the root" * "number of overlapping members".  This could be
> slow if there are unions with many members or many unions of unions.
> Another option is to say that they do not alias.  This would mean that all
> references to unions must be explicitly through the union.
> From what I gather from the thread so far, the access through the union
> may be lost because of LLVM transformations. I am not sure how, in the face
> of that, TBAA could indicate NoAlias safely (without the risk of
> functional-correctness issues in correct programs) between types which
> overlap within any union (within some portion of the program).
> As for the standards, it is definitely not true that all references to
> unions must be explicitly through the union. However, if you are trying to
> perform union-based type punning (under C11), then it appears that it is
> intended that the read must be through the union.
> This would be the least restrictive aliasing allowing the most
> optimization.  The implementation would be simple.  I believe we make the
> parent of the TBAA node for the union to be "omnipotent char".  This might
> be similar to treating the union TBAA node more like a scalar node instead
> of a struct-path.  Then the traversal of the TBAA nodes will be quick.
> I'll have to work this out a bit more, but, if this is good enough to meet
> the requirements of the standard, I can try to think this through a little
> more.  I'll need Hubert and Daniel to comment on that since I am no expert
> on the C and C++ standards.
> The third option is to be pessimistic and say "may alias" all of the time
> (conservatively correct), and rely on other alias analysis to improve it.
> This will have good compile time, but could hinder optimization.
> Personally I do not like this option.  Most of the time it will not have a
> negative effect, but there will be a reasonable number of programs where
> this will hurt optimization more that it needs to.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170301/e93bd273/attachment-0001.html>

More information about the llvm-dev mailing list