[llvm-dev] RFC: Representing unions in TBAA

Krzysztof Parzyszek via llvm-dev llvm-dev at lists.llvm.org
Fri Apr 7 13:28:16 PDT 2017


I'm asking if people are ok with it. :)
We've had customer reports that can be tracked down to this issue, so 
this is something we'd really like to working (at least in terms of 
correctness).

I can come up with a patch.

-Krzysztof


On 4/7/2017 3:25 PM, Daniel Berlin wrote:
> Not familiar with clang enough to know.
> Staring at it, it looks a bit annoying to do.
> You'd basically just want to stop decorating loads/stores with tbaa info
> if it's a union.
>
>
> On Fri, Apr 7, 2017 at 12:09 PM, Krzysztof Parzyszek via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>     Can we turn off TBAA info for union member accesses in clang before
>     this gets fixed?
>
>     -Krzysztof
>
>
>     On 3/1/2017 5:30 PM, Daniel Berlin via llvm-dev wrote:
>
>         So, https://bugs.llvm.org/show_bug.cgi?id=32056
>         <https://bugs.llvm.org/show_bug.cgi?id=32056> is an example showing
>         our current TBAA tree for union generation is definitely
>         irretrievably
>         broken.
>         I'll be honest here. I'm pretty sure your proposal doesn't go
>         far enough.
>         But truthfully,  I would rather see us come closer to a
>         representation
>         we know works, which is GCC's.
>         Let me try to simplify what you are suggesting, and what we have.
>         Our current representation is basically inverted from GCC, but
>         we don't
>         allow things that would enable it to work.
>
>         Given
>         union {int a, short b};
>
>         GCC's will be:
>
>          union
>           /   \
>         short int
>
>
>         Everything is implicitly a subset of alias set 0 (which for C/C++ is
>         char) just to avoid representing it.
>
>         Our metadata has no child links, and only a single parent link.
>
>         You can't represent the above form because you can't make a
>         single short
>         node a child  of every union/struct it needs to be (lack of multiple
>         parents means you can't just frob them all to offset zero).
>
>         Your suggestion is to invert this as a struct
>
>         short  int
>            |    /
>         union
>
>         We don't allow multiple parents, however.
>         Because of that, you suggest we follow all nodes for unions, special
>         casing union-type nodes somehow
>
>         Let me suggest something different:
>
>         We know the current structure fails us in a number of ways.
>         Instead of trying to shoehorn this into our current structure, I
>         suggest: we stop messing around and just have a GCC style tree, and
>         represent the children instead of the parents.
>         We make contained types descendants instead of ancestors.
>         We allow multiple children at each offset and for scalar type
>         nodes.x`
>
>         We could do that without changing to children, but honestly,  i
>         strongly
>         believe nobody who ever looks at this really understands it
>         right now.
>         (If someone really does like the ancestor form, i'd love to
>         understand
>         why :P)
>
>         So if we are going to change it, we should change it to
>         something that
>         is easier to understand.
>
>         something like:
>
>         scalar type node -> {name, children nodes}
>         struct node -> {name, array of {offset, child node} }
>
>         Paths are separate from the tbaa tree itself, and are just:
>         path node -> {struct node, offset} or whatever.
>
>         unions are just scalar type nodes with multiple children, not struct
>         nodes with special-cased offset zero.
>
>         This also has a well-defined upgrade path.
>
>         For "old-style" DAGs, like exist now, we know we have to regen the
>         metadata, and that it is wrong (So we could, for example, simply
>         disable
>         it for correctness reasons)
>         .
>         Anything with a "new-style" DAG, we know is usable.
>
>         In this representation, the most generic tbaa node is just the
>         one that
>         contains the other.
>         If neither contains the other, you can just make one that has
>         both as
>         children and use that.
>         (instead of now, where you'd have to have multiple parents again).
>
>         (The above also may be better for other languages)
>
>         --Dan
>
>
>
>
>         On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron
>         <perrons at ca.ibm.com <mailto:perrons at ca.ibm.com>
>         <mailto:perrons at ca.ibm.com <mailto:perrons at ca.ibm.com>>> wrote:
>
>             Seems like the comments have stopped.  I'll try to get a patch
>             together.  Then we can continue the discussion from there.
>
>             Hubert, as for the issue with the llvm optimizations losing
>         the TBAA
>             information, it should be the responsibility to make sure the
>             aliasing is changed in the correct way.  One function related to
>             this has already been mentioned:  getMostGenericTBAA.
>
>             Later,
>             Steven Perron
>
>
>
>             From:        Hubert Tong <hubert.reinterpretcast at gmail.com
>         <mailto:hubert.reinterpretcast at gmail.com>
>             <mailto:hubert.reinterpretcast at gmail.com
>         <mailto:hubert.reinterpretcast at gmail.com>>>
>             To:        Steven Perron/Toronto/IBM at IBMCA
>             Cc:        Daniel Berlin <dberlin at dberlin.org
>         <mailto:dberlin at dberlin.org>
>             <mailto:dberlin at dberlin.org <mailto:dberlin at dberlin.org>>>,
>         llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>             <mailto:llvm-dev at lists.llvm.org
>         <mailto:llvm-dev at lists.llvm.org>>>, Sanjoy Das
>             <sanjoy at playingwithpointers.com
>         <mailto:sanjoy at playingwithpointers.com>
>         <mailto:sanjoy at playingwithpointers.com
>         <mailto:sanjoy at playingwithpointers.com>>>
>             Date:        2017/02/15 07:44 AM
>             Subject:        Re: [llvm-dev] RFC: Representing unions in TBAA
>
>         ------------------------------------------------------------------------
>
>
>
>             On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron
>             <_perrons at ca.ibm.com_ <mailto:perrons at ca.ibm.com
>         <mailto:perrons at ca.ibm.com>>> wrote:
>             3) How should we handle a reference directly through a
>         union, and a
>             reference that is not through the union?
>
>             My solution was to look for each member of the union
>         overlaps the
>             given offset, and see if any of those members aliased the other
>             reference.  If no member aliases the other reference, then the
>             answer is no alias.  Otherwise the answer is may alias.  The run
>             time for this would be proportional to  "distance to the root" *
>             "number of overlapping members".  This could be slow if
>         there are
>             unions with many members or many unions of unions.
>
>             Another option is to say that they do not alias.  This would
>         mean
>             that all references to unions must be explicitly through the
>         union.
>             From what I gather from the thread so far, the access
>         through the
>             union may be lost because of LLVM transformations. I am not sure
>             how, in the face of that, TBAA could indicate NoAlias safely
>             (without the risk of functional-correctness issues in correct
>             programs) between types which overlap within any union
>         (within some
>             portion of the program).
>
>             As for the standards, it is definitely not true that all
>         references
>             to unions must be explicitly through the union. However, if
>         you are
>             trying to perform union-based type punning (under C11), then it
>             appears that it is intended that the read must be through
>         the union.
>
>             This would be the least restrictive aliasing allowing the most
>             optimization.  The implementation would be simple.  I believe we
>             make the parent of the TBAA node for the union to be "omnipotent
>             char".  This might be similar to treating the union TBAA
>         node more
>             like a scalar node instead of a struct-path.  Then the
>         traversal of
>             the TBAA nodes will be quick.  I'll have to work this out a bit
>             more, but, if this is good enough to meet the requirements
>         of the
>             standard, I can try to think this through a little more.
>         I'll need
>             Hubert and Daniel to comment on that since I am no expert on
>         the C
>             and C++ standards.
>
>             The third option is to be pessimistic and say "may alias"
>         all of the
>             time (conservatively correct), and rely on other alias
>         analysis to
>             improve it.  This will have good compile time, but could hinder
>             optimization.  Personally I do not like this option.  Most
>         of the
>             time it will not have a negative effect, but there will be a
>             reasonable number of programs where this will hurt
>         optimization more
>             that it needs to.
>
>
>
>
>
>
>         _______________________________________________
>         LLVM Developers mailing list
>         llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>     --
>     Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>     hosted by The Linux Foundation
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation


More information about the llvm-dev mailing list