[llvm-dev] RFC: Representing unions in TBAA

Daniel Berlin via llvm-dev llvm-dev at lists.llvm.org
Fri Apr 7 13:29:54 PDT 2017


Ah.
IMHO, yes we should disable it until it's correct.


On Fri, Apr 7, 2017 at 1:28 PM, Krzysztof Parzyszek <kparzysz at codeaurora.org
> wrote:

> I'm asking if people are ok with it. :)
> We've had customer reports that can be tracked down to this issue, so this
> is something we'd really like to working (at least in terms of correctness).
>
> I can come up with a patch.
>
> -Krzysztof
>
>
> On 4/7/2017 3:25 PM, Daniel Berlin wrote:
>
>> Not familiar with clang enough to know.
>> Staring at it, it looks a bit annoying to do.
>> You'd basically just want to stop decorating loads/stores with tbaa info
>> if it's a union.
>>
>>
>> On Fri, Apr 7, 2017 at 12:09 PM, Krzysztof Parzyszek via llvm-dev
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>     Can we turn off TBAA info for union member accesses in clang before
>>     this gets fixed?
>>
>>     -Krzysztof
>>
>>
>>     On 3/1/2017 5:30 PM, Daniel Berlin via llvm-dev wrote:
>>
>>         So, https://bugs.llvm.org/show_bug.cgi?id=32056
>>         <https://bugs.llvm.org/show_bug.cgi?id=32056> is an example
>> showing
>>         our current TBAA tree for union generation is definitely
>>         irretrievably
>>         broken.
>>         I'll be honest here. I'm pretty sure your proposal doesn't go
>>         far enough.
>>         But truthfully,  I would rather see us come closer to a
>>         representation
>>         we know works, which is GCC's.
>>         Let me try to simplify what you are suggesting, and what we have.
>>         Our current representation is basically inverted from GCC, but
>>         we don't
>>         allow things that would enable it to work.
>>
>>         Given
>>         union {int a, short b};
>>
>>         GCC's will be:
>>
>>          union
>>           /   \
>>         short int
>>
>>
>>         Everything is implicitly a subset of alias set 0 (which for C/C++
>> is
>>         char) just to avoid representing it.
>>
>>         Our metadata has no child links, and only a single parent link.
>>
>>         You can't represent the above form because you can't make a
>>         single short
>>         node a child  of every union/struct it needs to be (lack of
>> multiple
>>         parents means you can't just frob them all to offset zero).
>>
>>         Your suggestion is to invert this as a struct
>>
>>         short  int
>>            |    /
>>         union
>>
>>         We don't allow multiple parents, however.
>>         Because of that, you suggest we follow all nodes for unions,
>> special
>>         casing union-type nodes somehow
>>
>>         Let me suggest something different:
>>
>>         We know the current structure fails us in a number of ways.
>>         Instead of trying to shoehorn this into our current structure, I
>>         suggest: we stop messing around and just have a GCC style tree,
>> and
>>         represent the children instead of the parents.
>>         We make contained types descendants instead of ancestors.
>>         We allow multiple children at each offset and for scalar type
>>         nodes.x`
>>
>>         We could do that without changing to children, but honestly,  i
>>         strongly
>>         believe nobody who ever looks at this really understands it
>>         right now.
>>         (If someone really does like the ancestor form, i'd love to
>>         understand
>>         why :P)
>>
>>         So if we are going to change it, we should change it to
>>         something that
>>         is easier to understand.
>>
>>         something like:
>>
>>         scalar type node -> {name, children nodes}
>>         struct node -> {name, array of {offset, child node} }
>>
>>         Paths are separate from the tbaa tree itself, and are just:
>>         path node -> {struct node, offset} or whatever.
>>
>>         unions are just scalar type nodes with multiple children, not
>> struct
>>         nodes with special-cased offset zero.
>>
>>         This also has a well-defined upgrade path.
>>
>>         For "old-style" DAGs, like exist now, we know we have to regen the
>>         metadata, and that it is wrong (So we could, for example, simply
>>         disable
>>         it for correctness reasons)
>>         .
>>         Anything with a "new-style" DAG, we know is usable.
>>
>>         In this representation, the most generic tbaa node is just the
>>         one that
>>         contains the other.
>>         If neither contains the other, you can just make one that has
>>         both as
>>         children and use that.
>>         (instead of now, where you'd have to have multiple parents again).
>>
>>         (The above also may be better for other languages)
>>
>>         --Dan
>>
>>
>>
>>
>>         On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron
>>         <perrons at ca.ibm.com <mailto:perrons at ca.ibm.com>
>>         <mailto:perrons at ca.ibm.com <mailto:perrons at ca.ibm.com>>> wrote:
>>
>>             Seems like the comments have stopped.  I'll try to get a patch
>>             together.  Then we can continue the discussion from there.
>>
>>             Hubert, as for the issue with the llvm optimizations losing
>>         the TBAA
>>             information, it should be the responsibility to make sure the
>>             aliasing is changed in the correct way.  One function related
>> to
>>             this has already been mentioned:  getMostGenericTBAA.
>>
>>             Later,
>>             Steven Perron
>>
>>
>>
>>             From:        Hubert Tong <hubert.reinterpretcast at gmail.com
>>         <mailto:hubert.reinterpretcast at gmail.com>
>>             <mailto:hubert.reinterpretcast at gmail.com
>>         <mailto:hubert.reinterpretcast at gmail.com>>>
>>             To:        Steven Perron/Toronto/IBM at IBMCA
>>             Cc:        Daniel Berlin <dberlin at dberlin.org
>>         <mailto:dberlin at dberlin.org>
>>             <mailto:dberlin at dberlin.org <mailto:dberlin at dberlin.org>>>,
>>         llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org
>> >
>>             <mailto:llvm-dev at lists.llvm.org
>>         <mailto:llvm-dev at lists.llvm.org>>>, Sanjoy Das
>>             <sanjoy at playingwithpointers.com
>>         <mailto:sanjoy at playingwithpointers.com>
>>         <mailto:sanjoy at playingwithpointers.com
>>
>>         <mailto:sanjoy at playingwithpointers.com>>>
>>             Date:        2017/02/15 07:44 AM
>>             Subject:        Re: [llvm-dev] RFC: Representing unions in
>> TBAA
>>
>>         ------------------------------------------------------------
>> ------------
>>
>>
>>
>>             On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron
>>             <_perrons at ca.ibm.com_ <mailto:perrons at ca.ibm.com
>>         <mailto:perrons at ca.ibm.com>>> wrote:
>>             3) How should we handle a reference directly through a
>>         union, and a
>>             reference that is not through the union?
>>
>>             My solution was to look for each member of the union
>>         overlaps the
>>             given offset, and see if any of those members aliased the
>> other
>>             reference.  If no member aliases the other reference, then the
>>             answer is no alias.  Otherwise the answer is may alias.  The
>> run
>>             time for this would be proportional to  "distance to the
>> root" *
>>             "number of overlapping members".  This could be slow if
>>         there are
>>             unions with many members or many unions of unions.
>>
>>             Another option is to say that they do not alias.  This would
>>         mean
>>             that all references to unions must be explicitly through the
>>         union.
>>             From what I gather from the thread so far, the access
>>         through the
>>             union may be lost because of LLVM transformations. I am not
>> sure
>>             how, in the face of that, TBAA could indicate NoAlias safely
>>             (without the risk of functional-correctness issues in correct
>>             programs) between types which overlap within any union
>>         (within some
>>             portion of the program).
>>
>>             As for the standards, it is definitely not true that all
>>         references
>>             to unions must be explicitly through the union. However, if
>>         you are
>>             trying to perform union-based type punning (under C11), then
>> it
>>             appears that it is intended that the read must be through
>>         the union.
>>
>>             This would be the least restrictive aliasing allowing the most
>>             optimization.  The implementation would be simple.  I believe
>> we
>>             make the parent of the TBAA node for the union to be
>> "omnipotent
>>             char".  This might be similar to treating the union TBAA
>>         node more
>>             like a scalar node instead of a struct-path.  Then the
>>         traversal of
>>             the TBAA nodes will be quick.  I'll have to work this out a
>> bit
>>             more, but, if this is good enough to meet the requirements
>>         of the
>>             standard, I can try to think this through a little more.
>>         I'll need
>>             Hubert and Daniel to comment on that since I am no expert on
>>         the C
>>             and C++ standards.
>>
>>             The third option is to be pessimistic and say "may alias"
>>         all of the
>>             time (conservatively correct), and rely on other alias
>>         analysis to
>>             improve it.  This will have good compile time, but could
>> hinder
>>             optimization.  Personally I do not like this option.  Most
>>         of the
>>             time it will not have a negative effect, but there will be a
>>             reasonable number of programs where this will hurt
>>         optimization more
>>             that it needs to.
>>
>>
>>
>>
>>
>>
>>         _______________________________________________
>>         LLVM Developers mailing list
>>         llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>>
>>     --
>>     Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>     hosted by The Linux Foundation
>>
>>     _______________________________________________
>>     LLVM Developers mailing list
>>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>>
>>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
> by The Linux Foundation
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170407/215e72b3/attachment.html>


More information about the llvm-dev mailing list