[llvm-dev] RFC: Representing unions in TBAA

Xin Tong via llvm-dev llvm-dev at lists.llvm.org
Sun Apr 9 18:40:45 PDT 2017


I would like to see it disabled as well until fixed, we have problem
compiling HHVM with clang due to this union+TBAA problem.

-Xin
-- 
Software Engineer
Employee of Facebook Inc.

On Fri, Apr 7, 2017 at 1:29 PM, Daniel Berlin via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Ah.
> IMHO, yes we should disable it until it's correct.
>
>
> On Fri, Apr 7, 2017 at 1:28 PM, Krzysztof Parzyszek
> <kparzysz at codeaurora.org> wrote:
>>
>> I'm asking if people are ok with it. :)
>> We've had customer reports that can be tracked down to this issue, so this
>> is something we'd really like to working (at least in terms of correctness).
>>
>> I can come up with a patch.
>>
>> -Krzysztof
>>
>>
>> On 4/7/2017 3:25 PM, Daniel Berlin wrote:
>>>
>>> Not familiar with clang enough to know.
>>> Staring at it, it looks a bit annoying to do.
>>> You'd basically just want to stop decorating loads/stores with tbaa info
>>> if it's a union.
>>>
>>>
>>> On Fri, Apr 7, 2017 at 12:09 PM, Krzysztof Parzyszek via llvm-dev
>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>
>>>     Can we turn off TBAA info for union member accesses in clang before
>>>     this gets fixed?
>>>
>>>     -Krzysztof
>>>
>>>
>>>     On 3/1/2017 5:30 PM, Daniel Berlin via llvm-dev wrote:
>>>
>>>         So, https://bugs.llvm.org/show_bug.cgi?id=32056
>>>         <https://bugs.llvm.org/show_bug.cgi?id=32056> is an example
>>> showing
>>>         our current TBAA tree for union generation is definitely
>>>         irretrievably
>>>         broken.
>>>         I'll be honest here. I'm pretty sure your proposal doesn't go
>>>         far enough.
>>>         But truthfully,  I would rather see us come closer to a
>>>         representation
>>>         we know works, which is GCC's.
>>>         Let me try to simplify what you are suggesting, and what we have.
>>>         Our current representation is basically inverted from GCC, but
>>>         we don't
>>>         allow things that would enable it to work.
>>>
>>>         Given
>>>         union {int a, short b};
>>>
>>>         GCC's will be:
>>>
>>>          union
>>>           /   \
>>>         short int
>>>
>>>
>>>         Everything is implicitly a subset of alias set 0 (which for C/C++
>>> is
>>>         char) just to avoid representing it.
>>>
>>>         Our metadata has no child links, and only a single parent link.
>>>
>>>         You can't represent the above form because you can't make a
>>>         single short
>>>         node a child  of every union/struct it needs to be (lack of
>>> multiple
>>>         parents means you can't just frob them all to offset zero).
>>>
>>>         Your suggestion is to invert this as a struct
>>>
>>>         short  int
>>>            |    /
>>>         union
>>>
>>>         We don't allow multiple parents, however.
>>>         Because of that, you suggest we follow all nodes for unions,
>>> special
>>>         casing union-type nodes somehow
>>>
>>>         Let me suggest something different:
>>>
>>>         We know the current structure fails us in a number of ways.
>>>         Instead of trying to shoehorn this into our current structure, I
>>>         suggest: we stop messing around and just have a GCC style tree,
>>> and
>>>         represent the children instead of the parents.
>>>         We make contained types descendants instead of ancestors.
>>>         We allow multiple children at each offset and for scalar type
>>>         nodes.x`
>>>
>>>         We could do that without changing to children, but honestly,  i
>>>         strongly
>>>         believe nobody who ever looks at this really understands it
>>>         right now.
>>>         (If someone really does like the ancestor form, i'd love to
>>>         understand
>>>         why :P)
>>>
>>>         So if we are going to change it, we should change it to
>>>         something that
>>>         is easier to understand.
>>>
>>>         something like:
>>>
>>>         scalar type node -> {name, children nodes}
>>>         struct node -> {name, array of {offset, child node} }
>>>
>>>         Paths are separate from the tbaa tree itself, and are just:
>>>         path node -> {struct node, offset} or whatever.
>>>
>>>         unions are just scalar type nodes with multiple children, not
>>> struct
>>>         nodes with special-cased offset zero.
>>>
>>>         This also has a well-defined upgrade path.
>>>
>>>         For "old-style" DAGs, like exist now, we know we have to regen
>>> the
>>>         metadata, and that it is wrong (So we could, for example, simply
>>>         disable
>>>         it for correctness reasons)
>>>         .
>>>         Anything with a "new-style" DAG, we know is usable.
>>>
>>>         In this representation, the most generic tbaa node is just the
>>>         one that
>>>         contains the other.
>>>         If neither contains the other, you can just make one that has
>>>         both as
>>>         children and use that.
>>>         (instead of now, where you'd have to have multiple parents
>>> again).
>>>
>>>         (The above also may be better for other languages)
>>>
>>>         --Dan
>>>
>>>
>>>
>>>
>>>         On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron
>>>         <perrons at ca.ibm.com <mailto:perrons at ca.ibm.com>
>>>         <mailto:perrons at ca.ibm.com <mailto:perrons at ca.ibm.com>>> wrote:
>>>
>>>             Seems like the comments have stopped.  I'll try to get a
>>> patch
>>>             together.  Then we can continue the discussion from there.
>>>
>>>             Hubert, as for the issue with the llvm optimizations losing
>>>         the TBAA
>>>             information, it should be the responsibility to make sure the
>>>             aliasing is changed in the correct way.  One function related
>>> to
>>>             this has already been mentioned:  getMostGenericTBAA.
>>>
>>>             Later,
>>>             Steven Perron
>>>
>>>
>>>
>>>             From:        Hubert Tong <hubert.reinterpretcast at gmail.com
>>>         <mailto:hubert.reinterpretcast at gmail.com>
>>>             <mailto:hubert.reinterpretcast at gmail.com
>>>         <mailto:hubert.reinterpretcast at gmail.com>>>
>>>             To:        Steven Perron/Toronto/IBM at IBMCA
>>>             Cc:        Daniel Berlin <dberlin at dberlin.org
>>>         <mailto:dberlin at dberlin.org>
>>>             <mailto:dberlin at dberlin.org <mailto:dberlin at dberlin.org>>>,
>>>         llvm-dev <llvm-dev at lists.llvm.org
>>> <mailto:llvm-dev at lists.llvm.org>
>>>             <mailto:llvm-dev at lists.llvm.org
>>>         <mailto:llvm-dev at lists.llvm.org>>>, Sanjoy Das
>>>             <sanjoy at playingwithpointers.com
>>>         <mailto:sanjoy at playingwithpointers.com>
>>>         <mailto:sanjoy at playingwithpointers.com
>>>
>>>         <mailto:sanjoy at playingwithpointers.com>>>
>>>             Date:        2017/02/15 07:44 AM
>>>             Subject:        Re: [llvm-dev] RFC: Representing unions in
>>> TBAA
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>>
>>>             On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron
>>>             <_perrons at ca.ibm.com_ <mailto:perrons at ca.ibm.com
>>>         <mailto:perrons at ca.ibm.com>>> wrote:
>>>             3) How should we handle a reference directly through a
>>>         union, and a
>>>             reference that is not through the union?
>>>
>>>             My solution was to look for each member of the union
>>>         overlaps the
>>>             given offset, and see if any of those members aliased the
>>> other
>>>             reference.  If no member aliases the other reference, then
>>> the
>>>             answer is no alias.  Otherwise the answer is may alias.  The
>>> run
>>>             time for this would be proportional to  "distance to the
>>> root" *
>>>             "number of overlapping members".  This could be slow if
>>>         there are
>>>             unions with many members or many unions of unions.
>>>
>>>             Another option is to say that they do not alias.  This would
>>>         mean
>>>             that all references to unions must be explicitly through the
>>>         union.
>>>             From what I gather from the thread so far, the access
>>>         through the
>>>             union may be lost because of LLVM transformations. I am not
>>> sure
>>>             how, in the face of that, TBAA could indicate NoAlias safely
>>>             (without the risk of functional-correctness issues in correct
>>>             programs) between types which overlap within any union
>>>         (within some
>>>             portion of the program).
>>>
>>>             As for the standards, it is definitely not true that all
>>>         references
>>>             to unions must be explicitly through the union. However, if
>>>         you are
>>>             trying to perform union-based type punning (under C11), then
>>> it
>>>             appears that it is intended that the read must be through
>>>         the union.
>>>
>>>             This would be the least restrictive aliasing allowing the
>>> most
>>>             optimization.  The implementation would be simple.  I believe
>>> we
>>>             make the parent of the TBAA node for the union to be
>>> "omnipotent
>>>             char".  This might be similar to treating the union TBAA
>>>         node more
>>>             like a scalar node instead of a struct-path.  Then the
>>>         traversal of
>>>             the TBAA nodes will be quick.  I'll have to work this out a
>>> bit
>>>             more, but, if this is good enough to meet the requirements
>>>         of the
>>>             standard, I can try to think this through a little more.
>>>         I'll need
>>>             Hubert and Daniel to comment on that since I am no expert on
>>>         the C
>>>             and C++ standards.
>>>
>>>             The third option is to be pessimistic and say "may alias"
>>>         all of the
>>>             time (conservatively correct), and rely on other alias
>>>         analysis to
>>>             improve it.  This will have good compile time, but could
>>> hinder
>>>             optimization.  Personally I do not like this option.  Most
>>>         of the
>>>             time it will not have a negative effect, but there will be a
>>>             reasonable number of programs where this will hurt
>>>         optimization more
>>>             that it needs to.
>>>
>>>
>>>
>>>
>>>
>>>
>>>         _______________________________________________
>>>         LLVM Developers mailing list
>>>         llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>>
>>>
>>>     --
>>>     Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>>     hosted by The Linux Foundation
>>>
>>>     _______________________________________________
>>>     LLVM Developers mailing list
>>>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>>
>>>
>>
>> --
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
>> by The Linux Foundation
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>


More information about the llvm-dev mailing list