[llvm-dev] RFC: Representing unions in TBAA
Krzysztof Parzyszek via llvm-dev
llvm-dev at lists.llvm.org
Fri Apr 7 13:28:16 PDT 2017
I'm asking if people are ok with it. :)
We've had customer reports that can be tracked down to this issue, so
this is something we'd really like to working (at least in terms of
correctness).
I can come up with a patch.
-Krzysztof
On 4/7/2017 3:25 PM, Daniel Berlin wrote:
> Not familiar with clang enough to know.
> Staring at it, it looks a bit annoying to do.
> You'd basically just want to stop decorating loads/stores with tbaa info
> if it's a union.
>
>
> On Fri, Apr 7, 2017 at 12:09 PM, Krzysztof Parzyszek via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> Can we turn off TBAA info for union member accesses in clang before
> this gets fixed?
>
> -Krzysztof
>
>
> On 3/1/2017 5:30 PM, Daniel Berlin via llvm-dev wrote:
>
> So, https://bugs.llvm.org/show_bug.cgi?id=32056
> <https://bugs.llvm.org/show_bug.cgi?id=32056> is an example showing
> our current TBAA tree for union generation is definitely
> irretrievably
> broken.
> I'll be honest here. I'm pretty sure your proposal doesn't go
> far enough.
> But truthfully, I would rather see us come closer to a
> representation
> we know works, which is GCC's.
> Let me try to simplify what you are suggesting, and what we have.
> Our current representation is basically inverted from GCC, but
> we don't
> allow things that would enable it to work.
>
> Given
> union {int a, short b};
>
> GCC's will be:
>
> union
> / \
> short int
>
>
> Everything is implicitly a subset of alias set 0 (which for C/C++ is
> char) just to avoid representing it.
>
> Our metadata has no child links, and only a single parent link.
>
> You can't represent the above form because you can't make a
> single short
> node a child of every union/struct it needs to be (lack of multiple
> parents means you can't just frob them all to offset zero).
>
> Your suggestion is to invert this as a struct
>
> short int
> | /
> union
>
> We don't allow multiple parents, however.
> Because of that, you suggest we follow all nodes for unions, special
> casing union-type nodes somehow
>
> Let me suggest something different:
>
> We know the current structure fails us in a number of ways.
> Instead of trying to shoehorn this into our current structure, I
> suggest: we stop messing around and just have a GCC style tree, and
> represent the children instead of the parents.
> We make contained types descendants instead of ancestors.
> We allow multiple children at each offset and for scalar type
> nodes.x`
>
> We could do that without changing to children, but honestly, i
> strongly
> believe nobody who ever looks at this really understands it
> right now.
> (If someone really does like the ancestor form, i'd love to
> understand
> why :P)
>
> So if we are going to change it, we should change it to
> something that
> is easier to understand.
>
> something like:
>
> scalar type node -> {name, children nodes}
> struct node -> {name, array of {offset, child node} }
>
> Paths are separate from the tbaa tree itself, and are just:
> path node -> {struct node, offset} or whatever.
>
> unions are just scalar type nodes with multiple children, not struct
> nodes with special-cased offset zero.
>
> This also has a well-defined upgrade path.
>
> For "old-style" DAGs, like exist now, we know we have to regen the
> metadata, and that it is wrong (So we could, for example, simply
> disable
> it for correctness reasons)
> .
> Anything with a "new-style" DAG, we know is usable.
>
> In this representation, the most generic tbaa node is just the
> one that
> contains the other.
> If neither contains the other, you can just make one that has
> both as
> children and use that.
> (instead of now, where you'd have to have multiple parents again).
>
> (The above also may be better for other languages)
>
> --Dan
>
>
>
>
> On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron
> <perrons at ca.ibm.com <mailto:perrons at ca.ibm.com>
> <mailto:perrons at ca.ibm.com <mailto:perrons at ca.ibm.com>>> wrote:
>
> Seems like the comments have stopped. I'll try to get a patch
> together. Then we can continue the discussion from there.
>
> Hubert, as for the issue with the llvm optimizations losing
> the TBAA
> information, it should be the responsibility to make sure the
> aliasing is changed in the correct way. One function related to
> this has already been mentioned: getMostGenericTBAA.
>
> Later,
> Steven Perron
>
>
>
> From: Hubert Tong <hubert.reinterpretcast at gmail.com
> <mailto:hubert.reinterpretcast at gmail.com>
> <mailto:hubert.reinterpretcast at gmail.com
> <mailto:hubert.reinterpretcast at gmail.com>>>
> To: Steven Perron/Toronto/IBM at IBMCA
> Cc: Daniel Berlin <dberlin at dberlin.org
> <mailto:dberlin at dberlin.org>
> <mailto:dberlin at dberlin.org <mailto:dberlin at dberlin.org>>>,
> llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> <mailto:llvm-dev at lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>>>, Sanjoy Das
> <sanjoy at playingwithpointers.com
> <mailto:sanjoy at playingwithpointers.com>
> <mailto:sanjoy at playingwithpointers.com
> <mailto:sanjoy at playingwithpointers.com>>>
> Date: 2017/02/15 07:44 AM
> Subject: Re: [llvm-dev] RFC: Representing unions in TBAA
>
> ------------------------------------------------------------------------
>
>
>
> On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron
> <_perrons at ca.ibm.com_ <mailto:perrons at ca.ibm.com
> <mailto:perrons at ca.ibm.com>>> wrote:
> 3) How should we handle a reference directly through a
> union, and a
> reference that is not through the union?
>
> My solution was to look for each member of the union
> overlaps the
> given offset, and see if any of those members aliased the other
> reference. If no member aliases the other reference, then the
> answer is no alias. Otherwise the answer is may alias. The run
> time for this would be proportional to "distance to the root" *
> "number of overlapping members". This could be slow if
> there are
> unions with many members or many unions of unions.
>
> Another option is to say that they do not alias. This would
> mean
> that all references to unions must be explicitly through the
> union.
> From what I gather from the thread so far, the access
> through the
> union may be lost because of LLVM transformations. I am not sure
> how, in the face of that, TBAA could indicate NoAlias safely
> (without the risk of functional-correctness issues in correct
> programs) between types which overlap within any union
> (within some
> portion of the program).
>
> As for the standards, it is definitely not true that all
> references
> to unions must be explicitly through the union. However, if
> you are
> trying to perform union-based type punning (under C11), then it
> appears that it is intended that the read must be through
> the union.
>
> This would be the least restrictive aliasing allowing the most
> optimization. The implementation would be simple. I believe we
> make the parent of the TBAA node for the union to be "omnipotent
> char". This might be similar to treating the union TBAA
> node more
> like a scalar node instead of a struct-path. Then the
> traversal of
> the TBAA nodes will be quick. I'll have to work this out a bit
> more, but, if this is good enough to meet the requirements
> of the
> standard, I can try to think this through a little more.
> I'll need
> Hubert and Daniel to comment on that since I am no expert on
> the C
> and C++ standards.
>
> The third option is to be pessimistic and say "may alias"
> all of the
> time (conservatively correct), and rely on other alias
> analysis to
> improve it. This will have good compile time, but could hinder
> optimization. Personally I do not like this option. Most
> of the
> time it will not have a negative effect, but there will be a
> reasonable number of programs where this will hurt
> optimization more
> that it needs to.
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
More information about the llvm-dev
mailing list