[llvm-dev] RFC: Representing unions in TBAA

Daniel Berlin via llvm-dev llvm-dev at lists.llvm.org
Fri Apr 7 13:25:13 PDT 2017

Not familiar with clang enough to know.
Staring at it, it looks a bit annoying to do.
You'd basically just want to stop decorating loads/stores with tbaa info if
it's a union.

On Fri, Apr 7, 2017 at 12:09 PM, Krzysztof Parzyszek via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Can we turn off TBAA info for union member accesses in clang before this
> gets fixed?
> -Krzysztof
> On 3/1/2017 5:30 PM, Daniel Berlin via llvm-dev wrote:
>> So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing
>> our current TBAA tree for union generation is definitely irretrievably
>> broken.
>> I'll be honest here. I'm pretty sure your proposal doesn't go far enough.
>> But truthfully,  I would rather see us come closer to a representation
>> we know works, which is GCC's.
>> Let me try to simplify what you are suggesting, and what we have.
>> Our current representation is basically inverted from GCC, but we don't
>> allow things that would enable it to work.
>> Given
>> union {int a, short b};
>> GCC's will be:
>>  union
>>   /   \
>> short int
>> Everything is implicitly a subset of alias set 0 (which for C/C++ is
>> char) just to avoid representing it.
>> Our metadata has no child links, and only a single parent link.
>> You can't represent the above form because you can't make a single short
>> node a child  of every union/struct it needs to be (lack of multiple
>> parents means you can't just frob them all to offset zero).
>> Your suggestion is to invert this as a struct
>> short  int
>>    |    /
>> union
>> We don't allow multiple parents, however.
>> Because of that, you suggest we follow all nodes for unions, special
>> casing union-type nodes somehow
>> Let me suggest something different:
>> We know the current structure fails us in a number of ways.
>> Instead of trying to shoehorn this into our current structure, I
>> suggest: we stop messing around and just have a GCC style tree, and
>> represent the children instead of the parents.
>> We make contained types descendants instead of ancestors.
>> We allow multiple children at each offset and for scalar type nodes.x`
>> We could do that without changing to children, but honestly,  i strongly
>> believe nobody who ever looks at this really understands it right now.
>> (If someone really does like the ancestor form, i'd love to understand
>> why :P)
>> So if we are going to change it, we should change it to something that
>> is easier to understand.
>> something like:
>> scalar type node -> {name, children nodes}
>> struct node -> {name, array of {offset, child node} }
>> Paths are separate from the tbaa tree itself, and are just:
>> path node -> {struct node, offset} or whatever.
>> unions are just scalar type nodes with multiple children, not struct
>> nodes with special-cased offset zero.
>> This also has a well-defined upgrade path.
>> For "old-style" DAGs, like exist now, we know we have to regen the
>> metadata, and that it is wrong (So we could, for example, simply disable
>> it for correctness reasons)
>> .
>> Anything with a "new-style" DAG, we know is usable.
>> In this representation, the most generic tbaa node is just the one that
>> contains the other.
>> If neither contains the other, you can just make one that has both as
>> children and use that.
>> (instead of now, where you'd have to have multiple parents again).
>> (The above also may be better for other languages)
>> --Dan
>> On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron <perrons at ca.ibm.com
>> <mailto:perrons at ca.ibm.com>> wrote:
>>     Seems like the comments have stopped.  I'll try to get a patch
>>     together.  Then we can continue the discussion from there.
>>     Hubert, as for the issue with the llvm optimizations losing the TBAA
>>     information, it should be the responsibility to make sure the
>>     aliasing is changed in the correct way.  One function related to
>>     this has already been mentioned:  getMostGenericTBAA.
>>     Later,
>>     Steven Perron
>>     From:        Hubert Tong <hubert.reinterpretcast at gmail.com
>>     <mailto:hubert.reinterpretcast at gmail.com>>
>>     To:        Steven Perron/Toronto/IBM at IBMCA
>>     Cc:        Daniel Berlin <dberlin at dberlin.org
>>     <mailto:dberlin at dberlin.org>>, llvm-dev <llvm-dev at lists.llvm.org
>>     <mailto:llvm-dev at lists.llvm.org>>, Sanjoy Das
>>     <sanjoy at playingwithpointers.com <mailto:sanjoy at playingwithpoin
>> ters.com>>
>>     Date:        2017/02/15 07:44 AM
>>     Subject:        Re: [llvm-dev] RFC: Representing unions in TBAA
>>     ------------------------------------------------------------
>> ------------
>>     On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron
>>     <_perrons at ca.ibm.com_ <mailto:perrons at ca.ibm.com>> wrote:
>>     3) How should we handle a reference directly through a union, and a
>>     reference that is not through the union?
>>     My solution was to look for each member of the union overlaps the
>>     given offset, and see if any of those members aliased the other
>>     reference.  If no member aliases the other reference, then the
>>     answer is no alias.  Otherwise the answer is may alias.  The run
>>     time for this would be proportional to  "distance to the root" *
>>     "number of overlapping members".  This could be slow if there are
>>     unions with many members or many unions of unions.
>>     Another option is to say that they do not alias.  This would mean
>>     that all references to unions must be explicitly through the union.
>>     From what I gather from the thread so far, the access through the
>>     union may be lost because of LLVM transformations. I am not sure
>>     how, in the face of that, TBAA could indicate NoAlias safely
>>     (without the risk of functional-correctness issues in correct
>>     programs) between types which overlap within any union (within some
>>     portion of the program).
>>     As for the standards, it is definitely not true that all references
>>     to unions must be explicitly through the union. However, if you are
>>     trying to perform union-based type punning (under C11), then it
>>     appears that it is intended that the read must be through the union.
>>     This would be the least restrictive aliasing allowing the most
>>     optimization.  The implementation would be simple.  I believe we
>>     make the parent of the TBAA node for the union to be "omnipotent
>>     char".  This might be similar to treating the union TBAA node more
>>     like a scalar node instead of a struct-path.  Then the traversal of
>>     the TBAA nodes will be quick.  I'll have to work this out a bit
>>     more, but, if this is good enough to meet the requirements of the
>>     standard, I can try to think this through a little more.  I'll need
>>     Hubert and Daniel to comment on that since I am no expert on the C
>>     and C++ standards.
>>     The third option is to be pessimistic and say "may alias" all of the
>>     time (conservatively correct), and rely on other alias analysis to
>>     improve it.  This will have good compile time, but could hinder
>>     optimization.  Personally I do not like this option.  Most of the
>>     time it will not have a negative effect, but there will be a
>>     reasonable number of programs where this will hurt optimization more
>>     that it needs to.
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
> by The Linux Foundation
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170407/bc7f872f/attachment.html>

More information about the llvm-dev mailing list