[llvm-dev] RFC: Representing unions in TBAA

Hal Finkel via llvm-dev llvm-dev at lists.llvm.org
Thu Mar 9 09:57:30 PST 2017


On 03/01/2017 05:30 PM, Daniel Berlin via llvm-dev wrote:
> So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing 
> our current TBAA tree for union generation is definitely irretrievably 
> broken.
> I'll be honest here. I'm pretty sure your proposal doesn't go far enough.
> But truthfully,  I would rather see us come closer to a representation 
> we know works, which is GCC's.
> Let me try to simplify what you are suggesting, and what we have.
> Our current representation is basically inverted from GCC, but we 
> don't allow things that would enable it to work.
>
> Given
> union {int a, short b};
>
> GCC's will be:
>
>  union
>   /   \
> short int
>
>
> Everything is implicitly a subset of alias set 0 (which for C/C++ is 
> char) just to avoid representing it.
>
> Our metadata has no child links, and only a single parent link.
>
> You can't represent the above form because you can't make a single 
> short node a child  of every union/struct it needs to be (lack of 
> multiple parents means you can't just frob them all to offset zero).
>
> Your suggestion is to invert this as a struct
>
> short  int
>    |    /
> union
>
> We don't allow multiple parents, however.
> Because of that, you suggest we follow all nodes for unions, special 
> casing union-type nodes somehow
>
> Let me suggest something different:
>
> We know the current structure fails us in a number of ways.
> Instead of trying to shoehorn this into our current structure, I 
> suggest: we stop messing around and just have a GCC style tree, and 
> represent the children instead of the parents.
> We make contained types descendants instead of ancestors.
> We allow multiple children at each offset and for scalar type nodes.x`
>
> We could do that without changing to children, but honestly,  i 
> strongly believe nobody who ever looks at this really understands it 
> right now.
> (If someone really does like the ancestor form, i'd love to understand 
> why :P)
>
> So if we are going to change it, we should change it to something that 
> is easier to understand.
>
> something like:
>
> scalar type node -> {name, children nodes}
> struct node -> {name, array of {offset, child node} }
>
> Paths are separate from the tbaa tree itself, and are just:
> path node -> {struct node, offset} or whatever.
>
> unions are just scalar type nodes with multiple children, not struct 
> nodes with special-cased offset zero.
>
> This also has a well-defined upgrade path.
>
> For "old-style" DAGs, like exist now, we know we have to regen the 
> metadata, and that it is wrong (So we could, for example, simply 
> disable it for correctness reasons)
> .
> Anything with a "new-style" DAG, we know is usable.
>
> In this representation, the most generic tbaa node is just the one 
> that contains the other.
> If neither contains the other, you can just make one that has both as 
> children and use that.
> (instead of now, where you'd have to have multiple parents again).

You mean making a 'scalar type node', because those represent 'or'?

  -Hal

>
> (The above also may be better for other languages)
>
> --Dan
>
>
>
>
> On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron <perrons at ca.ibm.com 
> <mailto:perrons at ca.ibm.com>> wrote:
>
>     Seems like the comments have stopped.  I'll try to get a patch
>     together.  Then we can continue the discussion from there.
>
>     Hubert, as for the issue with the llvm optimizations losing the
>     TBAA information, it should be the responsibility to make sure the
>     aliasing is changed in the correct way. One function related to
>     this has already been mentioned:  getMostGenericTBAA.
>
>     Later,
>     Steven Perron
>
>
>
>     From: Hubert Tong <hubert.reinterpretcast at gmail.com
>     <mailto:hubert.reinterpretcast at gmail.com>>
>     To: Steven Perron/Toronto/IBM at IBMCA
>     Cc: Daniel Berlin <dberlin at dberlin.org
>     <mailto:dberlin at dberlin.org>>, llvm-dev <llvm-dev at lists.llvm.org
>     <mailto:llvm-dev at lists.llvm.org>>, Sanjoy Das
>     <sanjoy at playingwithpointers.com
>     <mailto:sanjoy at playingwithpointers.com>>
>     Date: 2017/02/15 07:44 AM
>     Subject: Re: [llvm-dev] RFC: Representing unions in TBAA
>     ------------------------------------------------------------------------
>
>
>
>     On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron
>     <_perrons at ca.ibm.com_ <mailto:perrons at ca.ibm.com>> wrote:
>     3) How should we handle a reference directly through a union, and
>     a reference that is not through the union?
>
>     My solution was to look for each member of the union overlaps the
>     given offset, and see if any of those members aliased the other
>     reference. If no member aliases the other reference, then the
>     answer is no alias. Otherwise the answer is may alias.  The run
>     time for this would be proportional to  "distance to the root" *
>     "number of overlapping members".  This could be slow if there are
>     unions with many members or many unions of unions.
>
>     Another option is to say that they do not alias.  This would mean
>     that all references to unions must be explicitly through the union.
>     From what I gather from the thread so far, the access through the
>     union may be lost because of LLVM transformations. I am not sure
>     how, in the face of that, TBAA could indicate NoAlias safely
>     (without the risk of functional-correctness issues in correct
>     programs) between types which overlap within any union (within
>     some portion of the program).
>
>     As for the standards, it is definitely not true that all
>     references to unions must be explicitly through the union.
>     However, if you are trying to perform union-based type punning
>     (under C11), then it appears that it is intended that the read
>     must be through the union.
>
>     This would be the least restrictive aliasing allowing the most
>     optimization.  The implementation would be simple.  I believe we
>     make the parent of the TBAA node for the union to be "omnipotent
>     char".  This might be similar to treating the union TBAA node more
>     like a scalar node instead of a struct-path.  Then the traversal
>     of the TBAA nodes will be quick. I'll have to work this out a bit
>     more, but, if this is good enough to meet the requirements of the
>     standard, I can try to think this through a little more.  I'll
>     need Hubert and Daniel to comment on that since I am no expert on
>     the C and C++ standards.
>
>     The third option is to be pessimistic and say "may alias" all of
>     the time (conservatively correct), and rely on other alias
>     analysis to improve it.  This will have good compile time, but
>     could hinder optimization.  Personally I do not like this option.
>     Most of the time it will not have a negative effect, but there
>     will be a reasonable number of programs where this will hurt
>     optimization more that it needs to.
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170309/9d4848b8/attachment.html>


More information about the llvm-dev mailing list