[llvm-dev] RFC: Representing unions in TBAA

Hal Finkel via llvm-dev llvm-dev at lists.llvm.org
Sun May 14 08:37:15 PDT 2017


On 03/01/2017 05:30 PM, Daniel Berlin via llvm-dev wrote:
> So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing 
> our current TBAA tree for union generation is definitely irretrievably 
> broken.
> I'll be honest here. I'm pretty sure your proposal doesn't go far enough.
> But truthfully,  I would rather see us come closer to a representation 
> we know works, which is GCC's.
> Let me try to simplify what you are suggesting, and what we have.
> Our current representation is basically inverted from GCC, but we 
> don't allow things that would enable it to work.
>
> Given
> union {int a, short b};
>
> GCC's will be:
>
>  union
>   /   \
> short int
>
>
> Everything is implicitly a subset of alias set 0 (which for C/C++ is 
> char) just to avoid representing it.
>
> Our metadata has no child links, and only a single parent link.
>
> You can't represent the above form because you can't make a single 
> short node a child  of every union/struct it needs to be (lack of 
> multiple parents means you can't just frob them all to offset zero).
>
> Your suggestion is to invert this as a struct
>
> short  int
>    |    /
> union
>
> We don't allow multiple parents, however.
> Because of that, you suggest we follow all nodes for unions, special 
> casing union-type nodes somehow

Now that I've spent a bunch of time looking at this, I'd like to voice 
support for Steven's original proposal. In the context of what we have, 
it makes sense, seems sound, and should fix the representational mapping 
problem we currently have. Something completely different (e.g. closer 
to what GCC uses) can work too, but this seems unnecessary (the proposed 
extension to the current semantics seem equivalently expressive).

What you call "special casing of union-type nodes" does not actually 
seem all that special.  The rule seems quite simple: if you some across 
duplicate offsets, then search all of them. This should not be difficult 
to implement or use, and seems no less efficient than any other way of 
encoding the concept of disjunction in the hierarchy.

  -Hal

>
> Let me suggest something different:
>
> We know the current structure fails us in a number of ways.
> Instead of trying to shoehorn this into our current structure, I 
> suggest: we stop messing around and just have a GCC style tree, and 
> represent the children instead of the parents.
> We make contained types descendants instead of ancestors.
> We allow multiple children at each offset and for scalar type nodes.x`
>
> We could do that without changing to children, but honestly,  i 
> strongly believe nobody who ever looks at this really understands it 
> right now.
> (If someone really does like the ancestor form, i'd love to understand 
> why :P)
>
> So if we are going to change it, we should change it to something that 
> is easier to understand.
>
> something like:
>
> scalar type node -> {name, children nodes}
> struct node -> {name, array of {offset, child node} }
>
> Paths are separate from the tbaa tree itself, and are just:
> path node -> {struct node, offset} or whatever.
>
> unions are just scalar type nodes with multiple children, not struct 
> nodes with special-cased offset zero.
>
> This also has a well-defined upgrade path.
>
> For "old-style" DAGs, like exist now, we know we have to regen the 
> metadata, and that it is wrong (So we could, for example, simply 
> disable it for correctness reasons)
> .
> Anything with a "new-style" DAG, we know is usable.
>
> In this representation, the most generic tbaa node is just the one 
> that contains the other.
> If neither contains the other, you can just make one that has both as 
> children and use that.
> (instead of now, where you'd have to have multiple parents again).
>
> (The above also may be better for other languages)
>
> --Dan
>
>
>
>
> On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron <perrons at ca.ibm.com 
> <mailto:perrons at ca.ibm.com>> wrote:
>
>     Seems like the comments have stopped.  I'll try to get a patch
>     together.  Then we can continue the discussion from there.
>
>     Hubert, as for the issue with the llvm optimizations losing the
>     TBAA information, it should be the responsibility to make sure the
>     aliasing is changed in the correct way. One function related to
>     this has already been mentioned:  getMostGenericTBAA.
>
>     Later,
>     Steven Perron
>
>
>
>     From: Hubert Tong <hubert.reinterpretcast at gmail.com
>     <mailto:hubert.reinterpretcast at gmail.com>>
>     To: Steven Perron/Toronto/IBM at IBMCA
>     Cc: Daniel Berlin <dberlin at dberlin.org
>     <mailto:dberlin at dberlin.org>>, llvm-dev <llvm-dev at lists.llvm.org
>     <mailto:llvm-dev at lists.llvm.org>>, Sanjoy Das
>     <sanjoy at playingwithpointers.com
>     <mailto:sanjoy at playingwithpointers.com>>
>     Date: 2017/02/15 07:44 AM
>     Subject: Re: [llvm-dev] RFC: Representing unions in TBAA
>     ------------------------------------------------------------------------
>
>
>
>     On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron
>     <_perrons at ca.ibm.com_ <mailto:perrons at ca.ibm.com>> wrote:
>     3) How should we handle a reference directly through a union, and
>     a reference that is not through the union?
>
>     My solution was to look for each member of the union overlaps the
>     given offset, and see if any of those members aliased the other
>     reference. If no member aliases the other reference, then the
>     answer is no alias. Otherwise the answer is may alias.  The run
>     time for this would be proportional to  "distance to the root" *
>     "number of overlapping members".  This could be slow if there are
>     unions with many members or many unions of unions.
>
>     Another option is to say that they do not alias.  This would mean
>     that all references to unions must be explicitly through the union.
>     From what I gather from the thread so far, the access through the
>     union may be lost because of LLVM transformations. I am not sure
>     how, in the face of that, TBAA could indicate NoAlias safely
>     (without the risk of functional-correctness issues in correct
>     programs) between types which overlap within any union (within
>     some portion of the program).
>
>     As for the standards, it is definitely not true that all
>     references to unions must be explicitly through the union.
>     However, if you are trying to perform union-based type punning
>     (under C11), then it appears that it is intended that the read
>     must be through the union.
>
>     This would be the least restrictive aliasing allowing the most
>     optimization.  The implementation would be simple.  I believe we
>     make the parent of the TBAA node for the union to be "omnipotent
>     char".  This might be similar to treating the union TBAA node more
>     like a scalar node instead of a struct-path.  Then the traversal
>     of the TBAA nodes will be quick. I'll have to work this out a bit
>     more, but, if this is good enough to meet the requirements of the
>     standard, I can try to think this through a little more.  I'll
>     need Hubert and Daniel to comment on that since I am no expert on
>     the C and C++ standards.
>
>     The third option is to be pessimistic and say "may alias" all of
>     the time (conservatively correct), and rely on other alias
>     analysis to improve it.  This will have good compile time, but
>     could hinder optimization.  Personally I do not like this option.
>     Most of the time it will not have a negative effect, but there
>     will be a reasonable number of programs where this will hurt
>     optimization more that it needs to.
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170514/e265919b/attachment.html>


More information about the llvm-dev mailing list