<div class="socmaildefaultfont" dir="ltr" style="font-family:Arial, Helvetica, sans-serif;font-size:10.5pt" ><div dir="ltr" >Hello all,</div>
<div dir="ltr" > </div>
<div dir="ltr" >I started implementing Daniel's idea, and I noticed a couple things that should be addressed.</div>
<div dir="ltr" > </div>
<div dir="ltr" >1) The root node: Clang currently generates a root node named "Simple C++ TBAA" (or something similar). From the comments it looks like this is used to determine whether or not two TBAA DAGs from different translation units can be merged into a single on during LTO. I plan on keeping that node, but making it the universal "sink" node as before. This means that, in the new interpretation of the graph, it will be a child of everything instead of a parent of everything. However, since it is still the sink, it can still work in LTO as before. This is just a guess, because I have not looked at the code that uses this.</div>
<div dir="ltr" > </div>
<div dir="ltr" >2) char: Daniel suggested omitting the aliasing for because we know it aliases everything. The problem with this is that llvm does not alias the current "omnipotent char" to everything. As is pointed out in the comments:</div>
<div dir="ltr" > </div>
<div dir="ltr" > // Define the root of the tree for user-accessible memory. C and C++ <br> // give special powers to char and certain similar types. However, <br> // these special powers only cover user-accessible memory, and doesn't <br> // include things like vtables. </div>
<div dir="ltr" > </div>
<div dir="ltr" >Is there another option on how to handle vtables? If not, we may need to add a node for "omnipotent char", which will have lots of children. This will make precomputing the transitive closure particularly important.</div>
<div dir="ltr" > </div>
<div dir="ltr" >Later,</div>
<div dir="ltr" >Steven Perron</div>
<div dir="ltr" > </div>
<blockquote data-history-content-modified="1" data-history-expanded="1" dir="ltr" style="border-left:solid #aaaaaa 2px; margin-left:5px; padding-left:5px; direction:ltr; margin-right:0px" >----- Original message -----<br>From: Hal Finkel <hfinkel@anl.gov><br>To: Daniel Berlin <dberlin@dberlin.org><br>Cc: Steven Perron/Toronto/IBM@IBMCA, llvm-dev <llvm-dev@lists.llvm.org><br>Subject: Re: [llvm-dev] RFC: Representing unions in TBAA<br>Date: Thu, Mar 9, 2017 2:41 PM<br> <br><!--Notes ACF
<meta content="text/html; charset=utf8" http-equiv="Content-Type" >-->
<p> </p>
<div>On 03/09/2017 01:15 PM, Daniel Berlin wrote:</div>
<blockquote cite="mid:CAF4BwTWgK+fNoKXG9FAMaL0V0JRtM6VgkL+o5p8guDeTKc3C=g@mail.gmail.com" type="cite" ><!--Notes ACF
<meta http-equiv="Content-Type" content="text/html; charset=utf8" >-->
<div dir="ltr" >
<div>
<div>On Thu, Mar 9, 2017 at 9:57 AM, Hal Finkel <span dir="ltr" ><<a href="mailto:hfinkel@anl.gov" moz-do-not-send="true" target="_blank" >hfinkel@anl.gov</a>></span> wrote:
<blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex" ><div bgcolor="#FFFFFF" text="#000000" ><div><div>
<div>On 03/01/2017 05:30 PM, Daniel Berlin via llvm-dev wrote:</div>
<blockquote type="cite" ><div dir="ltr" >So, <a href="https://bugs.llvm.org/show_bug.cgi?id=32056" moz-do-not-send="true" target="_blank" >https://bugs.llvm.org/show_<wbr>bug.cgi?id=32056</a> is an example showing our current TBAA tree for union generation is definitely irretrievably broken.
<div>I'll be honest here. I'm pretty sure your proposal doesn't go far enough.</div>
<div>But truthfully, I would rather see us come closer to a representation we know works, which is GCC's.</div>
<div>Let me try to simplify what you are suggesting, and what we have.</div>
<div>Our current representation is basically inverted from GCC, but we don't allow things that would enable it to work.</div>
<div> </div>
<div>Given</div>
<div>union {int a, short b};</div>
<div> </div>
<div>GCC's will be:</div>
<div> </div>
<div> union</div>
<div> / \</div>
<div>short int</div>
<div> </div>
<div> </div>
<div>Everything is implicitly a subset of alias set 0 (which for C/C++ is char) just to avoid representing it.</div>
<div> </div>
<div>Our metadata has no child links, and only a single parent link.</div>
<div> </div>
<div>You can't represent the above form because you can't make a single short node a child of every union/struct it needs to be (lack of multiple parents means you can't just frob them all to offset zero).</div>
<div> </div>
<div>Your suggestion is to invert this as a struct</div>
<div> </div>
<div>short int</div>
<div> | / </div>
<div>union</div>
<div> </div>
<div>We don't allow multiple parents, however.</div>
<div>Because of that, you suggest we follow all nodes for unions, special casing union-type nodes somehow</div>
<div> </div>
<div>Let me suggest something different:</div>
<div> </div>
<div>We know the current structure fails us in a number of ways.</div>
<div>Instead of trying to shoehorn this into our current structure, I suggest: we stop messing around and just have a GCC style tree, and represent the children instead of the parents.</div>
<div>We make contained types descendants instead of ancestors.</div>
<div>We allow multiple children at each offset and for scalar type nodes.x`</div>
<div> </div>
<div>We could do that without changing to children, but honestly, i strongly believe nobody who ever looks at this really understands it right now.</div>
<div>(If someone really does like the ancestor form, i'd love to understand why :P)</div>
<div><br>So if we are going to change it, we should change it to something that is easier to understand.</div>
<div> </div>
<div>something like:<br> </div>
<div>scalar type node -> {name, children nodes} </div>
<div>struct node -> {name, array of {offset, child node} }</div>
<div> </div>
<div>Paths are separate from the tbaa tree itself, and are just:</div>
<div>path node -> {struct node, offset} or whatever.</div>
<div> </div>
<div>unions are just scalar type nodes with multiple children, not struct nodes with special-cased offset zero.</div>
<div> </div>
<div>This also has a well-defined upgrade path.</div>
<div> </div>
<div>For "old-style" DAGs, like exist now, we know we have to regen the metadata, and that it is wrong (So we could, for example, simply disable it for correctness reasons)</div>
<div>.</div>
<div>Anything with a "new-style" DAG, we know is usable.</div>
<div> </div>
<div>In this representation, the most generic tbaa node is just the one that contains the other.</div>
<div>If neither contains the other, you can just make one that has both as children and use that.</div>
<div>(instead of now, where you'd have to have multiple parents again).</div></div></blockquote></div></div>You mean making a 'scalar type node', because those represent 'or'?</div></blockquote>
<div> </div>
<div>Yes.</div>
<div>I would probably stop calling them scalar type nodes and struct type nodes too, while we are bike-shedding a bit :)</div>
<div> </div>
<div>The terminology seems to be based on what we think a type system that generates those kinds of nodes look like.</div>
<div>But uh</div>
<div>1. That's wrong a lot of the time, because each frontend may want to use different types of nodes to be conservatively correct, and because different things can be represented, correctly, in multiple ways (IE, as a stupid example, you could represent any type as a struct node where all pieces go to the same place).</div>
<div> </div>
<div>In fact, it's not even true for the few things that do generate TBAA trees right now.</div>
<div> </div>
<div>2. It gives you no idea what they do in LLVM, which is, IMHO, what anyone reading those docs cares about.</div>
<div> </div>
<div>I'd probably call them based on something related to their actual semantic in llvm's tbaa tree is.</div>
<div> </div>
<div>IE i'd call scalar type nodes "or-type nodes" or "any of" nodes or something that gives people an idea that it means the aliasing result is may-alias if any path from that node is may-alias.</div>
<div>I'd call struct type nodes "offset-based nodes", becuase that's really what they are.</div>
<div> </div>
<div>(access-path nodes are actually access paths, so that seems find)</div>
<div> </div></div></div></div></blockquote><br>I certainly agree. We should pick terminology that makes sense independent of C/C++.<br><br> -Hal<br>
<blockquote cite="mid:CAF4BwTWgK+fNoKXG9FAMaL0V0JRtM6VgkL+o5p8guDeTKc3C=g@mail.gmail.com" type="cite" ><div dir="ltr" ><div><div><div> </div>
<div> </div>
<blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex" ><div bgcolor="#FFFFFF" text="#000000" ><br> -Hal<br>
<blockquote type="cite" ><div dir="ltr" ><div> </div>
<div><span>(The above also may be better for other languages)</span></div>
<div> </div>
<div><span>--Dan</span></div>
<div> </div>
<div> </div>
<div> </div></div>
<div>
<div><span>On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron <span dir="ltr" ><<a href="mailto:perrons@ca.ibm.com" moz-do-not-send="true" target="_blank" >perrons@ca.ibm.com</a>></span> wrote:</span>
<blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex" ><span><font face="sans-serif" size="2" >Seems like the comments have stopped. I'll try to get a patch together. Then we can continue the discussion from there. </font><br><br><font face="sans-serif" size="2" >Hubert, as for the issue with the llvm optimizations losing the TBAA information, it should be the responsibility to make sure the aliasing is changed in the correct way. One function related to this has already been mentioned: getMostGenericTBAA.</font><br><br><font face="sans-serif" size="2" >Later,</font><br><font face="sans-serif" size="2" >Steven Perron</font><br><br><br><br><font color="#5f5f5f" face="sans-serif" size="1" >From: </font><font face="sans-serif" size="1" >Hubert Tong <<a href="mailto:hubert.reinterpretcast@gmail.com" moz-do-not-send="true" target="_blank" >hubert.reinterpretcast@gmail.<wbr>com</a>></font><br><font color="#5f5f5f" face="sans-serif" size="1" >To: </font><font face="sans-serif" size="1" >Steven Perron/Toronto/IBM@IBMCA</font><br><font color="#5f5f5f" face="sans-serif" size="1" >Cc: </font><font face="sans-serif" size="1" >Daniel Berlin <<a href="mailto:dberlin@dberlin.org" moz-do-not-send="true" target="_blank" >dberlin@dberlin.org</a>>, llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true" target="_blank" >llvm-dev@lists.llvm.org</a>>, Sanjoy Das <<a href="mailto:sanjoy@playingwithpointers.com" moz-do-not-send="true" target="_blank" >sanjoy@playingwithpointers.co<wbr>m</a>></font><br><font color="#5f5f5f" face="sans-serif" size="1" >Date: </font><font face="sans-serif" size="1" >2017/02/15 07:44 AM</font><br><span><font color="#5f5f5f" face="sans-serif" size="1" >Subject: </font><font face="sans-serif" size="1" >Re: [llvm-dev] RFC: Representing unions in TBAA</font></span></span>
<hr noshade="noshade" ><span> </span>
<div><div><br><br><br><span><font size="3" >On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron <</font><a href="mailto:perrons@ca.ibm.com" moz-do-not-send="true" target="_blank" ><font color="blue" size="3" ><u>perrons@ca.ibm.com</u></font></a><font size="3" >> wrote:</font><br><font face="sans-serif" size="2" >3) How should we handle a reference directly through a union, and a reference that is not through the union? </font><br><br><font face="sans-serif" size="2" >My solution was to look for each member of the union overlaps the given offset, and see if any of those members aliased the other reference. If no member aliases the other reference, then the answer is no alias. Otherwise the answer is may alias. The run time for this would be proportional to "distance to the root" * "number of overlapping members". This could be slow if there are unions with many members or many unions of unions.</font><br><br><font face="sans-serif" size="2" >Another option is to say that they do not alias. This would mean that all references to unions must be explicitly through the union.</font><br><font size="3" >From what I gather from the thread so far, the access through the union may be lost because of LLVM transformations. I am not sure how, in the face of that, TBAA could indicate NoAlias safely (without the risk of functional-correctness issues in correct programs) between types which overlap within any union (within some portion of the program).</font><br><br><font size="3" >As for the standards, it is definitely not true that all references to unions must be explicitly through the union. However, if you are trying to perform union-based type punning (under C11), then it appears that it is intended that the read must be through the union.</font><br><font size="3" > </font><br><font face="sans-serif" size="2" >This would be the least restrictive aliasing allowing the most optimization. The implementation would be simple. I believe we make the parent of the TBAA node for the union to be "omnipotent char". This might be similar to treating the union TBAA node more like a scalar node instead of a struct-path. Then the traversal of the TBAA nodes will be quick. I'll have to work this out a bit more, but, if this is good enough to meet the requirements of the standard, I can try to think this through a little more. I'll need Hubert and Daniel to comment on that since I am no expert on the C and C++ standards.</font><br><br><font face="sans-serif" size="2" >The third option is to be pessimistic and say "may alias" all of the time (conservatively correct), and rely on other alias analysis to improve it. This will have good compile time, but could hinder optimization. Personally I do not like this option. Most of the time it will not have a negative effect, but there will be a reasonable number of programs where this will hurt optimization more that it needs to.</font></span><br><br><br> </div></div><span> </span></blockquote><span> </span></div></div>
<fieldset> </fieldset>
<div><span><font face="Default Monospace,Courier New,Courier,monospace" size="2" >______________________________<wbr>_________________<br>LLVM Developers mailing list<br><a href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true" target="_blank" >llvm-dev@lists.llvm.org</a><br><a href="http://lists.llvm.org/cgi-bin" target="_blank" >http://lists.llvm.org/cgi-bin</a>/<wbr>mailman/listinfo/llvm-dev</font></span></div></blockquote><span><font color="#888888" > </font></span>
<div><span><font color="#888888" ><font face="Default Monospace,Courier New,Courier,monospace" size="2" >--<br>Hal Finkel<br>Lead, Compiler Technology and Programming Languages<br>Leadership Computing Facility<br>Argonne National Laboratory</font></font></span></div><span><font color="#888888" > </font></span></div></blockquote></div></div></div></blockquote>
<div><font face="Default Monospace,Courier New,Courier,monospace" size="2" >--<br>Hal Finkel<br>Lead, Compiler Technology and Programming Languages<br>Leadership Computing Facility<br>Argonne National Laboratory</font></div></blockquote>
<div dir="ltr" > </div></div><BR>