<br><br><div class="gmail_quote">On Thu, Feb 14, 2013 at 3:13 PM, Arnold Schwaighofer <span dir="ltr"><<a href="mailto:aschwaighofer@apple.com" target="_blank">aschwaighofer@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div style="word-wrap:break-word"><div><div class="h5"><br><div><div>On Feb 14, 2013, at 1:55 PM, Daniel Berlin <<a href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>> wrote:</div><br><blockquote type="cite">

<br><br><div class="gmail_quote">On Thu, Feb 14, 2013 at 2:49 PM, Arnold Schwaighofer <span dir="ltr"><<a href="mailto:aschwaighofer@apple.com" target="_blank">aschwaighofer@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<div style="word-wrap:break-word"><br><div><div><div>On Feb 14, 2013, at 12:34 PM, Renato Golin <<a href="mailto:renato.golin@linaro.org" target="_blank">renato.golin@linaro.org</a>> wrote:</div><br><blockquote type="cite">


<div dir="ltr">On 14 February 2013 17:46, Arnold Schwaighofer <span dir="ltr"><<a href="mailto:aschwaighofer@apple.com" target="_blank">aschwaighofer@apple.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">The existing implementation already relies on runtime checks (it has to make sure that an unknown object and a known object do not overlap). Yes, AA will conservatively return MayAlias/PartialAlias if it does not know two objects. You just have to make sure that you actually query it with that unknown object.<br>


</div></blockquote><div><br></div><div>I was expecting this, and I'm planning to be extra conservative to begin with. So, for now, MayAlias and PartialAlias are reasons to stop trying. Once we can get the code to understand basic independent objects inside a structure, we can specialize for the other cases.</div>


<div><br></div></div></div></div></blockquote><div><br></div></div><div>Okay. But understand that LLVM IR semantics does not give you much leeway in "understanding independent objects inside a structure". That is what you need TBAA for.</div>


</div></div></blockquote><div><br></div><div>TBAA is not a good answer to this general problem.  It falls down pretty quickly when you start to ask it about fields inside structures.</div><div>Proper structure aliasing (IE aliasing on pieces of structs), including pointer analysis, in languages where things can point to fields inside structures (like GEP does) is not only possible to do on LLVM IR, it's actually been done before :)</div>


<div><br></div><div>See, e.g, <a href="http://www.cs.ucsb.edu/~benh/research/downloads.html" target="_blank">http://www.cs.ucsb.edu/~benh/research/downloads.html</a> (The field sensitive version, and the flow sensitive/field sensitive versions).</div>


<div>Both were done on earlier versions of LLVM, but could be made to work today.</div><div><br></div><div><br></div></div>

</blockquote></div><br><div><br></div><div><br></div></div></div><div>What I meant by saying "understanding independent objects inside a structure<b>"</b> to is that just by looking at an llvm type say for example struct { int A[100]; int B[100]} you and seeing two gep accesses that index into the first and second field of the type you can not necessarily say that the access is not overlapping without fully evaluating the address computation:</div>

</div></blockquote><div><br></div><div>Yes, you need to fully evaluate it, but that's not really related to TBAA :)</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div style="word-wrap:break-word"><div><br></div><div>struct { int A[100]; int B[100]} S;</div><div><br></div><div>ptr1 = getelementpr S, 0, 1, x</div><div>ptr2  = getelementpr S, 0, 0, y</div><div><br></div><div>ptr1 and ptr2 mayalias for some x and y (for example x = 0 and y = 100, the ir makes no guarantee that you cannot run over the length of an array within a struct as long as you are within the struct, <a href="http://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds" target="_blank">http://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds</a>)</div>

</div></blockquote><div>Yup.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>If you can symbolically reason about x and y - the full address computation then yes you can make more concise statements.</div>

<div><br></div><div>Correct me if I am wrong but this is how it understand the semantics of a gep. I would love to be able to treat a type like a tree :).</div><div><br></div><div>I have not had a look at what ben hardekopf is doing. Does he handle the case I mention above correctly?</div>

</div></blockquote><div><br></div><div>Yes.  Hardekopf's work is an extension of work done by David Pearce, who I worked with to get this right in all the weird edge cases, and implemented in GCC.  See <a href="http://dl.acm.org/citation.cfm?id=1290524">http://dl.acm.org/citation.cfm?id=1290524</a></div>

<div><br></div><div>(David made this work, Ben Hardekopf made it work *really fast*)</div><div><br></div><div>Even though C/C++ does not legally allow to happen what LLVM IR does, it was common enough that we had to assume it can anyway.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>I know that the existing TBAA can not handle cases with struct fields, but an improved one might. </div>

</div></blockquote></div><br><div>TBAA is type based, and language specific.   At least in C/C++/most other languages i'm aware of, the TBAA tree will always say the above accesses can alias because they validly point to accesses of the same type.</div>

<div><br></div><div>your example above is the C equivalent of</div><div><br></div><div>struct a {</div><div>int b[100];</div><div>int c[100];</div><div>};</div><div><br></div><div>struct a foo;</div><div>int (*x[100]) = &foo.b;</div>

<div>int (*y[100]) = &foo.c;</div><div><br></div><div>TBAA will *never* say that *x and *y are no-alias, because they are both validly access an int *.</div><div>The same is true for </div><div><br></div><div>int *z1 = *x;</div>

<div>int *z2 = *y;</div><div><br></div><div>TBAA will never say *z1 and *z2 are no-alias, because they both validly access ints.</div><div><br></div><div>So i can't see how you could extend TBAA to account for your example, since it's only about types, and both of these are the same type, but maybe i'm missing something?  <br>

</div><div>If you mean you could make TBAA evaluate address computations and calculate whether these the resulting addresses ever access the same objects in memory, this isn't TBAA anymore, that is pointer analysis.  At least for inclusion based versions, it's O(N^3) worst case :P </div>

<br><div>In any case the point of field sensitive pointer analysis above is *exactly* to say "these both point to different fields in the structure, and the bit size of the access guarantees they will never overrun".</div>

<div><br></div><div>In the case of pointer loops that iterate over your example type, it will properly discover that it overruns into the next field, and properly mark it as pointing-to that field as well.</div><div><br></div>

<div><br></div>