<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">On Feb 14, 2013, at 10:21 PM, Daniel Berlin <<a href="mailto:dberlin@dberlin.org">dberlin@dberlin.org</a>> wrote:<div><blockquote type="cite"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; position: static; z-index: auto;"><div style="word-wrap:break-word"><div>What I meant by saying "understanding independent objects inside a structure<b>"</b> to is that just by looking at an llvm type say for example struct { int A[100]; int B[100]} you and seeing two gep accesses that index into the first and second field of the type you can not necessarily say that the access is not overlapping without fully evaluating the address computation:</div>

</div></blockquote><div><br></div><div>Yes, you need to fully evaluate it, but that's not really related to TBAA :)</div></div></blockquote></div><div><br></div><div><br></div><div>I was thinking of type trees that encoded fields giving more guarantees than LLVM IR currently conveys. Admittedly, I have not though this through, whether this could even remotely work for C.</div><div><br></div><div>In LLVM IR:</div><div><br></div><div>struct { int A[100]; int B[100]} S;</div><div><br></div><div>ptr = gep S, 0, 0, x</div><div>ptr2 = gep S, 0, 1, y</div><div><br></div><div>= load ptr, !tbaa !"structS::A"</div><div>= load ptr2, !tbaa !"structS::B"</div><div><br></div><div>using this you could tell that ptr and ptr2 do not alias without knowing about x and y. Assuming certain language guarantees, which I don't know we could assume :).</div><div><br></div><div>Where the type dag looks something like:</div><div><br></div><div>           /->structS::A <- int</div><div>structS->structS::B <- int</div><div><br></div><div><blockquote type="cite"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div style="word-wrap:break-word"><div><br></div><div>struct { int A[100]; int B[100]} S;</div><div><br></div><div>ptr1 = getelementpr S, 0, 1, x</div><div>ptr2  = getelementpr S, 0, 0, y</div><div><br></div><div>ptr1 and ptr2 mayalias for some x and y (for example x = 0 and y = 100, the ir makes no guarantee that you cannot run over the length of an array within a struct as long as you are within the struct, <a href="http://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds" target="_blank">http://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds</a>)</div>

</div></blockquote><div>Yup.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>If you can symbolically reason about x and y - the full address computation then yes you can make more concise statements.</div>

<div><br></div><div>Correct me if I am wrong but this is how it understand the semantics of a gep. I would love to be able to treat a type like a tree :).</div><div><br></div><div>I have not had a look at what ben hardekopf is doing. Does he handle the case I mention above correctly?</div>

</div></blockquote><div><br></div><div>Yes.  Hardekopf's work is an extension of work done by David Pearce, who I worked with to get this right in all the weird edge cases, and implemented in GCC.  See <a href="http://dl.acm.org/citation.cfm?id=1290524">http://dl.acm.org/citation.cfm?id=1290524</a></div>

<div><br></div><div>(David made this work, Ben Hardekopf made it work *really fast*)</div><div><br></div><div>Even though C/C++ does not legally allow to happen what LLVM IR does, it was common enough that we had to assume it can anyway.</div>

<div> </div></div></blockquote><div><br></div>Okay.</div><div><br><blockquote type="cite"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>I know that the existing TBAA can not handle cases with struct fields, but an improved one might. </div>

</div></blockquote></div><br><div>TBAA is type based, and language specific.   At least in C/C++/most other languages i'm aware of, the TBAA tree will always say the above accesses can alias because they validly point to accesses of the same type.</div>

<div><br></div><div>your example above is the C equivalent of</div><div><br></div><div>struct a {</div><div>int b[100];</div><div>int c[100];</div><div>};</div><div><br></div><div>struct a foo;</div><div>int (*x[100]) = &foo.b;</div>

<div>int (*y[100]) = &foo.c;</div><div><br></div></blockquote></div><div><blockquote type="cite"><div>TBAA will *never* say that *x and *y are no-alias, because they are both validly access an int *.</div><div>The same is true for </div><div><br></div><div>int *z1 = *x;</div>

<div>int *z2 = *y;</div><div><br></div><div>TBAA will never say *z1 and *z2 are no-alias, because they both validly access ints.</div><div><br></div><div>So i can't see how you could extend TBAA to account for your example, since it's only about types, and both of these are the same type, but maybe i'm missing something?  <br></div></blockquote><div><br></div>I was thinking type dags or trees that encode the full access path, in the example above struct_a::b or struct_a::c.</div><div><br><blockquote type="cite"><div>

</div><div>If you mean you could make TBAA evaluate address computations and calculate whether these the resulting addresses ever access the same objects in memory, this isn't TBAA anymore, that is pointer analysis.  At least for inclusion based versions, it's O(N^3) worst case :P </div>

<br></blockquote><div><br></div>No, that is not what I thought of TBAA. I was looking for a powerful TBAA system to encode more knowledge about the access (guarantees that subfields of a structure do not overlap).</div><div><br></div><div><br><blockquote type="cite"><div>In any case the point of field sensitive pointer analysis above is *exactly* to say "these both point to different fields in the structure, and the bit size of the access guarantees they will never overrun".</div>

<div><br></div></blockquote><div><br></div>Okay, I was unsure about the bit size of access part :).</div><div><br></div><div><br></div><div><blockquote type="cite"><div>In the case of pointer loops that iterate over your example type, it will properly discover that it overruns into the next field, and properly mark it as pointing-to that field as well.</div>

</blockquote></div><div><br></div><div>So for the following example it will conservatively say arrayidx and arrayidx2 alias?</div><div><br></div><div>%struct.anon = type { [256 x i64], i64, [256 x i64] }</div><div><br></div><div>define void @foo(i32 %x, i32 %y) {</div><div><div>  %arrayidx = getelementptr inbounds %struct.anon* @Foo, i32 0, i32 0, i32 %x</div><div>  %0 = store i64* %arrayidx, align 8</div><div>  %arrayidx2 = getelementptr inbounds %struct.anon* @Foo, i32 0, i32 2, i32 %y</div><div>  %1 = store i64* %arrayidx2, align 8</div><div>}</div><div><br></div><div>Thanks,</div><div>Arnold</div></div></body></html>