<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Aug 17, 2017 at 4:49 PM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-m_8289637616720673898gmail-">
<br>
<div class="gmail-m_8289637616720673898gmail-m_5209512876004350455moz-cite-prefix">On 08/17/2017 04:49 PM, Daniel Berlin
via llvm-dev wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
<br>
For two given access sequences we can determine if the
accessed<br>
objects are allowed to overlap by the rules of the input<br>
language.</blockquote>
<div><br>
</div>
<div>Sadly, this is where this becomes "unlikely to want to
use to replace TBAA", at least for me. It may still be a
thing we want anyway.</div>
</div>
</div>
</div>
</blockquote>
<br></span>
The rules proposed by Ivan for handling C/C++ seem pretty generic.
</div></blockquote><div><br></div><div>Uh?</div><div><br></div><div>"</div><br style="font-size:12.8px"><span style="font-size:12.8px"> [X...] is allowed to overlap with [S1..., X..., S2...]</span><br style="font-size:12.8px"><span style="font-size:12.8px"> and the most generic access sequence is [X...].</span><br style="font-size:12.8px"><br style="font-size:12.8px"><span style="font-size:12.8px"> [X1..., X2...] is allowed to overlap with [S1..., X1...]</span><br style="font-size:12.8px"><span style="font-size:12.8px"> with the most generic access sequence to be [X1...].</span><br style="font-size:12.8px"><br style="font-size:12.8px"><span style="font-size:12.8px"> [X1..., U, U::m1, X2...] is allowed to overlap with</span><br style="font-size:12.8px"><span style="font-size:12.8px"> [S1..., X1..., U, U::m2, S2...]</span><br style="font-size:12.8px"><span style="font-size:12.8px"> for a union U of an unknown effective type, provided m1 != m2</span><br style="font-size:12.8px"><span style="font-size:12.8px"> and the most generic access sequence is [X1..., U].</span><br style="font-size:12.8px"><br style="font-size:12.8px"><span style="font-size:12.8px">If neither of the given sequences contains the leading access</span><br style="font-size:12.8px"><span style="font-size:12.8px">type of the other, then they are allowed to overlap if the</span><br style="font-size:12.8px"><span style="font-size:12.8px">leading access type of one sequence is a direct or indirect field</span><br style="font-size:12.8px"><span style="font-size:12.8px">of the final access type of the other sequence and then the most</span><br style="font-size:12.8px"><span style="font-size:12.8px">generic access sequence consists of a single element, which is</span><br style="font-size:12.8px"><span style="font-size:12.8px">that final access type.</span><br style="font-size:12.8px"><br style="font-size:12.8px"><span style="font-size:12.8px">For the purpose of determining whether one type is a direct or</span><br style="font-size:12.8px"><span style="font-size:12.8px">indirect member of another type unions are considered to have no</span><br style="font-size:12.8px"><span style="font-size:12.8px">members as accesses to members of unions are only allowed to</span><br style="font-size:12.8px"><div><span style="font-size:12.8px">overlap if they have the base union object explicitly specified."</span></div><div><br></div><div><br></div><div><span style="font-size:12.8px"></span>This sounds super specific to me. It talks about unions, and how they behave in C/C++, etc.</div><div>It has initial sequence rules, etc.</div><div>These would make no sense for "Rust", "Go", or "Haskell" at all.</div><div> <br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"> We generally explain our current TBAA rules by saying that they're
generic but motivated by C/C++ rules.</div></blockquote><div><br></div><div>We do say that but that's not really what our implementation does in any way. </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"> I could say the same thing
about this proposed system with its proposed rules.</div></blockquote><div>It is, IMHO, nowhere near as generic.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><span class="gmail-m_8289637616720673898gmail-"> <br></span></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><span class="gmail-m_8289637616720673898gmail-">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>This scheme is really an encoding of C/C++ TBAA info so
it can be read by LLVM and requires that *LLVM* have some
set of rules that it enforces about that scheme.</div>
<div>But that scheme is still very language specific in how
it is used.</div>
<div>GCC still has something in between this and LLVM, where
the language rules are a bit encoded (but not as much as
you have).<br>
</div>
<div><br>
</div>
<div>We (and gcc) have deliberately avoided such schemes, in
favor of transforming the info into abstract set trees
that then tag loads and stores.</div>
<div><br>
</div>
<div>The encoding of "struct path" tbaa, is just a way of
trading space vs time in that encoding. We trade walking
time for the space of transitive closure, etc.</div>
</div>
</div>
</div>
</blockquote>
<br></span>
If the provided statistic of 15% holds up, maybe which way we go in
this trade-off space doesn't matter much?</div></blockquote><div><br></div><div>Sure, that part i'm indifferent about. </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><span class="gmail-m_8289637616720673898gmail-"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>None of the TBAA in LLVM really has any *real* relation
to the original type system rules, and that is
deliberate. I've argued for years that calling it "TBAA"
just badly confuses people, and i believe here is a good
example :)</div>
<div><br>
</div>
<div>So i don't think what you've written can be used to
replace TBAA.</div>
</div>
</div>
</div>
</blockquote>
<br></span>
So *if* we just take the proposed rules for C/C++ as the rules for
the scheme in general, I'm not sure this is true. </div></blockquote><div><br></div><div>Okay, let me try to rephrase a little more precisely:<br><div>It is almost certainly true that we *could* make *any* particular lowering work, at different costs to the front ends and middle ends. IE i totally agree that the power of all of these systems (with small extensions) are equal. and also, for the record, i certainly think the rules/scheme described here would almost certainly produce better optimization results than what is currently implemented.</div><div><br></div>*But*</div><div>I don't see how you use this as a general scheme without ending up forcing the other frontends to lower things to look as if it was a C/C++ based type system.</div><div>This is definitely *not* true today.</div><div><br></div><div>They only have to lower things to look like a tree of sets. Having been down this path before, it's definitely easier to lower things to look like trees of sets than like groups of C/C++ structs/bitfields/etc that follow certain rules.</div><div><br></div><div>This is one of the reasons I'm honestly having a lot of trouble seeing how this is is definitively better than what IBM's proposal was, which seemed both more generic and incremental, and handled most concerns.</div><div><br></div><div>The only practical difference i see is that this just changes the scheme from one where the language rules are mostly in the front ends to one where the language rules are also partially in llvm. To me that isn't necessarily better, just different (heck, i'm even more used to that system because it's a push closer to gcc :P)</div><div><br></div><div>So overall I don't see one as particularly better than the other.</div><div>You seem to.</div><div>So i'd really like to understand your perspective on this in terms of the pro/con that you are seeing.</div><div><br></div><div>I mean, i'm really not even opposed (though others may be) in principle to saying "hey, let's just have a set of language specific tbaa passes with semi-common metadata".</div><div>I'd just like to understand why we would change tradeoffs, etc.</div><div><br></div></div></div></div>