<div dir="ltr"><div>+hal</div><div><br></div>Hi,<div><br></div><div>I'm not an expert on all of our AA bits, but I've been in the area some. If I'm wrong, I'm sure someone will jump in and we'll both end up learning something. :)</div><div><br></div><div>> <span style="font-size:12.8px">Why the current TBAA implementation returns NoAlias for pairs of</span><br style="font-size:12.8px"><span style="font-size:12.8px">> accesses for which it has no idea if they really </span><span class="m_-3396648458328114042m_9066362456197145343m_-5061944160054058013m_-3648170215437888668m_648271161869807912gmail-m_956479450418697647gmail-il" style="font-size:12.8px;background-color:rgb(255,255,255)">alias</span><span style="font-size:12.8px">?</span><br><div><div><br></div><div>For the same reason that LLVM assumes many other things it can't prove. The language says "if you do ${x}, the behavior is undefined," so optimizing on the assumption that the user isn't doing ${x} is OK.</div><div><br></div><div>That said, optimizing on aliasing assumptions has been contentious in the past, so frontends that emit TBAA data (e.g. clang) generally support flags like -fno-strict-aliasing to relax this behavior.</div><div><br></div><div>> <span style="font-size:12.8px">that I would expect to a) generate [...]</span><span style="font-size:12.8px"> warnings and [...]</span></div><div><br></div><div>FWIW, it's generally nontrivial to emit helpful warnings from LLVM. This is because LLVM doesn't have the AST that the IR was generated from, so it has no source-level information to speak of (except the bits we encode in debuginfo, but that doesn't exactly make for great diags).</div><div><br></div><div>In other words, a warning emitted by LLVM would probably be as helpful as "hey, somewhere in the function '_Z3fooi', or one of the functions that we happened to inline into it, we found some pretty obvious type punning."</div><div><br></div><div>> <span style="font-size:12.8px">it's hard to imagine how a pair of accesses can be</span></div><span style="font-size:12.8px">> aliasing and non-aliasing at the same time</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">As you've noted, this is an artifact of taking two logically independent pieces of information and merging them into one. </span><span style="font-size:12.8px">While this might not be ideal from a theoretical point of view, I don't see why an optimization would care about the difference between "proven NoAlias" and "proven illegal to alias."</span><span style="font-size:12.8px"> In either case, it's allowed to optimize based on the assumption that those things don't alias. So, from a practical standpoint, I don't understand what making that distinction would buy us.</span></div><div><span style="font-size:12.8px"><br></span></div><div><div>> <span style="font-size:12.8px">we are trying </span><span style="font-size:12.8px">to use BasicAA as a faster replacement for</span></div><div><span style="font-size:12.8px">> TBAA in simple cases</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">AIUI, the idea of putting BasicAA in front of TBAA isn't an efficiency hack: its purpose is to catch "obvious" type punning, and handle it gracefully for the user (your 'b)' above).</span></div></div></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">You might find Hal's recent work on TySan (<a href="https://reviews.llvm.org/D32197" target="_blank">https://reviews.llvm.org/D321<wbr>97</a>) interesting; it's a sanitizer that tries to catch and complain about type punning at run-time. :)</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">George</span></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 11, 2017 at 1:50 AM, Ivan A. Kosarev via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<br>

<br>

I'm trying to get the whole picture of what we are doing in terms<br>

of alias analysis in LLVM. Being new to this I may be missing<br>

some well known things so my apologies if I seem to ask obvious<br>

things.<br>

<br>

There are lots of questions arise in my head as I read related<br>

bug reports, sources and documentation, of which looks to be the<br>

most simple is,<br>

<br>

Why the current TBAA implementation returns NoAlias for pairs of<br>

accesses for which it has no idea if they really alias?<br>

<br>

Let me elaborate on this. My understanding of what the TBAA<br>

implementation is supposed to respond is whether a given pair of<br>

accesses is allowed to alias by the rules of the input language.<br>

(Which in turn leads me to another question, that is, whether<br>

we are really trying to perform C/C++ alias analysis in a<br>

[language-neutral] codegen pass, but let's postpone this for<br>

later.)<br>

<br>

This consideration assumes that what the TBAA implementation<br>

actually returns as a result is orthogonal to the<br>

Must/No/MayAlias set, that used to be the result type of various<br>

AA, including TBAA. I would expect that the result of the<br>

complete alias analysis includes both the information on whether<br>

given accesses alias and, as a separate element, whether they are<br>

allowed to alias by the rules of the language. Then we may have<br>

combinations like (MustAlias, AllowedAlias) that seem to be the<br>

common case and combinations like (MustAlias, NotAllowedAlias)<br>

that I would expect to a) generate the breaks-alias-rules kind of<br>

warnings and b) proceed further as any other MustAlias case.<br>

<br>

The latter combination is what I would expect for illegal type<br>

puns, for whatever definition of "illegal". Note that for such<br>

cases we currently get MustAlias from BasicAA and NoAlias from<br>

TBAA. In some comments this is considered okay whereas I have to<br>

admit that I don't understand how that may possibly be okay,<br>

meaning it's hard to imagine how a pair of accesses can be<br>

aliasing and non-aliasing at the same time. To me it appears<br>

like that the root cause if this inconsistency is not that any of<br>

these analyses is not correct, but that we are trying to<br>

represent their responses in an inadequate form and we are trying<br>

to use BasicAA as a faster replacement for TBAA in simple cases.<br>

<br>

Any thoughts are highly appreciated.<br>

<br>

Thanks,<span class="m_-3396648458328114042m_9066362456197145343m_-5061944160054058013m_-3648170215437888668m_648271161869807912HOEnZb"><font color="#888888"><br>

<br>

-- <br>

<br>

______________________________<wbr>_________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

</font></span></blockquote></div><br></div></div>