<div dir="ltr"><div>Hi Alina,</div><div><br></div><div>Thanks for your reply! That makes sense to me. <br></div><div>Regarding the other point I mentioned (the missing AAQI parameters in some of the function calls), do you think they make sense, or is it intentional that you use the non-batchAA version there?</div><div>My current understanding is that changing them to use BatchAA might expose more issues that triggers the `isValueEqualInPotentialCycles` bug, but if you have the bug fixed, make them use BatchAA as well could probably improve performance a bit.</div><div><br></div><div>Best,</div><div>Haoran<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Alina Sbirlea <<a href="mailto:asbirlea@google.com">asbirlea@google.com</a>> 于2020年10月23日周五 上午11:21写道：<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Some clarifications to the description in <a href="https://reviews.llvm.org/D89991" rel="noreferrer" target="_blank">https://reviews.llvm.org/D89991</a>:</div><div>The clearing of the cache is not supposed to be done in BatchAA. The patch introducing BatchAA was not a refactoring patch, and it did not accidentally miss the lines to clear those caches.</div><div><br></div><div>Nikita flagged the issue and I looked into it. The condition that's broken in BatchAA is the `isValueEqualInPotentialCycles`. This condition can be satisfied in a query and a result cached, but the condition broken in a subsequent query. Working out a fix is not straightforward.</div><div>Yes, clearing the whole cache is a big hammer we can use, and, yes, it will lose all the performance benefits.</div><div><br></div><div>For visibility, I uploaded a WIP fix:</div><div><a href="https://reviews.llvm.org/D90062" rel="noreferrer" target="_blank">https://reviews.llvm.org/D90062</a><br></div><div>There are still problems I have yet to solve in the above patch, hence the lack of reviewers. However, please feel free to test it and let me know if it resolves your failure.</div><div><br></div><div>Best,</div><div>Alina</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 22, 2020 at 6:47 AM Haoran Xu <<a href="mailto:haoranxu510@gmail.com" target="_blank">haoranxu510@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I don't know anything about LLVM internals, but solely from the commit message of your diff I think it's very likely. <br></div><div>So my current understanding is that the ``shrink_and_clear()`` is required to produce correct aliasing result, but if we naively do it for AAQI (as my patch proposed), we probably lose most of the performance benefits of BatchedAA. <br></div><div><br></div><div>However, this is only my understanding after spending 2 hours reading the code from zero background knowledge, so I guess I can't say much about it.</div><div> <br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Nikita Popov <<a href="mailto:nikita.ppv@gmail.com" target="_blank">nikita.ppv@gmail.com</a>> 于2020年10月22日周四 上午6:41写道：<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I suspect that you found the same issue as <a href="https://github.com/llvm/llvm-project/commit/ddd0f083184feb489684f87cf28d5eb10e0a244e" target="_blank">https://github.com/llvm/llvm-project/commit/ddd0f083184feb489684f87cf28d5eb10e0a244e</a>. We're trying to find a solution to that, but it's not entirely simple.<br></div><div><br></div><div>Nikita<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 22, 2020 at 3:33 PM Haoran Xu via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I managed to bisected out the diff introducing the bug: <a href="https://github.com/llvm/llvm-project/commit/bfc779e491099213d74c057b5727bde976c7ba02" target="_blank">https://github.com/llvm/llvm-project/commit/bfc779e491099213d74c057b5727bde976c7ba02</a></div><div><br></div><div>And I managed to figured out a fix: it seems (to me) that the diff is mostly a code refactoring diff, but the author accidentally removed two lines of functional logic, resulting in the bug.</div><div>After adding the two lines back, the miscompiled code works fine now. However, I'm not certain of the performance implications of clearing the whole AAQI.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><span style="font-family:monospace">diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp b/llvm/lib/Analysis/BasicAliasAnalysis.cpp<br>index 232132958f7..9a8ca46093d 100644<br>--- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp<br>+++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp<br>@@ -836,6 +836,8 @@ AliasResult BasicAAResult::alias(const MemoryLocation &LocA,<br>   AliasResult Alias = aliasCheck(LocA.Ptr, LocA.Size, LocA.AATags, LocB.Ptr,<br>                                  LocB.Size, LocB.AATags, AAQI);<br> <br>+  AAQI.AliasCache.shrink_and_clear();<br>+  AAQI.IsCapturedCache.shrink_and_clear();<br>   VisitedPhiBBs.clear();<br>   return Alias;<br> }</span></div></blockquote><div><br></div><div>I also noticed that the author forgot to change a few function calls to use his new API, but I'm not sure if it is a correctness issue or not. At least it's not causing the brainfxxk code miscompilation.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><span style="font-family:monospace">diff --git a/llvm/lib/Analysis/AliasAnalysis.cpp b/llvm/lib/Analysis/AliasAnalysis.cpp<br>index 06a33f659c1..94a9d12eb17 100644<br>--- a/llvm/lib/Analysis/AliasAnalysis.cpp<br>+++ b/llvm/lib/Analysis/AliasAnalysis.cpp<br>@@ -231,7 +231,7 @@ ModRefInfo AAResults::getModRefInfo(const CallBase *Call,<br> <br>   // If Loc is a constant memory location, the call definitely could not<br>   // modify the memory location.<br>-  if (isModSet(Result) && pointsToConstantMemory(Loc, /*OrLocal*/ false))<br>+  if (isModSet(Result) && pointsToConstantMemory(Loc, AAQI, /*OrLocal*/ false))<br>     Result = clearMod(Result);<br> <br>   return Result;<br>@@ -308,7 +308,7 @@ ModRefInfo AAResults::getModRefInfo(const CallBase *Call1,<br> <br>       // ModRefC1 indicates what Call1 might do to Call2ArgLoc, and we use<br>       // above ArgMask to update dependence info.<br>-      ModRefInfo ModRefC1 = getModRefInfo(Call1, Call2ArgLoc);<br>+      ModRefInfo ModRefC1 = getModRefInfo(Call1, Call2ArgLoc, AAQI);<br>       ArgMask = intersectModRef(ArgMask, ModRefC1);<br> <br>       // Conservatively clear IsMustAlias unless only MustAlias is found.<br>@@ -349,7 +349,7 @@ ModRefInfo AAResults::getModRefInfo(const CallBase *Call1,<br>       // might Mod Call1ArgLoc, then we care about either a Mod or a Ref by<br>       // Call2. If Call1 might Ref, then we care only about a Mod by Call2.<br>       ModRefInfo ArgModRefC1 = getArgModRefInfo(Call1, Call1ArgIdx);<br>-      ModRefInfo ModRefC2 = getModRefInfo(Call2, Call1ArgLoc);<br>+      ModRefInfo ModRefC2 = getModRefInfo(Call2, Call1ArgLoc, AAQI);<br>       if ((isModSet(ArgModRefC1) && isModOrRefSet(ModRefC2)) ||<br>           (isRefSet(ArgModRefC1) && isModSet(ModRefC2)))<br>         R = intersectModRef(unionModRef(R, ArgModRefC1), Result);</span></div></blockquote><div><br></div><div>I'll submit a patch for review, but I would really appreciate any comments here as well.</div><div><br></div><div>Thanks.<br></div><div><br></div><div> Best,</div><div>Haoran<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Haoran Xu <<a href="mailto:haoranxu510@gmail.com" target="_blank">haoranxu510@gmail.com</a>> 于2020年10月22日周四 上午12:12写道：<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi David,</div><div><br></div><div>I just tried creduce, but it generated a code with completely undefined behavior (out of range accesses, etc).</div><div>The "interesting" criteria I used is "timeout in -O1 but finishes in 20s in -O0". However, it turns out that the undefined behavior in creduce-generated code just makes the criteria "happens to" pass.<br></div><div><br></div><div>Best,</div><div>Haoran<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Haoran Xu <<a href="mailto:haoranxu510@gmail.com" target="_blank">haoranxu510@gmail.com</a>> 于2020年10月21日周三 下午10:32写道：<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I was just able to determine the offending IR code before and after the transformation. I'm now almost certain it's a bug in LLVM.</div><div><br></div><div>Before transformation, we have the following IR (I renamed all %xxx for brevity):</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><span style="font-family:monospace">%1 = load i8, i8* %0, align 1<br>%2 = add i8 %1, -1<br>store i8 %2, i8* %0, align 1</span></div></blockquote><div>The above IR is inside a loop, so the value in %0 can be different in each run.<br></div><div><br></div><div>The optimization pass changed the IR above to the following:</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><span style="font-family:monospace">store i8 %3, i8* %0, align 1</span></div></blockquote><div>where %3 is defined by</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><span style="font-family:monospace">%4 = load i8, i8* %0, align 1<br>%3 = add i8 %4, -1</span></div></blockquote><div>in an earlier piece of IR.</div><div><br></div><div>Apparently the pass treated %3 the same thing as %2 and it fired CSE, without realizing that the content in %0 may have been changed by the loop.<br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">David Blaikie <<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>> 于2020年10月21日周三 下午10:18写道：<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Might be worth running the c source file through creduce or similar to narrow it down a bit that way too.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Oct 21, 2020 at 9:12 PM Haoran Xu via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>A further bisect using opt's <span style="font-family:monospace">-opt-bisect-limit</span> option shows that the following pass is causing the issue:</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>BISECT: running pass (39) Early CSE w/ MemorySSA on function (main)<br></div></blockquote><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Haoran Xu <<a href="mailto:haoranxu510@gmail.com" target="_blank">haoranxu510@gmail.com</a>> 于2020年10月21日周三 下午9:00写道：<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I did a simple bisect on clang version, and it seems like clang 8.0.0 works correctly, but clang 9.0.0 failed to compile the code correctly.</div><div><a href="https://godbolt.org/z/676Grr" target="_blank">https://godbolt.org/z/676Grr</a>  <- if you change the clang version to 8.0.0, you will see the expected output in 'output' section.</div><div>I don't have the ability to bisect on clang git history. I would greatly appreciate it if any one is willing to do that.<br></div><div><br></div><div>Thanks!<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Haoran Xu <<a href="mailto:haoranxu510@gmail.com" target="_blank">haoranxu510@gmail.com</a>> 于2020年10月21日周三 下午8:47写道：<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hello,</div><div><br></div><div>I'm really amazed to find out that 

under -O3, a simple piece of C code generated from a brainfxxk-to-C 

transpiler is miscompiled. <br></div><div>As one probably know, the C code transpiled from brainfxxk only contains 3 kind of statements: <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>(1) ++(*ptr) / --(*ptr) <br></div><div>(2) ++ptr / --ptr <br></div><div>(3) while (*ptr) { ... }</div></blockquote><div> where ptr is a uint8_t*. <br></div><div>So

 it seems very clear to me that the code contains no undefined behavior 

(the pointer is uint8_t* and unsigned integer overflow is not UD). <br></div><div><br> </div><div>After further investigation, it seems like clang compiled this loop:</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div> <span style="font-family:monospace">while (*ptr) {<br>                         --(*ptr);<br>                    ++ptr;<br>                       ++(*ptr);<br>                    --ptr;<br>                     }</span></div></blockquote><div> to an unconditional infinite loop under -O3, resulting in the bug. The code snippet above seems completely benign to me. <br></div><div><br></div><div>I attached the offending program. With <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>clang a.c -O0</div></blockquote><div>it

 worked fine (it should print out an ASCII-art picture of mandelbrot 

fracture). However, with -O1 or -O3, it goes into a dead loop (in the 

code snippet above) after printing out a few characters.</div><div><br></div><div>I also tried UndefinedBehaviorSanitizer. Strangely, when compiling using <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>clang a.c -O3  -fsanitize=undefined</div></blockquote><div>the code worked again, with no infinite loop, and no undefined behavior reported.</div><div><br></div><div>So it seems to me a LLVM optimizer bug. I would greatly appreciate if any one is willing to investigate.</div><div><br></div><div>Best,</div><div>Haoran<div><br><br></div></div></div>

</blockquote></div>

</blockquote></div>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div>

</blockquote></div>

</blockquote></div>

</blockquote></div>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div></div>

</blockquote></div>

</blockquote></div>

</blockquote></div>