<div dir="ltr"><div><div>Hi Evgeny,<br></div><div><br></div>I started looking at the log files that you attached, and I'm confused. The code that is supposedly causing the perf regression is created by the loop vectorizer, right? Except the bad code is not in the "vector.body", so is there something peculiar about this benchmark that the hot loop is not the vector loop? But there's another mystery: there are no vector ops in the "vector.body"!<br><br></div>Although I'd hope that InstCombine and friends can clean up any mess made by the vectorizer, maybe the faster way to solve this problem is to stop the vectorizer from doing bogus (and in this case, harmful) work? I don't know the vectorizer code, so I can't offer much help on that side. See how the IR post-vectorizer differs between A53 and an x86 target since x86 didn't regress?<br><br><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jan 24, 2017 at 1:55 PM, Evgeny Astigeevich <span dir="ltr"><<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.Astigeevich@arm.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">





<div link="blue" vlink="purple" lang="EN-US">
<div class="m_2569825986882614929WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Hi Sanjay,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Thank you for your analysis. It’s interesting why the x86 machine is not affected. Maybe the x86 backend is smarter than the AArch64 backend, or it might be
 micro-architectural differences.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">I don’t mind to keep the changes on trunk.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">What I’d like to see is who will/should be involved in solving the issue. What kind of help/support is needed? Should we (ARM Compilation Tools) start digging
 into the issue and fix it because the issue affects our future ARM Compiler 6 releases?
<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">We can provide help with validating that patches fix the issue and don’t introduce new ones.<u></u><u></u></span></p><span class="">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Kind regards,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Evgeny Astigeevich<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Senior Compiler Engineer<br>
Compilation Tools<br>
ARM</span><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
</span><div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Sanjay Patel [mailto:<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a><wbr>]
<br>
<b>Sent:</b> Tuesday, January 24, 2017 4:55 PM<br>
<b>To:</b> Mehdi Amini<br>
<b>Cc:</b> Evgeny Astigeevich; llvm-dev; nd<br>
<b>Subject:</b> Re: [llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines<u></u><u></u></span></p>
</div>
</div><div><div class="h5">
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">On Tue, Jan 24, 2017 at 9:24 AM, Mehdi Amini <<a href="mailto:mehdi.amini@apple.com" target="_blank">mehdi.amini@apple.com</a>> wrote:<u></u><u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal">On Jan 24, 2017, at 7:18 AM, Sanjay Patel <<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>> wrote:<u></u><u></u></p>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal"><br>
<br>
<u></u><u></u></p>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">On Mon, Jan 23, 2017 at 10:53 PM, Mehdi Amini<span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span><<a href="mailto:mehdi.amini@apple.com" target="_blank">mehdi.amini@apple.com</a>><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span><wbr>wrote:<u></u><u></u></span></p>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><u></u> <u></u></span></p>
<div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">On Jan 23, 2017, at 3:48 PM, Sanjay Patel via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<u></u><u></u></span></p>
</div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><u></u> <u></u></span></p>
<div>
<div>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">All targets are likely affected in some way by the icmp+shl fold introduced with r292492. It's a basic pattern that occurs in lots of code.
 Did you see any perf wins on your targets with this commit?<u></u><u></u></span></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">Sadly, it is also likely that many (all?) targets are negatively impacted on the particular test (SingleSource/Benchmarks/<wbr>Shootout/sieve) that
 you have pointed out here because the IR is now decidedly worse.<br>
<br>
IMO, we should not revert the commit because it exposed shortcomings in the optimizer. It's an "obvious" fold/canonicalization, and the related 'nuw' variant of this fold has existed in trunk since:<br>
<a href="https://reviews.llvm.org/rL285729" target="_blank">https://reviews.llvm.org/<wbr>rL285729</a><u></u><u></u></span></p>
</div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">We need to dissect what analysis/folds are missing to restore the IR to the better form that existed before, but this is probably going to be a long process because we treat
 min/max like an optimization fence.<span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span><u></u><u></u></span></p>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">If this is gonna be a long process to recover, this looks like something to be reverted in the 4.0 branch (unless I missed that there is a correctness fix involved?).<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><u></u> <u></u></span></p>
</div>
</div>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">Nope - this is just about perf, not correctness. Of course, the intent was that this transform should only improve perf, so I wonder if we can
 pin any other perf changes from this commit.<br>
<br>
I'm new to using the LNT site, but this should be the full set of results for the A53 machine in question with a baseline (r292491) before this patch and current (r292522) :<br>
<a href="http://llvm.org/perf/db_default/v4/nts/107364" target="_blank">http://llvm.org/perf/db_<wbr>default/v4/nts/107364</a><br>
<br>
If these are reliable results, we have 2 perf wins (puzzle, gramschmidt) on the A53 machine. How do we determine the importance of the sieve benchmark vs. the rest of the suite?<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">An x86 machine doesn't show any regressions from this change:<br>
<a href="http://llvm.org/perf/db_default/v4/nts/107353" target="_blank">http://llvm.org/perf/db_<wbr>default/v4/nts/107353</a><u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><br>
Are there target-scope-based guidelines for when something is bad enough to revert?<u></u><u></u></span></p>
</div>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">I don’t think we have any guidelines.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">I think my suggestion was more about other regression that we would discover after the release, it was more of a “maturity” call: if we just notice a problem with the commit right before the release, it may not have been in tree long enough
 to get enough scrutiny.<u></u><u></u></p>
</div>
</div>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">That makes sense. I have no stake in any particular branch, so I have no objection to revert from the release branch if that's what people would like to do. My preference is to keep it in trunk though because it should be a win in theory
 and reverting there would make it harder to find and debug problems like this one.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><br>
<br>
<br>
 <u></u><u></u></p>
</div>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">(Also I thought this thread included a compile time regression, which on re-read it doesn’t).<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">— <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><span style="color:#888888">Mehdi<u></u><u></u></span></p>
</div>
<div>
<div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<p class="MsoNormal"><br>
<br>
<u></u><u></u></p>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><br>
Also, we've absolutely destroyed perf (-48%) on the sieve benchmark on that A53 target since the baseline (r256803). There are multiple things to fix before we can truly recover?<br>
<br>
Regardless of whether we revert or not, I am looking at how to clawback the IR from the r292492 regression. Here's one step towards that:<br>
<a href="https://reviews.llvm.org/D29053" target="_blank">https://reviews.llvm.org/<wbr>D29053</a><u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">If we get lucky, we may be able to sidestep the min/max problem by folding harder before we reach that point in the optimization pipeline.<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><br>
<br>
 <u></u><u></u></span></p>
</div>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">— <u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif";color:#888888">Mehdi<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif";color:#888888"><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif";color:#888888"><u></u> <u></u></span></p>
</div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif";color:#888888"><br>
<br>
</span><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><u></u><u></u></span></p>
<div>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><u></u> <u></u></span></p>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><u></u> <u></u></span></p>
</div>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><u></u> <u></u></span></p>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">On Mon, Jan 23, 2017 at 11:13 AM, Evgeny Astigeevich<span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span><<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.<wbr>Astigeevich@arm.com</a>><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>wrote:<u></u><u></u></span></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Confirm there is no change in IR if the hack is disabled in the sources.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">David wrote that these instructions are created by SCEV.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Are other targets affected by the changes, e.g. X86?</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Kind regards,<br>
Evgeny Astigeevich<br>
Senior Compiler Engineer<br>
Compilation Tools<br>
ARM</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p>
<div style="border:none;border-left:solid windowtext 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid windowtext 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Sanjay
 Patel [mailto:<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a><wbr>]<span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span><br>
<b>Sent:</b><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>Sunday, January 22, 2017 10:45 PM</span><u></u><u></u></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><br>
<b>To:</b><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>Evgeny Astigeevich<br>
<b>Cc:</b><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>llvm-dev; nd<br>
<b>Subject:</b><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>Re: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines<u></u><u></u></span></p>
</div>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><u></u> <u></u></span></p>
</div>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<p class="MsoNormal">I tried an experiment to remove the integer min/max bailouts from InstCombine, and it doesn't appear to change the IR in the attachment, so I doubt there's going to be any improvement.<br>
<br>
If I haven't messed up this example, this is amazing:<br>
<a href="https://godbolt.org/g/yzoxeY" target="_blank">https://godbolt.org/g/yzoxeY</a><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<p class="MsoNormal">On Sun, Jan 22, 2017 at 1:06 PM, Evgeny Astigeevich <<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.Astigeevich@arm.com</a>> wrote:<u></u><u></u></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Thank you for information.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">I’ll build clang without the hack and re-run the benchmark tomorrow.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">-Evgeny</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p>
<div style="border:none;border-left:solid windowtext 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid windowtext 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Sanjay
 Patel [mailto:<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a><wbr>]<span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span><br>
<b>Sent:</b><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>Sunday, January 22, 2017 8:00 PM<br>
<b>To:</b><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>Evgeny Astigeevich<br>
<b>Cc:</b><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>llvm-dev; nd</span><u></u><u></u></p>
<div>
<div>
<p class="MsoNormal"><br>
<b>Subject:</b><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>Re: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines<u></u><u></u></p>
</div>
</div>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">> Do you mean to remove the hack in InstCombiner::visitICmpInst()?</span><u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Yes. Although (this just came up in D28625 too) we might need to remove multiple versions of that
 in order to unlock optimization:</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCompares.cpp#L4338" target="_blank">https://github.com/llvm-<wbr>mirror/llvm/blob/master/lib/<wbr>Transforms/InstCombine/<wbr>InstCombineCompares.cpp#L4338</a></span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCasts.cpp#L470" target="_blank">https://github.com/llvm-<wbr>mirror/llvm/blob/master/lib/<wbr>Transforms/InstCombine/<wbr>InstCombineCasts.cpp#L470</a></span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstructionCombining.cpp#L803" target="_blank">https://github.com/llvm-<wbr>mirror/llvm/blob/master/lib/<wbr>Transforms/InstCombine/<wbr>InstructionCombining.cpp#L803</a></span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L409" target="_blank">https://github.com/llvm-<wbr>mirror/llvm/blob/master/lib/<wbr>Transforms/InstCombine/<wbr>InstCombineSimplifyDemanded.<wbr>cpp#L409</a></span><u></u><u></u></p>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
Similar for FP:<br>
<span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCompares.cpp#L4780" target="_blank">https://github.com/llvm-<wbr>mirror/llvm/blob/master/lib/<wbr>Transforms/InstCombine/<wbr>InstCombineCompares.cpp#L4780</a></span><br>
<span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCasts.cpp#L1376" target="_blank">https://github.com/llvm-<wbr>mirror/llvm/blob/master/lib/<wbr>Transforms/InstCombine/<wbr>InstCombineCasts.cpp#L1376</a></span><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<p class="MsoNormal">On Sun, Jan 22, 2017 at 12:40 PM, Evgeny Astigeevich <<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.Astigeevich@arm.com</a>> wrote:<u></u><u></u></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Hi Sanjay,</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">The benchmark source file:<span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span><a href="http://www.llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/Benchmarks/Shootout/sieve.c?view=markup" target="_blank">http://www.llvm.org/<wbr>viewvc/llvm-project/test-<wbr>suite/trunk/SingleSource/<wbr>Benchmarks/Shootout/sieve.c?<wbr>view=markup</a></span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Clang options used to produce the initial IR: clang -DNDEBUG  -O3 -DNDEBUG -mcpu=cortex-a53 -fomit-frame-pointer
 -O3 -DNDEBUG   -w -Werror=date-time -c sieve.c -S -emit-llvm -mllvm -disable-llvm-optzns --target=aarch64-arm-linux</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Opt options: opt -O3 -o /dev/null -print-before-all -print-after-all sieve.ll >& sieve.log</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">I used the IR (in attached sieve.zip) created with the r292487 version.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">The attached sieve contains the output of ‘-print-before-all -print-after-all’  for r292487 and rL292492.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span></span>If it's possible, can you
 remove that check locally, rebuild,<u></u><u></u></p>
<p class="MsoNormal">> and try the benchmark again on your system? I'd love to know<u></u><u></u></p>
<p class="MsoNormal">> if that change alone would solve the problem.<u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Do you mean to remove the hack in InstCombiner::visitICmpInst()?</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Kind regards,<br>
Evgeny Astigeevich<br>
Senior Compiler Engineer<br>
Compilation Tools<br>
ARM</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p>
<div style="border:none;border-left:solid windowtext 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid windowtext 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Sanjay
 Patel [mailto:<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a><wbr>]<span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span><br>
<b>Sent:</b><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>Friday, January 20, 2017 6:16 PM<br>
<b>To:</b><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>Evgeny Astigeevich<br>
<b>Cc:</b><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>llvm-dev; Renato Golin;<span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span><a href="mailto:t.p.northover@gmail.com" target="_blank">t.p.northover@gmail.com</a><wbr>;<span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span><a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a><br>
<b>Subject:</b><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>Re: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal">Thanks for letting me know about this problem!<br>
<br>
There's no 'shl nsw' visible in the earlier (r292487) code, so it would be better to see exactly what the IR looks like before that added transform fires.<u></u><u></u></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
But I see a red flag:<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%smax = select i1 %11, i64 %10, i64 8193<u></u><u></u></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt">The new icmp transform allowed us to create an smax, but we have this hack in InstCombiner::visitICmpInst():<br>
<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>// Test if the ICmpInst instruction is used exclusively by a select as<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>// part of a minimum or maximum operation. If so, refrain from doing<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>// any other folding. This helps out other analyses which understand<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>// non-obfuscated minimum and maximum idioms, such as ScalarEvolution<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>// and CodeGen. And in this case, at least one of the comparison<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>// operands has at least one user besides the compare (the select),<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>// which would often largely negate the benefit of folding anyway.<u></u><u></u></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt">...so that prevented folding the icmp into the earlier math.<u></u><u></u></p>
</div>
<p class="MsoNormal">I am actively working on trying to get rid of that bail-out by improving min/max value tracking and icmp/select folding. In fact, we might be able to remove it right now, but I
 don't know the history of that code or what cases it was supposed to help.<u></u><u></u></p>
</div>
</div>
</div>
<div>
<div>
<div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">If it's possible, can you remove that check locally, rebuild, and try the benchmark again on your system? I'd love to know if that change alone would solve the problem.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<p class="MsoNormal">On Fri, Jan 20, 2017 at 10:11 AM, Evgeny Astigeevich <<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.Astigeevich@arm.com</a>> wrote:<u></u><u></u></p>
<p class="MsoNormal" style="margin-bottom:12.0pt">Hi,<br>
<br>
We found that today's 17.30%/11.37% performance regressions in LNT SingleSource/Benchmarks/<wbr>Shootout/sieve on LNT-AArch64-A53-O3__clang_DEV_<wbr>_aarch64 and LNT-Thumb2v7-A15-O3__clang_<wbr>DEV__thumbv7 (<a href="http://llvm.org/perf/db_default/v4/nts/daily_report/2017/1/20?filter-machine-regex=aarch64%7Carm%7Cthumb%7Cgreen" target="_blank">http://llvm.org/perf/db_<wbr>default/v4/nts/daily_report/<wbr>2017/1/20?filter-machine-<wbr>regex=aarch64%7Carm%7Cthumb%<wbr>7Cgreen</a>)
 are caused by changes [rL292492] in InstCombine:<br>
<br>
<a href="https://reviews.llvm.org/D28406" target="_blank">https://reviews.llvm.org/<wbr>D28406</a><span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>"[InstCombine] icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1"<br>
<br>
The Loop Vectorizer generates code with more instructions:<br>
<br>
==== Loop Vectorizer from rL292492  ====<br>
for.body5:                                        ; preds = %for.inc16.for.body5_crit_<wbr>edge, %for.cond.preheader<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ]<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2, %for.cond.preheader ]<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%2 = add i64 %indvar, 2<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%3 = shl i64 %indvar, 1<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%4 = add i64 %3, 4<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%5 = add i64 %indvar, 2<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%6 = shl i64 %indvar, 1<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%7 = add i64 %6, 4<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%8 = add i64 %indvar, 2<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%9 = mul i64 %indvar, 3<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%10 = add i64 %9, 6<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%11 = icmp sgt i64 %10, 8193<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%smax = select i1 %11, i64 %10, i64 8193<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%12 = mul i64 %indvar, -2<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%13 = add i64 %12, -5<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%14 = add i64 %smax, %13<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%15 = add i64 %indvar, 2<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%16 = udiv i64 %14, %15<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%17 = add i64 %16, 1<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%tobool7 = icmp eq i8 %1, 0<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>br i1 %tobool7, label %for.inc16, label %if.then<br>
==============================<wbr>==<br>
<br>
The code generated by the Loop Vectorizer before the changes:<br>
<br>
==== Loop Vectorizer from rL292487 ====<br>
for.body5:                                        ; preds = %for.inc16.for.body5_crit_<wbr>edge, %for.cond.preheader<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ]<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2, %for.cond.preheader ]<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%2 = add i64 %indvar, 2<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%3 = shl i64 %indvar, 1<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%4 = add i64 %3, 4<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%5 = add i64 %indvar, 2<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%6 = shl i64 %indvar, 1<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%7 = add i64 %6, 4<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%8 = add i64 %indvar, 2<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%9 = mul i64 %indvar, -2<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%10 = add i64 %9, 8188<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%11 = add i64 %indvar, 2<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%12 = udiv i64 %10, %11<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%13 = add i64 %12, 1<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>%tobool7 = icmp eq i8 %1, 0<br>
 <span class="m_2569825986882614929m-1289848491661567520apple-converted-space"> </span>br i1 %tobool7, label %for.inc16, label %if.then<br>
==============================<wbr>==<br>
<br>
I have not investigated yet why the behaviour of the Vectorizer is changed.<br>
<br>
Kind regards,<br>
Evgeny Astigeevich<br>
Senior Compiler Engineer<br>
Compilation Tools<br>
ARM<u></u><u></u></p>
</div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><u></u> <u></u></span></p>
</div>
</div>
</div>
<p class="MsoNormal"><span class="m_2569825986882614929m-1289848491661567520gmail-"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">______________________________<wbr>_________________</span></span><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><br>
<span class="m_2569825986882614929m-1289848491661567520gmail-">LLVM Developers mailing list</span><br>
<span class="m_2569825986882614929m-1289848491661567520gmail-"><a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a></span><br>
<span class="m_2569825986882614929m-1289848491661567520gmail-"><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a></span><u></u><u></u></span></p>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</blockquote>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
</div></div></div>
</div>
</div>

</blockquote></div><br></div>