<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jan 24, 2017 at 9:24 AM, Mehdi Amini <span dir="ltr"><<a href="mailto:mehdi.amini@apple.com" target="_blank">mehdi.amini@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><br><div><span class=""><blockquote type="cite"><div>On Jan 24, 2017, at 7:18 AM, Sanjay Patel <<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>> wrote:</div><br class="m_-1289848491661567520Apple-interchange-newline"><div><br class="m_-1289848491661567520Apple-interchange-newline"><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div class="gmail_quote" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">On Mon, Jan 23, 2017 at 10:53 PM, Mehdi Amini<span class="m_-1289848491661567520Apple-converted-space"> </span><span dir="ltr"><<a href="mailto:mehdi.amini@apple.com" target="_blank">mehdi.amini@apple.com</a>></span><span class="m_-1289848491661567520Apple-converted-space"> </span><wbr>wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><br><div><span class="m_-1289848491661567520gmail-"><blockquote type="cite"><div>On Jan 23, 2017, at 3:48 PM, Sanjay Patel via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="m_-1289848491661567520gmail-m_-7725396102183704639Apple-interchange-newline"><div><div dir="ltr"><div><div>All targets are likely affected in some way by the icmp+shl fold introduced with r292492. It's a basic pattern that occurs in lots of code. Did you see any perf wins on your targets with this commit?<br><br></div>Sadly, it is also likely that many (all?) targets are negatively impacted on the particular test (SingleSource/Benchmarks/Shoot<wbr>out/sieve) that you have pointed out here because the IR is now decidedly worse.<br><br>IMO, we should not revert the commit because it exposed shortcomings in the optimizer. It's an "obvious" fold/canonicalization, and the related 'nuw' variant of this fold has existed in trunk since:<br><a href="https://reviews.llvm.org/rL285729" target="_blank">https://reviews.llvm.org/rL285<wbr>729</a><br><br></div>We need to dissect what analysis/folds are missing to restore the IR to the better form that existed before, but this is probably going to be a long process because we treat min/max like an optimization fence.<span class="m_-1289848491661567520Apple-converted-space"> </span><br></div></div></blockquote><div><br></div></span><div>If this is gonna be a long process to recover, this looks like something to be reverted in the 4.0 branch (unless I missed that there is a correctness fix involved?).</div><div><br></div></div></div></blockquote><div><br></div><div>Nope - this is just about perf, not correctness. Of course, the intent was that this transform should only improve perf, so I wonder if we can pin any other perf changes from this commit.<br><br>I'm new to using the LNT site, but this should be the full set of results for the A53 machine in question with a baseline (r292491) before this patch and current (r292522) :<br><a href="http://llvm.org/perf/db_default/v4/nts/107364" target="_blank">http://llvm.org/perf/db_<wbr>default/v4/nts/107364</a><br><br>If these are reliable results, we have 2 perf wins (puzzle, gramschmidt) on the A53 machine. How do we determine the importance of the sieve benchmark vs. the rest of the suite?<br><br></div><div>An x86 machine doesn't show any regressions from this change:<br><a href="http://llvm.org/perf/db_default/v4/nts/107353" target="_blank">http://llvm.org/perf/db_<wbr>default/v4/nts/107353</a><br></div><div><br>Are there target-scope-based guidelines for when something is bad enough to revert?<br></div></div></div></blockquote><div><br></div></span><div>I don’t think we have any guidelines.</div><div><br></div><div>I think my suggestion was more about other regression that we would discover after the release, it was more of a “maturity” call: if we just notice a problem with the commit right before the release, it may not have been in tree long enough to get enough scrutiny.</div></div></div></blockquote><div><br></div><div>That makes sense. I have no stake in any particular branch, so I have no objection to revert from the release branch if that's what people would like to do. My preference is to keep it in trunk though because it should be a win in theory and reverting there would make it harder to find and debug problems like this one.<br></div><div><br><br><br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><br></div><div>(Also I thought this thread included a compile time regression, which on re-read it doesn’t).</div><div><br></div><div>— </div><span class="HOEnZb"><font color="#888888"><div>Mehdi</div></font></span><div><div class="h5"><div><br></div><br><blockquote type="cite"><div><div class="gmail_quote" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div><br>Also, we've absolutely destroyed perf (-48%) on the sieve benchmark on that A53 target since the baseline (r256803). There are multiple things to fix before we can truly recover?<br><br>Regardless of whether we revert or not, I am looking at how to clawback the IR from the r292492 regression. Here's one step towards that:<br><a href="https://reviews.llvm.org/D29053" target="_blank">https://reviews.llvm.org/<wbr>D29053</a><br></div><div><br></div><div>If we get lucky, we may be able to sidestep the min/max problem by folding harder before we reach that point in the optimization pipeline.<br></div><div><br><br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div><div></div><div>— </div><span class="m_-1289848491661567520gmail-HOEnZb"><font color="#888888"><div>Mehdi</div><div><br></div><div><br></div><br></font></span><blockquote type="cite"><div><div><div class="m_-1289848491661567520gmail-h5"><div dir="ltr"><br><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jan 23, 2017 at 11:13 AM, Evgeny Astigeevich<span class="m_-1289848491661567520Apple-converted-space"> </span><span dir="ltr"><<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.<wbr>Astigeevich@arm.com</a>></span><span class="m_-1289848491661567520Apple-converted-space"> </span>wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div lang="EN-US"><div class="m_-1289848491661567520gmail-m_-7725396102183704639m_5791940775744498920WordSection1"><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">Confirm there is no change in IR if the hack is disabled in the sources.<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">David wrote that these instructions are created by SCEV.<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">Are other targets affected by the changes, e.g. X86?<u></u><u></u></span></p><span><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">Kind regards,<br>Evgeny Astigeevich<br>Senior Compiler Engineer<br>Compilation Tools<br>ARM</span><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"><u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p></span><div style="border-width:medium medium medium 1.5pt;border-style:none none none solid;padding:0cm 0cm 0cm 4pt"><div><div style="border-width:1pt medium medium;border-style:solid none none;padding:3pt 0cm 0cm"><p class="MsoNormal"><b><span style="font-size:10pt;font-family:tahoma,sans-serif">From:</span></b><span style="font-size:10pt;font-family:tahoma,sans-serif"><span class="m_-1289848491661567520Apple-converted-space"> </span>Sanjay Patel [mailto:<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a><wbr>]<span class="m_-1289848491661567520Apple-converted-space"> </span><br><b>Sent:</b><span class="m_-1289848491661567520Apple-converted-space"> </span>Sunday, January 22, 2017 10:45 PM</span></p><div><div class="m_-1289848491661567520gmail-m_-7725396102183704639h5"><br><b>To:</b><span class="m_-1289848491661567520Apple-converted-space"> </span>Evgeny Astigeevich<br><b>Cc:</b><span class="m_-1289848491661567520Apple-converted-space"> </span>llvm-dev; nd<br><b>Subject:</b><span class="m_-1289848491661567520Apple-converted-space"> </span>Re: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines<u></u><u></u></div></div><div><br class="m_-1289848491661567520gmail-m_-7725396102183704639webkit-block-placeholder"></div></div></div><div><div class="m_-1289848491661567520gmail-m_-7725396102183704639h5"><p class="MsoNormal"><u></u> <u></u></p><div><p class="MsoNormal">I tried an experiment to remove the integer min/max bailouts from InstCombine, and it doesn't appear to change the IR in the attachment, so I doubt there's going to be any improvement.<br><br>If I haven't messed up this example, this is amazing:<br><a href="https://godbolt.org/g/yzoxeY" target="_blank">https://godbolt.org/g/yzoxeY</a><u></u><u></u></p></div><div><p class="MsoNormal"><u></u> <u></u></p><div><p class="MsoNormal">On Sun, Jan 22, 2017 at 1:06 PM, Evgeny Astigeevich <<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.Astigeevich@arm.com</a>> wrote:<u></u><u></u></p><div><div><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">Thank you for information.</span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">I’ll build clang without the hack and re-run the benchmark tomorrow.</span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">-Evgeny</span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><div style="border-width:medium medium medium 1.5pt;border-style:none none none solid;padding:0cm 0cm 0cm 4pt"><div><div style="border-width:1pt medium medium;border-style:solid none none;padding:3pt 0cm 0cm"><p class="MsoNormal"><b><span style="font-size:10pt;font-family:tahoma,sans-serif">From:</span></b><span style="font-size:10pt;font-family:tahoma,sans-serif"><span class="m_-1289848491661567520Apple-converted-space"> </span>Sanjay Patel [mailto:<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a><wbr>]<span class="m_-1289848491661567520Apple-converted-space"> </span><br><b>Sent:</b><span class="m_-1289848491661567520Apple-converted-space"> </span>Sunday, January 22, 2017 8:00 PM<br><b>To:</b><span class="m_-1289848491661567520Apple-converted-space"> </span>Evgeny Astigeevich<br><b>Cc:</b><span class="m_-1289848491661567520Apple-converted-space"> </span>llvm-dev; nd</span><u></u><u></u></p><div><div><p class="MsoNormal"><br><b>Subject:</b><span class="m_-1289848491661567520Apple-converted-space"> </span>Re: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines<u></u><u></u></p></div></div></div></div><div><div><p class="MsoNormal"> <u></u><u></u></p><div><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">> Do you mean to remove the hack in InstCombiner::visitICmpInst()?</span><u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">Yes. Although (this just came up in D28625 too) we might need to remove multiple versions of that in order to unlock optimization:</span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCompares.cpp#L4338" target="_blank">https://github.com/llvm-mirror<wbr>/llvm/blob/master/lib/Transfor<wbr>ms/InstCombine/InstCombineComp<wbr>ares.cpp#L4338</a></span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCasts.cpp#L470" target="_blank">https://github.com/llvm-mirror<wbr>/llvm/blob/master/lib/Transfor<wbr>ms/InstCombine/InstCombineCast<wbr>s.cpp#L470</a></span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstructionCombining.cpp#L803" target="_blank">https://github.com/llvm-mirror<wbr>/llvm/blob/master/lib/Transfor<wbr>ms/InstCombine/InstructionComb<wbr>ining.cpp#L803</a></span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L409" target="_blank">https://github.com/llvm-mirror<wbr>/llvm/blob/master/lib/Transfor<wbr>ms/InstCombine/InstCombineSimp<wbr>lifyDemanded.cpp#L409</a></span><u></u><u></u></p><div><p class="MsoNormal" style="margin-bottom:12pt"><br>Similar for FP:<br><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCompares.cpp#L4780" target="_blank">https://github.com/llvm-mirror<wbr>/llvm/blob/master/lib/Transfor<wbr>ms/InstCombine/InstCombineComp<wbr>ares.cpp#L4780</a></span><br><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCasts.cpp#L1376" target="_blank">https://github.com/llvm-mirror<wbr>/llvm/blob/master/lib/Transfor<wbr>ms/InstCombine/InstCombineCast<wbr>s.cpp#L1376</a></span><u></u><u></u></p></div><div><p class="MsoNormal"> <u></u><u></u></p><div><p class="MsoNormal">On Sun, Jan 22, 2017 at 12:40 PM, Evgeny Astigeevich <<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.Astigeevich@arm.com</a>> wrote:<u></u><u></u></p><div><div><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">Hi Sanjay,</span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">The benchmark source file:<span class="m_-1289848491661567520Apple-converted-space"> </span><a href="http://www.llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/Benchmarks/Shootout/sieve.c?view=markup" target="_blank">http://www.llvm.org/<wbr>viewvc/llvm-project/test-<wbr>suite/trunk/SingleSource/<wbr>Benchmarks/Shootout/sieve.c?<wbr>view=markup</a></span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">Clang options used to produce the initial IR: clang -DNDEBUG  -O3 -DNDEBUG -mcpu=cortex-a53 -fomit-frame-pointer -O3 -DNDEBUG   -w -Werror=date-time -c sieve.c -S -emit-llvm -mllvm -disable-llvm-optzns --target=aarch64-arm-linux</span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">Opt options: opt -O3 -o /dev/null -print-before-all -print-after-all sieve.ll >& sieve.log</span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">I used the IR (in attached sieve.zip) created with the r292487 version.</span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">The attached sieve contains the output of ‘-print-before-all -print-after-all’  for r292487 and rL292492.</span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">><span class="m_-1289848491661567520Apple-converted-space"> </span></span>If it's possible, can you remove that check locally, rebuild,<u></u><u></u></p><p class="MsoNormal">> and try the benchmark again on your system? I'd love to know<u></u><u></u></p><p class="MsoNormal">> if that change alone would solve the problem.<u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">Do you mean to remove the hack in InstCombiner::visitICmpInst()?</span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:12pt"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)">Kind regards,<br>Evgeny Astigeevich<br>Senior Compiler Engineer<br>Compilation Tools<br>ARM</span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11pt;font-family:calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><div style="border-width:medium medium medium 1.5pt;border-style:none none none solid;padding:0cm 0cm 0cm 4pt"><div><div style="border-width:1pt medium medium;border-style:solid none none;padding:3pt 0cm 0cm"><p class="MsoNormal"><b><span style="font-size:10pt;font-family:tahoma,sans-serif">From:</span></b><span style="font-size:10pt;font-family:tahoma,sans-serif"><span class="m_-1289848491661567520Apple-converted-space"> </span>Sanjay Patel [mailto:<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a><wbr>]<span class="m_-1289848491661567520Apple-converted-space"> </span><br><b>Sent:</b><span class="m_-1289848491661567520Apple-converted-space"> </span>Friday, January 20, 2017 6:16 PM<br><b>To:</b><span class="m_-1289848491661567520Apple-converted-space"> </span>Evgeny Astigeevich<br><b>Cc:</b><span class="m_-1289848491661567520Apple-converted-space"> </span>llvm-dev; Renato Golin;<span class="m_-1289848491661567520Apple-converted-space"> </span><a href="mailto:t.p.northover@gmail.com" target="_blank">t.p.northover@gmail.com</a><wbr>;<span class="m_-1289848491661567520Apple-converted-space"> </span><a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a><br><b>Subject:</b><span class="m_-1289848491661567520Apple-converted-space"> </span>Re: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines</span><u></u><u></u></p></div></div><div><div><p class="MsoNormal"> <u></u><u></u></p><div><div><div><div><div><div><div><div><p class="MsoNormal">Thanks for letting me know about this problem!<br><br>There's no 'shl nsw' visible in the earlier (r292487) code, so it would be better to see exactly what the IR looks like before that added transform fires.<u></u><u></u></p></div><p class="MsoNormal" style="margin-bottom:12pt"><br>But I see a red flag:<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%smax = select i1 %11, i64 %10, i64 8193<u></u><u></u></p></div><p class="MsoNormal" style="margin-bottom:12pt">The new icmp transform allowed us to create an smax, but we have this hack in InstCombiner::visitICmpInst():<br><br> <span class="m_-1289848491661567520Apple-converted-space"> </span>// Test if the ICmpInst instruction is used exclusively by a select as<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>// part of a minimum or maximum operation. If so, refrain from doing<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>// any other folding. This helps out other analyses which understand<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>// non-obfuscated minimum and maximum idioms, such as ScalarEvolution<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>// and CodeGen. And in this case, at least one of the comparison<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>// operands has at least one user besides the compare (the select),<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>// which would often largely negate the benefit of folding anyway.<u></u><u></u></p></div><p class="MsoNormal" style="margin-bottom:12pt">...so that prevented folding the icmp into the earlier math.<u></u><u></u></p></div><p class="MsoNormal">I am actively working on trying to get rid of that bail-out by improving min/max value tracking and icmp/select folding. In fact, we might be able to remove it right now, but I don't know the history of that code or what cases it was supposed to help.<u></u><u></u></p></div></div></div><div><div><div><div><p class="MsoNormal"> <u></u><u></u></p></div><div><p class="MsoNormal">If it's possible, can you remove that check locally, rebuild, and try the benchmark again on your system? I'd love to know if that change alone would solve the problem.<u></u><u></u></p></div><div><p class="MsoNormal"> <u></u><u></u></p><div><div><div><div><div><p class="MsoNormal"> <u></u><u></u></p><div><p class="MsoNormal">On Fri, Jan 20, 2017 at 10:11 AM, Evgeny Astigeevich <<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.Astigeevich@arm.com</a>> wrote:<u></u><u></u></p><p class="MsoNormal" style="margin-bottom:12pt">Hi,<br><br>We found that today's 17.30%/11.37% performance regressions in LNT SingleSource/Benchmarks/Shooto<wbr>ut/sieve on LNT-AArch64-A53-O3__clang_DEV_<wbr>_aarch64 and LNT-Thumb2v7-A15-O3__clang_DEV<wbr>__thumbv7 (<a href="http://llvm.org/perf/db_default/v4/nts/daily_report/2017/1/20?filter-machine-regex=aarch64%7Carm%7Cthumb%7Cgreen" target="_blank">http://llvm.org/perf/db_defau<wbr>lt/v4/nts/daily_report/2017/1/<wbr>20?filter-machine-regex=aarch6<wbr>4%7Carm%7Cthumb%7Cgreen</a>) are caused by changes [rL292492] in InstCombine:<br><br><a href="https://reviews.llvm.org/D28406" target="_blank">https://reviews.llvm.org/D2840<wbr>6</a><span class="m_-1289848491661567520Apple-converted-space"> </span>"[InstCombine] icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1"<br><br>The Loop Vectorizer generates code with more instructions:<br><br>==== Loop Vectorizer from rL292492  ====<br>for.body5:                                        ; preds = %for.inc16.for.body5_crit_edge<wbr>, %for.cond.preheader<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ]<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2, %for.cond.preheader ]<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%2 = add i64 %indvar, 2<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%3 = shl i64 %indvar, 1<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%4 = add i64 %3, 4<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%5 = add i64 %indvar, 2<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%6 = shl i64 %indvar, 1<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%7 = add i64 %6, 4<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%8 = add i64 %indvar, 2<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%9 = mul i64 %indvar, 3<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%10 = add i64 %9, 6<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%11 = icmp sgt i64 %10, 8193<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%smax = select i1 %11, i64 %10, i64 8193<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%12 = mul i64 %indvar, -2<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%13 = add i64 %12, -5<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%14 = add i64 %smax, %13<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%15 = add i64 %indvar, 2<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%16 = udiv i64 %14, %15<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%17 = add i64 %16, 1<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%tobool7 = icmp eq i8 %1, 0<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>br i1 %tobool7, label %for.inc16, label %if.then<br>==============================<wbr>==<br><br>The code generated by the Loop Vectorizer before the changes:<br><br>==== Loop Vectorizer from rL292487 ====<br>for.body5:                                        ; preds = %for.inc16.for.body5_crit_edge<wbr>, %for.cond.preheader<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ]<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2, %for.cond.preheader ]<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%2 = add i64 %indvar, 2<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%3 = shl i64 %indvar, 1<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%4 = add i64 %3, 4<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%5 = add i64 %indvar, 2<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%6 = shl i64 %indvar, 1<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%7 = add i64 %6, 4<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%8 = add i64 %indvar, 2<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%9 = mul i64 %indvar, -2<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%10 = add i64 %9, 8188<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%11 = add i64 %indvar, 2<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%12 = udiv i64 %10, %11<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%13 = add i64 %12, 1<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>%tobool7 = icmp eq i8 %1, 0<br> <span class="m_-1289848491661567520Apple-converted-space"> </span>br i1 %tobool7, label %for.inc16, label %if.then<br>==============================<wbr>==<br><br>I have not investigated yet why the behaviour of the Vectorizer is changed.<br><br>Kind regards,<br>Evgeny Astigeevich<br>Senior Compiler Engineer<br>Compilation Tools<br>ARM<u></u><u></u></p></div><p class="MsoNormal"> <u></u><u></u></p></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div><p class="MsoNormal"> <u></u><u></u></p></div></div></div></div></div></div></div></div><p class="MsoNormal"><u></u> <u></u></p></div></div></div></div></div></div></blockquote></div><br></div></div></div><span class="m_-1289848491661567520gmail-">______________________________<wbr>_________________<br>LLVM Developers mailing list<br><a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a></span></div></blockquote></div></div></blockquote></div></div></blockquote></div></div></div><br></div></blockquote></div><br></div></div>