<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:Helvetica;
        panose-1:2 11 6 4 2 2 2 2 2 4;}
@font-face
        {font-family:Helvetica;
        panose-1:2 11 6 4 2 2 2 2 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0cm;
        margin-right:0cm;
        margin-bottom:0cm;
        margin-left:36.0pt;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
span.m-1289848491661567520apple-converted-space
        {mso-style-name:m_-1289848491661567520apple-converted-space;}
span.m-1289848491661567520gmail-
        {mso-style-name:m_-1289848491661567520gmail-;}
span.hoenzb
        {mso-style-name:hoenzb;}
span.m-1289848491661567520gmail-hoenzb
        {mso-style-name:m_-1289848491661567520gmail-hoenzb;}
span.EmailStyle21
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri","sans-serif";}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:2.0cm 42.5pt 2.0cm 3.0cm;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Hi Sanjay,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Thank you for your analysis. It’s interesting why the x86 machine is not affected. Maybe the x86 backend is smarter than the AArch64 backend, or it might be
 micro-architectural differences.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I don’t mind to keep the changes on trunk.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">What I’d like to see is who will/should be involved in solving the issue. What kind of help/support is needed? Should we (ARM Compilation Tools) start digging
 into the issue and fix it because the issue affects our future ARM Compiler 6 releases?
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">We can provide help with validating that patches fix the issue and don’t introduce new ones.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Kind regards,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Evgeny Astigeevich<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Senior Compiler Engineer<br>
Compilation Tools<br>
ARM</span><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Sanjay Patel [mailto:spatel@rotateright.com]
<br>
<b>Sent:</b> Tuesday, January 24, 2017 4:55 PM<br>
<b>To:</b> Mehdi Amini<br>
<b>Cc:</b> Evgeny Astigeevich; llvm-dev; nd<br>
<b>Subject:</b> Re: [llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">On Tue, Jan 24, 2017 at 9:24 AM, Mehdi Amini <<a href="mailto:mehdi.amini@apple.com" target="_blank">mehdi.amini@apple.com</a>> wrote:<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal">On Jan 24, 2017, at 7:18 AM, Sanjay Patel <<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>> wrote:<o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal"><br>
<br>
<o:p></o:p></p>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">On Mon, Jan 23, 2017 at 10:53 PM, Mehdi Amini<span class="m-1289848491661567520apple-converted-space"> </span><<a href="mailto:mehdi.amini@apple.com" target="_blank">mehdi.amini@apple.com</a>><span class="m-1289848491661567520apple-converted-space"> </span>wrote:<o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><o:p> </o:p></span></p>
<div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">On Jan 23, 2017, at 3:48 PM, Sanjay Patel via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><o:p> </o:p></span></p>
<div>
<div>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">All targets are likely affected in some way by the icmp+shl fold introduced with r292492. It's a basic pattern that occurs in lots of code.
 Did you see any perf wins on your targets with this commit?<o:p></o:p></span></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">Sadly, it is also likely that many (all?) targets are negatively impacted on the particular test (SingleSource/Benchmarks/Shootout/sieve) that
 you have pointed out here because the IR is now decidedly worse.<br>
<br>
IMO, we should not revert the commit because it exposed shortcomings in the optimizer. It's an "obvious" fold/canonicalization, and the related 'nuw' variant of this fold has existed in trunk since:<br>
<a href="https://reviews.llvm.org/rL285729" target="_blank">https://reviews.llvm.org/rL285729</a><o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">We need to dissect what analysis/folds are missing to restore the IR to the better form that existed before, but this is probably going to be a long process because we treat
 min/max like an optimization fence.<span class="m-1289848491661567520apple-converted-space"> </span><o:p></o:p></span></p>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">If this is gonna be a long process to recover, this looks like something to be reverted in the 4.0 branch (unless I missed that there is a correctness fix involved?).<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><o:p> </o:p></span></p>
</div>
</div>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">Nope - this is just about perf, not correctness. Of course, the intent was that this transform should only improve perf, so I wonder if we can
 pin any other perf changes from this commit.<br>
<br>
I'm new to using the LNT site, but this should be the full set of results for the A53 machine in question with a baseline (r292491) before this patch and current (r292522) :<br>
<a href="http://llvm.org/perf/db_default/v4/nts/107364" target="_blank">http://llvm.org/perf/db_default/v4/nts/107364</a><br>
<br>
If these are reliable results, we have 2 perf wins (puzzle, gramschmidt) on the A53 machine. How do we determine the importance of the sieve benchmark vs. the rest of the suite?<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">An x86 machine doesn't show any regressions from this change:<br>
<a href="http://llvm.org/perf/db_default/v4/nts/107353" target="_blank">http://llvm.org/perf/db_default/v4/nts/107353</a><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><br>
Are there target-scope-based guidelines for when something is bad enough to revert?<o:p></o:p></span></p>
</div>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I don’t think we have any guidelines.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I think my suggestion was more about other regression that we would discover after the release, it was more of a “maturity” call: if we just notice a problem with the commit right before the release, it may not have been in tree long enough
 to get enough scrutiny.<o:p></o:p></p>
</div>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">That makes sense. I have no stake in any particular branch, so I have no objection to revert from the release branch if that's what people would like to do. My preference is to keep it in trunk though because it should be a win in theory
 and reverting there would make it harder to find and debug problems like this one.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><br>
<br>
<br>
 <o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">(Also I thought this thread included a compile time regression, which on re-read it doesn’t).<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">— <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="color:#888888">Mehdi<o:p></o:p></span></p>
</div>
<div>
<div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal"><br>
<br>
<o:p></o:p></p>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><br>
Also, we've absolutely destroyed perf (-48%) on the sieve benchmark on that A53 target since the baseline (r256803). There are multiple things to fix before we can truly recover?<br>
<br>
Regardless of whether we revert or not, I am looking at how to clawback the IR from the r292492 regression. Here's one step towards that:<br>
<a href="https://reviews.llvm.org/D29053" target="_blank">https://reviews.llvm.org/D29053</a><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">If we get lucky, we may be able to sidestep the min/max problem by folding harder before we reach that point in the optimization pipeline.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><br>
<br>
 <o:p></o:p></span></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">— <o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif";color:#888888">Mehdi<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif";color:#888888"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif";color:#888888"><o:p> </o:p></span></p>
</div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif";color:#888888"><br>
<br>
</span><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><o:p></o:p></span></p>
<div>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><o:p> </o:p></span></p>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><o:p> </o:p></span></p>
</div>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><o:p> </o:p></span></p>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">On Mon, Jan 23, 2017 at 11:13 AM, Evgeny Astigeevich<span class="m-1289848491661567520apple-converted-space"> </span><<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.Astigeevich@arm.com</a>><span class="m-1289848491661567520apple-converted-space"> </span>wrote:<o:p></o:p></span></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Confirm there is no change in IR if the hack is disabled in the sources.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">David wrote that these instructions are created by SCEV.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Are other targets affected by the changes, e.g. X86?</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Kind regards,<br>
Evgeny Astigeevich<br>
Senior Compiler Engineer<br>
Compilation Tools<br>
ARM</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<div style="border:none;border-left:solid windowtext 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid windowtext 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span class="m-1289848491661567520apple-converted-space"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Sanjay
 Patel [mailto:<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>]<span class="m-1289848491661567520apple-converted-space"> </span><br>
<b>Sent:</b><span class="m-1289848491661567520apple-converted-space"> </span>Sunday, January 22, 2017 10:45 PM</span><o:p></o:p></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><br>
<b>To:</b><span class="m-1289848491661567520apple-converted-space"> </span>Evgeny Astigeevich<br>
<b>Cc:</b><span class="m-1289848491661567520apple-converted-space"> </span>llvm-dev; nd<br>
<b>Subject:</b><span class="m-1289848491661567520apple-converted-space"> </span>Re: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines<o:p></o:p></span></p>
</div>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><o:p> </o:p></span></p>
</div>
</div>
</div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">I tried an experiment to remove the integer min/max bailouts from InstCombine, and it doesn't appear to change the IR in the attachment, so I doubt there's going to be any improvement.<br>
<br>
If I haven't messed up this example, this is amazing:<br>
<a href="https://godbolt.org/g/yzoxeY" target="_blank">https://godbolt.org/g/yzoxeY</a><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">On Sun, Jan 22, 2017 at 1:06 PM, Evgeny Astigeevich <<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.Astigeevich@arm.com</a>> wrote:<o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Thank you for information.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I’ll build clang without the hack and re-run the benchmark tomorrow.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">-Evgeny</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<div style="border:none;border-left:solid windowtext 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid windowtext 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span class="m-1289848491661567520apple-converted-space"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Sanjay
 Patel [mailto:<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>]<span class="m-1289848491661567520apple-converted-space"> </span><br>
<b>Sent:</b><span class="m-1289848491661567520apple-converted-space"> </span>Sunday, January 22, 2017 8:00 PM<br>
<b>To:</b><span class="m-1289848491661567520apple-converted-space"> </span>Evgeny Astigeevich<br>
<b>Cc:</b><span class="m-1289848491661567520apple-converted-space"> </span>llvm-dev; nd</span><o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><br>
<b>Subject:</b><span class="m-1289848491661567520apple-converted-space"> </span>Re: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines<o:p></o:p></p>
</div>
</div>
</div>
</div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> Do you mean to remove the hack in InstCombiner::visitICmpInst()?</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Yes. Although (this just came up in D28625 too) we might need to remove multiple versions of that
 in order to unlock optimization:</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCompares.cpp#L4338" target="_blank">https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCompares.cpp#L4338</a></span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCasts.cpp#L470" target="_blank">https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCasts.cpp#L470</a></span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstructionCombining.cpp#L803" target="_blank">https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstructionCombining.cpp#L803</a></span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L409" target="_blank">https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L409</a></span><o:p></o:p></p>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;margin-bottom:12.0pt"><br>
Similar for FP:<br>
<span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCompares.cpp#L4780" target="_blank">https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCompares.cpp#L4780</a></span><br>
<span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCasts.cpp#L1376" target="_blank">https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCasts.cpp#L1376</a></span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">On Sun, Jan 22, 2017 at 12:40 PM, Evgeny Astigeevich <<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.Astigeevich@arm.com</a>> wrote:<o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Hi Sanjay,</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">The benchmark source file:<span class="m-1289848491661567520apple-converted-space"> </span><a href="http://www.llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/Benchmarks/Shootout/sieve.c?view=markup" target="_blank">http://www.llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/Benchmarks/Shootout/sieve.c?view=markup</a></span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Clang options used to produce the initial IR: clang -DNDEBUG  -O3 -DNDEBUG -mcpu=cortex-a53 -fomit-frame-pointer
 -O3 -DNDEBUG   -w -Werror=date-time -c sieve.c -S -emit-llvm -mllvm -disable-llvm-optzns --target=aarch64-arm-linux</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Opt options: opt -O3 -o /dev/null -print-before-all -print-after-all sieve.ll >& sieve.log</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I used the IR (in attached sieve.zip) created with the r292487 version.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">The attached sieve contains the output of ‘-print-before-all -print-after-all’  for r292487 and rL292492.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">><span class="m-1289848491661567520apple-converted-space"> </span></span>If it's possible, can you
 remove that check locally, rebuild,<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">> and try the benchmark again on your system? I'd love to know<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">> if that change alone would solve the problem.<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Do you mean to remove the hack in InstCombiner::visitICmpInst()?</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;margin-bottom:12.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Kind regards,<br>
Evgeny Astigeevich<br>
Senior Compiler Engineer<br>
Compilation Tools<br>
ARM</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<div style="border:none;border-left:solid windowtext 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid windowtext 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span class="m-1289848491661567520apple-converted-space"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Sanjay
 Patel [mailto:<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>]<span class="m-1289848491661567520apple-converted-space"> </span><br>
<b>Sent:</b><span class="m-1289848491661567520apple-converted-space"> </span>Friday, January 20, 2017 6:16 PM<br>
<b>To:</b><span class="m-1289848491661567520apple-converted-space"> </span>Evgeny Astigeevich<br>
<b>Cc:</b><span class="m-1289848491661567520apple-converted-space"> </span>llvm-dev; Renato Golin;<span class="m-1289848491661567520apple-converted-space"> </span><a href="mailto:t.p.northover@gmail.com" target="_blank">t.p.northover@gmail.com</a>;<span class="m-1289848491661567520apple-converted-space"> </span><a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a><br>
<b>Subject:</b><span class="m-1289848491661567520apple-converted-space"> </span>Re: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Thanks for letting me know about this problem!<br>
<br>
There's no 'shl nsw' visible in the earlier (r292487) code, so it would be better to see exactly what the IR looks like before that added transform fires.<o:p></o:p></p>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;margin-bottom:12.0pt"><br>
But I see a red flag:<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%smax = select i1 %11, i64 %10, i64 8193<o:p></o:p></p>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;margin-bottom:12.0pt">The new icmp transform allowed us to create an smax, but we have this hack in InstCombiner::visitICmpInst():<br>
<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>// Test if the ICmpInst instruction is used exclusively by a select as<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>// part of a minimum or maximum operation. If so, refrain from doing<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>// any other folding. This helps out other analyses which understand<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>// non-obfuscated minimum and maximum idioms, such as ScalarEvolution<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>// and CodeGen. And in this case, at least one of the comparison<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>// operands has at least one user besides the compare (the select),<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>// which would often largely negate the benefit of folding anyway.<o:p></o:p></p>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;margin-bottom:12.0pt">...so that prevented folding the icmp into the earlier math.<o:p></o:p></p>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">I am actively working on trying to get rid of that bail-out by improving min/max value tracking and icmp/select folding. In fact, we might be able to remove it right now, but I
 don't know the history of that code or what cases it was supposed to help.<o:p></o:p></p>
</div>
</div>
</div>
<div>
<div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">If it's possible, can you remove that check locally, rebuild, and try the benchmark again on your system? I'd love to know if that change alone would solve the problem.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">On Fri, Jan 20, 2017 at 10:11 AM, Evgeny Astigeevich <<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.Astigeevich@arm.com</a>> wrote:<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;margin-bottom:12.0pt">Hi,<br>
<br>
We found that today's 17.30%/11.37% performance regressions in LNT SingleSource/Benchmarks/Shootout/sieve on LNT-AArch64-A53-O3__clang_DEV__aarch64 and LNT-Thumb2v7-A15-O3__clang_DEV__thumbv7 (<a href="http://llvm.org/perf/db_default/v4/nts/daily_report/2017/1/20?filter-machine-regex=aarch64%7Carm%7Cthumb%7Cgreen" target="_blank">http://llvm.org/perf/db_default/v4/nts/daily_report/2017/1/20?filter-machine-regex=aarch64%7Carm%7Cthumb%7Cgreen</a>)
 are caused by changes [rL292492] in InstCombine:<br>
<br>
<a href="https://reviews.llvm.org/D28406" target="_blank">https://reviews.llvm.org/D28406</a><span class="m-1289848491661567520apple-converted-space"> </span>"[InstCombine] icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1"<br>
<br>
The Loop Vectorizer generates code with more instructions:<br>
<br>
==== Loop Vectorizer from rL292492  ====<br>
for.body5:                                        ; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ]<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2, %for.cond.preheader ]<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%2 = add i64 %indvar, 2<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%3 = shl i64 %indvar, 1<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%4 = add i64 %3, 4<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%5 = add i64 %indvar, 2<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%6 = shl i64 %indvar, 1<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%7 = add i64 %6, 4<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%8 = add i64 %indvar, 2<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%9 = mul i64 %indvar, 3<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%10 = add i64 %9, 6<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%11 = icmp sgt i64 %10, 8193<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%smax = select i1 %11, i64 %10, i64 8193<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%12 = mul i64 %indvar, -2<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%13 = add i64 %12, -5<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%14 = add i64 %smax, %13<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%15 = add i64 %indvar, 2<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%16 = udiv i64 %14, %15<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%17 = add i64 %16, 1<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%tobool7 = icmp eq i8 %1, 0<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>br i1 %tobool7, label %for.inc16, label %if.then<br>
================================<br>
<br>
The code generated by the Loop Vectorizer before the changes:<br>
<br>
==== Loop Vectorizer from rL292487 ====<br>
for.body5:                                        ; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ]<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2, %for.cond.preheader ]<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%2 = add i64 %indvar, 2<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%3 = shl i64 %indvar, 1<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%4 = add i64 %3, 4<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%5 = add i64 %indvar, 2<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%6 = shl i64 %indvar, 1<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%7 = add i64 %6, 4<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%8 = add i64 %indvar, 2<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%9 = mul i64 %indvar, -2<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%10 = add i64 %9, 8188<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%11 = add i64 %indvar, 2<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%12 = udiv i64 %10, %11<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%13 = add i64 %12, 1<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>%tobool7 = icmp eq i8 %1, 0<br>
 <span class="m-1289848491661567520apple-converted-space"> </span>br i1 %tobool7, label %for.inc16, label %if.then<br>
================================<br>
<br>
I have not investigated yet why the behaviour of the Vectorizer is changed.<br>
<br>
Kind regards,<br>
Evgeny Astigeevich<br>
Senior Compiler Engineer<br>
Compilation Tools<br>
ARM<o:p></o:p></p>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><o:p> </o:p></span></p>
</div>
</div>
</div>
<p class="MsoNormal"><span class="m-1289848491661567520gmail-"><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif"">_______________________________________________</span></span><span style="font-size:9.0pt;font-family:"Helvetica","sans-serif""><br>
<span class="m-1289848491661567520gmail-">LLVM Developers mailing list</span><br>
<span class="m-1289848491661567520gmail-"><a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a></span><br>
<span class="m-1289848491661567520gmail-"><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></span><o:p></o:p></span></p>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</blockquote>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</div>
</div>
</body>
</html>