<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


<meta name="Generator" content="Microsoft Word 14 (filtered medium)">


<style><!--


/* Font Definitions */


@font-face


        {font-family:Calibri;


        panose-1:2 15 5 2 2 2 4 3 2 4;}


@font-face


        {font-family:Tahoma;


        panose-1:2 11 6 4 3 5 4 4 2 4;}


/* Style Definitions */


p.MsoNormal, li.MsoNormal, div.MsoNormal


        {margin:0cm;


        margin-bottom:.0001pt;


        font-size:12.0pt;


        font-family:"Times New Roman","serif";}


a:link, span.MsoHyperlink


        {mso-style-priority:99;


        color:blue;


        text-decoration:underline;}


a:visited, span.MsoHyperlinkFollowed


        {mso-style-priority:99;


        color:purple;


        text-decoration:underline;}


p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph


        {mso-style-priority:34;


        margin-top:0cm;


        margin-right:0cm;


        margin-bottom:0cm;


        margin-left:36.0pt;


        margin-bottom:.0001pt;


        font-size:12.0pt;


        font-family:"Times New Roman","serif";}


span.EmailStyle17


        {mso-style-type:personal-reply;


        font-family:"Calibri","sans-serif";


        color:#1F497D;}


.MsoChpDefault


        {mso-style-type:export-only;


        font-family:"Calibri","sans-serif";}


@page WordSection1


        {size:612.0pt 792.0pt;


        margin:2.0cm 42.5pt 2.0cm 3.0cm;}


div.WordSection1


        {page:WordSection1;}


--></style><!--[if gte mso 9]><xml>


<o:shapedefaults v:ext="edit" spidmax="1026" />


</xml><![endif]--><!--[if gte mso 9]><xml>


<o:shapelayout v:ext="edit">


<o:idmap v:ext="edit" data="1" />


</o:shapelayout></xml><![endif]-->


</head>


<body lang="EN-US" link="blue" vlink="purple">


<div class="WordSection1">


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Hi Sanjay,<o:p></o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">The benchmark source file:


<a href="http://www.llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/Benchmarks/Shootout/sieve.c?view=markup">


http://www.llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/Benchmarks/Shootout/sieve.c?view=markup</a><o:p></o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Clang options used to produce the initial IR: clang -DNDEBUG  -O3 -DNDEBUG -mcpu=cortex-a53 -fomit-frame-pointer -O3 -DNDEBUG   -w -Werror=date-time -c sieve.c


 -S -emit-llvm -mllvm -disable-llvm-optzns --target=aarch64-arm-linux<o:p></o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Opt options: opt -O3 -o /dev/null -print-before-all -print-after-all sieve.ll >& sieve.log<o:p></o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I used the IR (in attached sieve.zip) created with the r292487 version.<o:p></o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">The attached sieve contains the output of ‘-print-before-all -print-after-all’  for r292487 and rL292492.<o:p></o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>


</span>If it's possible, can you remove that check locally, rebuild,<o:p></o:p></p>


<p class="MsoNormal">> and try the benchmark again on your system? I'd love to know<o:p></o:p></p>


<p class="MsoNormal">> if that change alone would solve the problem.<o:p></o:p></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Do you mean to remove the hack in InstCombiner::visitICmpInst()?<o:p></o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Kind regards,<br>


Evgeny Astigeevich<br>


Senior Compiler Engineer<br>


Compilation Tools<br>


ARM<br>


<br>


<o:p></o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>


<div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">


<div>


<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">


<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Sanjay Patel [mailto:spatel@rotateright.com]


<br>


<b>Sent:</b> Friday, January 20, 2017 6:16 PM<br>


<b>To:</b> Evgeny Astigeevich<br>


<b>Cc:</b> llvm-dev; Renato Golin; t.p.northover@gmail.com; hfinkel@anl.gov<br>


<b>Subject:</b> Re: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines<o:p></o:p></span></p>


</div>


</div>


<p class="MsoNormal"><o:p> </o:p></p>


<div>


<div>


<div>


<div>


<div>


<div>


<div>


<div>


<p class="MsoNormal">Thanks for letting me know about this problem!<br>


<br>


There's no 'shl nsw' visible in the earlier (r292487) code, so it would be better to see exactly what the IR looks like before that added transform fires.<o:p></o:p></p>


</div>


<p class="MsoNormal" style="margin-bottom:12.0pt"><br>


But I see a red flag:<br>


  %smax = select i1 %11, i64 %10, i64 8193<o:p></o:p></p>


</div>


<p class="MsoNormal" style="margin-bottom:12.0pt">The new icmp transform allowed us to create an smax, but we have this hack in InstCombiner::visitICmpInst():<br>


<br>


  // Test if the ICmpInst instruction is used exclusively by a select as<br>


  // part of a minimum or maximum operation. If so, refrain from doing<br>


  // any other folding. This helps out other analyses which understand<br>


  // non-obfuscated minimum and maximum idioms, such as ScalarEvolution<br>


  // and CodeGen. And in this case, at least one of the comparison<br>


  // operands has at least one user besides the compare (the select),<br>


  // which would often largely negate the benefit of folding anyway.<o:p></o:p></p>


</div>


<p class="MsoNormal" style="margin-bottom:12.0pt">...so that prevented folding the icmp into the earlier math.<o:p></o:p></p>


</div>


<p class="MsoNormal">I am actively working on trying to get rid of that bail-out by improving min/max value tracking and icmp/select folding. In fact, we might be able to remove it right now, but I don't know the history of that code or what cases it was supposed


 to help.<o:p></o:p></p>


</div>


</div>


</div>


<div>


<div>


<div>


<div>


<p class="MsoNormal"><o:p> </o:p></p>


</div>


<div>


<p class="MsoNormal">If it's possible, can you remove that check locally, rebuild, and try the benchmark again on your system? I'd love to know if that change alone would solve the problem.<o:p></o:p></p>


</div>


<div>


<p class="MsoNormal"><o:p> </o:p></p>


<div>


<div>


<div>


<div>


<div>


<p class="MsoNormal"><o:p> </o:p></p>


<div>


<p class="MsoNormal">On Fri, Jan 20, 2017 at 10:11 AM, Evgeny Astigeevich <<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.Astigeevich@arm.com</a>> wrote:<o:p></o:p></p>


<p class="MsoNormal">Hi,<br>


<br>


We found that today's 17.30%/11.37% performance regressions in LNT SingleSource/Benchmarks/Shootout/sieve on LNT-AArch64-A53-O3__clang_DEV__aarch64 and LNT-Thumb2v7-A15-O3__clang_DEV__thumbv7 (<a href="http://llvm.org/perf/db_default/v4/nts/daily_report/2017/1/20?filter-machine-regex=aarch64%7Carm%7Cthumb%7Cgreen" target="_blank">http://llvm.org/perf/db_default/v4/nts/daily_report/2017/1/20?filter-machine-regex=aarch64%7Carm%7Cthumb%7Cgreen</a>)


 are caused by changes [rL292492] in InstCombine:<br>


<br>


<a href="https://reviews.llvm.org/D28406" target="_blank">https://reviews.llvm.org/D28406</a> "[InstCombine] icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1"<br>


<br>


The Loop Vectorizer generates code with more instructions:<br>


<br>


==== Loop Vectorizer from rL292492  ====<br>


for.body5:                                        ; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader<br>


  %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br>


  %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ]<br>


  %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br>


  %i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2, %for.cond.preheader ]<br>


  %2 = add i64 %indvar, 2<br>


  %3 = shl i64 %indvar, 1<br>


  %4 = add i64 %3, 4<br>


  %5 = add i64 %indvar, 2<br>


  %6 = shl i64 %indvar, 1<br>


  %7 = add i64 %6, 4<br>


  %8 = add i64 %indvar, 2<br>


  %9 = mul i64 %indvar, 3<br>


  %10 = add i64 %9, 6<br>


  %11 = icmp sgt i64 %10, 8193<br>


  %smax = select i1 %11, i64 %10, i64 8193<br>


  %12 = mul i64 %indvar, -2<br>


  %13 = add i64 %12, -5<br>


  %14 = add i64 %smax, %13<br>


  %15 = add i64 %indvar, 2<br>


  %16 = udiv i64 %14, %15<br>


  %17 = add i64 %16, 1<br>


  %tobool7 = icmp eq i8 %1, 0<br>


  br i1 %tobool7, label %for.inc16, label %if.then<br>


================================<br>


<br>


The code generated by the Loop Vectorizer before the changes:<br>


<br>


==== Loop Vectorizer from rL292487 ====<br>


for.body5:                                        ; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader<br>


  %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br>


  %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ]<br>


  %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br>


  %i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2, %for.cond.preheader ]<br>


  %2 = add i64 %indvar, 2<br>


  %3 = shl i64 %indvar, 1<br>


  %4 = add i64 %3, 4<br>


  %5 = add i64 %indvar, 2<br>


  %6 = shl i64 %indvar, 1<br>


  %7 = add i64 %6, 4<br>


  %8 = add i64 %indvar, 2<br>


  %9 = mul i64 %indvar, -2<br>


  %10 = add i64 %9, 8188<br>


  %11 = add i64 %indvar, 2<br>


  %12 = udiv i64 %10, %11<br>


  %13 = add i64 %12, 1<br>


  %tobool7 = icmp eq i8 %1, 0<br>


  br i1 %tobool7, label %for.inc16, label %if.then<br>


================================<br>


<br>


I have not investigated yet why the behaviour of the Vectorizer is changed.<br>


<br>


Kind regards,<br>


Evgeny Astigeevich<br>


Senior Compiler Engineer<br>


Compilation Tools<br>


ARM<br>


<br>


<o:p></o:p></p>


</div>


<p class="MsoNormal"><o:p> </o:p></p>


</div>


</div>


</div>


</div>


</div>


</div>


</div>


</div>


</div>


</div>


</div>


</div>


</body>


</html>