<div dir="ltr"><p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)">> Do you mean to remove the hack in InstCombiner::visitICmpInst()?</span></p><p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><br></span></p><p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)">Yes. Although (this just came up in D28625 too) we might need to remove multiple versions of that in order to unlock optimization:</span></p><p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCompares.cpp#L4338" target="_blank">https://github.com/llvm-<wbr>mirror/llvm/blob/master/lib/<wbr>Transforms/InstCombine/<wbr>InstCombineCompares.cpp#L4338</a><br></span></p><p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCasts.cpp#L470" target="_blank">https://github.com/llvm-<wbr>mirror/llvm/blob/master/lib/<wbr>Transforms/InstCombine/<wbr>InstCombineCasts.cpp#L470</a><br></span></p><p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstructionCombining.cpp#L803">https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstructionCombining.cpp#L803</a><br></span></p><p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L409">https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L409</a><br></span></p><div class="gmail_extra"><br>Similar for FP:<br><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCompares.cpp#L4780" target="_blank">https://github.com/llvm-<wbr>mirror/llvm/blob/master/lib/<wbr>Transforms/InstCombine/<wbr>InstCombineCompares.cpp#L4780</a></span><br><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><a href="https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCasts.cpp#L1376" target="_blank">https://github.com/llvm-<wbr>mirror/llvm/blob/master/lib/<wbr>Transforms/InstCombine/<wbr>InstCombineCasts.cpp#L1376</a></span><br><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Jan 22, 2017 at 12:40 PM, Evgeny Astigeevich <span dir="ltr"><<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.Astigeevich@arm.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div lang="EN-US">
<div class="gmail-m_4123102878076098443gmail-m_-9153536727343648257WordSection1">
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)">Hi Sanjay,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)">The benchmark source file:
<a href="http://www.llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/Benchmarks/Shootout/sieve.c?view=markup" target="_blank">
http://www.llvm.org/viewvc/llv<wbr>m-project/test-suite/trunk/Sin<wbr>gleSource/Benchmarks/Shootout/<wbr>sieve.c?view=markup</a><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)">Clang options used to produce the initial IR: clang -DNDEBUG -O3 -DNDEBUG -mcpu=cortex-a53 -fomit-frame-pointer -O3 -DNDEBUG -w -Werror=date-time -c sieve.c
-S -emit-llvm -mllvm -disable-llvm-optzns --target=aarch64-arm-linux<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)">Opt options: opt -O3 -o /dev/null -print-before-all -print-after-all sieve.ll >& sieve.log<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)">I used the IR (in attached sieve.zip) created with the r292487 version.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)">The attached sieve contains the output of ‘-print-before-all -print-after-all’ for r292487 and rL292492.<u></u><u></u></span></p><span class="gmail-m_4123102878076098443gmail-">
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)">>
</span>If it's possible, can you remove that check locally, rebuild,<u></u><u></u></p>
<p class="MsoNormal">> and try the benchmark again on your system? I'd love to know<u></u><u></u></p>
<p class="MsoNormal">> if that change alone would solve the problem.<u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><u></u> <u></u></span></p>
</span><p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)">Do you mean to remove the hack in InstCombiner::visitICmpInst()?<u></u><u></u></span></p><span class="gmail-m_4123102878076098443gmail-">
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)">Kind regards,<br>
Evgeny Astigeevich<br>
Senior Compiler Engineer<br>
Compilation Tools<br>
ARM<br>
<br>
<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><u></u> <u></u></span></p>
</span><div style="border-width:medium medium medium 1.5pt;border-style:none none none solid;border-color:-moz-use-text-color -moz-use-text-color -moz-use-text-color blue;padding:0cm 0cm 0cm 4pt">
<div>
<div style="border-width:1pt medium medium;border-style:solid none none;border-color:rgb(181,196,223) -moz-use-text-color -moz-use-text-color;padding:3pt 0cm 0cm">
<p class="MsoNormal"><b><span style="font-size:10pt;font-family:"tahoma","sans-serif"">From:</span></b><span style="font-size:10pt;font-family:"tahoma","sans-serif""> Sanjay Patel [mailto:<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a><wbr>]
<br>
<b>Sent:</b> Friday, January 20, 2017 6:16 PM<br>
<b>To:</b> Evgeny Astigeevich<br>
<b>Cc:</b> llvm-dev; Renato Golin; <a href="mailto:t.p.northover@gmail.com" target="_blank">t.p.northover@gmail.com</a>; <a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a><br>
<b>Subject:</b> Re: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines<u></u><u></u></span></p>
</div>
</div><div><div class="gmail-m_4123102878076098443gmail-h5">
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal">Thanks for letting me know about this problem!<br>
<br>
There's no 'shl nsw' visible in the earlier (r292487) code, so it would be better to see exactly what the IR looks like before that added transform fires.<u></u><u></u></p>
</div>
<p class="MsoNormal" style="margin-bottom:12pt"><br>
But I see a red flag:<br>
%smax = select i1 %11, i64 %10, i64 8193<u></u><u></u></p>
</div>
<p class="MsoNormal" style="margin-bottom:12pt">The new icmp transform allowed us to create an smax, but we have this hack in InstCombiner::visitICmpInst():<br>
<br>
// Test if the ICmpInst instruction is used exclusively by a select as<br>
// part of a minimum or maximum operation. If so, refrain from doing<br>
// any other folding. This helps out other analyses which understand<br>
// non-obfuscated minimum and maximum idioms, such as ScalarEvolution<br>
// and CodeGen. And in this case, at least one of the comparison<br>
// operands has at least one user besides the compare (the select),<br>
// which would often largely negate the benefit of folding anyway.<u></u><u></u></p>
</div>
<p class="MsoNormal" style="margin-bottom:12pt">...so that prevented folding the icmp into the earlier math.<u></u><u></u></p>
</div>
<p class="MsoNormal">I am actively working on trying to get rid of that bail-out by improving min/max value tracking and icmp/select folding. In fact, we might be able to remove it right now, but I don't know the history of that code or what cases it was supposed
to help.<u></u><u></u></p>
</div>
</div>
</div>
<div>
<div>
<div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">If it's possible, can you remove that check locally, rebuild, and try the benchmark again on your system? I'd love to know if that change alone would solve the problem.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">On Fri, Jan 20, 2017 at 10:11 AM, Evgeny Astigeevich <<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.Astigeevich@arm.com</a>> wrote:<u></u><u></u></p>
<p class="MsoNormal">Hi,<br>
<br>
We found that today's 17.30%/11.37% performance regressions in LNT SingleSource/Benchmarks/Shooto<wbr>ut/sieve on LNT-AArch64-A53-O3__clang_DEV_<wbr>_aarch64 and LNT-Thumb2v7-A15-O3__clang_DEV<wbr>__thumbv7 (<a href="http://llvm.org/perf/db_default/v4/nts/daily_report/2017/1/20?filter-machine-regex=aarch64%7Carm%7Cthumb%7Cgreen" target="_blank">http://llvm.org/perf/db_defau<wbr>lt/v4/nts/daily_report/2017/1/<wbr>20?filter-machine-regex=<wbr>aarch64%7Carm%7Cthumb%7Cgreen</a>)
are caused by changes [rL292492] in InstCombine:<br>
<br>
<a href="https://reviews.llvm.org/D28406" target="_blank">https://reviews.llvm.org/D2840<wbr>6</a> "[InstCombine] icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1"<br>
<br>
The Loop Vectorizer generates code with more instructions:<br>
<br>
==== Loop Vectorizer from rL292492 ====<br>
for.body5: ; preds = %for.inc16.for.body5_crit_edge<wbr>, %for.cond.preheader<br>
%indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br>
%1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ]<br>
%count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br>
%i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2, %for.cond.preheader ]<br>
%2 = add i64 %indvar, 2<br>
%3 = shl i64 %indvar, 1<br>
%4 = add i64 %3, 4<br>
%5 = add i64 %indvar, 2<br>
%6 = shl i64 %indvar, 1<br>
%7 = add i64 %6, 4<br>
%8 = add i64 %indvar, 2<br>
%9 = mul i64 %indvar, 3<br>
%10 = add i64 %9, 6<br>
%11 = icmp sgt i64 %10, 8193<br>
%smax = select i1 %11, i64 %10, i64 8193<br>
%12 = mul i64 %indvar, -2<br>
%13 = add i64 %12, -5<br>
%14 = add i64 %smax, %13<br>
%15 = add i64 %indvar, 2<br>
%16 = udiv i64 %14, %15<br>
%17 = add i64 %16, 1<br>
%tobool7 = icmp eq i8 %1, 0<br>
br i1 %tobool7, label %for.inc16, label %if.then<br>
==============================<wbr>==<br>
<br>
The code generated by the Loop Vectorizer before the changes:<br>
<br>
==== Loop Vectorizer from rL292487 ====<br>
for.body5: ; preds = %for.inc16.for.body5_crit_edge<wbr>, %for.cond.preheader<br>
%indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br>
%1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ]<br>
%count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]<br>
%i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2, %for.cond.preheader ]<br>
%2 = add i64 %indvar, 2<br>
%3 = shl i64 %indvar, 1<br>
%4 = add i64 %3, 4<br>
%5 = add i64 %indvar, 2<br>
%6 = shl i64 %indvar, 1<br>
%7 = add i64 %6, 4<br>
%8 = add i64 %indvar, 2<br>
%9 = mul i64 %indvar, -2<br>
%10 = add i64 %9, 8188<br>
%11 = add i64 %indvar, 2<br>
%12 = udiv i64 %10, %11<br>
%13 = add i64 %12, 1<br>
%tobool7 = icmp eq i8 %1, 0<br>
br i1 %tobool7, label %for.inc16, label %if.then<br>
==============================<wbr>==<br>
<br>
I have not investigated yet why the behaviour of the Vectorizer is changed.<br>
<br>
Kind regards,<br>
Evgeny Astigeevich<br>
Senior Compiler Engineer<br>
Compilation Tools<br>
ARM<br>
<br>
<u></u><u></u></p>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div></div></div>
</div>
</div>
</blockquote></div><br></div></div>