[llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
David Majnemer via llvm-dev
llvm-dev at lists.llvm.org
Sun Jan 22 14:46:54 PST 2017
On Fri, Jan 20, 2017 at 10:16 AM, Sanjay Patel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Thanks for letting me know about this problem!
>
> There's no 'shl nsw' visible in the earlier (r292487) code, so it would be
> better to see exactly what the IR looks like before that added transform
> fires.
>
> But I see a red flag:
> %smax = select i1 %11, i64 %10, i64 8193
>
> The new icmp transform allowed us to create an smax, but we have this hack
> in InstCombiner::visitICmpInst():
>
> // Test if the ICmpInst instruction is used exclusively by a select as
> // part of a minimum or maximum operation. If so, refrain from doing
> // any other folding. This helps out other analyses which understand
> // non-obfuscated minimum and maximum idioms, such as ScalarEvolution
> // and CodeGen. And in this case, at least one of the comparison
> // operands has at least one user besides the compare (the select),
> // which would often largely negate the benefit of folding anyway.
>
> ...so that prevented folding the icmp into the earlier math.
>
> I am actively working on trying to get rid of that bail-out by improving
> min/max value tracking and icmp/select folding. In fact, we might be able
> to remove it right now, but I don't know the history of that code or what
> cases it was supposed to help.
>
I think the primary reason is SCEV. It has SCEVSMax/SCEVUMax expressions.
Our generic select handling for things like knownbits is going to be less
precise.
>
> If it's possible, can you remove that check locally, rebuild, and try the
> benchmark again on your system? I'd love to know if that change alone would
> solve the problem.
>
>
> On Fri, Jan 20, 2017 at 10:11 AM, Evgeny Astigeevich <
> Evgeny.Astigeevich at arm.com> wrote:
>
>> Hi,
>>
>> We found that today's 17.30%/11.37% performance regressions in LNT
>> SingleSource/Benchmarks/Shootout/sieve on LNT-AArch64-A53-O3__clang_DEV__aarch64
>> and LNT-Thumb2v7-A15-O3__clang_DEV__thumbv7 (
>> http://llvm.org/perf/db_default/v4/nts/daily_report/2017/1/
>> 20?filter-machine-regex=aarch64%7Carm%7Cthumb%7Cgreen) are caused by
>> changes [rL292492] in InstCombine:
>>
>> https://reviews.llvm.org/D28406 "[InstCombine] icmp sgt (shl nsw X, C1),
>> C0 --> icmp sgt X, C0 >> C1"
>>
>> The Loop Vectorizer generates code with more instructions:
>>
>> ==== Loop Vectorizer from rL292492 ====
>> for.body5: ; preds =
>> %for.inc16.for.body5_crit_edge, %for.cond.preheader
>> %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [
>> 0, %for.cond.preheader ]
>> %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1,
>> %for.cond.preheader ]
>> %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0,
>> %for.cond.preheader ]
>> %i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2,
>> %for.cond.preheader ]
>> %2 = add i64 %indvar, 2
>> %3 = shl i64 %indvar, 1
>> %4 = add i64 %3, 4
>> %5 = add i64 %indvar, 2
>> %6 = shl i64 %indvar, 1
>> %7 = add i64 %6, 4
>> %8 = add i64 %indvar, 2
>> %9 = mul i64 %indvar, 3
>> %10 = add i64 %9, 6
>> %11 = icmp sgt i64 %10, 8193
>> %smax = select i1 %11, i64 %10, i64 8193
>> %12 = mul i64 %indvar, -2
>> %13 = add i64 %12, -5
>> %14 = add i64 %smax, %13
>> %15 = add i64 %indvar, 2
>> %16 = udiv i64 %14, %15
>> %17 = add i64 %16, 1
>> %tobool7 = icmp eq i8 %1, 0
>> br i1 %tobool7, label %for.inc16, label %if.then
>> ================================
>>
>> The code generated by the Loop Vectorizer before the changes:
>>
>> ==== Loop Vectorizer from rL292487 ====
>> for.body5: ; preds =
>> %for.inc16.for.body5_crit_edge, %for.cond.preheader
>> %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [
>> 0, %for.cond.preheader ]
>> %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1,
>> %for.cond.preheader ]
>> %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0,
>> %for.cond.preheader ]
>> %i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2,
>> %for.cond.preheader ]
>> %2 = add i64 %indvar, 2
>> %3 = shl i64 %indvar, 1
>> %4 = add i64 %3, 4
>> %5 = add i64 %indvar, 2
>> %6 = shl i64 %indvar, 1
>> %7 = add i64 %6, 4
>> %8 = add i64 %indvar, 2
>> %9 = mul i64 %indvar, -2
>> %10 = add i64 %9, 8188
>> %11 = add i64 %indvar, 2
>> %12 = udiv i64 %10, %11
>> %13 = add i64 %12, 1
>> %tobool7 = icmp eq i8 %1, 0
>> br i1 %tobool7, label %for.inc16, label %if.then
>> ================================
>>
>> I have not investigated yet why the behaviour of the Vectorizer is
>> changed.
>>
>> Kind regards,
>> Evgeny Astigeevich
>> Senior Compiler Engineer
>> Compilation Tools
>> ARM
>>
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170122/d35587d7/attachment.html>
More information about the llvm-dev
mailing list