[llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Fri Jan 20 09:11:53 PST 2017

Hi,

We found that today's 17.30%/11.37% performance regressions in LNT SingleSource/Benchmarks/Shootout/sieve on LNT-AArch64-A53-O3__clang_DEV__aarch64 and LNT-Thumb2v7-A15-O3__clang_DEV__thumbv7 (http://llvm.org/perf/db_default/v4/nts/daily_report/2017/1/20?filter-machine-regex=aarch64%7Carm%7Cthumb%7Cgreen) are caused by changes [rL292492] in InstCombine:

https://reviews.llvm.org/D28406 "[InstCombine] icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1"

The Loop Vectorizer generates code with more instructions:

==== Loop Vectorizer from rL292492  ====
for.body5:                                        ; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader
  %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]
  %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ]
  %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]
  %i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2, %for.cond.preheader ]
  %2 = add i64 %indvar, 2
  %3 = shl i64 %indvar, 1
  %4 = add i64 %3, 4
  %5 = add i64 %indvar, 2
  %6 = shl i64 %indvar, 1
  %7 = add i64 %6, 4
  %8 = add i64 %indvar, 2
  %9 = mul i64 %indvar, 3
  %10 = add i64 %9, 6
  %11 = icmp sgt i64 %10, 8193
  %smax = select i1 %11, i64 %10, i64 8193
  %12 = mul i64 %indvar, -2
  %13 = add i64 %12, -5
  %14 = add i64 %smax, %13
  %15 = add i64 %indvar, 2
  %16 = udiv i64 %14, %15
  %17 = add i64 %16, 1
  %tobool7 = icmp eq i8 %1, 0
  br i1 %tobool7, label %for.inc16, label %if.then
================================

The code generated by the Loop Vectorizer before the changes:

==== Loop Vectorizer from rL292487 ====
for.body5:                                        ; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader
  %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]
  %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ]
  %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]
  %i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2, %for.cond.preheader ]
  %2 = add i64 %indvar, 2
  %3 = shl i64 %indvar, 1
  %4 = add i64 %3, 4
  %5 = add i64 %indvar, 2
  %6 = shl i64 %indvar, 1
  %7 = add i64 %6, 4
  %8 = add i64 %indvar, 2
  %9 = mul i64 %indvar, -2
  %10 = add i64 %9, 8188
  %11 = add i64 %indvar, 2
  %12 = udiv i64 %10, %11
  %13 = add i64 %12, 1
  %tobool7 = icmp eq i8 %1, 0
  br i1 %tobool7, label %for.inc16, label %if.then
================================

I have not investigated yet why the behaviour of the Vectorizer is changed.

Kind regards,
Evgeny Astigeevich
Senior Compiler Engineer
Compilation Tools
ARM

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.