[PATCH] D15792: [AArch64] Turn off PredictableSelectIsExpensive on the Cortex-A57

Wed Jan 13 03:02:29 PST 2016

flyingforyou added a comment.

Hi James.

I ran SPEC with ref input as Evandro & you said. After running SPEC with ref input, I can see same result what you saw.

> [AArch64] Conditional selects are expensive on out-of-order cores.

>  Specifically Cortex-A57. This probably applies to Cyclone too but I haven't enabled it for that as I can't test it.

>  This gives ~4% improvement on SPEC 174.vpr, and ~1% in 471.omnetpp.

But bzip2 which is in SPEC INT 2000/2006 and 456.hmmer show 2~3% improvemnt when I set `PredictableSelectIsExpensive `false;

On commertial benchmark, we can see big improvements on a few workloads. (8~12% on GS6 Dev board which uses Cortex-A57)

When below code compiled with `-O3 -ffast-math -mcpu=cortex-a57` option, `optimizeSelectInst(CodeGenPrepare.cpp)` can be run due to isPredictableSelectIsExpensive flag set true.

  float tmp1 = sqrt(tmp);
  tmp4 = tmp1 == 0 ? 1 : 1 / tmp3;

Above code translated likes below.

   0:   1e21c000        fsqrt   s0, s0
   4:   1e202008        fcmp    s0, #0.0
   8:   54000061        b.ne    14 <_Z4testfff+0x14>
   c:   1e2e1000        fmov    s0, #1.000000000000000000e+00
  10:   d65f03c0        ret
  14:   1e2e1000        fmov    s0, #1.000000000000000000e+00
  18:   1e211800        fdiv    s0, s0, s1
  1c:   d65f03c0        ret

if "tmp1 == 0" always or mostly false, we should run division "fdiv s0, s1, s0".

In this case, two branches which are for select optimization can be a performance bottleneck.

I remembered that you mentioned about PGO, when I upload some flag tuning.

I think if we know PGO information, this can be handled easily.

Can we just let this flag off on "cortex-a57"? Without PGO information, it can't cover above case.

Junmo.

http://reviews.llvm.org/D15792