[PATCH] D15792: [AArch64] Turn off PredictableSelectIsExpensive on the Cortex-A57
Junmo Park via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 13 03:02:29 PST 2016
flyingforyou added a comment.
Hi James.
I ran SPEC with ref input as Evandro & you said. After running SPEC with ref input, I can see same result what you saw.
> [AArch64] Conditional selects are expensive on out-of-order cores.
> Specifically Cortex-A57. This probably applies to Cyclone too but I haven't enabled it for that as I can't test it.
> This gives ~4% improvement on SPEC 174.vpr, and ~1% in 471.omnetpp.
But bzip2 which is in SPEC INT 2000/2006 and 456.hmmer show 2~3% improvemnt when I set `PredictableSelectIsExpensive `false;
On commertial benchmark, we can see big improvements on a few workloads. (8~12% on GS6 Dev board which uses Cortex-A57)
When below code compiled with `-O3 -ffast-math -mcpu=cortex-a57` option, `optimizeSelectInst(CodeGenPrepare.cpp)` can be run due to isPredictableSelectIsExpensive flag set true.
float tmp1 = sqrt(tmp);
tmp4 = tmp1 == 0 ? 1 : 1 / tmp3;
Above code translated likes below.
0: 1e21c000 fsqrt s0, s0
4: 1e202008 fcmp s0, #0.0
8: 54000061 b.ne 14 <_Z4testfff+0x14>
c: 1e2e1000 fmov s0, #1.000000000000000000e+00
10: d65f03c0 ret
14: 1e2e1000 fmov s0, #1.000000000000000000e+00
18: 1e211800 fdiv s0, s0, s1
1c: d65f03c0 ret
if "tmp1 == 0" always or mostly false, we should run division "fdiv s0, s1, s0".
In this case, two branches which are for select optimization can be a performance bottleneck.
I remembered that you mentioned about PGO, when I upload some flag tuning.
I think if we know PGO information, this can be handled easily.
Can we just let this flag off on "cortex-a57"? Without PGO information, it can't cover above case.
Junmo.
http://reviews.llvm.org/D15792
More information about the llvm-commits
mailing list