[all-commits] [llvm/llvm-project] 16a72a: [AArch64] Enable the select optimize pass for AArch64

Sat Dec 3 08:09:13 PST 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 16a72a0f87487f2a07bb2a4101c79e4d311151a0
      https://github.com/llvm/llvm-project/commit/16a72a0f87487f2a07bb2a4101c79e4d311151a0
  Author: David Green <david.green at arm.com>
  Date:   2022-12-03 (Sat, 03 Dec 2022)

  Changed paths:
    M llvm/include/llvm/Analysis/TargetTransformInfo.h
    M llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
    M llvm/lib/Analysis/TargetTransformInfo.cpp
    M llvm/lib/CodeGen/SelectOptimize.cpp
    M llvm/lib/Target/AArch64/AArch64.td
    M llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
    M llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
    M llvm/test/CodeGen/AArch64/O3-pipeline.ll
    A llvm/test/CodeGen/AArch64/selectopt.ll

  Log Message:
  -----------
  [AArch64] Enable the select optimize pass for AArch64

This enabled the select optimize patch for ARM Out of order AArch64
cores. It is trying to solve a problem that is difficult for the
compiler to fix. The criteria for when a csel is better or worse than a
branch depends heavily on whether the branch is well predicted and the
amount of ILP in the loop (as well as other criteria like the core in
question and the relative performance of the branch predictor).  The
pass seems to do a decent job though, with the inner loop heuristics
being well implemented and doing a better job than I had expected in
general, even without PGO information.

I've been doing quite a bit of benchmarking. The headline numbers are
these for SPEC2017 on a Neoverse N1:
  500.perlbench_r   -0.12%
  502.gcc_r         0.02%
  505.mcf_r         6.02%
  520.omnetpp_r     0.32%
  523.xalancbmk_r   0.20%
  525.x264_r        0.02%
  531.deepsjeng_r   0.00%
  541.leela_r       -0.09%
  548.exchange2_r   0.00%
  557.xz_r          -0.20%

Running benchmarks with a combination of the llvm-test-suite plus
several versions of SPEC gave between a 0.2% and 0.4% geomean
improvement depending on the core/run. The instruction count went down
by 0.1% too, which is a good sign, but the results can be a little
noisy.  Some issues from other benchmarks I had ran were improved in
rGca78b5601466f8515f5f958ef8e63d787d9d812e. In summary well predicted
branches will see in improvement, badly predicted branches may get
worse, and on average performance seems to be a little better overall.

This patch enables the pass for AArch64 under -O3 for cores that will
benefit for it. i.e. not in-order cores that do not fit into the "Assume
infinite resources that allow to fully exploit the available
instruction-level parallelism" cost model. It uses a subtarget feature
for specifying when the pass will be enabled, which I have enabled under
cpu=generic as the performance increases for out of order cores seems
larger than any decreases for inorder, which were minor.

Differential Revision: https://reviews.llvm.org/D138990