[all-commits] [llvm/llvm-project] e947f9: [LSR][TTI][RISCV] Enable terminator folding for RI...

Wed Nov 29 12:14:19 PST 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: e947f953370abe8ffc8713b8f3250a3ec39599fe
      https://github.com/llvm/llvm-project/commit/e947f953370abe8ffc8713b8f3250a3ec39599fe
  Author: Philip Reames <preames at rivosinc.com>
  Date:   2023-11-29 (Wed, 29 Nov 2023)

  Changed paths:
    M llvm/include/llvm/Analysis/TargetTransformInfo.h
    M llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
    M llvm/include/llvm/CodeGen/BasicTTIImpl.h
    M llvm/lib/Analysis/TargetTransformInfo.cpp
    M llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
    M llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
    M llvm/test/CodeGen/RISCV/branch-on-zero.ll
    M llvm/test/CodeGen/RISCV/riscv-codegenprepare-asm.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll
    M llvm/test/CodeGen/RISCV/rvv/sink-splat-operands-i1.ll
    M llvm/test/CodeGen/RISCV/rvv/sink-splat-operands.ll
    M llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll
    M llvm/test/Transforms/LoopStrengthReduce/RISCV/lsr-cost-compare.ll

  Log Message:
  -----------
  [LSR][TTI][RISCV] Enable terminator folding for RISC-V

If looking for a miscompile revert candidate, look here!

The transform being enabled prefers comparing to a loop invariant
exit value for a secondary IV over using an otherwise dead primary
IV.  This increases register pressure (by requiring the exit value
to be live through the loop), but reduces the number of instructions
within the loop by one.

On RISC-V which has a large number of scalar registers, this is
generally a profitable transform.  We loose the ability to use a beqz
on what is typically a count down IV, and pay the cost of computing
the exit value on the secondary IV in the loop preheader, but save
an add or sub in the loop body.  For anything except an extremely
short running loop, or one with extreme register pressure, this is
profitable.  On spec2017, we see a 0.42% geomean improvement in
dynamic icount, with no individual workload regressing by more than
0.25%.

Code size wise, we trade a (possibly compressible) beqz and a (possibly
compressible) addi for a uncompressible beq.  We also add instructions
in the preheader.  Net result is a slight regression overall, but
neutral or better inside the loop.

Previous versions of this transform had numerous cornercase correctness
bugs.  All of them ones I can spot by inspection have been fixed, and I
have run this through all of spec2017, but there may be further issues
lurking.  Adding uses to an IV is a fraught thing to do given poison
semantics, so this transform is somewhat inherently risky.

This patch is a reworked version of D134893 by @eop.  That patch has
been abandoned since May, so I picked it up, reworked it a bit, and
am landing it.