[PATCH] D87976: Support the division-by-constant strength reduction for more integer types

Sat Sep 26 17:43:25 PDT 2020

nagisa created this revision.
Herald added subscribers: llvm-commits, luismarques, s.egerton, lenary, PkmX, simoncook, hiraditya, kristof.beyls.
Herald added a project: LLVM.
nagisa added a comment.
nagisa edited the summary of this revision.
nagisa edited the summary of this revision.
nagisa updated this revision to Diff 293031.
Herald added subscribers: kerbowa, apazos, sameer.abuasal, pzheng, Jim, jocewei, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, niosHD, sabuasal, johnrusso, rbar, asb, fedor.sergeev, nhaehnle, jvesely, nemanjai, jyknight.
nagisa updated this revision to Diff 294528.
nagisa published this revision for review.
nagisa added reviewers: eli.friedman, RKSimon, craig.topper, nikic.
Herald added a subscriber: MaskRay.

Still working on adding the tests, but I believe this is “done” otherwise.

nagisa added a comment.

Update the pre-existing tests

nagisa added a comment.

Allow non-legal shift types too

The division-by-constant strength reduction into multiply-shift sequence of instructions can be
applied on ~all target at any integer width to gain significant throughput boost for the operation,
at a (fairly significant) cost of code size.

LLVM already has this optimisation, but it would only fire on integers
with bit-widths supported natively. For example on x86_64 divisions up to 64-bits would trigger the
optimisation and on i686 64-bit integers would no longer be strength-reduced anymore.

This commit adjusts the lowering code to apply this strength-reduction even on integer bit-widths
not natively supported by the target. Ideally this would've been implemented via fallback lowerings
for the `ISD::MULHU` and `ISD::MULHS` – not all of the backends support them – but I found that to
require significant refactors and it still failed to work on some backends such as the ARM or the
RISCV (without m instructions) regardless.

However, the targets will universally support `ISD::MUL` of any bit-width so we just take the upper
half of the regular `MUL` result. This will likely be sub-optimal in a sense that some of the
instructions may not do anything useful, but even with those instructions present the resulting
lowering should be significantly better compared to conventional software division implementations.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D87976

Files:
  llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
  llvm/test/CodeGen/AArch64/srem-seteq.ll
  llvm/test/CodeGen/AArch64/srem-vector-lkk.ll
  llvm/test/CodeGen/AArch64/urem-seteq-nonzero.ll
  llvm/test/CodeGen/AArch64/urem-seteq.ll
  llvm/test/CodeGen/AArch64/urem-vector-lkk.ll
  llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
  llvm/test/CodeGen/AMDGPU/udiv64.ll
  llvm/test/CodeGen/AMDGPU/urem64.ll
  llvm/test/CodeGen/BPF/32-bit-subreg-alu.ll
  llvm/test/CodeGen/BPF/sdiv_error.ll
  llvm/test/CodeGen/PowerPC/loop-instr-form-prepare.ll
  llvm/test/CodeGen/PowerPC/srem-lkk.ll
  llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll
  llvm/test/CodeGen/PowerPC/urem-lkk.ll
  llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll
  llvm/test/CodeGen/RISCV/div.ll
  llvm/test/CodeGen/RISCV/srem-lkk.ll
  llvm/test/CodeGen/RISCV/srem-vector-lkk.ll
  llvm/test/CodeGen/RISCV/urem-lkk.ll
  llvm/test/CodeGen/RISCV/urem-vector-lkk.ll
  llvm/test/CodeGen/SPARC/rem.ll
  llvm/test/CodeGen/X86/divide-by-constant.ll
  llvm/test/CodeGen/X86/divmod128.ll
  llvm/test/CodeGen/X86/i128-sdiv.ll
  llvm/test/CodeGen/X86/pr44812.ll
  llvm/test/CodeGen/X86/rotate-extract-vector.ll
  llvm/test/CodeGen/X86/srem-vector-lkk.ll
  llvm/test/CodeGen/X86/urem-seteq-nonzero.ll
  llvm/test/CodeGen/X86/urem-vector-lkk.ll
  llvm/test/CodeGen/X86/vector-idiv-sdiv-128.ll
  llvm/test/CodeGen/X86/vector-idiv-sdiv-256.ll
  llvm/test/CodeGen/X86/vector-idiv-sdiv-512.ll
  llvm/test/CodeGen/X86/vector-idiv-udiv-128.ll
  llvm/test/CodeGen/X86/vector-idiv-udiv-256.ll
  llvm/test/CodeGen/X86/vector-idiv-udiv-512.ll
  llvm/test/CodeGen/X86/vector-idiv-v2i32.ll
  llvm/test/CodeGen/X86/vshli-simplify-demanded-bits.ll