[PATCH] D136714: [WIP][LV] Reassociate start value of recurrence to end

Tue Oct 25 13:58:00 PDT 2022

reames created this revision.
reames added reviewers: craig.topper, fhahn, paulwalker-arm, david-arm.
Herald added subscribers: kosarev, StephenFan, frasercrmck, kerbowa, luismarques, apazos, sameer.abuasal, s.egerton, dmgreen, Jim, jocewei, PkmX, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, niosHD, sabuasal, bollu, simoncook, johnrusso, rbar, asb, hiraditya, kristof.beyls, arichardson, jvesely, nemanjai, mcrosier.
Herald added a project: All.
reames requested review of this revision.
Herald added subscribers: pcwang-thead, vkmr, MaskRay.
Herald added a project: LLVM.

This is more a discussion item than an actual patch right now.  Curious to know what folks think about profitability on various targets.

Our current lowering of a reduction in the vectorizer creates the starting vector value as a splat of the identity element and then inserts the original scalar start as the low element.  An alternate choice would be to instead use a splat of the identity element, and defer handling the scalar start until the end of the loop.

RISC-V has an interesting quirk to the reduction instructions where the start value of the recurrence must be provided.  As a result, using the deferred-add strategy fits slightly better with the ISA, and generally allows us to get rid of one scalar to vector insert.  Its worth noting this comes at the cost of extending the live range of the scalar start value.

I glanced at AArch64 for comparison, and all of the unordered reductions appear to have a single vector operand.  So, in this case the deferred add strategy would cost us a longer live range and an extra scalar add.  This doesn't appear profitable unless I'm missing something.

What do our other targets prefer here?  Is this a case we should just have a target hook, or is there something smarter we can do heuristic wise?

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D136714

Files:
  llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
  llvm/test/Transforms/LoopVectorize/AArch64/gather-do-not-vectorize-addressing.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-reductions.ll
  llvm/test/Transforms/LoopVectorize/AMDGPU/packed-math.ll
  llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-predselect.ll
  llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-types.ll
  llvm/test/Transforms/LoopVectorize/ARM/mve-selectandorcost.ll
  llvm/test/Transforms/LoopVectorize/ARM/sphinx.ll
  llvm/test/Transforms/LoopVectorize/PowerPC/widened-massv-call.ll
  llvm/test/Transforms/LoopVectorize/PowerPC/widened-massv-vfabi-attr.ll
  llvm/test/Transforms/LoopVectorize/RISCV/defaults.ll
  llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
  llvm/test/Transforms/LoopVectorize/RISCV/scalable-basics.ll
  llvm/test/Transforms/LoopVectorize/RISCV/scalable-tailfold.ll
  llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
  llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll
  llvm/test/Transforms/LoopVectorize/X86/invariant-store-vectorization.ll
  llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll
  llvm/test/Transforms/LoopVectorize/X86/reduction-fastmath.ll
  llvm/test/Transforms/LoopVectorize/X86/strided_load_cost.ll
  llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll
  llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
  llvm/test/Transforms/LoopVectorize/X86/x86-predication.ll
  llvm/test/Transforms/LoopVectorize/epilog-vectorization-reductions.ll
  llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll
  llvm/test/Transforms/LoopVectorize/if-pred-non-void.ll
  llvm/test/Transforms/LoopVectorize/if-pred-stores.ll
  llvm/test/Transforms/LoopVectorize/induction.ll
  llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
  llvm/test/Transforms/LoopVectorize/loop-form.ll
  llvm/test/Transforms/LoopVectorize/pr51614-fold-tail-by-masking.ll
  llvm/test/Transforms/LoopVectorize/pr55167-fold-tail-live-out.ll
  llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll
  llvm/test/Transforms/LoopVectorize/reduction-inloop.ll
  llvm/test/Transforms/LoopVectorize/reduction-predselect.ll
  llvm/test/Transforms/LoopVectorize/reduction-small-size.ll
  llvm/test/Transforms/LoopVectorize/scalable-reduction-inloop.ll
  llvm/test/Transforms/LoopVectorize/trunc-reductions.ll
  llvm/test/Transforms/PhaseOrdering/AArch64/peel-multiple-unreachable-exits-for-vectorization.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D136714.470607.patch
Type: text/x-patch
Size: 442643 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20221025/c7ee6a21/attachment-0001.bin>