[PATCH] D130164: [LV] Support predicated div/rem operations via safe-divisor select idiom

Wed Jul 20 06:10:47 PDT 2022

reames created this revision.
reames added reviewers: david-arm, fhahn.
Herald added subscribers: frasercrmck, luismarques, apazos, sameer.abuasal, s.egerton, Jim, jocewei, PkmX, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, niosHD, sabuasal, bollu, simoncook, johnrusso, rbar, asb, hiraditya, mcrosier.
Herald added a project: All.
reames requested review of this revision.
Herald added subscribers: alextsao1999, pcwang-thead, vkmr, MaskRay.
Herald added a project: LLVM.

This patch adds support for vectorizing conditionally executed div/rem operations via a variant of widening.  The existing support for predicated divrem in the vectorizer requires scalarization which we can't do for scalable vectors.

The basic idea is that we can always divide (take remainder) by 1 without executing UB.  As such, we can use the active lane mask to conditional select either the actual divisor for active lanes, or a constant one for inactive lanes.  We already account for the cost of the active lane mask, so the only additional cost is a splat of one and the vector select.

There are some potential further improvements code wise, but I don't think any of them are worth bothering with.  The difference between vectorizing and not vectorizing is huge, the remaining minor wins are well - minor.  Just for the record, here's the things I currently know of:

- For the block predication, we're generating a select after the div/rem.  This is arguably redundant with the select before the div/rem, and we could try to fold the later one out.
- For targets with masked vector div/rem operations, we could make sure to pattern match the select.  (Div/Rem by 1 is the same as using the undisturbed LHS.)

This is an area of the code I'm fairly new to, and I'm not quite clear on the desired design re: VPlan.  Given that, I'm going to take extra care to layout the major design decisions so that reviews can tell me I did this wrong and how to rework it.  :)

Strategy/Lowering wise, we've got a couple choices:

- This approach which uses the active lane mask with the select.
- Forming an independent safety mask by explicitly checking for UB triggering edge cases.  This works, but seems to generate strictly worse code for the generic case.  (This is actually where I started.)
- Lowering to VP intrinsic.  I'd prefer not to couple this to the VP work, and the select lowering is profitable on its own.  Even for targets which don't have VL or mask predication on the udiv itself.  Given that, we'd probably want a cost model based mechanism to expand the VP op back out anyways, and I'd prefer to think about that whole problem separately.

Code structure wise, I see three major options:

- Use a new recipe specific to div/rem.
- Extend the existing VPWidenRecipe to handle the safe-divide guard if a mask is provided.  DivRem is the only opcode which has this safety guard, so while doing this was straight forward, it seemed oddly coupled.
- Use a new recipe for arbitrary predicated neutral element guards.  This could be done for e.g. all binary ops, and is arguably a useful building block towards the VP patches.  This feels a bit speculative to me, and possibly falsely generic.

I really don't have any preference as to the recipe structure, and will defer to reviewers with more context on the overall VPlan design and direction.  Let me know what you want, and I'll adjust.

I enabled the new code only for scalable vectors.  We could also legally enable it for fixed vectors as well, but I haven't thought through the cost tradeoffs between widening and scalarization enough to know if that's profitable.  I'd welcome input on how this should be structured, but am also hoping to defer actually implementing it to a follow up patch.  :)

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D130164

Files:
  llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
  llvm/lib/Transforms/Vectorize/VPlan.h
  llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
  llvm/lib/Transforms/Vectorize/VPlanValue.h
  llvm/test/Transforms/LoopVectorize/AArch64/scalable-predicate-instruction.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll
  llvm/test/Transforms/LoopVectorize/RISCV/scalable-divrem.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D130164.446129.patch
Type: text/x-patch
Size: 38531 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220720/1eabc362/attachment-0001.bin>