[PATCH] D75069: [LoopVectorizer] Inloop vector reductions

Ayal Zaks via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jun 5 15:43:40 PDT 2020


Ayal added a comment.

The proposed `RecurrenceDescriptor::getReductionOpChain()` method identifies horizontal reductions, a subset of the vertical reductions identified by `RecurrenceDescriptor::isReductionPHI()`. Being two totally independent methods, it's unclear what the latter supports that the former does not. Would it be better to have isReductionPHI() also take care of recognizing horizontal reductions, and record them as in CastsToIgnore? See also TODO commented inline.



================
Comment at: llvm/lib/Analysis/IVDescriptors.cpp:439
     //       model could then apply the recurrence type to these instructions,
     //       without needing a white list of instructions to ignore.
     collectCastsToIgnore(TheLoop, ExitInstruction, RecurrenceType, CastInsts);
----------------
Perhaps the above "better way" would also help recognize and record horizontal reductions?


================
Comment at: llvm/lib/Analysis/IVDescriptors.cpp:848
+
+  if (!isCorrectOpcode(Cur))
+    return {};
----------------
`|| !Cur->hasNUses(ExpectedUses)` ?


nit: can alternatively let getNextInstruction check its result and return only valid ones, e.g.:

```
bool RedOpIsCmp = (RedOp == Instruction::ICmp || RedOp == Instruction::FCmp);
unsigned ExpectedUses = RedOpIsCmp ? 2 : 1;

auto getNextInstruction = [&](Instruction *Cur) {
  if (!Cur->hasNUses(ExpectedUses))
    return nullptr;
  auto *FirstUser = Cur->user_begin();
  if (!RedOpIsCmp)
    return FirstUser->getOpcode() == RedOp ? FirstUser : nullptr;
  // Handle cmp/select pair:
  auto *Sel = dyn_cast<SelectInst>(*FirstUser) ||
    dyn_cast<SelectInst>(*std::next(FirstUser));
  if (SelectPatternResult::isMinOrMax(matchSelectPattern(Sel, LHS, RHS).Flavor))
    return Sel;
  return nullptr;
}

for (auto *Cur = getNextInstruction(Phi); Cur && Cur != LoopExitInstr; Cur = getNextInstruction(Cur))
  ReductionOperations.push_back(Cur);
```


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1461
 
+  /// PHINodes of the reductions that should be expanded in-loop.
+  DenseMap<PHINode *, SmallVector<Instruction *, 4>> InloopReductionChains;
----------------
... along with their associated chains of reduction operations,
in program order from top (PHI) to bottom?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4209
       Type *VecTy =
           (VF == 1) ? PN->getType() : VectorType::get(PN->getType(), VF);
       Value *EntryPart = PHINode::Create(
----------------
Check in-loop reduction along with VF == 1, instead of modifying VF, as in other places? E.g.,

```
      bool ScalarPHI = (VF == 1) || Cost->isInLoopReduction(cast<PHINode>(PN));

      Type *VecTy =
          ScalarPHI ? PN->getType() : VectorType::get(PN->getType(), VF);

```
?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6568
+      LLVM_DEBUG(dbgs() << "LV: Using out of loop reduction for phi: " << *Phi
+                        << "\n");
+  }
----------------
The "else" clause will be empty in non-debug mode.
Can hoist the LLVM_DEBUG's, e.g.,


```
bool InLoop = !ReductionOperations.empty();
LLVM_DEBUG(dbgs() << "LV: Using " << (InLoop ? "inloop" : "out of loop") << " reduction for phi: " << *Phi << "\n");
```


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7316
+            Kind == RecurrenceDescriptor::RK_FloatMinMax) {
+          RecipeBuilder.recordRecipeOf(cast<Instruction>(R->getOperand(0)));
+        }
----------------
Add a comment that this is the compare operand of the select, whose recipe will also be discarded.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7439
+  // reduction chain.
+  if (Range.Start > 1) {
+    for (auto &Reduction : Legal->getReductionVars()) {
----------------
Consider outlining this part, unless it can be shortened.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7470
+          VecOpId = 1 - ChainOpId;
+        }
+
----------------
`ChainOp `can be simply set:

```
VPValue *ChainOp = Plan->getVPValue(Chain);
```

leaving only VecOp to figure out which operand index to use. Can preset the options before the loop:

```
RecurrenceDescriptor::RecurrenceKind Kind = RdxDesc.getRecurrenceKind();
bool InMinMax = (Kind == RecurrenceDescriptor::RK_IntegerMinMax || Kind == RecurrenceDescriptor::RK_FloatMinMax);
FirstOpId = IsMinMax ? 1 : 0;
SecondOpId = FirstOpId + 1;

```
and then do

```
auto *FirstOp = R->getOperand(FirstOpId);
auto *SecondOp =  R->getOperand(SecondOpId);
VPValue *VecOp = Plan->getVPValue(FirstOp == Chain ? SecondOp : FirstOp);

```


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3769
     // MinMax reduction have the start value as their identify.
-    if (VF == 1) {
+    if (VF == 1 || UseInloopReductions) {
       VectorStart = Identity = ReductionStartValue;
----------------
dmgreen wrote:
> Ayal wrote:
> > This is dead code if cmp/select chains are not recognized yet, as noted above.
> I've added the code to handle minmax too (but not tested it a lot yet. I will try that now).
> 
> MVE has instructions for integer min/max reductions, but they can be slow enough to make them not worth using over a normal vmin/vmax. Adds are always not-slower-enough to warrant the inloop reduction (and have other advantages like handling higher type sizes and folding in more instructions.)
> 
> My point is that min/max, like some of the other fadd/mul/and/etc might not be used by MVE yet. If you think the code is more hassle than it deserves, then we could take them out for the time being. I'd like to leave them in for consistency though, even if it's not used straight away.
Would be good to make sure code is being exercised and tested. Could inloop min/max (and/or other reductions) help reduce code size, and be applied when vectorizing under optsize?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3847
+  if ((VF > 1 && !UseInloopReductions) &&
+      Phi->getType() != RdxDesc.getRecurrenceType()) {
     Type *RdxVecTy = VectorType::get(RdxDesc.getRecurrenceType(), VF);
----------------
dmgreen wrote:
> Ayal wrote:
> > Checking if !UseInLoopReductions is redundant, as current in loop reductions have phi's of the same type as the recurrence; right?
> > May leave it as an assert, until in loop reductions handle truncation/extension.
> Right. We look from the phi down (now), so can't go through the 'and' we would need to detect the different bitwidth. That would leave 'and' reduction,  which I think would not got through that codepath to create type promoted phis?
> 
> I've put an explicit check in for it to be sure, and added some testing for those cases.
'and' reduction may or may not be performed in a smaller type, iinm.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7286
+    RecurrenceDescriptor &RdxDesc = Reduction.second;
+    if (CM.useInloopReductions(Reduction.first)) {
+      PHINode *Phi = Reduction.first;
----------------
dmgreen wrote:
> Ayal wrote:
> > Iterate over in loop reductions?
> Do you mean adding an iterator for iterating over reductions, stepping over the ones not inloop?
> 
> It would seem like it's similar to the existing code, but as a new iterator class. My gut says the current code is simpler and clearer what is going on?
Suggestion was to iterate over the PHIs/elements of InloopReductionChains, rather than over all reduction PHIs of Legal->getReductionVars().

(Better early-exit via "if (!CM.isInLoopReduction(Reduction.first)) continue;")


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7416
+      RecurrenceDescriptor &RdxDesc = Reduction.second;
+      if (!CM.useInloopReductions(Phi))
+        continue;
----------------
Ayal wrote:
> Iterate over in loop reductions?
> Comment that they are ordered top-down, starting from the phi.
Suggestion was to iterate over the PHIs/elements of InloopReductionChains, rather than over all reduction PHIs of Legal->getReductionVars().


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7442
   // and the live-out instruction of each reduction, at the end of the latch.
-  if (CM.foldTailByMasking()) {
+  if (CM.foldTailByMasking() && CM.hasOuterloopReductions()) {
     Builder.setInsertPoint(VPBB);
----------------
Ayal wrote:
> In loop reductions are currently disabled in fold tail. Checking also if hasOuterloopReductions() is an independent optimization that should be done separately of this patch.
Checking also if hasOuterloopReductions() is an independent optimization that should be done separately of this patch.


================
Comment at: llvm/test/Transforms/LoopVectorize/reduction-inloop.ll:2
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt < %s  -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -force-reduction-intrinsics -dce -instcombine -S | FileCheck %s
+; RUN: opt < %s  -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -force-inloop-reductions -force-reduction-intrinsics -dce -instcombine -S | FileCheck %s
 
----------------
dmgreen wrote:
> Ayal wrote:
> > It is indeed good to examine the diff in this review, but existing non -force-inloop-reductions RUN should be retained; e.g., by cloning the file for a separate -force-inloop-reductions RUN. 
> > 
> > Would be good to also test with -force-vector-interleave greater than 1.
> This testcase is new for this patch (although that wasn't obvious from how I presented it). I was just trying to show the diffs here and the option didn't exist before this patch, hence it wasn't already there.
> 
> Are you suggesting we should still have these same test for without -force-inloop-reductions? I think those should already be tested elsewhere, but I can add it if you think it's useful.
ok, understood; no need to retain non-existing tests.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75069/new/

https://reviews.llvm.org/D75069





More information about the llvm-commits mailing list