[PATCH] D75069: [LoopVectorizer] Inloop vector reductions

Ayal Zaks via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue May 19 16:34:51 PDT 2020


Ayal added a comment.

In D75069#2041073 <https://reviews.llvm.org/D75069#2041073>, @dmgreen wrote:

> Ping. Anyone have any thought/comments about this one? Does it at least look like its going in the right direction, or does anyone have some better suggestions?


Yes, this direction does look right!
Would be good to clarify that the unrolled Parts are still reduced out of loop, and document the current limitations / future TODO extensions, including min/max, conditional bumps, truncate/extend.
Added various minor comments.



================
Comment at: llvm/lib/Analysis/IVDescriptors.cpp:812
+    if (LHS->getOpcode() == Opcode && L->contains(LHS->getParent()) &&
+        LHS->hasOneUse() &&
+        findPathToPhi(LHS, ReductionOperations, Opcode, Phi, L)) {
----------------
Looking for a chain of hasOneUse op's would be easier starting from the Phi and going downwards, until reaching LoopExitInstr?

Note that when extended to handle reductions with conditional bumps, some ops will have more than one use.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:269
+static cl::opt<bool>
+    ForceInloopReductions("force-inloop-reductions", cl::init(false),
+                          cl::Hidden,
----------------
"Force" >> "Prefer"? If a reduction cannot be handled in loop, it is vectorized out of loop, rather than bailing out.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1041
+  /// outside.
+  void categorizeReductions();
+
----------------
collectInLoopReductions()? Perhaps worth holding a map of in loop reduction phi's with their chains.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1311
+  /// Returns true if the Phi is part of an inloop reduction.
+  bool useInloopReductions(PHINode *Phi) const {
+    return InloopReductions.count(Phi);
----------------
isInLoopReduction(Phi)?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1316
+  /// Returns true if there are any inloop reductions.
+  bool hasInloopReductions() const { return InloopReductions.size() != 0; }
+
----------------
Simply call InLoopReductions.empty() directly?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1319
+  /// Returns true if there are any outside-loop reductions.
+  bool hasOuterloopReductions() const {
+    return InloopReductions.size() < Legal->getReductionVars().size();
----------------
hasOutOfLoopReductions()? Outerloop stands for a non innermost loop in a nest.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3751
   setDebugLocFromInst(Builder, ReductionStartValue);
+  bool UseInloopReductions = Cost->useInloopReductions(Phi);
 
----------------
isInLoopReductionPhi


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3769
     // MinMax reduction have the start value as their identify.
-    if (VF == 1) {
+    if (VF == 1 || UseInloopReductions) {
       VectorStart = Identity = ReductionStartValue;
----------------
This is dead code if cmp/select chains are not recognized yet, as noted above.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3847
+  if ((VF > 1 && !UseInloopReductions) &&
+      Phi->getType() != RdxDesc.getRecurrenceType()) {
     Type *RdxVecTy = VectorType::get(RdxDesc.getRecurrenceType(), VF);
----------------
Checking if !UseInLoopReductions is redundant, as current in loop reductions have phi's of the same type as the recurrence; right?
May leave it as an assert, until in loop reductions handle truncation/extension.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3898
 
-  if (VF > 1) {
+  if (VF > 1 && !UseInloopReductions) {
     bool NoNaN = Legal->hasFunNoNaNAttr();
----------------
Worth adding a comment that in loop reductions create their target reductions inside the loop using a recipe, rather that here in the middle block.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6537
+    // want to record it as such.
+    if (!ForceInloopReductions)
+      continue;
----------------
Move this invariant check out as another early-exit?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7286
+    RecurrenceDescriptor &RdxDesc = Reduction.second;
+    if (CM.useInloopReductions(Reduction.first)) {
+      PHINode *Phi = Reduction.first;
----------------
Iterate over in loop reductions?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7416
+      RecurrenceDescriptor &RdxDesc = Reduction.second;
+      if (!CM.useInloopReductions(Phi))
+        continue;
----------------
Iterate over in loop reductions?
Comment that they are ordered top-down, starting from the phi.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7442
   // and the live-out instruction of each reduction, at the end of the latch.
-  if (CM.foldTailByMasking()) {
+  if (CM.foldTailByMasking() && CM.hasOuterloopReductions()) {
     Builder.setInsertPoint(VPBB);
----------------
In loop reductions are currently disabled in fold tail. Checking also if hasOuterloopReductions() is an independent optimization that should be done separately of this patch.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7447
+      if (CM.useInloopReductions(Reduction.first))
+        continue;
       VPValue *Phi = Plan->getVPValue(Reduction.first);
----------------
This is currently dead code, no in loop reductions under fold tail.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7610
+    Value *NewAcc = State.Builder.CreateBinOp(
+        (Instruction::BinaryOps)I->getOpcode(), NewRed, NewChain);
+    State.ValueMap.setVectorValue(I, Part, NewAcc);
----------------
NewChain, NewAcc >> PrevInChain, NextInChain?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:238
   VPTransformState(unsigned VF, unsigned UF, LoopInfo *LI, DominatorTree *DT,
-                   IRBuilder<> &Builder, VectorizerValueMap &ValueMap,
-                   InnerLoopVectorizer *ILV, VPCallback &Callback)
-      : VF(VF), UF(UF), Instance(), LI(LI), DT(DT), Builder(Builder),
+                   const TargetTransformInfo *TTI, IRBuilder<> &Builder,
+                   VectorizerValueMap &ValueMap, InnerLoopVectorizer *ILV,
----------------
Too bad this requires passing TTI through the State everywhere.
Perhaps storing TTI in the recipe would be somewhat better.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1022
 
+/// VPReductionRecipe
+class VPReductionRecipe : public VPRecipeBase {
----------------
Comment is redundant.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1024
+class VPReductionRecipe : public VPRecipeBase {
+private:
+  RecurrenceDescriptor *RdxDesc;
----------------
private is redundant.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanValue.h:47
   friend class VPSlotTracker;
+  friend class LoopVectorizationPlanner;
 
----------------
Why is this friendship needed?


================
Comment at: llvm/test/Transforms/LoopVectorize/icmp-uniforms.ll:43
-; CHECK-NEXT:      "WIDEN-SELECT%s = select %cond0, 10, 20\l" +
-; CHECK-NEXT:      "EMIT vp<%1> = icmp ule ir<%iv> vp<%0>\l"
 ; CHECK-NEXT:  ]
----------------
Optimizing this comparison away is independent of this patch.


================
Comment at: llvm/test/Transforms/LoopVectorize/reduction-inloop.ll:2
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt < %s  -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -force-reduction-intrinsics -dce -instcombine -S | FileCheck %s
+; RUN: opt < %s  -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -force-inloop-reductions -force-reduction-intrinsics -dce -instcombine -S | FileCheck %s
 
----------------
It is indeed good to examine the diff in this review, but existing non -force-inloop-reductions RUN should be retained; e.g., by cloning the file for a separate -force-inloop-reductions RUN. 

Would be good to also test with -force-vector-interleave greater than 1.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75069/new/

https://reviews.llvm.org/D75069





More information about the llvm-commits mailing list