[PATCH] D133758: [VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe.

Sat Nov 26 15:51:32 PST 2022

fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9569
   Value *Step = State.get(getStepValue(), VPIteration(0, 0));
-  auto CreateScalarIV = [&](Value *&Step) -> Value * {
-    Value *ScalarIV = State.get(getCanonicalIV(), VPIteration(0, 0));
-    auto *CanonicalIV = State.get(getParent()->getPlan()->getCanonicalIV(), 0);
-    if (!isCanonical() || CanonicalIV->getType() != Ty) {
-      ScalarIV =
-          Ty->isIntegerTy()
-              ? State.Builder.CreateSExtOrTrunc(ScalarIV, Ty)
-              : State.Builder.CreateCast(Instruction::SIToFP, ScalarIV, Ty);
-      ScalarIV = emitTransformedIndex(State.Builder, ScalarIV,
-                                      getStartValue()->getLiveInIRValue(), Step,
-                                      IndDesc);
-      ScalarIV->setName("offset.idx");
-    }
-    if (TruncToTy) {
-      assert(Step->getType()->isIntegerTy() &&
-             "Truncation requires an integer step");
-      ScalarIV = State.Builder.CreateTrunc(ScalarIV, TruncToTy);
-      Step = State.Builder.CreateTrunc(Step, TruncToTy);
-    }
-    return ScalarIV;
-  };
+  if (Step->getType() != BaseIV->getType())
+    Step = State.Builder.CreateTrunc(Step, BaseIV->getType());
----------------
Ayal wrote:
> Another type reconciliation worth taking care of by the callee instead of having it assert it? buildScalarSteps() in this case.
Refactored in bf0bd85f9d82 and removed here, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9541
+  DerivedIV =
+      emitTransformedIndex(State.Builder, DerivedIV,
+                           getStartValue()->getLiveInIRValue(), Step, IndDesc);
----------------
Ayal wrote:
> fhahn wrote:
> > Ayal wrote:
> > > nit: can also rename this original emitTransformedIndex() - emitDerivedIndex()
> > > 
> > > TODO: have emitTransformedIndex() also take care of casting its 2nd "Index" parameter to Ty instead of asserting it's the same as that of its 4th "Step" parameter?
> > > nit: can also rename this original emitTransformedIndex() - emitDerivedIndex()
> > 
> > Done!
> > 
> > > TODO: have emitTransformedIndex() also take care of casting its 2nd "Index" parameter to Ty instead of asserting it's the same as that of its 4th "Step" parameter?
> > 
> > It looks like this is only needed for the use here and not the other uses, so it seems simpler to keep to code here?
> >> TODO: have emitTransformedIndex() also take care of casting its 2nd "Index" parameter to Ty instead of asserting it's the same as that of its 4th "Step" parameter?
> 
> > It looks like this is only needed for the use here and not the other uses, so it seems simpler to keep to code here?
> 
> Hmm, on the contrary - it appears all users cast the 2nd operand to match the type of the 4th operand before calling emitTransformedIndex(): VTC/"cast.vtc" casts VectorTripCount and AdditionalBypass.second to StepType, CMO/"cast.cmo" casts CountMinusOne to Step Type, PtrInd casts CanonicalIV to Step Type in order for Idx and GlobalIdx to be the desired type.
> 
> Here `Step` is obtained from an operand via State, `Ty` is recorded in the recipe, and we eventually assert that the type of Step matches Ty. Seems overly complicated? Would it be simpler, here and also at the other callers, to feed emitTransformedIndex() or emitDerivedIndex() directly with the original IV and Step, and let it do the necessary casting of the former to match the type of the latter?
Simplified in 12bb5535d270, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9521
+  Value *ScalarIV = State.get(getCanonicalIV(), VPIteration(0, 0));
+  Value *TransformedIV =
+      Ty->isIntegerTy()
----------------
Ayal wrote:
> fhahn wrote:
> > Ayal wrote:
> > > Some things seem a bit confusing here, looking at the existing code:
> > > 
> > > VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()?
> > > 
> > > The original code has both `ScalarIV` and `CanonicalIV` - are they not the same - one retrieves a Value per lane (0,0) and the other per part (0) - used only to check its type?
> > > 
> > > Now `TransformedIV` is also "Scalar" (as in non-Vector) similar to `ScalarIV`.
> > > 
> > > Perhaps instead of
> > > `Value *ScalarIV = State.get(getCanonicalIV(), VPIteration(0, 0));`
> > > we should have
> > > `Value *CanonicalIV = State.get(getCanonicalIV(), VPIteration(0, 0));`
> > > ?
> > > 
> > > Perhaps instead of `TransformedIV` have `NonCanonicalIV`, `AffineIV` or `DerivedIV` - considering that the canonical IV is aka a "BasicIV"?
> > > 
> > > Then rename `VPTransformedIVRecipe` accordingly?
> > > 
> > > Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane.
> > > VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()?
> > 
> > Yes we could also use `VPlan->getCanonicalIV()`, but it might be easier to follow if modeled explicitly?
> > 
> > > The original code has both ScalarIV and CanonicalIV - are they not the same - one retrieves a Value per lane (0,0) and the other per part (0) - used only to check its type?
> > 
> > 
> > Yep, that should be cleaner in the new code.
> > 
> > > Perhaps instead of
> > 
> > Thanks, I updated the naming to use `CanonicalIV` and `DerivedIV`. I also renamed `VPTransformedIVRecipe` -> `VPDerivedIVRecipe`
> > 
> > > Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane.
> > 
> > Good idea, I'll see about that separately.
> >> VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()?
> 
> > Yes we could also use VPlan->getCanonicalIV(), but it might be easier to follow if modeled explicitly?
> 
> Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.)
> 
> >> Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane.
> 
> > Good idea, I'll see about that separately.
> 
> Found a good place?
Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.)

I think `emitTransformedIndex` needs the canonical IV.

> Found a good place?

I put up D138748 to add it to the VPHeaderPHIRecipe documentation.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1972
+                    Type *TruncToTy)
+      : VPRecipeBase(VPDerivedIVSC, {CanonicalIV, Start, Step}),
+        VPValue(VPVDerivedIVSC, nullptr, this), Ty(Ty), TruncToTy(TruncToTy),
----------------
Ayal wrote:
> nit: Derived IV stands for producing Start + CanonicalIV * Step, so seems more natural to order its operands and dump them in this order? Possibly with + and * instead of separating commas, possibly along with type cast information.
Updated, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1996
+  /// Generate the transformed value of the induction at offset StartValue (2.
+  /// operand) + IV (1. operand) * StepValue (3, operand).
+  void execute(VPTransformState &State) override;
----------------
Ayal wrote:
> Reordering the operands will also simplify this documentation.
Updated, thanks! 

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2005
+
+  VPCanonicalIVPHIRecipe *getCanonicalIV() const;
+  VPValue *getStartValue() const { return getOperand(1); }
----------------
Ayal wrote:
> Suffice to have
> 
> ```
> VPValue *getBasicIVValue() const { return getOperand(0); }
> ```
> instead?
Updated, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1860
   /// Returns the scalar type of the induction.
-  const Type *getScalarType() const {
+  Type *getScalarType() const {
     return getOperand(0)->getLiveInIRValue()->getType();
----------------
Ayal wrote:
> Ayal wrote:
> > Check if this method can remain const if this recipe no longer needs to delegate its Step.
> Cannot return a const Type because ConstantInt::get() expects a non-const type, in VPCanonicalIVPHIRecipe::getStepValue()?
Unfortunately yes!

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1865
+  /// Returns a VPValue for the step (constant 1) of the induction.
+  VPValue *getStepValue();
+
----------------
Ayal wrote:
> Ayal wrote:
> > Consider removing and feeding users directly with a constant 1 VPValue when needed upon their construction, to keep this recipe with a single value (Start) VPDef. Otherwise it can provide both Start and Step as parts of its multi-valued VPDef...
> > 
> > (nit: const?)
> Is VPCanonicalIVPHIRecipe::getStepValue() still needed?
It's not needed in the latest version, removed, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1961
   /// Scalar type to use for the generated values.
   Type *Ty;
   /// If not nullptr, truncate the generated values to TruncToTy.
----------------
Ayal wrote:
> Ayal wrote:
> > Worth clarifying the use of both Ty (type to cast to before conversion) and TruncToTy (type to cast to after conversion)?
> Worth removing Ty altogether from VPDerivedIVRecipe - Step provides the desired type.
Removed in the latest version, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:1065
+
+  auto *StepC = dyn_cast_or_null<SCEVConstant>(ID.getStep());
+  return StepC && StepC->isOne();
----------------
Ayal wrote:
> nit: can check `ConstantInt *Step = ID.getConstIntStepValue()` as in  Loop::isCanonical().
> 
> nit: can assert ID.getInductionOpcode() == Instruction::Add.
Updated, thanks! I turns out the assertion triggered in some tests, added as an extra condition.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:396
+    Type *IVTy = WideIV->getPHINode()->getType();
+    Instruction *TruncI = WideIV->getTruncInst();
+    Type *TruncTy = TruncI ? TruncI->getType() : IVTy;
----------------
Ayal wrote:
> nit: TruncI is used for its type only, TruncTy may better be called ResultTy, IVTy is hopefully unneeded if VPDerivedRecipe can be constructed w/o it; consider setting:
> 
> 
> ```
>     Type *IVTy = WideIV->getPHINode()->getType();
>     Type *ResultTy = IVTy;
>     Type *TruncInstTy = nullptr;
>     if (auto *TruncI = WideIV->getTruncInst()) {
>       TruncInstTy = TruncI->getType();
>       ResultTy = TruncInstTy;
>     }
> 
> ```
Simplified, thanks!

================
Comment at: llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll:188
+; DBG-NEXT:     FIRST-ORDER-RECURRENCE-PHI ir<%for> = phi ir<0>, vp<[[SCALAR_STEPS:.+]]>
+; DBG-NEXT:     vp<[[DERIVED_IV:%.+]]> = DERIVED-IV vp<[[CAN_IV]]>, ir<0>, ir<1>
+; DBG-NEXT:     vp<[[SCALAR_STEPS]]> = SCALAR-STEPS vp<[[DERIVED_IV]]>
----------------
Ayal wrote:
> The reason for having DERIVED-IV with canonical start ir<0> and step ir<1> is because of type expansion and/or truncation? Perhaps worth dumping the distinct types, as this operation is effectively a cast.
Updated, thanks!

================
Comment at: llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll:189
+; DBG-NEXT:     vp<[[DERIVED_IV:%.+]]> = DERIVED-IV vp<[[CAN_IV]]>, ir<0>, ir<1>
+; DBG-NEXT:     vp<[[SCALAR_STEPS]]> = SCALAR-STEPS vp<[[DERIVED_IV]]>
+; DBG-NEXT:     EMIT vp<[[SPLICE:%.+]]> = first-order splice ir<%for> vp<[[SCALAR_STEPS]]>
----------------
Ayal wrote:
> nit: check the second (ir<1>) operand of SCALAR-STEPS, for completeness?
> 
> It is in practice either +1 or -1 ...
Done, thanks!

================
Comment at: llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll:73
 ; DBG-NEXT:     EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION
-; DBG-NEXT:     vp<[[STEPS1:%.+]]>    = SCALAR-STEPS vp<[[CAN_IV]]>, ir<false>, ir<true>
-; DBG-NEXT:     vp<[[STEPS2:%.+]]>    = SCALAR-STEPS vp<[[CAN_IV]]>, ir<0>, ir<1>
+; DBG-NEXT:     vp<[[DERIVED_IV:%.+]]> = DERIVED-IV vp<[[CAN_IV]]>, ir<false>, ir<true>
+; DBG-NEXT:     vp<[[STEPS1:%.+]]>    = SCALAR-STEPS vp<[[DERIVED_IV]]>
----------------
Ayal wrote:
> fhahn wrote:
> > Ayal wrote:
> > > While we're here, `ir<false>`, `ir<true>` seem odd (and even ;-)) 
> > I guess that's because they are boolean values and the IR get printed as boolean literals by the IR printer. Do you think this is something that should be changed?
> Oh well, the input IR is adding and subtracting true and false...
> 
> The reason for having DERIVED-IV with canonical start `ir<false>` ==0 and step `ir<true>` ==1 is because of type expansion and/or truncation?
Here the inputs are already truncated to `i1` before recipe construction so the recipe doesn't truncated itself.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133758/new/

https://reviews.llvm.org/D133758