[PATCH] D133758: [VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe.

Tue Nov 15 14:03:40 PST 2022

Ayal added a comment.

pong :-)

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9569
   Value *Step = State.get(getStepValue(), VPIteration(0, 0));
-  auto CreateScalarIV = [&](Value *&Step) -> Value * {
-    Value *ScalarIV = State.get(getCanonicalIV(), VPIteration(0, 0));
-    auto *CanonicalIV = State.get(getParent()->getPlan()->getCanonicalIV(), 0);
-    if (!isCanonical() || CanonicalIV->getType() != Ty) {
-      ScalarIV =
-          Ty->isIntegerTy()
-              ? State.Builder.CreateSExtOrTrunc(ScalarIV, Ty)
-              : State.Builder.CreateCast(Instruction::SIToFP, ScalarIV, Ty);
-      ScalarIV = emitTransformedIndex(State.Builder, ScalarIV,
-                                      getStartValue()->getLiveInIRValue(), Step,
-                                      IndDesc);
-      ScalarIV->setName("offset.idx");
-    }
-    if (TruncToTy) {
-      assert(Step->getType()->isIntegerTy() &&
-             "Truncation requires an integer step");
-      ScalarIV = State.Builder.CreateTrunc(ScalarIV, TruncToTy);
-      Step = State.Builder.CreateTrunc(Step, TruncToTy);
-    }
-    return ScalarIV;
-  };
+  if (Step->getType() != BaseIV->getType())
+    Step = State.Builder.CreateTrunc(Step, BaseIV->getType());
----------------
Another type reconciliation worth taking care of by the callee instead of having it assert it? buildScalarSteps() in this case.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9541
+  DerivedIV =
+      emitTransformedIndex(State.Builder, DerivedIV,
+                           getStartValue()->getLiveInIRValue(), Step, IndDesc);
----------------
fhahn wrote:
> Ayal wrote:
> > nit: can also rename this original emitTransformedIndex() - emitDerivedIndex()
> > 
> > TODO: have emitTransformedIndex() also take care of casting its 2nd "Index" parameter to Ty instead of asserting it's the same as that of its 4th "Step" parameter?
> > nit: can also rename this original emitTransformedIndex() - emitDerivedIndex()
> 
> Done!
> 
> > TODO: have emitTransformedIndex() also take care of casting its 2nd "Index" parameter to Ty instead of asserting it's the same as that of its 4th "Step" parameter?
> 
> It looks like this is only needed for the use here and not the other uses, so it seems simpler to keep to code here?
>> TODO: have emitTransformedIndex() also take care of casting its 2nd "Index" parameter to Ty instead of asserting it's the same as that of its 4th "Step" parameter?

> It looks like this is only needed for the use here and not the other uses, so it seems simpler to keep to code here?

Hmm, on the contrary - it appears all users cast the 2nd operand to match the type of the 4th operand before calling emitTransformedIndex(): VTC/"cast.vtc" casts VectorTripCount and AdditionalBypass.second to StepType, CMO/"cast.cmo" casts CountMinusOne to Step Type, PtrInd casts CanonicalIV to Step Type in order for Idx and GlobalIdx to be the desired type.

Here `Step` is obtained from an operand via State, `Ty` is recorded in the recipe, and we eventually assert that the type of Step matches Ty. Seems overly complicated? Would it be simpler, here and also at the other callers, to feed emitTransformedIndex() or emitDerivedIndex() directly with the original IV and Step, and let it do the necessary casting of the former to match the type of the latter?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9573
+  if (Step->getType() != BaseIV->getType())
+    Step = State.Builder.CreateTrunc(Step, BaseIV->getType());

----------------
fhahn wrote:
> Ayal wrote:
> > This truncation of Step is needed only if fed directly from CanonicalIV, because the Step produced by DeriveIV should have the desired (value and) type, including a truncation at the end if needed, right?
> > 
> > Alternatively, introduce a DeriveIV recipe also if only truncation of Step is needed, so as not to sink it into triangles? (Or have ScalarIV recipe take care of TruncToTy always, also for derived steps, relieving DeriveIV of doing so, though this would sink into triangles.)
> at the moment, a derived IV recipe is created if only a truncate is needed, but the derived IV will only be truncated at the end, so it uses the wide step whereas for the steps recipe we need the truncated step. 
> 
> I think to unify this we would have to compute the derived IV with truncated steps, but that would be a bigger, unrelated change,
Perhaps worth leaving a note behind about said bigger unrelated change.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9521
+  Value *ScalarIV = State.get(getCanonicalIV(), VPIteration(0, 0));
+  Value *TransformedIV =
+      Ty->isIntegerTy()
----------------
fhahn wrote:
> Ayal wrote:
> > Some things seem a bit confusing here, looking at the existing code:
> > 
> > VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()?
> > 
> > The original code has both `ScalarIV` and `CanonicalIV` - are they not the same - one retrieves a Value per lane (0,0) and the other per part (0) - used only to check its type?
> > 
> > Now `TransformedIV` is also "Scalar" (as in non-Vector) similar to `ScalarIV`.
> > 
> > Perhaps instead of
> > `Value *ScalarIV = State.get(getCanonicalIV(), VPIteration(0, 0));`
> > we should have
> > `Value *CanonicalIV = State.get(getCanonicalIV(), VPIteration(0, 0));`
> > ?
> > 
> > Perhaps instead of `TransformedIV` have `NonCanonicalIV`, `AffineIV` or `DerivedIV` - considering that the canonical IV is aka a "BasicIV"?
> > 
> > Then rename `VPTransformedIVRecipe` accordingly?
> > 
> > Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane.
> > VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()?
> 
> Yes we could also use `VPlan->getCanonicalIV()`, but it might be easier to follow if modeled explicitly?
> 
> > The original code has both ScalarIV and CanonicalIV - are they not the same - one retrieves a Value per lane (0,0) and the other per part (0) - used only to check its type?
> 
> 
> Yep, that should be cleaner in the new code.
> 
> > Perhaps instead of
> 
> Thanks, I updated the naming to use `CanonicalIV` and `DerivedIV`. I also renamed `VPTransformedIVRecipe` -> `VPDerivedIVRecipe`
> 
> > Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane.
> 
> Good idea, I'll see about that separately.
>> VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()?

> Yes we could also use VPlan->getCanonicalIV(), but it might be easier to follow if modeled explicitly?

Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.)

>> Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane.

> Good idea, I'll see about that separately.

Found a good place?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9549

-  Value *Step = State.get(getStepValue(), VPIteration(0, 0));
-  auto CreateScalarIV = [&](Value *&Step) -> Value * {
-    Value *ScalarIV = State.get(getCanonicalIV(), VPIteration(0, 0));
-    auto *CanonicalIV = State.get(getParent()->getPlan()->getCanonicalIV(), 0);
-    if (!isCanonical() || CanonicalIV->getType() != Ty) {
-      ScalarIV =
-          Ty->isIntegerTy()
-              ? State.Builder.CreateSExtOrTrunc(ScalarIV, Ty)
-              : State.Builder.CreateCast(Instruction::SIToFP, ScalarIV, Ty);
-      ScalarIV = emitTransformedIndex(State.Builder, ScalarIV,
-                                      getStartValue()->getLiveInIRValue(), Step,
-                                      IndDesc);
-      ScalarIV->setName("offset.idx");
-    }
-    if (TruncToTy) {
-      assert(Step->getType()->isIntegerTy() &&
-             "Truncation requires an integer step");
-      ScalarIV = State.Builder.CreateTrunc(ScalarIV, TruncToTy);
-      Step = State.Builder.CreateTrunc(Step, TruncToTy);
-    }
-    return ScalarIV;
-  };
+  Value *ScalarIV = State.get(getOperand(0), VPIteration(0, 0));
+  VPValue *StepVPV;
----------------
fhahn wrote:
> Ayal wrote:
> > ditto: everything here is scalar IV. Perhaps BaseIV, FirstLaneScalarIV, or Start - reviving getStartValue() to wrap getOperand(0)?
> Updated to use `BaseIV`. I kept `getOperand(0)` for now, as it seems like it may be confused with the different behavior of the existing `getStartValue()`.
> I kept getOperand(0) for now, as it seems like it may be confused with the different behavior of the existing getStartValue().

Not sure what the source of confusion is, but it may appear clearer to have either

```
  Value *BaseIV = State.get(getOperand(0), VPIteration(0, 0));
  Value *Step = State.get(getOperand(1), VPIteration(0, 0));

```
or

```
  Value *BaseIV = State.get(getBaseIV(), VPIteration(0, 0));
  Value *Step = State.get(getStepValue(), VPIteration(0, 0));
```

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1972
+                    Type *TruncToTy)
+      : VPRecipeBase(VPDerivedIVSC, {CanonicalIV, Start, Step}),
+        VPValue(VPVDerivedIVSC, nullptr, this), Ty(Ty), TruncToTy(TruncToTy),
----------------
nit: Derived IV stands for producing Start + CanonicalIV * Step, so seems more natural to order its operands and dump them in this order? Possibly with + and * instead of separating commas, possibly along with type cast information.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1996
+  /// Generate the transformed value of the induction at offset StartValue (2.
+  /// operand) + IV (1. operand) * StepValue (3, operand).
+  void execute(VPTransformState &State) override;
----------------
Reordering the operands will also simplify this documentation.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2005
+
+  VPCanonicalIVPHIRecipe *getCanonicalIV() const;
+  VPValue *getStartValue() const { return getOperand(1); }
----------------
Suffice to have

```
VPValue *getBasicIVValue() const { return getOperand(0); }
```
instead?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1860
   /// Returns the scalar type of the induction.
-  const Type *getScalarType() const {
+  Type *getScalarType() const {
     return getOperand(0)->getLiveInIRValue()->getType();
----------------
Ayal wrote:
> Check if this method can remain const if this recipe no longer needs to delegate its Step.
Cannot return a const Type because ConstantInt::get() expects a non-const type, in VPCanonicalIVPHIRecipe::getStepValue()?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1865
+  /// Returns a VPValue for the step (constant 1) of the induction.
+  VPValue *getStepValue();
+
----------------
Ayal wrote:
> Consider removing and feeding users directly with a constant 1 VPValue when needed upon their construction, to keep this recipe with a single value (Start) VPDef. Otherwise it can provide both Start and Step as parts of its multi-valued VPDef...
> 
> (nit: const?)
Is VPCanonicalIVPHIRecipe::getStepValue() still needed?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1875
+  /// Check if the induction described by \p ID is canonical, i.e. has a step of
+  /// 1 and  the same type and start as the canonical IV.
+  bool isCanonical(const InductionDescriptor &ID) const;
----------------
Ayal wrote:
> nit: extra space "and  the"
> 
> step is also the same (1), as type and start; or does the original canonical unit step become UF * VF?
I mean, would the following rephrasing be better:
`/// i.e., has the same start, step (of 1), and type as the canonical IV.`
?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1961
   /// Scalar type to use for the generated values.
   Type *Ty;
   /// If not nullptr, truncate the generated values to TruncToTy.
----------------
Ayal wrote:
> Worth clarifying the use of both Ty (type to cast to before conversion) and TruncToTy (type to cast to after conversion)?
Worth removing Ty altogether from VPDerivedIVRecipe - Step provides the desired type.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:1065
+
+  auto *StepC = dyn_cast_or_null<SCEVConstant>(ID.getStep());
+  return StepC && StepC->isOne();
----------------
nit: can check `ConstantInt *Step = ID.getConstIntStepValue()` as in  Loop::isCanonical().

nit: can assert ID.getInductionOpcode() == Instruction::Add.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:396
+    Type *IVTy = WideIV->getPHINode()->getType();
+    Instruction *TruncI = WideIV->getTruncInst();
+    Type *TruncTy = TruncI ? TruncI->getType() : IVTy;
----------------
nit: TruncI is used for its type only, TruncTy may better be called ResultTy, IVTy is hopefully unneeded if VPDerivedRecipe can be constructed w/o it; consider setting:

```
    Type *IVTy = WideIV->getPHINode()->getType();
    Type *ResultTy = IVTy;
    Type *TruncInstTy = nullptr;
    if (auto *TruncI = WideIV->getTruncInst()) {
      TruncInstTy = TruncI->getType();
      ResultTy = TruncInstTy;
    }

```

================
Comment at: llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll:188
+; DBG-NEXT:     FIRST-ORDER-RECURRENCE-PHI ir<%for> = phi ir<0>, vp<[[SCALAR_STEPS:.+]]>
+; DBG-NEXT:     vp<[[DERIVED_IV:%.+]]> = DERIVED-IV vp<[[CAN_IV]]>, ir<0>, ir<1>
+; DBG-NEXT:     vp<[[SCALAR_STEPS]]> = SCALAR-STEPS vp<[[DERIVED_IV]]>
----------------
The reason for having DERIVED-IV with canonical start ir<0> and step ir<1> is because of type expansion and/or truncation? Perhaps worth dumping the distinct types, as this operation is effectively a cast.

================
Comment at: llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll:189
+; DBG-NEXT:     vp<[[DERIVED_IV:%.+]]> = DERIVED-IV vp<[[CAN_IV]]>, ir<0>, ir<1>
+; DBG-NEXT:     vp<[[SCALAR_STEPS]]> = SCALAR-STEPS vp<[[DERIVED_IV]]>
+; DBG-NEXT:     EMIT vp<[[SPLICE:%.+]]> = first-order splice ir<%for> vp<[[SCALAR_STEPS]]>
----------------
nit: check the second (ir<1>) operand of SCALAR-STEPS, for completeness?

It is in practice either +1 or -1 ...

================
Comment at: llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll:73
 ; DBG-NEXT:     EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION
-; DBG-NEXT:     vp<[[STEPS1:%.+]]>    = SCALAR-STEPS vp<[[CAN_IV]]>, ir<false>, ir<true>
-; DBG-NEXT:     vp<[[STEPS2:%.+]]>    = SCALAR-STEPS vp<[[CAN_IV]]>, ir<0>, ir<1>
+; DBG-NEXT:     vp<[[DERIVED_IV:%.+]]> = DERIVED-IV vp<[[CAN_IV]]>, ir<false>, ir<true>
+; DBG-NEXT:     vp<[[STEPS1:%.+]]>    = SCALAR-STEPS vp<[[DERIVED_IV]]>
----------------
fhahn wrote:
> Ayal wrote:
> > While we're here, `ir<false>`, `ir<true>` seem odd (and even ;-)) 
> I guess that's because they are boolean values and the IR get printed as boolean literals by the IR printer. Do you think this is something that should be changed?
Oh well, the input IR is adding and subtracting true and false...

The reason for having DERIVED-IV with canonical start `ir<false>` ==0 and step `ir<true>` ==1 is because of type expansion and/or truncation?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133758/new/

https://reviews.llvm.org/D133758