[PATCH] D133758: [VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe.

Sun Nov 27 14:03:51 PST 2022

fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9573
+  if (Step->getType() != BaseIV->getType())
+    Step = State.Builder.CreateTrunc(Step, BaseIV->getType());

----------------
Ayal wrote:
> fhahn wrote:
> > Ayal wrote:
> > > This truncation of Step is needed only if fed directly from CanonicalIV, because the Step produced by DeriveIV should have the desired (value and) type, including a truncation at the end if needed, right?
> > > 
> > > Alternatively, introduce a DeriveIV recipe also if only truncation of Step is needed, so as not to sink it into triangles? (Or have ScalarIV recipe take care of TruncToTy always, also for derived steps, relieving DeriveIV of doing so, though this would sink into triangles.)
> > at the moment, a derived IV recipe is created if only a truncate is needed, but the derived IV will only be truncated at the end, so it uses the wide step whereas for the steps recipe we need the truncated step. 
> > 
> > I think to unify this we would have to compute the derived IV with truncated steps, but that would be a bigger, unrelated change,
> Perhaps worth leaving a note behind about said bigger unrelated change.
Added to `buildScalarSteps`.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2348
   assert(ScalarIVTy == Step->getType() &&
          "Val and Step should have the same type");

----------------
Ayal wrote:
> Above assert is now redundant.
> 
> Can hoist the comment above and rephrase it, e.g., "Ensure scalar IV and step have the same integer type", or rather "Ensure step has the same type as that of scalar IV"?
Updated, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9550
+  DerivedIV->setName("offset.idx");
+  if (TruncToTy && TruncToTy != DerivedIV->getType()) {
+    assert(Step->getType()->isIntegerTy() &&
----------------
Ayal wrote:
> nit: suffice to ask if TruncToTy != DerivedIV->getType() as the latter is never null?
Adjusted, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9551
+  if (TruncToTy && TruncToTy != DerivedIV->getType()) {
+    assert(Step->getType()->isIntegerTy() &&
+           "Truncation requires an integer step");
----------------
Ayal wrote:
> assert(Step->getType()->isIntegerTy()) belongs earlier if still needed here at all?
I might be missing something, but I think before the patch we also only had this assert for the case we need to truncate, as this can only be done for integer types. The induction could also be a floating point IV in general here I think.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9521
+  Value *ScalarIV = State.get(getCanonicalIV(), VPIteration(0, 0));
+  Value *TransformedIV =
+      Ty->isIntegerTy()
----------------
Ayal wrote:
> fhahn wrote:
> > Ayal wrote:
> > > fhahn wrote:
> > > > Ayal wrote:
> > > > > Some things seem a bit confusing here, looking at the existing code:
> > > > > 
> > > > > VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()?
> > > > > 
> > > > > The original code has both `ScalarIV` and `CanonicalIV` - are they not the same - one retrieves a Value per lane (0,0) and the other per part (0) - used only to check its type?
> > > > > 
> > > > > Now `TransformedIV` is also "Scalar" (as in non-Vector) similar to `ScalarIV`.
> > > > > 
> > > > > Perhaps instead of
> > > > > `Value *ScalarIV = State.get(getCanonicalIV(), VPIteration(0, 0));`
> > > > > we should have
> > > > > `Value *CanonicalIV = State.get(getCanonicalIV(), VPIteration(0, 0));`
> > > > > ?
> > > > > 
> > > > > Perhaps instead of `TransformedIV` have `NonCanonicalIV`, `AffineIV` or `DerivedIV` - considering that the canonical IV is aka a "BasicIV"?
> > > > > 
> > > > > Then rename `VPTransformedIVRecipe` accordingly?
> > > > > 
> > > > > Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane.
> > > > > VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()?
> > > > 
> > > > Yes we could also use `VPlan->getCanonicalIV()`, but it might be easier to follow if modeled explicitly?
> > > > 
> > > > > The original code has both ScalarIV and CanonicalIV - are they not the same - one retrieves a Value per lane (0,0) and the other per part (0) - used only to check its type?
> > > > 
> > > > 
> > > > Yep, that should be cleaner in the new code.
> > > > 
> > > > > Perhaps instead of
> > > > 
> > > > Thanks, I updated the naming to use `CanonicalIV` and `DerivedIV`. I also renamed `VPTransformedIVRecipe` -> `VPDerivedIVRecipe`
> > > > 
> > > > > Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane.
> > > > 
> > > > Good idea, I'll see about that separately.
> > > >> VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()?
> > > 
> > > > Yes we could also use VPlan->getCanonicalIV(), but it might be easier to follow if modeled explicitly?
> > > 
> > > Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.)
> > > 
> > > >> Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane.
> > > 
> > > > Good idea, I'll see about that separately.
> > > 
> > > Found a good place?
> > Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.)
> > 
> > I think `emitTransformedIndex` needs the canonical IV.
> > 
> > > Found a good place?
> > 
> > I put up D138748 to add it to the VPHeaderPHIRecipe documentation.
> > Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.)
> > 
> > I think `emitTransformedIndex` needs the canonical IV.
> 
> Hmm, `emitTransformedIndex` just needs an "Index", i.e., an "IV" or "BasicIV", regardless if it is "the canonical" IV.
> 
> 
Right, but VPDerivedIVRecipe will always use the canonical IV, at least to start with IIUC.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1968
+class VPDerivedIVRecipe : public VPRecipeBase, public VPValue {
+  /// If not nullptr, truncate the generated values to TruncToTy.
+  Type *TruncToTy;
----------------
Ayal wrote:
> nit: it seems TruncToTy is always non-null, can simplify later check?
Adjusted, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1976
+  VPDerivedIVRecipe(const InductionDescriptor &IndDesc, VPValue *Start,
+                    VPValue *CanonicalIV, VPValue *Step, Type *TruncToTy)
+      : VPRecipeBase(VPDerivedIVSC, {Start, CanonicalIV, Step}),
----------------
Ayal wrote:
> CanonicalIV >> BaseIV or IV? I.e., any Index that has a value per iteration rather than per-part/per-lane, regardless if it is "the canonical" IV?
> 
> TruncToTy >> ResultTy? I.e., always specifies the type of the result, never nullptr? 
> CanonicalIV >> BaseIV or IV? I.e., any Index that has a value per iteration rather than per-part/per-lane, regardless if it is "the canonical" IV?

I kept it as CanonicalIV for now, as this is what all current clients use (also updated the constructor to require `VPCanonicalIVPHIRecipe`. I can change it if you would prefer, but can also do that if we ever lift the restriction.

> TruncToTy >> ResultTy? I.e., always specifies the type of the result, never nullptr?

Updated, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1172
   /// Returns the incoming value from the loop backedge.
-  VPValue *getBackedgeValue() {
-    return getOperand(1);
-  }
+  VPValue *getBackedgeValue() { return getOperand(1); }

----------------
Ayal wrote:
> nit: inlining can be applied separately.
I think it was already inlined in current `main`, it just got re-formatted here. Undid the formatting change.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1875
+  /// Check if the induction described by \p ID is canonical, i.e. has a step of
+  /// 1 and  the same type and start as the canonical IV.
+  bool isCanonical(const InductionDescriptor &ID) const;
----------------
Ayal wrote:
> Ayal wrote:
> > Ayal wrote:
> > > nit: extra space "and  the"
> > > 
> > > step is also the same (1), as type and start; or does the original canonical unit step become UF * VF?
> > I mean, would the following rephrasing be better:
> > `/// i.e., has the same start, step (of 1), and type as the canonical IV.`
> > ?
> > 
> > 
> above nit worth addressing?
Missed those earlier, should be adjusted, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:1056
+                                         Type *Ty) const {
+  if (Ty != getScalarType() || ID.getInductionOpcode() != Instruction::Add)
+    return false;
----------------
Ayal wrote:
> > turns out the assertion triggered in some tests, added as an extra condition.
> 
> Good! Worth a comment?
Moved the check to the return and move the part about incrementing it by one there.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:1058
+    return false;
+  // The start value of the steps-recipe must match the start value of the
+  // canonical induction and it must step by 1.
----------------
Ayal wrote:
> steps-recipe?
Adjusted, thanks!

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll:22
 ; VPLANS-NEXT:     ACTIVE-LANE-MASK-PHI vp<%5> = phi vp<%3>, vp<%10>
-; VPLANS-NEXT:     vp<%6>    = SCALAR-STEPS vp<%4>, ir<0>, ir<1>
+; VPLANS-NEXT:     vp<%6>    = SCALAR-STEPS vp<%4>
 ; VPLANS-NEXT:     CLONE ir<%gep> = getelementptr ir<%ptr>, vp<%6>
----------------
Ayal wrote:
> Ayal wrote:
> > This does appear less descriptive than having SCALAR-STEPS depict the step it uses?
> Ah, the step it uses is explicit - it's ir<1>! What has been omitted is "start" (ir<0>), which is moved from ScalarIVSteps to DerivedIV, if needed.
Ah I missed this comment earlier, Yes, DerivedIV will now take care of adjusting the start value, ScalarIVSteps just generate steps.

================
Comment at: llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll:73
 ; DBG-NEXT:     EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION
-; DBG-NEXT:     vp<[[STEPS1:%.+]]>    = SCALAR-STEPS vp<[[CAN_IV]]>, ir<false>, ir<true>
-; DBG-NEXT:     vp<[[STEPS2:%.+]]>    = SCALAR-STEPS vp<[[CAN_IV]]>, ir<0>, ir<1>
+; DBG-NEXT:     vp<[[DERIVED_IV:%.+]]> = DERIVED-IV vp<[[CAN_IV]]>, ir<false>, ir<true>
+; DBG-NEXT:     vp<[[STEPS1:%.+]]>    = SCALAR-STEPS vp<[[DERIVED_IV]]>
----------------
Ayal wrote:
> fhahn wrote:
> > Ayal wrote:
> > > fhahn wrote:
> > > > Ayal wrote:
> > > > > While we're here, `ir<false>`, `ir<true>` seem odd (and even ;-)) 
> > > > I guess that's because they are boolean values and the IR get printed as boolean literals by the IR printer. Do you think this is something that should be changed?
> > > Oh well, the input IR is adding and subtracting true and false...
> > > 
> > > The reason for having DERIVED-IV with canonical start `ir<false>` ==0 and step `ir<true>` ==1 is because of type expansion and/or truncation?
> > Here the inputs are already truncated to `i1` before recipe construction so the recipe doesn't truncated itself.
> Trying to clarify why DERIVED-IV is needed at all here (too), given that it starts at 0 (false) and bumps with a step of 1 (true)?
I think `DERIVED-IV` here is for `%d = phi i1 ...`, which has a different type than the canonical IV, but doesn't itself need truncating because the operands are already `i1`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133758/new/

https://reviews.llvm.org/D133758