[PATCH] D133758: [VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe.

Sun Nov 27 15:58:51 PST 2022

Ayal accepted this revision.
Ayal added a comment.
This revision is now accepted and ready to land.

This looks good to me, ship it!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2343
+    // TODO: Also use VPDerivedIVRecipe when only the step needs truncating, to
+    // avoid separate truncate here.
     assert(Step->getType()->isIntegerTy() &&
----------------
Is this TODO an NFC to simplify the code, w/o affecting the generated code?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9551
+  if (TruncToTy && TruncToTy != DerivedIV->getType()) {
+    assert(Step->getType()->isIntegerTy() &&
+           "Truncation requires an integer step");
----------------
fhahn wrote:
> Ayal wrote:
> > assert(Step->getType()->isIntegerTy()) belongs earlier if still needed here at all?
> I might be missing something, but I think before the patch we also only had this assert for the case we need to truncate, as this can only be done for integer types. The induction could also be a floating point IV in general here I think.
Ahh, sorry, agreed. (Confused by same assert-guarding-trunc in buildScalarSteps() and thought one could check if ResultTy differs from Step->getType() instead of DerivedIV->getType() thereby asserting earlier before emitTransformedIndex(). But current code is fine.)

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9521
+  Value *ScalarIV = State.get(getCanonicalIV(), VPIteration(0, 0));
+  Value *TransformedIV =
+      Ty->isIntegerTy()
----------------
fhahn wrote:
> Ayal wrote:
> > fhahn wrote:
> > > Ayal wrote:
> > > > fhahn wrote:
> > > > > Ayal wrote:
> > > > > > Some things seem a bit confusing here, looking at the existing code:
> > > > > > 
> > > > > > VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()?
> > > > > > 
> > > > > > The original code has both `ScalarIV` and `CanonicalIV` - are they not the same - one retrieves a Value per lane (0,0) and the other per part (0) - used only to check its type?
> > > > > > 
> > > > > > Now `TransformedIV` is also "Scalar" (as in non-Vector) similar to `ScalarIV`.
> > > > > > 
> > > > > > Perhaps instead of
> > > > > > `Value *ScalarIV = State.get(getCanonicalIV(), VPIteration(0, 0));`
> > > > > > we should have
> > > > > > `Value *CanonicalIV = State.get(getCanonicalIV(), VPIteration(0, 0));`
> > > > > > ?
> > > > > > 
> > > > > > Perhaps instead of `TransformedIV` have `NonCanonicalIV`, `AffineIV` or `DerivedIV` - considering that the canonical IV is aka a "BasicIV"?
> > > > > > 
> > > > > > Then rename `VPTransformedIVRecipe` accordingly?
> > > > > > 
> > > > > > Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane.
> > > > > > VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()?
> > > > > 
> > > > > Yes we could also use `VPlan->getCanonicalIV()`, but it might be easier to follow if modeled explicitly?
> > > > > 
> > > > > > The original code has both ScalarIV and CanonicalIV - are they not the same - one retrieves a Value per lane (0,0) and the other per part (0) - used only to check its type?
> > > > > 
> > > > > 
> > > > > Yep, that should be cleaner in the new code.
> > > > > 
> > > > > > Perhaps instead of
> > > > > 
> > > > > Thanks, I updated the naming to use `CanonicalIV` and `DerivedIV`. I also renamed `VPTransformedIVRecipe` -> `VPDerivedIVRecipe`
> > > > > 
> > > > > > Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane.
> > > > > 
> > > > > Good idea, I'll see about that separately.
> > > > >> VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()?
> > > > 
> > > > > Yes we could also use VPlan->getCanonicalIV(), but it might be easier to follow if modeled explicitly?
> > > > 
> > > > Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.)
> > > > 
> > > > >> Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane.
> > > > 
> > > > > Good idea, I'll see about that separately.
> > > > 
> > > > Found a good place?
> > > Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.)
> > > 
> > > I think `emitTransformedIndex` needs the canonical IV.
> > > 
> > > > Found a good place?
> > > 
> > > I put up D138748 to add it to the VPHeaderPHIRecipe documentation.
> > > Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.)
> > > 
> > > I think `emitTransformedIndex` needs the canonical IV.
> > 
> > Hmm, `emitTransformedIndex` just needs an "Index", i.e., an "IV" or "BasicIV", regardless if it is "the canonical" IV.
> > 
> > 
> Right, but VPDerivedIVRecipe will always use the canonical IV, at least to start with IIUC.
Agreed, VPDerivedIVRecipe is (currently) always fed the canonical IV, but it can compute a derived IV given any BaseIV, i.e., does not rely on it being The Canonical IV.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:659
            "the canonical IV should only be used by its increments or "
            "ScalarIVSteps when "
            "resetting the start value");
----------------
fhahn wrote:
> Ayal wrote:
> > ... or DerivedIV ...
> added, thanks
worth adding in the error message as well

================
Comment at: llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll:73
 ; DBG-NEXT:     EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION
-; DBG-NEXT:     vp<[[STEPS1:%.+]]>    = SCALAR-STEPS vp<[[CAN_IV]]>, ir<false>, ir<true>
-; DBG-NEXT:     vp<[[STEPS2:%.+]]>    = SCALAR-STEPS vp<[[CAN_IV]]>, ir<0>, ir<1>
+; DBG-NEXT:     vp<[[DERIVED_IV:%.+]]> = DERIVED-IV vp<[[CAN_IV]]>, ir<false>, ir<true>
+; DBG-NEXT:     vp<[[STEPS1:%.+]]>    = SCALAR-STEPS vp<[[DERIVED_IV]]>
----------------
fhahn wrote:
> Ayal wrote:
> > fhahn wrote:
> > > Ayal wrote:
> > > > fhahn wrote:
> > > > > Ayal wrote:
> > > > > > While we're here, `ir<false>`, `ir<true>` seem odd (and even ;-)) 
> > > > > I guess that's because they are boolean values and the IR get printed as boolean literals by the IR printer. Do you think this is something that should be changed?
> > > > Oh well, the input IR is adding and subtracting true and false...
> > > > 
> > > > The reason for having DERIVED-IV with canonical start `ir<false>` ==0 and step `ir<true>` ==1 is because of type expansion and/or truncation?
> > > Here the inputs are already truncated to `i1` before recipe construction so the recipe doesn't truncated itself.
> > Trying to clarify why DERIVED-IV is needed at all here (too), given that it starts at 0 (false) and bumps with a step of 1 (true)?
> I think `DERIVED-IV` here is for `%d = phi i1 ...`, which has a different type than the canonical IV, but doesn't itself need truncating because the operands are already `i1`.
Ah, right; DERIVED-IV truncates CAN_IV to i1 before Mul & Add, which is not dumped-out like the truncation to ResultTy.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133758/new/

https://reviews.llvm.org/D133758