[PATCH] D116123: [VPlan] Handle IV vector splat as VPInstruction.
Florian Hahn via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 27 08:01:01 PST 2021
fhahn added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2729
// Create the vector values from the scalar IV, in the absence of creating a
// vector IV.
auto CreateSplatIV = [&](Value *ScalarIV, Value *Step) {
----------------
Ayal wrote:
> CreateSplatIV() will now be called when unrolling only, i.e., when VF==1, so it creates and stores **scalar** values w/o actually **Splat**ting into vectors. Update comment and name of lambda?
I inlined the lambda, as it is a single use now. I also inlined the scalar path from `getStepVector`, which seems like it should have never been there in the first place.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2774
// create the phi node, we will splat the scalar induction variable in each
// loop iteration.
if (!shouldScalarizeInstruction(EntryVal)) {
----------------
Ayal wrote:
> 2nd "If we can't ..." sentence is obsolete?
removed
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2788
// vectorised IV. Except when we tail-fold, then the splat IV feeds the
// predicate used by the masked loads/stores.
Value *ScalarIV = CreateScalarIV(Step);
----------------
Ayal wrote:
> 2nd "Except ..." sentence is obsolete?
removed
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2790
Value *ScalarIV = CreateScalarIV(Step);
- if (!Cost->isScalarEpilogueAllowed())
- CreateSplatIV(ScalarIV, Step);
buildScalarSteps(ScalarIV, Step, EntryVal, ID, Def, State);
}
----------------
Ayal wrote:
> This last case which only emits a scalar IV can now be folded with the end of the previous case which emits both?
Done!
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8419
VPValue *IV = nullptr;
- if (Legal->getPrimaryInduction())
- IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());
- else {
+ if (!Legal->getPrimaryInduction()) {
VPBasicBlock *HeaderVPBB = Plan->getEntry()->getEntryBasicBlock();
----------------
Ayal wrote:
> Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?
>
> At first FoldTail relied solely on PrimaryInduction, until D77635 extended it to work w/o PrimaryInduction; but it should probably have removed the dependence on PrimaryInduction altogether. This may also simplify/revert D92017. A subsequent optimization could try to fold multiple IV recipes, in case both the PrimaryInduction was widened and WidenCanonicalIV was created (or leave it to LSR).
> Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?
We could do, but it would probably mean a bigger test diff and a bigger functional change. I'd prefer to do it separately, to avoid any unexpected knock-on effects.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8432
VPValue *BTC = Plan->getOrCreateBackedgeTakenCount();
bool TailFolded = !CM.isScalarEpilogueAllowed();
----------------
Ayal wrote:
> TailFolded should always be true? (While we're here, irrespective of this patch)
Yes, I noticed that as well. I'll push a commit, replacing it with an assert.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8442
+ IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());
+ }
BlockMask = Builder.createNaryOp(VPInstruction::ActiveLaneMask, {IV});
----------------
Ayal wrote:
> No need for {}
>
> So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?
>
> BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch).
> No need for {}
I think if the body spans multiple lines due to a comment, braces should not be omitted
https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements
> So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?
Yes, let me try to split that off. `ActiveLaneMask` only needs the first scalar step of each `Part`.
> BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch).
done separately in 2e630eabd329
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8454
+ auto *R =
+ new VPInstruction(VPInstruction::StepVector, {CanIV, OneVPV});
+ Builder.getInsertBlock()->insert(R, NewInsertionPoint);
----------------
Ayal wrote:
> StepVector needs to hold One as its second operand, mainly to convey the desired type?
At the moment only a step of one is used, but it will be used with different steps in the future (build scalar steps for inductions with arbitrary steps.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8849
+ return VF.isVector() && ShouldScalarizeInstruction(Instr, VF) &&
+ NeedsScalarInduction(VF);
+ },
----------------
Ayal wrote:
> NeedsScalarInduction(VF) will always early-return true given that ShouldScalarizeInstruction(Instr, VF) is true?
I think it is required for the case where `Instr` is a truncate.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9175
+ // Replace StepVector recipes of the canonical induction by the widened vector
+ // induction, if it exists.
+ if (auto *CanonicalStepVector =
----------------
Ayal wrote:
> As a redundancy elimination optimization?
> This buildVPlanWithVPRecipes() method deserves outlining anything from it.
Sounds good, moved to ` VPlanTransforms::removeRedundantStepVector`
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:741
+ Value *Step = State.get(getOperand(1), VPIteration(0, 0));
+ Value *Val = State.get(getOperand(0), VPIteration(0, 0));
+ Value *Broadcasted =
----------------
Ayal wrote:
> get operands in order?
> Store singleton scalar values per part 0, instead of per lane 0?
Changed to get `VPIteration(Part, 0)`. I kept accessing `Val` outside the loop, to keep a single broadcast. But there's now an assert.
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:746
+ for (unsigned Part = 0; Part < State.UF; ++Part) {
+ assert(!State.VF.isScalable() && "scalable vectors not yet supported.");
+ Value *StartIdx;
----------------
Ayal wrote:
> assert at the outset? (While we're here, irrespective of this patch)
Move out of the loop and will push a fix separately.
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:856
+ }
+
unsigned getOpcode() const { return Opcode; }
----------------
Ayal wrote:
> Where are these new classof's needed?
Unfortunately this is needed to `dyn_cast` directly from VPUser -> VPInstruction
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D116123/new/
https://reviews.llvm.org/D116123
More information about the llvm-commits
mailing list