[PATCH] D125301: [LoopVectorize] Add option to use active lane mask for loop control flow
Paul Walker via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 28 10:55:20 PDT 2022
paulwalker-arm added a comment.
Out of interest are the test output produced manually or via an update script. I ask because I cannot see any of the normal headers and am wondering why.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8635-8637
// Add a VPCanonicalIVPHIRecipe starting at 0 to the header, a
// CanonicalIVIncrement{NUW} VPInstruction to increment it by VF * UF and a
// BranchOnCount VPInstruction to the latch.
----------------
This comment is now out of date. Perhaps it's best to replace it with something more generic and have the implementation specific part nearer the code?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8673-8674
+ VPValue *TC = Plan.getOrCreateTripCount();
+ auto *EntryALM = new VPInstruction(VPInstruction::ActiveLaneMask,
+ {CanonicalIVIncrementParts, TC}, DL);
+ Preheader->appendRecipe(EntryALM);
----------------
Please ignore if too awkward but I think the output IR will be more readable if we have tighter control over the name used for `EntryALM` and `ALM`. Currently the output is just littered with `active.lane.mask###` making it harder to understand the data flow.
If the entry masks were called `active.lane.mask.entry###`, next iteration masks called `active.lane.mask.next###` and the PHIs keeping their current `active.lane.mask###` name, the loops would be easier to read.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8697-8698
+ // to the exit block.
+ auto *NotMask = new VPInstruction(VPInstruction::Not, ALM, DL);
+ EB->appendRecipe(NotMask);
+
----------------
This seems a bit wasteful given it's only the first lane we care about. perhaps `VPInstruction::BranchOnCount` could take an `invert` flag that switches the branch destinations?
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:795-797
+ Value *Step = createStepForVF(Builder, IV->getType(), State.VF, Part);
+ Value *Next = Builder.CreateAdd(IV, Step, "index.part.next", IsNUW, false);
+ State.set(this, Next, Part);
----------------
I know it'll get folded away by later passes but would it be wrong to handling the `Part == 0` case here? so that we don't litter the loop vectorize tests with pointless adds of zero. I see `CanonicalIVIncrement` doing similar for the non zero parts so I'll take that as precedent for doing such common sense early optimisation.
================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions-tf.ll:7
; CHECK: vector.body:
+; CHECK: %[[ACTIVE_LANE_MASK:.*]] = phi <vscale x 4 x i1> [ %{{.*}}, %vector.ph ], [ %[[ACTIVE_LANE_MASK2:.*]], %vector.body ]
; CHECK: %[[VEC_PHI:.*]] = phi <vscale x 4 x i32> [ zeroinitializer, %vector.ph ], [ %[[PREDPHI:.*]], %vector.body ]
----------------
Given the structure has changed, is it worth showing how the entry active lane mask is constructed?
================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll:4
; CHECK-NOT: vector.body:
----------------
What's this protecting?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D125301/new/
https://reviews.llvm.org/D125301
More information about the llvm-commits
mailing list