[PATCH] D125301: [LoopVectorize] Add option to use active lane mask for loop control flow

Tue Jun 28 10:55:20 PDT 2022

paulwalker-arm added a comment.

Out of interest are the test output produced manually or via an update script.  I ask because I cannot see any of the normal headers and am wondering why.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8635-8637
 // Add a VPCanonicalIVPHIRecipe starting at 0 to the header, a
 // CanonicalIVIncrement{NUW} VPInstruction to increment it by VF * UF and a
 // BranchOnCount VPInstruction to the latch.
----------------
This comment is now out of date.  Perhaps it's best to replace it with something more generic and have the implementation specific part nearer the code?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8673-8674
+    VPValue *TC = Plan.getOrCreateTripCount();
+    auto *EntryALM = new VPInstruction(VPInstruction::ActiveLaneMask,
+                                       {CanonicalIVIncrementParts, TC}, DL);
+    Preheader->appendRecipe(EntryALM);
----------------
Please ignore if too awkward but I think the output IR will be more readable if we have tighter control over the name used for `EntryALM` and `ALM`.  Currently the output is just littered with `active.lane.mask###` making it harder to understand the data flow.

If the entry masks were called `active.lane.mask.entry###`, next iteration masks called `active.lane.mask.next###` and the PHIs keeping their current `active.lane.mask###` name, the loops would be easier to read.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8697-8698
+    // to the exit block.
+    auto *NotMask = new VPInstruction(VPInstruction::Not, ALM, DL);
+    EB->appendRecipe(NotMask);
+
----------------
This seems a bit wasteful given it's only the first lane we care about.  perhaps `VPInstruction::BranchOnCount` could take an `invert` flag that switches the branch destinations?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:795-797
+    Value *Step = createStepForVF(Builder, IV->getType(), State.VF, Part);
+    Value *Next = Builder.CreateAdd(IV, Step, "index.part.next", IsNUW, false);
+    State.set(this, Next, Part);
----------------
I know it'll get folded away by later passes but would it be wrong to handling the `Part == 0` case here? so that we don't litter the loop vectorize tests with pointless adds of zero. I see `CanonicalIVIncrement` doing similar for the non zero parts so I'll take that as precedent for doing such common sense early optimisation.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions-tf.ll:7
 ; CHECK: vector.body:
+; CHECK:      %[[ACTIVE_LANE_MASK:.*]] = phi <vscale x 4 x i1> [ %{{.*}}, %vector.ph ], [ %[[ACTIVE_LANE_MASK2:.*]], %vector.body ]
 ; CHECK:      %[[VEC_PHI:.*]] = phi <vscale x 4 x i32> [ zeroinitializer, %vector.ph ], [ %[[PREDPHI:.*]], %vector.body ]
----------------
Given the structure has changed, is it worth showing how the entry active lane mask is constructed?

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll:4

 ; CHECK-NOT: vector.body:

----------------
What's this protecting?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125301/new/

https://reviews.llvm.org/D125301