[PATCH] D158779: [VPlan] Add active-lane-mask as VPlan-to-VPlan transformation.

Sun Sep 24 03:51:17 PDT 2023

Ayal accepted this revision.
Ayal added a comment.
This revision is now accepted and ready to land.

This looks good to me, adding some final thoughts and comments, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:140

+  VPInstruction *createOverflowingOp(unsigned Opcode,
+                                     std::initializer_list<VPValue *> Operands,
----------------
fhahn wrote:
> Ayal wrote:
> > All callers of createOverflowingOp() below pass {false, false} as their WrapFlags, i.e., both HasNUW and HasNSW are set to false, which is the default. Suffice to call createNaryOp() instead?
> The current flag-handling code assumes that flags are only queried when the correct `OperationType` is set. Using `createNaryOp` would not set the operation type, so during recipe executing an assert would be triggered when `hasNUW` is called. The main motivation for strict checking is to ensure consistent handling. It effectively enforces that hasNUW is only used on recipes/opcodes that have associated flags, which we would have to weaken, either by having `hasNUW` return false on mismatch or adding extra checks for the operation type. The current handling is similar to how it is handled in LLVM IR.
Yes, OperationType needs to be set correctly for any VPRecipeWithIRFlags, and to OverflowingBinOp for any VPInstruction with CanonicalIVIncrementForPart as its opcode. This could presumably be handled by createNaryOp/createInstruction/VPInstruction-constructor, possibly with a protected setOperationType if needed.
OTOH, if non-default wrap flags are desired, such a designated creation method would be needed.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:862
+// the loop terminator with a branch-on-cond recipe with the negated
+// active-lane-mask as operand. Only the existing terminator is replaced, all
+// other existing recipes/users remain unchanged. Return the created
----------------
(note that replacing branch-on-count with branch-on-cond(!ALM) effectively turns a countable loop into an uncountable one, yet VectorTripCount remains intact.)

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:863
+// active-lane-mask as operand. Only the existing terminator is replaced, all
+// other existing recipes/users remain unchanged. Return the created
+// VPActiveLaneMaskPHIRecipe.
----------------
(plus some poison droppings, to be percise.)

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:866
+//
+// The function adds the following recipes
+//
----------------

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:873
+//  %StartV is the canonical induction start value.
+//
+// vector.ph:
----------------
// The function adds the following recipes:

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:874
+//
+// vector.ph:
+//   %EntryInc = canonical-iv-increment-for-part %StartV
----------------
//    %TripCount = calculate-trip-count-minus-VF (original TC) [if DataWithControlFlowWithoutRuntimeCheck]

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:896
+      cast<VPInstruction>(CanonicalIVPHI->getBackedgeValue());
+  // TODO: Check if dropping the flags is needed in if
+  // !DataAndControlFlowWithoutRuntimeCheck.
----------------

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:941
+                   [](VPUser *U) { return isa<VPWidenCanonicalIVRecipe>(U); });
+  auto *WideCanonicalIV = R ? cast<VPWidenCanonicalIVRecipe>(*R) : nullptr;
+  VPRecipeBase *LaneMask;
----------------
fhahn wrote:
> Ayal wrote:
> > Look for a user U that is a VPWidenCanonicalIVRecipe and resides in the header?
> > (There could also be multiple such U's....)
> > As we're considering multiple ICMP_ULE's, they could be using multiple R's, and both could spread out of the header.
> > 
> > Or simply create a WideCanonicalIV before the first non-phi of the header.
> Added an assert that it is in the header. All compares are users of the WideCanonicalIV, the created ALM should be placed to be valid for all of them. Might be good to ensure there's a single VPWidenCanonicalIVRecipe in the future as part of the verifier?
> Might be good to ensure there's a single VPWidenCanonicalIVRecipe in the future as part of the verifier?
+1

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll:418
+; CHECK-ORDERED-TF-NEXT:    [[TMP23:%.*]] = icmp ugt i64 [[N]], [[TMP21]]
+; CHECK-ORDERED-TF-NEXT:    [[TMP24:%.*]] = select i1 [[TMP23]], i64 [[TMP22]], i64 0
+; CHECK-ORDERED-TF-NEXT:    [[TMP25:%.*]] = call i64 @llvm.vscale.i64()
----------------
nit (independent of this patch): suffice to generate this saturating subtraction of N - vscale*32 once instead of replicating it four times.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158779/new/

https://reviews.llvm.org/D158779