[PATCH] D158779: [VPlan] Add active-lane-mask as VPlan-to-VPlan transformation.

Mon Sep 25 05:32:35 PDT 2023

fhahn marked 8 inline comments as done.
fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:862
+// the loop terminator with a branch-on-cond recipe with the negated
+// active-lane-mask as operand. Only the existing terminator is replaced, all
+// other existing recipes/users remain unchanged. Return the created
----------------
Ayal wrote:
> (note that replacing branch-on-count with branch-on-cond(!ALM) effectively turns a countable loop into an uncountable one, yet VectorTripCount remains intact.)
added a note for now/

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:863
+// active-lane-mask as operand. Only the existing terminator is replaced, all
+// other existing recipes/users remain unchanged. Return the created
+// VPActiveLaneMaskPHIRecipe.
----------------
Ayal wrote:
> (plus some poison droppings, to be percise.)
Added, thanks !

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:866
+//
+// The function adds the following recipes
+//
----------------
Ayal wrote:
> 
Adjusted and moved the `add` part below.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:873
+//  %StartV is the canonical induction start value.
+//
+// vector.ph:
----------------
Ayal wrote:
> // The function adds the following recipes:
Added, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:874
+//
+// vector.ph:
+//   %EntryInc = canonical-iv-increment-for-part %StartV
----------------
Ayal wrote:
> //    %TripCount = calculate-trip-count-minus-VF (original TC) [if DataWithControlFlowWithoutRuntimeCheck]
Added, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:896
+      cast<VPInstruction>(CanonicalIVPHI->getBackedgeValue());
+  // TODO: Check if dropping the flags is needed in if
+  // !DataAndControlFlowWithoutRuntimeCheck.
----------------
Ayal wrote:
> 
Fixed, thanks!

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll:418
+; CHECK-ORDERED-TF-NEXT:    [[TMP23:%.*]] = icmp ugt i64 [[N]], [[TMP21]]
+; CHECK-ORDERED-TF-NEXT:    [[TMP24:%.*]] = select i1 [[TMP23]], i64 [[TMP22]], i64 0
+; CHECK-ORDERED-TF-NEXT:    [[TMP25:%.*]] = call i64 @llvm.vscale.i64()
----------------
Ayal wrote:
> nit (independent of this patch): suffice to generate this saturating subtraction of N - vscale*32 once instead of replicating it four times.
Yes, hopefully should be cleaned up by explicit unrolling, but I could check if we can adjust this during `execute` to start with ,like done in other places.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158779/new/

https://reviews.llvm.org/D158779