[PATCH] D99750: [LV, VP]VP intrinsics support for the Loop Vectorizer

Thu Sep 7 15:10:19 PDT 2023

fhahn added a comment.

Thanks for the latest update! A few more comments inline and it looks like there are a few older comments that haven't been addressed yet.

Overall it looks like we are converging on a good initial version. As for modeling EVL, IMO having the EVL VPInstruction set the active vector length for the remainder of the region would be OK to start with,  but it would be good to wait and hear @Ayal's thoughts as well.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1801
+  /// Control whether to generate VP intrinsics in vectorized code.
+  bool PreferVPIntrinsics = false;
+
----------------
This is only for using EVL for now, would be good to clarify name and comment.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8112
   // When not folding the tail, use nullptr to model all-true mask.
-  if (!CM.foldTailByMasking()) {
+  if (!CM.foldTailByMasking() || CM.useVPIVectorization()) {
     BlockMaskCache[Header] = nullptr;
----------------
Better to replace the mask together with introducing EVL to make sure EVL gets added when the mask gets removed?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:277
+    cl::Hidden,
+    cl::desc("When vectorizing with tail-folding, generate vector predication "
+             "intrinsics."),
----------------
Ayal wrote:
> Description talks about tail-folding, name of variable and command-line argument do not.
Looks like the description still needs updating?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1646
+  /// Returns true if VP intrinsics should be generated in the tail folded loop.
+  bool preferVPIntrinsics() const {
+    return foldTailByMasking() && PreferVPIntrinsics;
----------------
Ayal wrote:
> hiraditya wrote:
> > nit: IMO the function name doesn't imply the semantics precisely.
> > 
> > nit: reorder to aviod call if possible
> > ```cpp
> > PreferVPIntrinsics && foldTailByMasking();
> > ```
> +1
> 
> The acronym `VP` is overloaded here and mainly short for "VPlan", as in VPInstruction.
> Perhaps `VPI` would be clearer, short for VP Intrinsics.
> Seems to mean "foldTailByEVL()" or "foldTailByVPI()"?
Agreed, it should be possible to use VP intrinsics without EVL, so it would be good to have the name reflect that this is focused about introducing EVL? Same for the user-exposed options.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:832
+      continue;
+    auto *NewInst =
+        new VPInstruction(VPInstruction::ExplicitVectorLengthIVIncrement,
----------------
I think turning the step of the canonical induction non-loop-invariant technically turns the canonical IV into a phi that's not a canonical IV any more (which is guaranteed to step the same amount each iteration). Would it work to keep the increment unchanged and keep rounding up the trip count was with regular tail folding initially? Further down the line, the canonical IV issue may be resolved by also replacing the canonical IV node with a regular scalar phi when doing the replacement here.

================
Comment at: llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll:428
+
+for.body:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
----------------
one more test that would probably be good to have is one with a conditional memory operation, which needs EVL and the block mask, right?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99750/new/

https://reviews.llvm.org/D99750