[PATCH] D99750: [LV, VP] RFC: VP intrinsics support for the Loop Vectorizer (Proof-of-Concept)

Mon Apr 19 07:56:49 PDT 2021

vkmr added inline comments.

================
Comment at: llvm/include/llvm/Analysis/TargetTransformInfo.h:1381
+  /// TripCount - Induction).
+  bool useCustomActiveVectorLengthIntrinsic() const;
+
----------------
frasercrmck wrote:
> Part of me wants to say this should be `usesCustomActiveVectorLengthIntrinsic` like "the target `hasActiveVectorLength`" and "the target `supportsScalableVectors`".
> 
> I'm not arguing very strongly though because we have `preferPredicatedReductionSelect`. I know, it's the epitome of bikeshedding. I don't know if it's any better if it's self-aware.
Hmmm...I am in double-mind here too. 
One way to look at it is that with `useCustomActiveVectorLengthIntrinsic` or `preferPredicatedReductionSelect` there is a notion of choice - even if it can use a custom vector length intrinsic it might choose not to; It is being told to pick one choice . Whereas `usesCustomActiveVectorLengthIntrinsic` or `hasActiveVectorLength` implies reporting a fact. Not sure if it makes much sense!

May be others can chime in too. I am not too particular about names as long as they are reasonably sensible.

> I know, it's the epitome of bikeshedding. I don't know if it's any better if it's self-aware.
"There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:280
+                   "is not supported. This option is discouraged."),
+        clEnumValN(PreferVPIntrinsicsTy::ForceAVLSupport,
+                   "force-active-vector-length-support",
----------------
bmahjour wrote:
> don't see ForceAVLSupport being used anywhere in this patch.
Not explicitly but implicitly as the default choice if other options do not match. See the test cases for more clarity on how it works and what it results in. 

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:628
+  /// intrinsics.
+  Value *createEVL();
+
----------------
bmahjour wrote:
> Could this be moved to the `VPWidenEVLRecipe` class? 
This is called from the `execute` method of the recipe and is responsible for generating IR instructions to compute `EVL`. Keeping it in  `InnerLoopVectorizer` is consistent with how the widening methods for other recipes are a part of `InnerLoopVectorizer`.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2931
+  if (Reverse)
+    assert(!EVL &&
+           "Vector reverse not supported for predicated vectorization.");
----------------
bmahjour wrote:
> how do we guard against this situation? I think this should be part of the legality check...eg `isLegalMaskedLoad` check for `Legal->isConsecutivePtr(Ptr)`...if we had a function similar to isLegalMaskedLoad  (say isLegalEVLLoad), then it could check for -1 stride and return false...then we'd be guaranteed not to hit this assert.
Do you mean the legality checker should determine if it's a reverse operation then it is illegal to use EVL predicated Load and should default to Masked load (if that is legal)?

Eventually, there will be support for reverse with EVL predication and required legality checks will be added. For this PoC patch however, it is more explicit and straightforward to have an assert here. 

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4969

+void InnerLoopVectorizer::widenPredicatedInstruction(Instruction &I,
+                                                     VPValue *Def, VPUser &User,
----------------
simoll wrote:
> bmahjour wrote:
> > I guess the targets that don't need/have the concept of predicated binary operations, must provide lowering for all these calls to ultimately generate non-predicated vector code. I'd expect that to be a large effort with little gain. To allow the EVL exploitation while the lowering is being provided, would it make sense to have a path in this function where we just fall back to `InnerLoopVectorizer::widenInstruction` under an option?
> D78203 implements lowering from VP intrinsics to regular SIMD instructions for all targets.
> 
> If targets support VP, they should get it. If IR with VP intrinsics ends up getting compiled for a non-VP targets, the intrinsics will disappear before they hit lowering.
To add to @simoll's comment, eventually we will be using the `VectorBuilder` (earlier `VPBuilder`) to widen to VP intrinsics instead of manually creating each intrinsic here. `VectorBuilder` can also make some decisions on when and how to use the intrinsics.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9017
+  if (isa<LoadInst>(Instr) || isa<StoreInst>(Instr)) {
+    if (preferPredicatedWiden())
+      return toVPRecipeResult(tryToPredicatedWidenMemory(Instr, Range, Plan));
----------------
bmahjour wrote:
> not sure if this is already in your todo list...but apart from "preferring" to predicate, we need to check that the predication is legal for the target platform. This should probably be done in `isScalarWithPredication` with calls similar to `isLegalMaskedLoad`.
`preferPredicatedWiden` takes into account `FoldTailByMasking` and `TTI.hasActiveVectorLength()`. `TTI.hasActiveVectorLength()` checks whether the EVL based predication is legal for the target or not.

For this initial patch, legality checker and cost model are purposely avoided and the explicit command line option is used to 1) keep things simple and 2) demonstrate different approaches. 

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9712
+
+  Value *RuntimeVLExt = Builder.CreateZExt(RuntimeVL, Remaining->getType());
+  Value *EVL =
----------------
bmahjour wrote:
> can we try to initialize RuntimeVL with the right type, to avoid having to zero extend it here?
The core issue here is that the Vectorizer sort of implicitly uses i32 for everything related to `VF` and Vector Lenght. `%evl` parameter in VP intrinsics is also of type i32. 
`TC`, `BTC` and `IV` are however i64.  So whenever we mix these two we end up having    extends and truncates. 

That being said, in this particular case, it should be possible to initialize `MinVF` and thus `RuntimeVL` to i64 if we are going to use it with `TC` and `IV` and i32 otherwise. 

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9714
+  Value *EVL =
+      Builder.CreateBinaryIntrinsic(Intrinsic::umin, RuntimeVLExt, Remaining);
+  return Builder.CreateTrunc(EVL, Builder.getInt32Ty());
----------------
bmahjour wrote:
> you can avoid calling the intrinsic, by just doing a `cmp` and a `select`. (ie check that Remaining is less than RuntimeVL and if so pick Ramining, otherwise select RuntimeVL).
I may be wrong here but from what I understand, patches [[ https://reviews.llvm.org/D81829 | D81829 ]] and [[ https://reviews.llvm.org/D84125 | D84125 ]] recently introduces these intrinsics to avoid these IR patterns. 
I am not sure if their use is discouraged for some reason.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9715
+      Builder.CreateBinaryIntrinsic(Intrinsic::umin, RuntimeVLExt, Remaining);
+  return Builder.CreateTrunc(EVL, Builder.getInt32Ty());
+}
----------------
bmahjour wrote:
> Can we avoid truncating to 32-bit on 64-bit targets that take in 64-bit length? I think the type should be the same as the IV.
See previous related comment. %evl parameter in VP intrinsics is of type i32.

================
Comment at: llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h:74
+  /// predicated recipe takes mandatory mask and EVL VPInstructions.
+  VPRecipeBase *tryToPredicatedWidenMemory(Instruction *I, VFRange &Range,
+                                           VPlanPtr &Plan);
----------------
bmahjour wrote:
> rename to `tryToPredicateWidenMemory` or `tryToWidenPredicatedMemory`
Not sure I understand your reasoning here.

My understanding is that we are trying to do predicated widening of memory instruction. 
`tryToPredicateWidenMemory` would imply that we are trying to predicate and/or widen memory instruction, whereas `tryToWidenPredicatedMemory` would imply we are trying to widen predicated memory instruction, which might not be the case (VP intrinsics can be used to widen non predicated instructions too).

Also, it is consistent with wording in other places (`tryToPredicatedWiden`, `VPPredicatedWidenRecipe`, `PREDICATED-WIDEN`, etc.). If we change here, it should be changed everywhere too.

If others also think that the current wording should be changed everywhere, I am open to changing it.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:1144
+    raw_ostream &O, const Twine &Indent, VPSlotTracker &SlotTracker) const {
+  O << Indent << "PREDICATED-WIDEN ";
+
----------------
bmahjour wrote:
> "PREDICATED-WIDEN-MEMORY" to distinguish from VPPredicatedWidenRecipe
"PREDICATED-WIDEN" is consistent with "WIDEN" (and not "WIDEN-MEMORY") used for `VPWidenMemoryInstructionRecipe`.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:1152
+
+  printOperands(O, SlotTracker);
+}
----------------
bmahjour wrote:
> I suppose the EVL is not one of the operands, right? If so, I think printing the EVL here would be useful too.
It is appended as one of the operands and is printed correctly here. 
See the `vplan-vp-intrinsics.ll` test case for how the printed VPlan looks.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99750/new/

https://reviews.llvm.org/D99750