[PATCH] D99750: [LV, VP]VP intrinsics support for the Loop Vectorizer

Mon Oct 2 13:05:45 PDT 2023

fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8112
   // When not folding the tail, use nullptr to model all-true mask.
-  if (!CM.foldTailByMasking()) {
+  if (!CM.foldTailByMasking() || CM.useVPIVectorization()) {
     BlockMaskCache[Header] = nullptr;
----------------
ABataev wrote:
> fhahn wrote:
> > ABataev wrote:
> > > fhahn wrote:
> > > > Better to replace the mask together with introducing EVL to make sure EVL gets added when the mask gets removed?
> > > Currently it will require some extra work. We'll need to handle both cases, with activelane instrnsics and direct comparison. Would be possible to keep it for now and fix it once you land emission of activelane intrinsic in VPlan-toVPlan transform?
> > With the latest version, can the `useVPWithVPEVLVectorization` part be dropped (if the transform is updated to remove the mask from load/stores)?
> Not quite, it will require an extra VPValue, something like VPAllTrueMask, which should replace IV <= BTC. Shall I add it?
Would a live in `i1 true` work? I think that may work as is. As EVL is only used for lowering of loads/stores at the moment, it should be only removed there for now?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:832
+      continue;
+    auto *NewInst =
+        new VPInstruction(VPInstruction::ExplicitVectorLengthIVIncrement,
----------------
ABataev wrote:
> fhahn wrote:
> > ABataev wrote:
> > > fhahn wrote:
> > > > I think turning the step of the canonical induction non-loop-invariant technically turns the canonical IV into a phi that's not a canonical IV any more (which is guaranteed to step the same amount each iteration). Would it work to keep the increment unchanged and keep rounding up the trip count was with regular tail folding initially? Further down the line, the canonical IV issue may be resolved by also replacing the canonical IV node with a regular scalar phi when doing the replacement here.
> > > I'll try to improve this.
> > Did you get a chance to try this out yet? 
> > 
> > 97687b7aea17 landed, it would probably be good to also remove the header mask from load/store recipes here, to make clear that this optimizes the tail-folded loop?
> Already did. The loop is countable, adding a new phi won't give anything, just some extra work without any effect.
Oh right I missed that, sorry! 

Does the latest version actually have to update the canonical IV increment? 

I might be missing something, but shouldn't the exit condition now use the rounded up version (a multiple of the VF) of the trip count for the compare, so if we increment by EVL then the we might not reach the exit condition?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99750/new/

https://reviews.llvm.org/D99750