[PATCH] D84451: [LV] Tail folded inloop reductions.

Wed Oct 7 02:25:12 PDT 2020

SjoerdMeijer added a comment.

Just some minor questions inline.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3996
   // a Select choosing between the vectorized LoopExitInst and vectorized Phi,
   // instead of the former.
+  if (Cost->foldTailByMasking() && !IsInLoopReductionPhi) {
----------------
nit: perhaps a comment about the in-loop reductions.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:157
+         (It->getVPRecipeID() == VPRecipeBase::VPWidenPHISC ||
+          It->getVPRecipeID() == VPRecipeBase::VPWidenIntOrFpInductionSC ||
+          It->getVPRecipeID() == VPRecipeBase::VPPredInstPHISC ||
----------------
Is that a Phi?

================
Comment at: llvm/test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll:19
 ; CHECK-NEXT:    [[TMP0:%.*]] = add i32 [[INDEX]], 0
+; CHECK-NEXT:    [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw nsw i32 [[TMP0]], 1
----------------
Not important, but just out of curiousity, why is this moved up?

================
Comment at: llvm/test/Transforms/LoopVectorize/ARM/reduction-inloop-pred.ll:438
 ; CHECK-NEXT:    [[INDUCTION:%.*]] = or <4 x i64> [[BROADCAST_SPLAT]], <i64 0, i64 1, i64 2, i64 3>
-; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP1:%.*]] = icmp ult <4 x i64> [[INDUCTION]], <i64 257, i64 257, i64 257, i64 257>
-; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i32* [[TMP0]] to <4 x i32>*
-; CHECK-NEXT:    [[WIDE_MASKED_LOAD:%.*]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* [[TMP2]], i32 4, <4 x i1> [[TMP1]], <4 x i32> undef)
+; CHECK-NEXT:    [[TMP0:%.*]] = icmp ult <4 x i64> [[INDUCTION]], <i64 257, i64 257, i64 257, i64 257>
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 [[INDEX]]
----------------
Maye a bit off topic for this patch, but are we tail-predicating this loop for MVE? Could we do that? Reason I am asking is that I am looking at this icmp of the induction and the BTC, for which we could emit get.active.lane.mask, so we would get the tail-predication?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84451/new/

https://reviews.llvm.org/D84451