[PATCH] D84451: [LV] Tail folded inloop reductions.

Thu Oct 8 03:27:37 PDT 2020

dmgreen added a comment.

Thanks for taking a look.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:157
+         (It->getVPRecipeID() == VPRecipeBase::VPWidenPHISC ||
+          It->getVPRecipeID() == VPRecipeBase::VPWidenIntOrFpInductionSC ||
+          It->getVPRecipeID() == VPRecipeBase::VPPredInstPHISC ||
----------------
SjoerdMeijer wrote:
> Is that a Phi?
A VPWidenIntOrFpInductionSC? It looks like an induction variable, which I think will be a PHI, yeah.

================
Comment at: llvm/test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll:19
 ; CHECK-NEXT:    [[TMP0:%.*]] = add i32 [[INDEX]], 0
+; CHECK-NEXT:    [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw nsw i32 [[TMP0]], 1
----------------
SjoerdMeijer wrote:
> Not important, but just out of curiousity, why is this moved up?
These all moved up because we now create the blocks predicates after the phi recipes, to ensure they are before all the potentially predicated instructions that will need to use them.

================
Comment at: llvm/test/Transforms/LoopVectorize/ARM/reduction-inloop-pred.ll:438
 ; CHECK-NEXT:    [[INDUCTION:%.*]] = or <4 x i64> [[BROADCAST_SPLAT]], <i64 0, i64 1, i64 2, i64 3>
-; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP1:%.*]] = icmp ult <4 x i64> [[INDUCTION]], <i64 257, i64 257, i64 257, i64 257>
-; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i32* [[TMP0]] to <4 x i32>*
-; CHECK-NEXT:    [[WIDE_MASKED_LOAD:%.*]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* [[TMP2]], i32 4, <4 x i1> [[TMP1]], <4 x i32> undef)
+; CHECK-NEXT:    [[TMP0:%.*]] = icmp ult <4 x i64> [[INDUCTION]], <i64 257, i64 257, i64 257, i64 257>
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 [[INDEX]]
----------------
SjoerdMeijer wrote:
> Maye a bit off topic for this patch, but are we tail-predicating this loop for MVE? Could we do that? Reason I am asking is that I am looking at this icmp of the induction and the BTC, for which we could emit get.active.lane.mask, so we would get the tail-predication?
I think this test just needed a rebase. 

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84451/new/

https://reviews.llvm.org/D84451