[PATCH] D117580: [LoopVectorize] Support conditional in-loop vector reductions

Fri Feb 4 06:38:52 PST 2022

david-arm added a comment.

Hi @kmclaughlin, this looks a lot better, thanks for taking the time to address my previous comments and add the exhaustive set of tests! I just had some minor comments, mostly about the tests.

================
Comment at: llvm/lib/Analysis/IVDescriptors.cpp:1064
     ExpectedUses = 2;
+  unsigned ExtraPhiUses = 0;

----------------
nit: If you agree, it might be worth moving this variable down to the place it's used? i.e. just above

  if (auto ExitPhi = dyn_cast<PHINode>(LoopExitInstr)) {

================
Comment at: llvm/lib/Analysis/IVDescriptors.cpp:1105
+    Instruction *Chain = nullptr;
+    if (Inc0 && Inc0 == Phi)
+      Chain = Inc1;
----------------
nit: I think you can probably simplify this a little with something like this:

  if (Inc0 == Phi)
    Chain = Inc1;
  else if (Inc1 == Phi)
    Chain = Inc0;
  else
    return {};

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/scalable-reduction-inloop-cond.ll:26
+; CHECK-NEXT:    [[TMP8:%.*]] = bitcast float* [[TMP7]] to <vscale x 4 x float>*
+; CHECK-NEXT:    [[WIDE_MASKED_LOAD:%.*]] = call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0nxv4f32(<vscale x 4 x float>* [[TMP8]], i32 4, <vscale x 4 x i1> [[TMP6]], <vscale x 4 x float> zeroinitializer)
+; CHECK-NEXT:    [[TMP9]] = call fast float @llvm.vector.reduce.fadd.nxv4f32(float [[VEC_PHI]], <vscale x 4 x float> [[WIDE_MASKED_LOAD]])
----------------
I think this code looks right because the inactive lanes will be zero, which matches the identity value for an fadd. However, it might be clearer if you remove the `-dce -instcombine` flags so we can see the select? I assume the select has been folded away.

================
Comment at: llvm/test/Transforms/LoopVectorize/reduction-inloop-cond.ll:234
+
+define i32 @multiple_cond_ands(i32* noalias %A, i32* noalias %B, i32 %cond, i64 noundef %N) #0 {
+; CHECK-LABEL: @multiple_cond_ands(
----------------
Hi @kmclaughlin, it doesn't look there are multiple conditional and instructions in the loop?

================
Comment at: llvm/test/Transforms/LoopVectorize/reduction-inloop-cond.ll:518
+;
+define i64 @cond_and(i64* noalias nocapture readonly %a, i64* noalias nocapture readonly %b, i64* noalias nocapture readonly %cond, i64 %N){
+; CHECK-LABEL: @cond_and(
----------------
nit: Maybe a better name for this is something like `@nested_cond_and` to distinguish from the other cond_and test?

================
Comment at: llvm/test/Transforms/LoopVectorize/reduction-inloop-cond.ll:526
+; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
+; CHECK:       vector.body:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[PRED_LOAD_CONTINUE14:%.*]] ]
----------------
Maybe for this negative test you don't need the CHECK lines here and perhaps something like this is sufficient?

  ; CHECK: vector.body
  ; CHECK-NOT: llvm.vector.reduce.and.v4i64
  ; CHECK: middle.block
  ; CHECK: llvm.vector.reduce.and.v4i64
  ; CHECK: scalar.ph

================
Comment at: llvm/test/Transforms/LoopVectorize/reduction-inloop-cond.ll:689
+; Chain of conditional & unconditional reductions. We do not vectorise as the Phi (%rdx1) has more than
+; the expected number of uses.
+;
----------------
Perhaps worth adding a bit more here, i.e.

  ; Chain of conditional & unconditional reductions. We currently only support conditional reductions
  ; if they are the last in the chain, i.e. the loop exit instruction is a PHI node. Therefore, we reject the
  ; PHI (%rdx1) as it has more than one use.

Do you think that makes it a bit clearer?

================
Comment at: llvm/test/Transforms/LoopVectorize/reduction-inloop-cond.ll:696
+; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK:       vector.ph:
+; CHECK-NEXT:    [[N_VEC:%.*]] = and i64 [[N]], -4
----------------
SImilar to the comment in `@cond_and` I don't think you need all the CHECK lines here.

================
Comment at: llvm/test/Transforms/LoopVectorize/reduction-inloop-cond.ll:826
+; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK:       vector.ph:
+; CHECK-NEXT:    [[N_VEC:%.*]] = and i64 [[N]], -4
----------------
Same comment as `@cond_and` about the CHECK lines. :)

================
Comment at: llvm/test/Transforms/LoopVectorize/reduction-inloop-cond.ll:1007
+; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
+; CHECK:       vector.body:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[PRED_LOAD_CONTINUE7:%.*]] ]
----------------
Same comment as `@cond_and` for the CHECK lines.

================
Comment at: llvm/test/Transforms/LoopVectorize/reduction-inloop-cond.ll:1137
+; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
+; CHECK:       vector.body:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[PRED_LOAD_CONTINUE7:%.*]] ]
----------------
Same comment as `@cond_and` for the CHECK lines.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117580/new/

https://reviews.llvm.org/D117580