[PATCH] D117578: [LoopVectorize] Test in-loop reductions with tail folding for scalable vectors

Wed Jan 19 01:44:17 PST 2022

david-arm accepted this revision.
david-arm added a comment.
This revision is now accepted and ready to land.

LGTM!

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll:687
+; CHECK-NEXT:    [[TMP8:%.*]] = add i64 [[INDEX]], 0
+; CHECK-NEXT:    [[TMP9:%.*]] = icmp ule <vscale x 4 x i64> [[INDUCTION]], [[BROADCAST_SPLAT]]
+; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[COND:%.*]], i64 [[TMP8]]
----------------
nit: I think you'll need to rebase the patch before merging because we now use the get.active.lane.mask intrinsic to generate the loop predicate.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll:709
+; CHECK:       middle.block:
+; CHECK-NEXT:    [[TMP25:%.*]] = call i32 @llvm.vector.reduce.xor.nxv4i32(<vscale x 4 x i32> [[TMP21]])
+; CHECK-NEXT:    br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
----------------
I think the reason why we don't use in-loop reductions here is because it's a conditional reduction, right?

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll:711
+; CHECK-NEXT:    br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
+; CHECK:       scalar.ph:
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
----------------
nit: I think you can just delete everything from this line onwards because we'll never reach the scalar tail anyway.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117578/new/

https://reviews.llvm.org/D117578