[PATCH] D95245: [SVE] Add support for scalable vectorization of loops with int/fast FP reductions

Mon Jan 25 01:37:05 PST 2021

david-arm added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1103
                                            TTI::ReductionFlags Flags) const {
+  if (isa<ScalableVectorType>(Ty))
+    return true;
----------------
Just a thought - if we're excluding FMul from reductions is it worth having an assert here that the op is not fmul?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1525
+        LLVM_DEBUG(
+            dbgs() << "LV: Not vectorizing. Found invalid reduction type.\n");
+        return false;
----------------
It might be worth printing out the recurrence kind here. Do we also want to emit a remark here to help the user understand why it failed to vectorise?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4695
       // first lane. Otherwise, we generate all VF values.
       unsigned Lanes =
           Cost->isUniformAfterVectorization(P, VF) ? 1 : VF.getKnownMinValue();
----------------
CarolineConcatto wrote:
> So once we start to use Scalable vector and we start to use the VF.getKnownMinValue(), shouldn't;t this be multiplied by getMaxVScale()?
This is for vectorise of induction variables. I think we'll have to use a runtime VF that I introduced in D95139 here. I don't think Kerry has to fix this in her patch.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6177

-  assert(LoopCost && "Non-zero loop cost expected");
+  assert(LoopCost.getValue() && "Non-zero loop cost expected");

----------------
CarolineConcatto wrote:
> I believe we can use LoopCost.isValid(), here!
I think since we're changing LoopCost to be InstructionCost we can change the line above too from

LoopCost = *expectedCost(VF).first.getValue();

to

LoopCost = expectedCost(VF).first;

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9457
+                         "vectorization is not possible.\n");
+    VectorizeLoop = false;
+    return false;
----------------
Similar to an earlier comment, a remark here would be good I think.

================
Comment at: llvm/test/Transforms/LoopVectorize/scalable_reductions.ll:1
+; RUN: opt < %s -loop-vectorize -transform-warning -mtriple aarch64-unknown-linux-gnu -mattr=+sve -debug-only=loop-vectorize -S 2>&1 | FileCheck %s -check-prefix=CHECK
+; RUN: opt < %s -loop-vectorize -transform-warning -mtriple aarch64-unknown-linux-gnu -mattr=+sve -S 2>&1 | FileCheck %s -check-prefix=CHECK-WARN
----------------
Needs a "REQUIRES: asserts" here I think because you're relying upon debug output. Also, since you're explicitly adding "-mattr=+sve" here I think you'll either have to:

1. Make the test generic work for all targets (this test will fail on some builds due to lack of AArch64 support), or
2. Move the test for LoopVectorize/AArch64

================
Comment at: llvm/test/Transforms/LoopVectorize/scalable_reductions.ll:16
+; CHECK: LV: Interleave Count is 2
+; CHECK: Setting best plan to VF=vscale x 8, UF=2
+define dso_local i32 @add(i32* nocapture %a, i32* nocapture readonly %b, i32 %n) {
----------------
I wonder if it's worth adding CHECK lines for the resulting IR to show we've vectorised the loop using reductions and checking we have the right structure, i.e. vector.body, middle.block, etc?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95245/new/

https://reviews.llvm.org/D95245