[PATCH] D150336: [LV][AArch64] Disable maximising bandwidth for streaming compatible sve

Fri May 12 01:01:12 PDT 2023

david-arm added inline comments.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/streaming-compatible-sve-no-maximize-bandwidth.ll:1
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes=loop-vectorize -debug-only=loop-vectorize -force-streaming-compatible-sve -mattr=+sve -scalable-vectorization=off -aarch64-sve-vector-bits-min=128 -S 2>&1 | FileCheck %s --check-prefix=SC_SVE
----------------
I don't think you should add this `NOTE` if you are manually deleting the CHECK lines for prefix NO_SC_SVE.

Also, in the RUN lines below you've added the flag `-debug-only=loop-vectorize`, but this requires an assert build. It looks like you only want the debug output to check the value of VF chosen. I think you have two choices here:

1. Add an extra `; REQUIRES: asserts` line here so the tests only run with debug builds, or
2. Remove the `-debug-only=loop-vectorize` and the `Selecting VF` CHECK lines. If you autogenerate the vectorised IR for both RUN lines then you are automatically testing the VF anyway because the output IR will either contain `<2 x i32>` or `<8 x i32>`.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/streaming-compatible-sve-no-maximize-bandwidth.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes=loop-vectorize -debug-only=loop-vectorize -force-streaming-compatible-sve -mattr=+sve -scalable-vectorization=off -aarch64-sve-vector-bits-min=128 -S 2>&1 | FileCheck %s --check-prefix=SC_SVE
+; RUN: opt < %s -passes=loop-vectorize -debug-only=loop-vectorize -mattr=+sve -scalable-vectorization=off -aarch64-sve-vector-bits-min=128 -S -disable-output 2>&1 | FileCheck %s --check-prefix=NO_SC_SVE
----------------
Hi @dtemirbulatov, perhaps instead of specifying `-aarch64-sve-vector-bits-min=128` you can just add a `vscale_range(1,16)` attribute to the function instead? For example,

  define void @foo() vscale_range(1,16)  {

Or perhaps even better you can also remove the `-mattr=+sve` flag by doing something like

  ; RUN: opt < %s -passes=loop-vectorize -debug-only=loop-vectorize -force-streaming-compatible-sve -scalable-vectorization=off -S 2>&1 | FileCheck %s --check-prefix=SC_SVE
  ...

  define void @foo() #0  {

  ...

  attributes #0 = { "target-features"="+sve" vscale_range(1,16) }

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/streaming-compatible-sve-no-maximize-bandwidth.ll:10
+
+define void @fxpAutoCorrelation() {
+; SC_SVE-LABEL: @fxpAutoCorrelation(
----------------
Can you rename this function to something else please? It looks like it came from an existing program. This is just a suggestion, but you could call it `reduc_max_bandwidth` or something like that?

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/streaming-compatible-sve-no-maximize-bandwidth.ll:16
+; SC_SVE-NEXT:    br label [[VECTOR_BODY:%.*]]
+; SC_SVE:       vector.body:
+; SC_SVE-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
----------------
The vectorised IR here doesn't match the scalar IR in the test. Can you decide what IR you actually need in the function in order to defend the change in this patch? For example, it looks like the load, sext and mul are all unnecessary for the test to work. I'm just a bit worried about the test being a bit fragile.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/streaming-compatible-sve-no-maximize-bandwidth.ll:56
+  %0 = load i16, ptr null, align 2
+  %conv10 = sext i16 0 to i32
+  %mul = mul i32 %conv10, 0
----------------
Hi @dtemirbulatov, the IR here looks wrong to me. We're sign-extending the constant value `i16 0` here. Did you mean this instead?

  %0 = load i16, ptr null, align 2
  %conv10 = sext i16 %0 to i32

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/streaming-compatible-sve-no-maximize-bandwidth.ll:57
+  %conv10 = sext i16 0 to i32
+  %mul = mul i32 %conv10, 0
+  %add12 = or i32 0, %Accumulator.032
----------------
We're multiplying by 0 and orring with 0 below too. Is this right?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150336/new/

https://reviews.llvm.org/D150336