[PATCH] D101836: [LoopVectorize] Enable strict reductions when allowReordering() returns false

Mon May 17 06:04:37 PDT 2021

kmclaughlin added inline comments.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll:7
 define float @fadd_strict(float* noalias nocapture readonly %a, i64 %n) {
-; CHECK-LABEL: @fadd_strict
-; CHECK: vector.body:
-; CHECK: %[[VEC_PHI:.*]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX:.*]], %vector.body ]
-; CHECK: %[[LOAD:.*]] = load <vscale x 8 x float>, <vscale x 8 x float>*
-; CHECK: %[[RDX]] = call float @llvm.vector.reduce.fadd.nxv8f32(float %[[VEC_PHI]], <vscale x 8 x float> %[[LOAD]])
-; CHECK: for.end
-; CHECK: %[[PHI:.*]] = phi float [ %[[SCALAR:.*]], %for.body ], [ %[[RDX]], %middle.block ]
-; CHECK: ret float %[[PHI]]
+; CHECK-VF8UF1-LABEL: @fadd_strict
+; CHECK-VF8UF1: vector.body:
----------------
sdesmalen wrote:
> Should all test functions have check lines for all VF8UF1, VF8UF4, etc. ? Conversely, is it sufficient to just pass the interleave-count hint (not the vector width) via metadata and have 1 RUN line for VF8UF1, VF8UF4, VF4UF1?
> 
> Which also makes me wonder, what is the additional value of having both VF8UF1 and VF4UF1 ?
I think it should be sufficient to pass the interleave count via metadata. I've changed VF4UF1 to VF8UF1 as there was no additional benefit in having both, similarly I've changed VF4UF1 in the strict-fadd.ll test as well to reduce the number of RUN lines.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101836/new/

https://reviews.llvm.org/D101836