[PATCH] D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic

Mon Nov 15 03:34:19 PST 2021

david-arm added inline comments.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll:397
+; CHECK-ORDERED: vector.body:
+; CHECK-ORDERED: [[VEC_PHI:%.*]] = phi float [ 0.000000e+00, %vector.ph ], [ [[RDX2:%.*]], %vector.body ]
+; CHECK-ORDERED: [[WIDE_LOAD:%.*]] = load <vscale x 8 x float>, <vscale x 8 x float>*
----------------
Hi @RosieSumpter, I think this CHECK line should be:

  ; CHECK-ORDERED: [[VEC_PHI:%.*]] = phi float [ 0.000000e+00, %vector.ph ], [ [[RDX3:%.*]], %vector.body ]

and then lower down you should have

  ; CHECK-ORDERED: [[RDX3]] = call float @llvm.vector.reduce.fadd.nxv8f32(float [[RDX2]], <vscale x 8 x float> [[FMUL3]])

This ensures that the return value from the final intrinsic call ends up as the incoming value for the PHI. At the moment you have `[[RDX3:%.*]]` in both cases so in theory they could be different.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll:477
+; CHECK-ORDERED: vector.body:
+; CHECK-ORDERED: [[VEC_PHI:%.*]] = phi float [ 0.000000e+00, %vector.ph ], [ [[RDX2:%.*]], %vector.body ]
+; CHECK-ORDERED: [[WIDE_LOAD:%.*]] = load <vscale x 8 x float>, <vscale x 8 x float>*
----------------
I think we should be checking for `[[RDX3:%.*]]` here.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll:493
+; CHECK-ORDERED: [[RDX2:%.*]] = call nnan float @llvm.vector.reduce.fadd.nxv8f32(float [[RDX1]], <vscale x 8 x float> [[FMUL2]])
+; CHECK-ORDERED: [[RDX3:%.*]] = call nnan float @llvm.vector.reduce.fadd.nxv8f32(float [[RDX2]], <vscale x 8 x float> [[FMUL3]])
+; CHECK-ORDERED: for.end
----------------
Again, you should be able to use `[[RDX3]] = call ...` here

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll:954
+; CHECK-ORDERED: [[RDX2:%.*]] = call float @llvm.vector.reduce.fadd.v8f32(float [[RDX1]], <8 x float> [[FMUL2]])
+; CHECK-ORDERED: [[RDX3:%.*]] = call float @llvm.vector.reduce.fadd.v8f32(float [[RDX2]], <8 x float> [[FMUL3]])
+; CHECK-ORDERED: for.body:
----------------
Again, you should be able to use `[[RDX3]] = call ...` here

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll:1013
+; CHECK-ORDERED: vector.body:
+; CHECK-ORDERED: [[VEC_PHI:%.*]] = phi float [ 0.000000e+00, %vector.ph ], [ [[RDX2:%.*]], %vector.body ]
+; CHECK-ORDERED: [[LOAD:%.*]] = load float, float*
----------------
`[[RDX3:%.*]]`

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll:1029
+; CHECK-ORDERED: [[FADD2:%.*]] = fadd float [[FADD1]], [[FMUL2]]
+; CHECK-ORDERED: [[FADD3:%.*]] = fadd float [[FADD2]], [[FMUL3]]
+; CHECK-ORDERED-NOT: @llvm.vector.reduce.fadd
----------------
`[[RDX3]] = fadd ...`

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll:1100
+; which isn't vectorized.
+define float @fmuladd_phi_is_mul_operand(float* %a, float* %b, i64 %n) {
+; CHECK-ORDERED-LABEL: @fmuladd_phi_is_mul_operand
----------------
Is it worth also having a negative test for the case when the PHI is both a mul operand and the add operand too?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111555/new/

https://reviews.llvm.org/D111555