[PATCH] D146033: [AArch64][TTI] Cost model FADD/FSUB

Fri Mar 17 06:18:44 PDT 2023

dmgreen added a comment.

I think this makes a lot of sense. Especially if we are treating many shuffles with a cost of 1, floating point operations should not be twice the cost. We could consider doing the same for fmul from looking at software optimization guides, but the changes for fadd/fsub already have a high likelihood of causing some large changes.  Adding fneg is worth it though as that should be a simple operation.

This might lead to a fair number of changes. From looking at the main regression I found for example, it was a case of identical code being produced by the loop vectorizer, but scalar and vector costs being closer lead it to have a higher minimum trip count for entering the vector body (from D109368 <https://reviews.llvm.org/D109368>). The real problem there is aliasing causing LD4 loads to not be recognized though, so it's tough to see how this change is really to blame. There are a number of improvements too, to make up for that regression.

@fhahn do you have any thoughts on the change from your side, or any benchmark results that could be helpful?

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2402
   case ISD::FSUB:
+    return LT.first;
   case ISD::FMUL:
----------------
It is probably worth adding ISD::FNEG, and we might need to be more careful about the types. FP128 shouldn't be cheap, for example. Possibly a higher cost with fp16/bf16 too, when they are not available?

================
Comment at: llvm/test/Transforms/SLPVectorizer/AArch64/matmul.ll:7
 ; This test is reduced from the matrix multiplication benchmark in the test-suite:
 ; https://github.com/llvm/llvm-test-suite/tree/main/SingleSource/Benchmarks/Misc/matmul_f64_4x4.c
 ; The operations here are expected to be vectorized to <2 x double>.
----------------
This source file, when compiled is not SLP vectorizing already, so I don't think things change in practice. It is using a number of llvm.fmuladd intrinsics when I compile it, which already seem to have a cost of 1.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146033/new/

https://reviews.llvm.org/D146033