[llvm] [AArch64][CostModel] Improve cost estimate of scalarizing a vector di… (PR #118055)

Tue Dec 17 09:36:40 PST 2024

================
@@ -3472,6 +3472,20 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
           Cost *= 4;
         return Cost;
       } else {
+        // If the information about individual scalars being vectorized is
+        // available, this yeilds better cost estimation.
+        if (auto *VTy = dyn_cast<FixedVectorType>(Ty); VTy && !Args.empty()) {
+          InstructionCost InsertExtractCost =
+              ST->getVectorInsertExtractBaseCost();
+          Cost = (3 * InsertExtractCost) * VTy->getNumElements();
+          for (int i = 0, Sz = Args.size(); i < Sz; i += 2) {
+            Cost += getArithmeticInstrCost(
+                Opcode, VTy->getScalarType(), CostKind,
+                TTI::getOperandInfo(Args[i]), TTI::getOperandInfo(Args[i + 1]));
+          }
+          return Cost;
+        }
----------------
alexey-bataev wrote:

Then, you should not build the vector node at all. I would suggest instead that you check how the `getScalarsVectorizationState` function in SLP works, add a check for NEON division (maybe add a new entry in TTI to check if the vector operation is legal and won't be scalarized), and return that for NEON, it should build the `TreeEntry::NeedToGather` node.

https://github.com/llvm/llvm-project/pull/118055