[llvm] [AArch64][CostModel] Improve cost estimate of scalarizing a vector di… (PR #118055)

Thu Dec 19 21:56:24 PST 2024

================
@@ -3472,6 +3472,20 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
           Cost *= 4;
         return Cost;
       } else {
+        // If the information about individual scalars being vectorized is
+        // available, this yeilds better cost estimation.
+        if (auto *VTy = dyn_cast<FixedVectorType>(Ty); VTy && !Args.empty()) {
+          InstructionCost InsertExtractCost =
+              ST->getVectorInsertExtractBaseCost();
+          Cost = (3 * InsertExtractCost) * VTy->getNumElements();
+          for (int i = 0, Sz = Args.size(); i < Sz; i += 2) {
+            Cost += getArithmeticInstrCost(
+                Opcode, VTy->getScalarType(), CostKind,
+                TTI::getOperandInfo(Args[i]), TTI::getOperandInfo(Args[i + 1]));
+          }
+          return Cost;
+        }
----------------
sushgokh wrote:

> Otherwise, it complicates the whole vectorization process and makes it miss some possible vectorization opportunities. I assume, you missed the cost of extraction of the operands in your cost model changes?

No, in fact the patch is accurately calculating the cost of:
```
for every lane: 2 extracts and 1 div and 1 insert
See
   Cost = (3 * InsertExtractCost) * VTy->getNumElements();   // Insert/Extract cost is same
```
So, no way it is missing out on opportunities and by no means its complicating the process.

https://github.com/llvm/llvm-project/pull/118055