[llvm] [AArch64][CostModel] Improve cost estimate of scalarizing a vector di… (PR #118055)

Mon Dec 16 21:37:02 PST 2024

================
@@ -3472,6 +3472,20 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
           Cost *= 4;
         return Cost;
       } else {
+        // If the information about individual scalars being vectorized is
+        // available, this yeilds better cost estimation.
+        if (auto *VTy = dyn_cast<FixedVectorType>(Ty); VTy && !Args.empty()) {
+          InstructionCost InsertExtractCost =
+              ST->getVectorInsertExtractBaseCost();
+          Cost = (3 * InsertExtractCost) * VTy->getNumElements();
+          for (int i = 0, Sz = Args.size(); i < Sz; i += 2) {
+            Cost += getArithmeticInstrCost(
+                Opcode, VTy->getScalarType(), CostKind,
+                TTI::getOperandInfo(Args[i]), TTI::getOperandInfo(Args[i + 1]));
+          }
+          return Cost;
+        }
----------------
sushgokh wrote:

> I don't understand what are you trying to do here. Why are you adding the cost of the scalar ops here? They are already counted in the cost model

AArch64 Neon does not support vector division natively. It scalarizes the vector division as can be seen [here](https://godbolt.org/z/EaxKsGe3b). 

Now coming to the cost modelling part: This code section is calculating the cost of scalarizing the vector division. Thus,
```
vector cost = 
for each lane(cost of extract for both div operands) + 
for each lane(cost of scalar division) + 
for each lane(cost of insert into the result vector) 
```
This vector cost was being over-calculated thus preferring scalar code. This patch just tries to rectify this.

https://github.com/llvm/llvm-project/pull/118055