[llvm] [AArch64][CostModel] Improve cost estimate of scalarizing a vector di… (PR #118055)

Wed Dec 18 02:00:33 PST 2024

================
@@ -3472,6 +3472,20 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
           Cost *= 4;
         return Cost;
       } else {
+        // If the information about individual scalars being vectorized is
+        // available, this yeilds better cost estimation.
+        if (auto *VTy = dyn_cast<FixedVectorType>(Ty); VTy && !Args.empty()) {
+          InstructionCost InsertExtractCost =
+              ST->getVectorInsertExtractBaseCost();
+          Cost = (3 * InsertExtractCost) * VTy->getNumElements();
+          for (int i = 0, Sz = Args.size(); i < Sz; i += 2) {
+            Cost += getArithmeticInstrCost(
+                Opcode, VTy->getScalarType(), CostKind,
+                TTI::getOperandInfo(Args[i]), TTI::getOperandInfo(Args[i + 1]));
+          }
+          return Cost;
+        }
----------------
sushgokh wrote:

> Then, you should not build the vector node at all. I would suggest instead that you check how the `getScalarsVectorizationState` function in SLP works, add a check for NEON division (maybe add a new entry in TTI to check if the vector operation is legal and won't be scalarized), and return that for NEON, it should build the `TreeEntry::NeedToGather` node.

Two issues which I can think of with this approach:
1. The operands of the div operation wont get vectorized.
2. Lot of checks are in place that alter the div cost based on information about  its operands(e.g. if operands are uniform/constant/pow-of-2 etc.). Hence, factoring out code just to check if the div operation would be scalarized is difficult and would result in duplication of code.

https://github.com/llvm/llvm-project/pull/118055