[llvm] [AArch64][CostModel] Add constraints on which partial reductions are (PR #163728)
    Sander de Smalen via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Mon Oct 20 01:14:42 PDT 2025
    
    
  
================
@@ -5721,6 +5721,38 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost(
       return Cost;
   }
 
+  // FIXME:
+  // 1. Do cost modelling for USDOT.
+  // 2. Refactor the whole code here.
+  if (ST->isSVEorStreamingSVEAvailable() && !IsUSDot) {
+    if (AccumLT.second.getScalarType() == MVT::i32 &&
+        InputLT.second.getScalarType() == MVT::i16) {
+      // i16 -> i32 is supported in SVE 2.1
+      if (ST->hasSVE2p1())
+        return Cost;
+      // umlalt + umlalb. Same goes for signed types.
+      return Cost + 1;
+    }
+    if (AccumLT.second.getScalarType() == MVT::i64 &&
+        InputLT.second.getScalarType() == MVT::i32)
+      return Cost + 1;
+  }
+  if (AccumLT.second.isFixedLengthVector() && ST->isNeonAvailable() &&
+      ST->hasDotProd() && !IsUSDot) {
+    // umull + umull2 + (2 * uaddw) + (2 * uaddw2). Same goes for signed types.
+    if (AccumLT.second.getScalarType() == MVT::i64 &&
+        InputLT.second.getScalarType() == MVT::i16)
+      return Cost + 5;
----------------
sdesmalen-arm wrote:
This is not correct for targets with SVE2, which would use sdot. I'd also like us to avoid having to encoded every combination to this level of detail. The function is set up to return a cheap cost for all legal cases, and a higher cost for all illegal cases (that would require extends).
For targets where SVE(2) is not available, perhaps you can just return a higher default at the bottom of this function. But as it stands, the suggestion I made below (to return `Cost + 2`) causes all the tests to pass, so there is currently missing test-coverage for the (NEON only) case you're trying to support.
https://github.com/llvm/llvm-project/pull/163728
    
    
More information about the llvm-commits
mailing list