[llvm] [AArch64][CostModel] Consider i32 --> i64 partial reduce cost as Invalid for FixedLength vectors (PR #165226)

David Sherwood via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 29 02:02:47 PDT 2025


================
@@ -5757,8 +5757,15 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost(
       return Cost;
   }
 
+  if (!ST->useSVEForFixedLengthVectors() &&
+      (AccumLT.second.isFixedLengthVector() && ST->isNeonAvailable() &&
+       ST->hasDotProd()) &&
+      (AccumLT.second.getScalarType() == MVT::i64 &&
+       InputLT.second.getScalarType() == MVT::i32))
+    return Invalid;
----------------
david-arm wrote:

I don't understand why you're returning 'Invalid' here when we can actually generate code for the intrinsic, i.e.

```
define <2 x i64> @foo(<2 x i64> %accum, ptr %p) {
  %load = load <4 x i32>, ptr %p
  %sext = sext <4 x i32> %load to <4 x i64>
  %res = call <2 x i64> @llvm.vector.partial.reduce.add.v2i64.v4i64(<2 x i64> %accum, <4 x i64> %sext)
  ret <2 x i64> %res
}

declare <2 x i64> @llvm.vector.partial.reduce.add.v2i64.v4i64(<2 x i64>, <4 x i64>)
```

gets lowered to

```
foo:
	ldr	q1, [x0]
	saddw	v0.2d, v0.2d, v1.2s
	saddw2	v0.2d, v0.2d, v1.4s
	ret
```

I think this just needs to have a cost that reflects what the generated code looks like.

https://github.com/llvm/llvm-project/pull/165226


More information about the llvm-commits mailing list