[llvm] [TTI][AArch64] Detect OperandInfo from scalable splats. (PR #122469)

Mon Jan 13 01:18:41 PST 2025

================
@@ -260,22 +260,22 @@ define void @udiv_uniformconst() {
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = udiv <16 x i8> undef, splat (i8 7)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: %V32i8 = udiv <32 x i8> undef, splat (i8 7)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: %V64i8 = udiv <64 x i8> undef, splat (i8 7)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %NV2i64 = udiv <vscale x 2 x i64> undef, splat (i64 7)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %NV2i64 = udiv <vscale x 2 x i64> undef, splat (i64 7)
----------------
davemgreen wrote:

Hi - yeah I agree it is a bit odd, the costs are not super accurate so far. It is hard to find a very accurate cost for something that depends on the input data, but they should be improved in future patches.

The cost of a udiv/sdiv should not be 1 (or 2), which I think some of these are falling back to, neither for scalar or for sve vectors. The divides use an iterative algorithm that takes multiple cycles to complete and block other operations until they finish. So whilst the codesize cost can be 1, the recip-throughput and latency should be higher.

I believe the expansion is still better than using udiv/sdiv instructions (from experiments). These costs look better in #122236, and the others should be brought into line in future patches.

https://github.com/llvm/llvm-project/pull/122469