[llvm] [Analysis] Extend llvm.experimental.cttz.elts to type-based-cost (PR #184578)

Fri Mar 6 03:04:48 PST 2026

================
@@ -131,3 +194,65 @@ define void @foo_vscale_range_2_16() vscale_range(2,16) {
 
   ret void
 }
+
+define void @foo_fixed_len_vectors() {
+; CHECK-LABEL: 'foo_fixed_len_vectors'
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res.i32.v2i1.false = call i32 @llvm.experimental.cttz.elts.i32.v2i1(<2 x i1> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res.i32.v4i1.false = call i32 @llvm.experimental.cttz.elts.i32.v4i1(<4 x i1> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res.i32.v8i1.false = call i32 @llvm.experimental.cttz.elts.i32.v8i1(<8 x i1> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res.i32.v64i1.false = call i32 @llvm.experimental.cttz.elts.i32.v64i1(<64 x i1> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res.i32.v128i1.false = call i32 @llvm.experimental.cttz.elts.i32.v128i1(<128 x i1> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 663 for instruction: %res.i32.v1024i1.false = call i32 @llvm.experimental.cttz.elts.i32.v1024i1(<1024 x i1> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1319 for instruction: %res.i32.v2048i1.false = call i32 @llvm.experimental.cttz.elts.i32.v2048i1(<2048 x i1> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res.i32.v2i1.true = call i32 @llvm.experimental.cttz.elts.i32.v2i1(<2 x i1> undef, i1 true)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1319 for instruction: %res.i32.v2048i1.true = call i32 @llvm.experimental.cttz.elts.i32.v2048i1(<2048 x i1> undef, i1 true)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: %res.i32.v2i32 = call i32 @llvm.experimental.cttz.elts.i32.v2i32(<2 x i32> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 25 for instruction: %res.i32.v4i32 = call i32 @llvm.experimental.cttz.elts.i32.v4i32(<4 x i32> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 171 for instruction: %res.i32.v32i32 = call i32 @llvm.experimental.cttz.elts.i32.v32i32(<32 x i32> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: %res.i32.v2i33 = call i32 @llvm.experimental.cttz.elts.i32.v2i33(<2 x i33> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 25 for instruction: %res.i32.v4i33 = call i32 @llvm.experimental.cttz.elts.i32.v4i33(<4 x i33> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 170 for instruction: %res.i32.v32i33 = call i32 @llvm.experimental.cttz.elts.i32.v32i33(<32 x i33> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; TYPE-LABEL: 'foo_fixed_len_vectors'
+; TYPE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res.i32.v2i1.false = call i32 @llvm.experimental.cttz.elts.i32.v2i1(<2 x i1> undef, i1 false)
+; TYPE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res.i32.v4i1.false = call i32 @llvm.experimental.cttz.elts.i32.v4i1(<4 x i1> undef, i1 false)
+; TYPE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res.i32.v8i1.false = call i32 @llvm.experimental.cttz.elts.i32.v8i1(<8 x i1> undef, i1 false)
+; TYPE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res.i32.v64i1.false = call i32 @llvm.experimental.cttz.elts.i32.v64i1(<64 x i1> undef, i1 false)
+; TYPE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res.i32.v128i1.false = call i32 @llvm.experimental.cttz.elts.i32.v128i1(<128 x i1> undef, i1 false)
+; TYPE-NEXT:  Cost Model: Found an estimated cost of 663 for instruction: %res.i32.v1024i1.false = call i32 @llvm.experimental.cttz.elts.i32.v1024i1(<1024 x i1> undef, i1 false)
+; TYPE-NEXT:  Cost Model: Found an estimated cost of 1319 for instruction: %res.i32.v2048i1.false = call i32 @llvm.experimental.cttz.elts.i32.v2048i1(<2048 x i1> undef, i1 false)
+; TYPE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res.i32.v2i1.true = call i32 @llvm.experimental.cttz.elts.i32.v2i1(<2 x i1> undef, i1 true)
+; TYPE-NEXT:  Cost Model: Found an estimated cost of 1319 for instruction: %res.i32.v2048i1.true = call i32 @llvm.experimental.cttz.elts.i32.v2048i1(<2048 x i1> undef, i1 true)
+; TYPE-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: %res.i32.v2i32 = call i32 @llvm.experimental.cttz.elts.i32.v2i32(<2 x i32> undef, i1 false)
+; TYPE-NEXT:  Cost Model: Found an estimated cost of 25 for instruction: %res.i32.v4i32 = call i32 @llvm.experimental.cttz.elts.i32.v4i32(<4 x i32> undef, i1 false)
+; TYPE-NEXT:  Cost Model: Found an estimated cost of 171 for instruction: %res.i32.v32i32 = call i32 @llvm.experimental.cttz.elts.i32.v32i32(<32 x i32> undef, i1 false)
----------------
prados-oc wrote:

> We can codegen fixed length llvm.experimental.cttz.elts intrinsics, I think we should probably fix that first rather than work around it

This patch is a bit more generic in that it models the general lowering sequence if this intrinsic is expanded as determined by TLI.shouldExpandCttzElements().

For example before this patch it would return an InvalidCost for `@llvm.experimental.cttz.elts.i32.nxv8i32(<vscale x 8 x i32> %a, i1 false)` even though it's perfectly lowerable. Now it returns a cost of 20.

I feel like having this generic patch as the baseline and having architecture specific tweaks in following patches is reasonable. Let me know what you think @lukel97 

https://github.com/llvm/llvm-project/pull/184578