[clang] [llvm] [mlir] [AArch64][SME] Improve codegen for aarch64.sme.cnts* when not in streaming mode (PR #154761)

Fri Sep 5 04:49:54 PDT 2025

================
@@ -822,16 +822,18 @@ struct OuterProductWideningOpConversion
   }
 };
 
-/// Lower `arm_sme.streaming_vl` to SME CNTS intrinsics.
+/// Lower `arm_sme.streaming_vl` to SME CNTSD intrinsic.
 ///
 /// Example:
 ///
 ///   %0 = arm_sme.streaming_vl <half>
 ///
 /// is converted to:
 ///
-///   %cnt = "arm_sme.intr.cntsh"() : () -> i64
-///   %0 = arith.index_cast %cnt : i64 to index
+///   %cnt = "arm_sme.intr.cntsd"() : () -> i64
+///   %0 = arith.constant 4 : i64
+///   %1 = arith.muli %cnt, %0 : i64
+///   %2 = arith.index_cast %1 : i64 to index
----------------
MacDue wrote:

```suggestion
///   %scale = arith.constant 4 : index
///   %cntIndex = arith.index_cast %cnt : i64 to index
///   %0 = arith.muli %cntIndex, %scale : index
```

https://github.com/llvm/llvm-project/pull/154761