[Mlir-commits] [clang] [llvm] [mlir] [AMDGPU] Use a general form of intrinsic for tensor load/store (PR #182334)

Fri Feb 20 17:27:59 PST 2026

================
@@ -4194,41 +4194,24 @@ def int_amdgcn_swmmac_f16_16x16x128_bf8_bf8 : AMDGPUSWmmacIntrinsicIdxReuse<llvm
 def int_amdgcn_swmmac_i32_16x16x128_iu8     : AMDGPUSWmmacIntrinsicABIdxClamp<llvm_anyint_ty, llvm_anyint_ty, llvm_anyint_ty, llvm_anyint_ty>;
 }
 
-
 class AMDGPUTensorLoadStore:
   Intrinsic<
     [],
     [llvm_v4i32_ty, // D# group 0
      llvm_v8i32_ty, // D# group 1
-     llvm_v4i32_ty, // D# group 2
-     llvm_v4i32_ty, // D# group 3
+     llvm_v4i32_ty, // D# group 2: group 2 and 3 should be zero-initialized for D# up to 2D.
----------------
changpeng wrote:

> Should this accept type mangling to change the vector width?
I don't think so because every group has "fixed width" for existing and near-future hardware.  

https://github.com/llvm/llvm-project/pull/182334