[clang] [llvm] [mlir] [AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (PR #171069)

Tue Dec 16 06:07:35 PST 2025

================
@@ -856,7 +856,7 @@ TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x64_fp8_fp8, "V8hV8iV8iIsV8hIbIb",
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x64_fp8_bf8, "V8hV8iV8iIsV8hIbIb", "nc", "gfx1250-insts,wavefrontsize32")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x64_bf8_fp8, "V8hV8iV8iIsV8hIbIb", "nc", "gfx1250-insts,wavefrontsize32")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x64_bf8_bf8, "V8hV8iV8iIsV8hIbIb", "nc", "gfx1250-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x64_iu8, "V8iIbV8iIbV8iV8iIbIb", "nc", "gfx1250-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x64_iu8, "V8iIbV8iIbV8iV8iIbIb.", "nc", "gfx1250-insts,wavefrontsize32")
----------------
shiltian wrote:

Alternatively, you can add a builtin `__builtin_amdgcn_wmma_i32_16x16x64_iu8_clamp` as well as a corresponding intrinsic `llvm.amdgcn.wmma.i32.16x16x64.iu8.v8i32.v8i32.clamp` dedicated for the case where clamp is enabled. I saw some other code like that.

https://github.com/llvm/llvm-project/pull/171069