[llvm] [clang] [AMDGPU] Add global_load_tr for GFX12 (PR #77772)

Piotr Sobczak via cfe-commits cfe-commits at lists.llvm.org
Mon Jan 15 02:47:36 PST 2024


================
@@ -2496,6 +2496,26 @@ def int_amdgcn_flat_atomic_fmax_num   : AMDGPUAtomicRtn<llvm_anyfloat_ty>;
 def int_amdgcn_global_atomic_fmin_num : AMDGPUAtomicRtn<llvm_anyfloat_ty>;
 def int_amdgcn_global_atomic_fmax_num : AMDGPUAtomicRtn<llvm_anyfloat_ty>;
 
+class AMDGPUGlobalLoadTr<LLVMType data_ty> :
+  Intrinsic<
+    [data_ty],
+    [global_ptr_ty],
+    [IntrReadMem, IntrWillReturn, IntrConvergent, NoCapture<ArgIndex<0>>, IntrNoCallback, IntrNoFree],
+    "",
+    [SDNPMemOperand]
+  >;
+
+// Wave32
+// <2 x i32>  @llvm.amdgcn.global.load.tr.v2i32(ptr addrspace(1)) -> global_load_tr_b64
+// <8 x i16>  @llvm.amdgcn.global.load.tr.v8i16(ptr addrspace(1)) -> global_load_tr_b128
+// <8 x half> @llvm.amdgcn.global.load.tr.v8f16(ptr addrspace(1)) -> global_load_tr_b128
----------------
piotrAMD wrote:

Here is just a comment to say how the intrinsic could be used, but the right patterns need to be added. I added an extra one for bf16 in the most recent update.

https://github.com/llvm/llvm-project/pull/77772


More information about the cfe-commits mailing list