[clang] [llvm] [AMDGPU] Add global_load_tr for GFX12 (PR #77772)
Changpeng Fang via llvm-commits
llvm-commits at lists.llvm.org
Fri Jan 12 11:48:45 PST 2024
================
@@ -2496,6 +2496,26 @@ def int_amdgcn_flat_atomic_fmax_num : AMDGPUAtomicRtn<llvm_anyfloat_ty>;
def int_amdgcn_global_atomic_fmin_num : AMDGPUAtomicRtn<llvm_anyfloat_ty>;
def int_amdgcn_global_atomic_fmax_num : AMDGPUAtomicRtn<llvm_anyfloat_ty>;
+class AMDGPUGlobalLoadTr<LLVMType data_ty> :
+ Intrinsic<
+ [data_ty],
+ [global_ptr_ty],
+ [IntrReadMem, IntrWillReturn, IntrConvergent, NoCapture<ArgIndex<0>>, IntrNoCallback, IntrNoFree],
+ "",
+ [SDNPMemOperand]
+ >;
+
+// Wave32
+// <2 x i32> @llvm.amdgcn.global.load.tr.v2i32(ptr addrspace(1)) -> global_load_tr_b64
+// <8 x i16> @llvm.amdgcn.global.load.tr.v8i16(ptr addrspace(1)) -> global_load_tr_b128
+// <8 x half> @llvm.amdgcn.global.load.tr.v8f16(ptr addrspace(1)) -> global_load_tr_b128
----------------
changpeng wrote:
global_load_tr_b128 transposes to vector of b16. Do we really need to enumerate every possible types (i16, f16)? In that case, we may also need to consider bf16.
https://github.com/llvm/llvm-project/pull/77772
More information about the llvm-commits
mailing list