[llvm] [clang] [AMDGPU] Add global_load_tr for GFX12 (PR #77772)
Piotr Sobczak via cfe-commits
cfe-commits at lists.llvm.org
Fri Jan 12 04:52:23 PST 2024
================
@@ -18178,6 +18178,51 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy});
return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1});
}
+ case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_v2i32:
+ case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32:
+ case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v8i16:
+ case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v8f16:
+ case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4i16:
+ case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4f16: {
+
+ Intrinsic::ID IID;
+ llvm::Type *ArgTy;
+ switch (BuiltinID) {
+ case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_v2i32:
+ ArgTy = llvm::FixedVectorType::get(
+ llvm::Type::getInt32Ty(getLLVMContext()), 2);
+ IID = Intrinsic::amdgcn_global_load_tr_b64;
----------------
piotrAMD wrote:
Initially I thought it was better to have _b64/_b128 explicit to avoid confusion as the number of bits loaded depends also on wave size. On the second thought, I believe that having just one intrinsic would be cleaner - will make an update.
https://github.com/llvm/llvm-project/pull/77772
More information about the cfe-commits
mailing list