[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

Shilei Tian via cfe-commits cfe-commits at lists.llvm.org
Wed Jun 12 21:26:17 PDT 2024


================
@@ -0,0 +1,95 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu verde -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu tonga -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu gfx1100 -emit-llvm -o - %s | FileCheck %s
+
+// CHECK-LABEL: @test_amdgcn_make_buffer_rsrc_p0(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr [[P:%.*]], i16 [[STRIDE:%.*]], i32 [[NUM:%.*]], i32 [[FLAGS:%.*]])
+// CHECK-NEXT:    ret ptr addrspace(8) [[TMP0]]
+//
+__buffer_rsrc_t test_amdgcn_make_buffer_rsrc_p0(void *p, short stride, int num, int flags) {
+  return __builtin_amdgcn_make_buffer_rsrc(p, stride, num, flags);
----------------
shiltian wrote:

For example, we have the following code:

```
void test_amdgcn_buffer_rsrc_t_assignment(void *p, short stride, int num, int flags, char c) {
  __buffer_rsrc_t rsrc = __builtin_amdgcn_make_buffer_rsrc(p, stride, num, flags);
  bar();
  __builtin_amdgcn_raw_ptr_buffer_store_i8(c, rsrc, 0, 0, 0);
}
```

The generated IR would be:

```
define dso_local void @test_amdgcn_buffer_rsrc_t_assignment(ptr nocapture noundef writeonly %p, i16 noundef signext %stride, i32 noundef %num, i32 noundef %flags, i8 noundef signext %c) local_unnamed_addr {
entry:
  %0 = tail call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %p, i16 %stride, i32 %num, i32 %flags)
  tail call void @bar()
  tail call void @llvm.amdgcn.raw.ptr.buffer.store.i8(i8 %c, ptr addrspace(8) %0, i32 0, i32 0, i32 0)
  ret void
}

declare ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr readnone, i16, i32, i32) #1

declare void @bar() local_unnamed_addr #2
```

However, I just checked the potential use case of this, such as https://github.com/ROCm/composable_kernel/blob/acda4c5a3c34c13b71475fdd963e61182bba8a76/include/ck_tile/core/arch/amd_buffer_addressing.hpp#L71, we will need this type to be able to be passed around, so a sizeless type doesn't work. To move forward, I think we still need to make it a 128-bit fat pointer. I'm not sure yet if we want to make it an `i128` or `4xi32`, or a struct type because we definitely need to prevent the case like `__buffer_rsrc_t rsrc = some_i128_val;` or `__buffer_rsrc_t rsrc = some_4xi32_val;`. At clang codegen level, it is still taken as AS8 pointer. WDYT? @yxsamliu @arsenm @krzysz00 

https://github.com/llvm/llvm-project/pull/95276


More information about the cfe-commits mailing list