[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

Thu Aug 17 14:17:09 PDT 2023

saiislam added inline comments.

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17143-17145
+  llvm::LoadInst *LD;
+  Constant *Offset, *Offset1;
+  Value *DP, *DP1;
----------------
arsenm wrote:
> Move down to define and initialize
There are multiple uses of the same identifier. Defining them four times looks odd.

================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:2550-2551
+    Error Err = retrieveAllMemoryPools();
+    if (Err)
+      return Plugin::error("Unable to retieve all memmory pools");
+
----------------
jhuber6 wrote:
> This and below isn't correct. You can't discard an `llvm::Error` value like this without either doing `consumeError(std::move(Err))` or `toString(std::move(Err))`. However, you don't need to consume these in the first place, they already contain the error message from the callee and should just be forwarded.
Removed the logic for preallocatedheap.

================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1752
+    // if (auto Err = preAllocateDeviceMemoryPool())
+    //   return Err;
+
----------------
jhuber6 wrote:
> saiislam wrote:
> > jhuber6 wrote:
> > > Leftoever?
> > No, it is not a left over.
> > One of the fields in cov5 implicitikernarg is heap_v1 ptr. It should point to a 128KB zero-initialized block of coarse-grained memory on each device before launching the kernel. This code was working a while ago, but right now it is failing most likely due to some latest change in devicertl memory handling mechanism.
> > I need to debug it with this patch, otherwise it will cause all target region code calling device-malloc to fail.
> > I will try to fix it before the next revision.
> Do we really need that? We only use a fraction of the existing implicit arguments. My understanding is that most of these are more for runtime handling for HIP and OpenCL while we would most likely want our own solution. I'm assuming that the 128KB is not required for anything we use?
I have removed the preallocatedheap work from this patch.

================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h:36

-// The implicit arguments of AMDGPU kernels.
-struct AMDGPUImplicitArgsTy {
-  uint64_t OffsetX;
-  uint64_t OffsetY;
-  uint64_t OffsetZ;
-  uint64_t HostcallPtr;
-  uint64_t Unused0;
-  uint64_t Unused1;
-  uint64_t Unused2;
+enum IMPLICITARGS : uint32_t {
+  COV4_SIZE = 56,
----------------
jhuber6 wrote:
> saiislam wrote:
> > jhuber6 wrote:
> > > arsenm wrote:
> > > > This is getting duplicated a few places, should it move to a support header?
> > > > 
> > > > I don't love the existing APIs for this, I think a struct definition makes more sense
> > > The other user here is my custom loader, @JonChesterfield has talked about wanting a common HSA helper header for awhile now.
> > > 
> > > I agree that the struct definition is much better. Being able to simply allocate this size and then zero fill it is much cleaner.
> > Defining a struct for whole 256 byte of implicitargs in cov5 was becoming a little difficult due to different sizes of various fields (2, 4, 6, 8, 48, 72 bytes) along with multiple reserved fields in between. It made sense for cov4 because it only had 7 fields of 8 bytes each, where we needed only 4th field in OpenMP runtime (for hostcall_buffer).
> > 
> > Offset based lookups like the following allows handling/exposing only required fields across generations of ABI.
> If we don't use it, just put it as `unused`. It's really hard to read as-is and it makes it more difficult to just zero fill.
I have reduced the fields to bare minimum required for OpenMP.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730