[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

Joseph Huber via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Aug 7 06:23:23 PDT 2023


jhuber6 added inline comments.


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17138
+///            and use its value for COV_4 or COV_5 approach. It is used for
+///            compiling device libraries in ABI-agnostic way.
+///
----------------



================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17187-17188
+        Address(Result, CGF.Int16Ty, CharUnits::fromQuantity(2)));
+  } else {
+    if (Cov == clang::TargetOptions::COV_5) {
+      // Indexing the implicit kernarg segment.
----------------
saiislam wrote:
> jhuber6 wrote:
> > nit.
> There are a couple of common lines after the inner if-else, in the outer else section.
You should be able to factor out
```
    LD = CGF.Builder.CreateLoad(
        Address(Result, CGF.Int16Ty, CharUnits::fromQuantity(2)));
```
from both by making each assign the `Result` to a value.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:2550-2551
+    Error Err = retrieveAllMemoryPools();
+    if (Err)
+      return Plugin::error("Unable to retieve all memmory pools");
+
----------------
This and below isn't correct. You can't discard an `llvm::Error` value like this without either doing `consumeError(std::move(Err))` or `toString(std::move(Err))`. However, you don't need to consume these in the first place, they already contain the error message from the callee and should just be forwarded.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1752
+    // if (auto Err = preAllocateDeviceMemoryPool())
+    //   return Err;
+
----------------
saiislam wrote:
> jhuber6 wrote:
> > Leftoever?
> No, it is not a left over.
> One of the fields in cov5 implicitikernarg is heap_v1 ptr. It should point to a 128KB zero-initialized block of coarse-grained memory on each device before launching the kernel. This code was working a while ago, but right now it is failing most likely due to some latest change in devicertl memory handling mechanism.
> I need to debug it with this patch, otherwise it will cause all target region code calling device-malloc to fail.
> I will try to fix it before the next revision.
Do we really need that? We only use a fraction of the existing implicit arguments. My understanding is that most of these are more for runtime handling for HIP and OpenCL while we would most likely want our own solution. I'm assuming that the 128KB is not required for anything we use?


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h:36
 
-// The implicit arguments of AMDGPU kernels.
-struct AMDGPUImplicitArgsTy {
-  uint64_t OffsetX;
-  uint64_t OffsetY;
-  uint64_t OffsetZ;
-  uint64_t HostcallPtr;
-  uint64_t Unused0;
-  uint64_t Unused1;
-  uint64_t Unused2;
+enum IMPLICITARGS : uint32_t {
+  COV4_SIZE = 56,
----------------
saiislam wrote:
> jhuber6 wrote:
> > arsenm wrote:
> > > This is getting duplicated a few places, should it move to a support header?
> > > 
> > > I don't love the existing APIs for this, I think a struct definition makes more sense
> > The other user here is my custom loader, @JonChesterfield has talked about wanting a common HSA helper header for awhile now.
> > 
> > I agree that the struct definition is much better. Being able to simply allocate this size and then zero fill it is much cleaner.
> Defining a struct for whole 256 byte of implicitargs in cov5 was becoming a little difficult due to different sizes of various fields (2, 4, 6, 8, 48, 72 bytes) along with multiple reserved fields in between. It made sense for cov4 because it only had 7 fields of 8 bytes each, where we needed only 4th field in OpenMP runtime (for hostcall_buffer).
> 
> Offset based lookups like the following allows handling/exposing only required fields across generations of ABI.
If we don't use it, just put it as `unused`. It's really hard to read as-is and it makes it more difficult to just zero fill.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730



More information about the cfe-commits mailing list