[Openmp-commits] [PATCH] D132660: [openmp][amdgpu] Implement target_alloc_host as fine grain HSA memory

Thu Aug 25 07:12:53 PDT 2022

JonChesterfield created this revision.
JonChesterfield added reviewers: jhuber6, jdoerfert, carlo.bertolli, ronlieb.
Herald added subscribers: kosarev, kerbowa, guansong, t-tye, tpr, dstuttard, yaxunl, jvesely, kzhuravl.
Herald added a project: All.
JonChesterfield requested review of this revision.
Herald added subscribers: openmp-commits, sstefan1, wdng.
Herald added a project: OpenMP.

The cuda plugin maps TARGET_ALLOC_HOST onto cuMemAllocHost
which is page locked host memory. Fine grain HSA memory is not
necessarily page locked but has the same read/write from host or
device semantics.

The cuda plugin does this per-gpu and this patch makes it accessible
from any gpu, but it can be locked down to match the cuda behaviour
if preferred.

Enabling tests requires an equivalent to
// RUN: %libomptarget-compile-run-and-check-nvptx64-nvidia-cuda
for amdgpu which doesn't seem to be in use yet.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D132660

Files:
  openmp/libomptarget/plugins/amdgpu/src/rtl.cpp


Index: openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
===================================================================

--- openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
+++ openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
@@ -2620,13 +2620,20 @@
   void *Ptr = NULL;
   assert(DeviceId < DeviceInfo().NumberOfDevices && "Device ID too large");
 
-  if (Kind != TARGET_ALLOC_DEFAULT) {
+  hsa_amd_memory_pool_t MemoryPool;
+  switch (Kind) {
+  case TARGET_ALLOC_DEFAULT:
+    MemoryPool = DeviceInfo().getDeviceMemoryPool(DeviceId);
+    break;
+  case TARGET_ALLOC_HOST:
+    MemoryPool = DeviceInfo().getHostMemoryPool();
+    break;
+  default:
     REPORT("Invalid target data allocation kind or requested allocator not "
            "implemented yet\n");
     return NULL;
   }
 
-  hsa_amd_memory_pool_t MemoryPool = DeviceInfo().getDeviceMemoryPool(DeviceId);
   hsa_status_t Err = hsa_amd_memory_pool_allocate(MemoryPool, Size, 0, &Ptr);
   DP("Tgt alloc data %ld bytes, (tgt:%016llx).\n", Size,
      (long long unsigned)(Elf64_Addr)Ptr);
@@ -2677,6 +2684,10 @@
 
 int32_t __tgt_rtl_data_delete(int DeviceId, void *TgtPtr) {
   assert(DeviceId < DeviceInfo().NumberOfDevices && "Device ID too large");
+
+  // TgtPtr is either a TARGET_ALLOC_DEFAULT or a TARGET_ALLOC_HOST, either way
+  // code::runtime::Memfree can handle it
+
   hsa_status_t Err;
   DP("Tgt free data (tgt:%016llx).\n", (long long unsigned)(Elf64_Addr)TgtPtr);
   Err = core::Runtime::Memfree(TgtPtr);


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D132660.455577.patch
Type: text/x-patch
Size: 1484 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20220825/13d3e3eb/attachment.bin>