[Openmp-commits] [openmp] [amdgpu] Implement D2D memcpy as HSA call (PR #69955)

via Openmp-commits openmp-commits at lists.llvm.org
Mon Oct 23 11:06:06 PDT 2023


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-backend-amdgpu

Author: Jon Chesterfield (JonChesterfield)

<details>
<summary>Changes</summary>

hsa_amd_memory_async_copy can handle device to device copies if passed the corresponding parameters.

No functional change - currently D2D copy goes through a fallback in libomptarget that stages through a host malloc, after this it goes directly through HSA.

Works under exactly the situations that HSA works. Verified locally on a performance benchmark. Hoping to attract further testing from internal developers after it lands.

---
Full diff: https://github.com/llvm/llvm-project/pull/69955.diff


1 Files Affected:

- (modified) openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp (+32-7) 


``````````diff
diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
index ab24856f9bc78e4..5d3d7837d918b41 100644
--- a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -2250,14 +2250,37 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
                                           PinnedMemoryManager);
   }
 
-  /// Exchange data between two devices within the plugin. This function is not
-  /// supported in this plugin.
+  /// Exchange data between two devices within the plugin.
   Error dataExchangeImpl(const void *SrcPtr, GenericDeviceTy &DstGenericDevice,
                          void *DstPtr, int64_t Size,
                          AsyncInfoWrapperTy &AsyncInfoWrapper) override {
-    // This function should never be called because the function
-    // AMDGPUPluginTy::isDataExchangable() returns false.
-    return Plugin::error("dataExchangeImpl not supported");
+    AMDGPUDeviceTy &DstDevice = static_cast<AMDGPUDeviceTy &>(DstGenericDevice);
+
+    hsa_agent_t SrcAgent = getAgent();
+    hsa_agent_t DstAgent = DstDevice.getAgent();
+
+    AMDGPUSignalTy Signal;
+    if (auto Err = Signal.init())
+      return Err;
+
+    // The agents need to have access to the corresponding memory
+    // This is presently only true if the pointers were originally
+    // allocated by this runtime or the caller made the appropriate
+    // access calls.
+    hsa_status_t Status = hsa_amd_memory_async_copy(
+        DstPtr, DstAgent, SrcPtr, SrcAgent, (Size > 0) ? (size_t)Size : 0, 0,
+        nullptr, Signal.get());
+    if (auto Err =
+            Plugin::check(Status, "Error in hsa_amd_memory_async_copy: %s"))
+      return Err;
+
+    if (auto Err = Signal.wait(getStreamBusyWaitMicroseconds()))
+      return Err;
+
+    if (auto Err = Signal.deinit())
+      return Err;
+
+    return Plugin::success();
   }
 
   /// Initialize the async info for interoperability purposes.
@@ -2899,7 +2922,7 @@ struct AMDGPUPluginTy final : public GenericPluginTy {
 
   /// This plugin does not support exchanging data between two devices.
   bool isDataExchangable(int32_t SrcDeviceId, int32_t DstDeviceId) override {
-    return false;
+    return true;
   }
 
   /// Get the host device instance.
@@ -3174,9 +3197,11 @@ void *AMDGPUDeviceTy::allocate(size_t Size, void *, TargetAllocTy Kind) {
     return nullptr;
   }
 
-  if (Alloc && (Kind == TARGET_ALLOC_HOST || Kind == TARGET_ALLOC_SHARED)) {
+  if (Alloc) {
     auto &KernelAgents = Plugin::get<AMDGPUPluginTy>().getKernelAgents();
 
+    // Inherently necessary for host or shared allocations
+    // Also enabled for device memory to allow device to device memcpy
     // Enable all kernel agents to access the buffer.
     if (auto Err = MemoryPool->enableAccess(Alloc, Size, KernelAgents)) {
       REPORT("%s\n", toString(std::move(Err)).data());

``````````

</details>


https://github.com/llvm/llvm-project/pull/69955


More information about the Openmp-commits mailing list