[Openmp-commits] [PATCH] D138389: [OpenMP][libomptarget] Add AMDGPU NextGen plugin with asynchronous behavior

Johannes Doerfert via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Wed Nov 23 11:10:24 PST 2022


jdoerfert added inline comments.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:292
+
+    *PtrStorage = MemoryManager->allocate(Size, nullptr);
+    if (*PtrStorage == nullptr)
----------------
jhuber6 wrote:
> I'm not deeply familiar with the `MemoryManager`, but I thought that we didn't use it for AMD because the HSA plugin already handled its own caching in the memory pool? Maybe @JonChesterfield and @tianshilei1992 could comment further.
I'm generally in favor of using it either way.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:334
+  AMDGPUDeviceImageTy(int32_t ImageId, const __tgt_device_image *TgtImage)
+      : DeviceImageTy(ImageId, TgtImage) /*, Module(nullptr)*/ {}
+
----------------



================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:415
+    // Publish the packet. Do not modify the package after this point.
+    __atomic_store_n(PacketPtr, Header | (Setup << 16), __ATOMIC_RELEASE);
+
----------------
jhuber6 wrote:
> Never looked into it much myself, but does `std::atomic` allow us to avoid using a compiler builtin for this? Not a big deal if not.
We'd need to bend over backwards as the definitions are plain pointers. Let's keep it this way.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:673
+  /// Array of stream slots/operations. The array works as a circular buffer.
+  llvm::SmallVector<StreamOperationTy> Ops;
+
----------------
Pick a size that is at least the default Capacity.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:829
+    if (Size == Capacity)
+      return Plugin::error("Stream is full");
+
----------------
Isn't this a problem? Shouldn't we wait instead? Same in other places. Let's mark it as TODO and address it in a follow up.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1251-1252
+
+/// Class wrapping a CUDA event reference. These are the objects handled by the
+/// Event Manager for the CUDA plugin.
+class AMDGPUEventRef final : public GenericDeviceResourceRef {
----------------
- CUDA + AMD


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1461-1462
+        OMPX_NumQueues("LIBOMPTARGET_AMDGPU_NUM_QUEUES", 8),
+        OMPX_QueueSize("LIBOMPTARGET_AMDGPU_QUEUE_SIZE", 1024),
+        OMPX_StreamSize("LIBOMPTARGET_AMDGPU_STREAM_SIZE", 512),
+        OMPX_MaxAsyncCopySize("LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_SIZE",
----------------
4 x 512 x 128 might be enough


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138389/new/

https://reviews.llvm.org/D138389



More information about the Openmp-commits mailing list