[Openmp-commits] [PATCH] D138389: [OpenMP][libomptarget] Add AMDGPU NextGen plugin with asynchronous behavior
Johannes Doerfert via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Wed Nov 23 11:10:24 PST 2022
jdoerfert added inline comments.
================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:292
+
+ *PtrStorage = MemoryManager->allocate(Size, nullptr);
+ if (*PtrStorage == nullptr)
----------------
jhuber6 wrote:
> I'm not deeply familiar with the `MemoryManager`, but I thought that we didn't use it for AMD because the HSA plugin already handled its own caching in the memory pool? Maybe @JonChesterfield and @tianshilei1992 could comment further.
I'm generally in favor of using it either way.
================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:334
+ AMDGPUDeviceImageTy(int32_t ImageId, const __tgt_device_image *TgtImage)
+ : DeviceImageTy(ImageId, TgtImage) /*, Module(nullptr)*/ {}
+
----------------
================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:415
+ // Publish the packet. Do not modify the package after this point.
+ __atomic_store_n(PacketPtr, Header | (Setup << 16), __ATOMIC_RELEASE);
+
----------------
jhuber6 wrote:
> Never looked into it much myself, but does `std::atomic` allow us to avoid using a compiler builtin for this? Not a big deal if not.
We'd need to bend over backwards as the definitions are plain pointers. Let's keep it this way.
================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:673
+ /// Array of stream slots/operations. The array works as a circular buffer.
+ llvm::SmallVector<StreamOperationTy> Ops;
+
----------------
Pick a size that is at least the default Capacity.
================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:829
+ if (Size == Capacity)
+ return Plugin::error("Stream is full");
+
----------------
Isn't this a problem? Shouldn't we wait instead? Same in other places. Let's mark it as TODO and address it in a follow up.
================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1251-1252
+
+/// Class wrapping a CUDA event reference. These are the objects handled by the
+/// Event Manager for the CUDA plugin.
+class AMDGPUEventRef final : public GenericDeviceResourceRef {
----------------
- CUDA + AMD
================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1461-1462
+ OMPX_NumQueues("LIBOMPTARGET_AMDGPU_NUM_QUEUES", 8),
+ OMPX_QueueSize("LIBOMPTARGET_AMDGPU_QUEUE_SIZE", 1024),
+ OMPX_StreamSize("LIBOMPTARGET_AMDGPU_STREAM_SIZE", 512),
+ OMPX_MaxAsyncCopySize("LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_SIZE",
----------------
4 x 512 x 128 might be enough
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D138389/new/
https://reviews.llvm.org/D138389
More information about the Openmp-commits
mailing list