[Openmp-commits] [PATCH] D138389: [OpenMP][libomptarget] Add AMDGPU NextGen plugin with asynchronous behavior

Joseph Huber via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Mon Nov 21 11:09:08 PST 2022


jhuber6 added a subscriber: saiislam.
jhuber6 added a comment.

Looks good overall for me, mostly nits. I'm not the most qualifies to scrutinize the HSA usage, maybe @JonChesterfield or @saiislam could help on that front.



================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:43
+struct AMDGPUPluginTy;
+struct AMDGPUStreamTy;
+struct AMDGPUEventTy;
----------------
This is declared as a struct here then defined as a class later.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:64
+
+static_assert(sizeof(impl_implicit_args_t) == 56, "");
+
----------------
Empty argument isn't needed in C++17


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:148
+  if (Features.contains("sramecc+")) {
+    FeatureMap.insert(std::pair<std::string, bool>("sramecc", true));
+  } else if (Features.contains("sramecc-")) {
----------------
These can be `StringRef` since they're constants. Avoids the heap allocation.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:292
+
+    *PtrStorage = MemoryManager->allocate(Size, nullptr);
+    if (*PtrStorage == nullptr)
----------------
I'm not deeply familiar with the `MemoryManager`, but I thought that we didn't use it for AMD because the HSA plugin already handled its own caching in the memory pool? Maybe @JonChesterfield and @tianshilei1992 could comment further.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:415
+    // Publish the packet. Do not modify the package after this point.
+    __atomic_store_n(PacketPtr, Header | (Setup << 16), __ATOMIC_RELEASE);
+
----------------
Never looked into it much myself, but does `std::atomic` allow us to avoid using a compiler builtin for this? Not a big deal if not.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:536-537
+    // Kernel symbols have a ".kd" suffix.
+    std::string KernelName(getName());
+    KernelName += ".kd";
+
----------------
Small nit, can use LLVM's string, it's basically just a small vector of chars.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1009
+
+  /// Push an asynchronous memory copy host-to-device involving a unpinned
+  /// memory buffer. The operation consists of a two-step copy from the
----------------
typo


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1199
+
+  std::scoped_lock MultiLock(Mutex, RecordedStream.Mutex);
+
----------------
Probably best to specify the template arguments.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1572
+                       DeviceImageTy &Image) override {
+    AMDGPUDeviceImageTy &AMDImage = static_cast<AMDGPUDeviceImageTy &>(Image);
+
----------------
Unused


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1712
+      Status =
+          hsa_amd_memory_lock((void *)HstPtr, Size, nullptr, 0, &PinnedHstPtr);
+      if (auto Err =
----------------
Using `const_cast` isn't great but otherwise the compiler will complain with warnings on.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1733
+
+      Status = hsa_amd_memory_unlock((void *)HstPtr);
+      if (auto Err =
----------------
Same here.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1760
+      return Stream.pushPinnedMemoryCopyAsync(HstPtr, TgtPtr, Size);
+    } else if (Size >= OMPX_MaxAsyncCopySize) {
+      // For large transfers use synchronous behavior.
----------------
Nit, no else after return


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:2048
+                                 const ELF64LE::Shdr &Section,
+                                 GlobalTy &ImageGlobal) {
+    // The global's address in AMDGPU is computed as the image begin + the ELF
----------------
Is this supposed to be marked override?


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:2176
+    // Setup the memory pools of available for the host.
+    if (Err = HostDevice->init())
+      return std::move(Err);
----------------
Best make a new one here as this was checked above.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:2297
+        return false;
+      } else if (EnvFeature->first() == ImgFeature.first() &&
+                 EnvFeature->second != ImgFeature.second) {
----------------
Nit. no else after return.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138389/new/

https://reviews.llvm.org/D138389



More information about the Openmp-commits mailing list