[Openmp-commits] [PATCH] D138389: [OpenMP][libomptarget] Add AMDGPU NextGen plugin with asynchronous behavior

Sat Dec 3 18:49:00 PST 2022

jdoerfert added inline comments.

================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:181
+  /// Getter of the HSA memory pool.
+  hsa_amd_memory_pool_t get() const { return MemoryPool; }
+
----------------
JonChesterfield wrote:
> this is pointless, just make the member public if we have to leak it. but again, existing code...
It's not pointless, it gives a common name for abstracted resources. Often this would be `operator*` or similar but `get` is as good as anything for now. Let's not go down rabbit holes that are indeed pointless.

================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:499
+    while (hsa_signal_wait_scacquire(Signal, HSA_SIGNAL_CONDITION_EQ, 0,
+                                     UINT64_MAX, HSA_WAIT_STATE_ACTIVE) != 0)
+      ;
----------------
kevinsala wrote:
> JonChesterfield wrote:
> > comment for why active instead of blocked? wait in particular sounds like we expect the signal to be unavailable when the function is called so busy-spin in the host thread seems bad
> True, that's something I was experimenting with. I'll change it back to blocked wait.
I'm not sure which way to go. Add a TODO to try this out. I'm uncertain if the host thread can do something useful while it's not busy waiting.

================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:507
+
+  /// Signal decrementing by one.
+  void signal() {
----------------
kevinsala wrote:
> JonChesterfield wrote:
> > This looks racy. If a few threads are calling decrement and one calls reset, it's going to break. 
> Reset is expected to be called when the signal is not being used by any other thread nor the HSA runtime. It should be called just before being reused.
Add that to the function comment please.

================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:725
+    // Lock the queue during the packet publishing process.
+    std::lock_guard<AMDGPUQueueTy> Lock(Queue);
+
----------------
JonChesterfield wrote:
> Confused by this, why are we locking the queue?
I think that's related to the answer above. Add a TODO to look into this.

================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1931
+    bool Found = false;
+    HostAllocationsMutex.lock_shared();
+    if (!HostAllocations.empty()) {
----------------
kevinsala wrote:
> JonChesterfield wrote:
> > I can't work out what's going on here. The corresponding logic for erase looks up the pointer directly, should found not be the same? Also can't tell why we're recording the size of the allocation next to the pointer, as opposed to a DenseSet<void*>
> The `__tgt_rtl_data_delete` operation should pass the same pointer provided by the `__tgt_rtl_data_alloc`. As far as I know, it's not valid to make a partial deletion of an allocated buffer. But in the case of `__tgt_rtl_data_submit/retrieve`, the pointer can come with an applied offset. Thus, we should check whether the provided pointer is inside any host allocation, considering the sizes of the allocations.
We might want to make our lives easier here and not do a search.
Later we want two changes that will help:
1) Have a pre-allocated pinned buffer for arguments.
2) Use the hsa lookup function if it's not a pointer to pinned memory allocated by us.
@kevinsala You think we need the search for the current use cases?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138389/new/

https://reviews.llvm.org/D138389