[Openmp-commits] [PATCH] D138389: [OpenMP][libomptarget] Add AMDGPU NextGen plugin with asynchronous behavior
Jon Chesterfield via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Tue Nov 29 15:54:12 PST 2022
JonChesterfield added a comment.
A few comments above. General case, I don't see the benefit of getting constructs like:
`hsa_signal_t get() const { return Signal; }`
can we either not allow access to the private member, or just make it public and drop the getter?
================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:181
+ /// Getter of the HSA memory pool.
+ hsa_amd_memory_pool_t get() const { return MemoryPool; }
+
----------------
this is pointless, just make the member public if we have to leak it. but again, existing code...
================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:220
+ const llvm::SmallVector<hsa_agent_t> &Agents) const {
+ // TODO: Ensure it is possible to enable the access. This can be retrieved
+ // through HSA_AMD_AGENT_MEMORY_POOL_INFO_ACCESS. If it is not possible,
----------------
this is a scary comment, though presumably in existing code
================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:292
+
+ *PtrStorage = MemoryManager->allocate(Size, nullptr);
+ if (*PtrStorage == nullptr)
----------------
jdoerfert wrote:
> jhuber6 wrote:
> > I'm not deeply familiar with the `MemoryManager`, but I thought that we didn't use it for AMD because the HSA plugin already handled its own caching in the memory pool? Maybe @JonChesterfield and @tianshilei1992 could comment further.
> I'm generally in favor of using it either way.
IIUC the HSA interface with 'memory pool' in the name doesn't implement a memory pool so caching it here is probably faster. I expect it'll interact badly with the asan development effort but that was rocm-internal last time I looked at it so can't influence decisions here
================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:393
+ // the acquire memory order.
+ while (PacketId - hsa_queue_load_read_index_scacquire(Queue) >= Queue->size)
+ ;
----------------
does this behave sensibly on wrap around? the arithmetic used elsewhere is
```
static uint64_t acquireAvailablePacketId(hsa_queue_t *Queue) {
uint64_t PacketId = hsa_queue_add_write_index_relaxed(Queue, 1);
bool Full = true;
while (Full) {
Full =
PacketId >= (Queue->size + hsa_queue_load_read_index_scacquire(Queue));
}
return PacketId;
}
```
================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:452
+
+ /// Lock and unlock the queue. Acquiring a packet, filling its fields and
+ /// publishing it should be performed under the same exclusive region without
----------------
This is strange. The whole point of the queue abstraction is that multiple threads can enqueue work onto it at the same time, so why would we be locking it?
================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:481
+ /// Create an empty signal.
+ AMDGPUSignalTy() : Signal({0}) {}
+
----------------
Is {0} a meaningful value for a signal? e.g. can we pass it to signal_destroy or similar without it falling over?
================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:499
+ while (hsa_signal_wait_scacquire(Signal, HSA_SIGNAL_CONDITION_EQ, 0,
+ UINT64_MAX, HSA_WAIT_STATE_ACTIVE) != 0)
+ ;
----------------
comment for why active instead of blocked? wait in particular sounds like we expect the signal to be unavailable when the function is called so busy-spin in the host thread seems bad
================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:507
+
+ /// Signal decrementing by one.
+ void signal() {
----------------
This looks racy. If a few threads are calling decrement and one calls reset, it's going to break.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D138389/new/
https://reviews.llvm.org/D138389
More information about the Openmp-commits
mailing list