[Openmp-commits] [PATCH] D132005: Add non-blocking support for target nowait regions
Guilherme Valarini via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Wed Aug 17 18:57:04 PDT 2022
gValarini added a comment.
In D132005#3730450 <https://reviews.llvm.org/D132005#3730450>, @ye-luo wrote:
> Right now the synchronization is based on stream. Have you though about synchronize by an CUDA event and return the Stream to the pool early?
I have not thought about that at the moment, but that could be a nice optimization. Since the CUDA plugin currently maintains a resizable pool of streams for each device with an initial size of 32, I thought that for a first implementation this could be enough.
CUDA events have the same API as streams for non-blocking synchronization using cudaEventQuery <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EVENT.html#group__CUDART__EVENT_1g2bf738909b4a059023537eaa29d8a5b7>, so we could store a single event (`completionEvent`) per `AsyncInfo` and use that when synchronizing with `SyncType::NON_BLOCKING`. I have one question though: does querying for CUDA events completion synchronize all the operations prior to the event on the stream? Or another thread on the host must synchronize the stream? If only synchronizing the events is enough, it would make using them quite simpler.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D132005/new/
https://reviews.llvm.org/D132005
More information about the Openmp-commits
mailing list