[Openmp-commits] [PATCH] D132005: Add non-blocking support for target nowait regions

Guilherme Valarini via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Wed Aug 17 18:57:04 PDT 2022


gValarini added a comment.

In D132005#3730450 <https://reviews.llvm.org/D132005#3730450>, @ye-luo wrote:

> Right now the synchronization is based on stream. Have you though about synchronize by an CUDA event and return the Stream to the pool early?

I have not thought about that at the moment, but that could be a nice optimization. Since the CUDA plugin currently maintains a resizable pool of streams for each device with an initial size of 32, I thought that for a first implementation this could be enough.

CUDA events have the same API as streams for non-blocking synchronization using cudaEventQuery <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EVENT.html#group__CUDART__EVENT_1g2bf738909b4a059023537eaa29d8a5b7>, so we could store a single event (`completionEvent`) per `AsyncInfo` and use that when synchronizing with `SyncType::NON_BLOCKING`. I have one question though: does querying for CUDA events completion synchronize all the operations prior to the event on the stream? Or another thread on the host must synchronize the stream? If only synchronizing the events is enough, it would make using them quite simpler.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132005/new/

https://reviews.llvm.org/D132005



More information about the Openmp-commits mailing list