[PATCH] D70010: [OpenMP][Offloading] Replaced default stream with an actual per-device unblocking stream in NVPTX implementation
Shilei Tian via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Nov 8 10:26:41 PST 2019
tianshilei1992 added a comment.
In D70010#1739061 <https://reviews.llvm.org/D70010#1739061>, @ABataev wrote:
> In D70010#1739049 <https://reviews.llvm.org/D70010#1739049>, @tianshilei1992 wrote:
>
> > In D70010#1738930 <https://reviews.llvm.org/D70010#1738930>, @ABataev wrote:
> >
> > > Also, the main question, how does it affect the exiting execution model? What if we have target region in a parallel region, will they be executed asynchronously? We need some tests for this if we don't have such tests.
> >
> >
> > According to https://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf, non-default stream can improve performance. This is actually the first step to use multiple streams I'm gonna implement later.
>
>
> My question is different. Does it affect execution of the existing code anyhow?
AFAIK, no. Currently we still only have one stream for each device, but it's just not the default stream. Kernels in a stream are executed in order. The asynchronous execution requires multiple streams. I'll check whether existing cases can cover it, and will write one if no.
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D70010/new/
https://reviews.llvm.org/D70010
More information about the llvm-commits
mailing list