[PATCH] D70010: [OpenMP][Offloading] Replaced default stream with an actual per-device unblocking stream in NVPTX implementation

Fri Nov 8 10:26:41 PST 2019

tianshilei1992 added a comment.

In D70010#1739061 <https://reviews.llvm.org/D70010#1739061>, @ABataev wrote:

> In D70010#1739049 <https://reviews.llvm.org/D70010#1739049>, @tianshilei1992 wrote:
>
> > In D70010#1738930 <https://reviews.llvm.org/D70010#1738930>, @ABataev wrote:
> >
> > > Also, the main question, how does it affect the exiting execution model? What if we have target region in a parallel region, will they be executed asynchronously? We need some tests for this if we don't have such tests.
> >
> >
> > According to https://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf, non-default stream can improve performance. This is actually the first step to use multiple streams I'm gonna implement later.
>
>
> My question is different. Does it affect execution of the existing code anyhow?

AFAIK, no. Currently we still only have one stream for each device, but it's just not the default stream. Kernels in a stream are executed in order. The asynchronous execution requires multiple streams. I'll check whether existing cases can cover it, and will write one if no.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70010/new/

https://reviews.llvm.org/D70010