[Openmp-commits] [PATCH] D74145: [OpenMP][Offloading] Added support for multiple streams so that multiple kernels can be executed concurrently

Johannes Doerfert via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Sun Feb 9 22:34:40 PST 2020

jdoerfert accepted this revision.
jdoerfert added a comment.
This revision is now accepted and ready to land.

In D74145#1866382 <https://reviews.llvm.org/D74145#1866382>, @ye-luo wrote:

> I tested the patch. The stream of H2D, D2H and compute behaves asynchronously as expected.

I do accept this pending D74258 <https://reviews.llvm.org/D74258> and the C++14 RFC. If they go through the version of this patch that uses C++14 is fine.

We can discuss and modify the stream number afterwards as necessary (assuming we don't find a consensus now).
This patch is strictly positive so we should work from here.

Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:246
+    // By default let's create 32 streams per device
+    EnvNumStreams = 32;
+    envStr = getenv("LIBOMPTARGET_NUM_STREAMS");
ye-luo wrote:
> tianshilei1992 wrote:
> > jdoerfert wrote:
> > > The hardware will cap the number internally anyway so we should go higher here. Maybe 256?
> > Sure
> I don't like this choice. The hardware limit is 32 which is preferred. Users can play with environment variable if they need more.
> On the nvprof, it is impossible to digest 256 streams from OpenMP plus other application streams.
@ye-luo Do you experience a downside to 256 streams?

There should not be a performance problem but it should help us to be future and backwards compatible. 



More information about the Openmp-commits mailing list