[Openmp-commits] [PATCH] D74145: [OpenMP][Offloading] Added support for multiple streams so that multiple kernels can be executed concurrently

Ye Luo via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Sun Feb 9 21:00:43 PST 2020

ye-luo added a comment.

I tested the patch. The stream of H2D, D2H and compute behaves asynchronously as expected.

Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:246
+    // By default let's create 32 streams per device
+    EnvNumStreams = 32;
+    envStr = getenv("LIBOMPTARGET_NUM_STREAMS");
tianshilei1992 wrote:
> jdoerfert wrote:
> > The hardware will cap the number internally anyway so we should go higher here. Maybe 256?
> Sure
I don't like this choice. The hardware limit is 32 which is preferred. Users can play with environment variable if they need more.
On the nvprof, it is impossible to digest 256 streams from OpenMP plus other application streams.



More information about the Openmp-commits mailing list