[Openmp-commits] [PATCH] D74145: [OpenMP][Offloading] Added support for multiple streams so that multiple kernels can be executed concurrently

Ye Luo via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Mon Feb 10 00:41:47 PST 2020

ye-luo added inline comments.

Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:246
+    // By default let's create 32 streams per device
+    EnvNumStreams = 32;
+    envStr = getenv("LIBOMPTARGET_NUM_STREAMS");
jdoerfert wrote:
> ye-luo wrote:
> > tianshilei1992 wrote:
> > > jdoerfert wrote:
> > > > The hardware will cap the number internally anyway so we should go higher here. Maybe 256?
> > > Sure
> > I don't like this choice. The hardware limit is 32 which is preferred. Users can play with environment variable if they need more.
> > On the nvprof, it is impossible to digest 256 streams from OpenMP plus other application streams.
> @ye-luo Do you experience a downside to 256 streams?
> There should not be a performance problem but it should help us to be future and backwards compatible. 
I don't have strong evidence about performance impact. I though more streams should cost the driver a bit more to monitor and schedule workload to the hardware.



More information about the Openmp-commits mailing list