[Openmp-commits] [PATCH] D74145: [OpenMP][Offloading] Added support for multiple streams so that multiple kernels can be executed concurrently
Ye Luo via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Mon Feb 10 00:41:47 PST 2020
ye-luo added inline comments.
================
Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:246
+ // By default let's create 32 streams per device
+ EnvNumStreams = 32;
+ envStr = getenv("LIBOMPTARGET_NUM_STREAMS");
----------------
jdoerfert wrote:
> ye-luo wrote:
> > tianshilei1992 wrote:
> > > jdoerfert wrote:
> > > > The hardware will cap the number internally anyway so we should go higher here. Maybe 256?
> > > Sure
> > I don't like this choice. The hardware limit is 32 which is preferred. Users can play with environment variable if they need more.
> > On the nvprof, it is impossible to digest 256 streams from OpenMP plus other application streams.
> @ye-luo Do you experience a downside to 256 streams?
>
> There should not be a performance problem but it should help us to be future and backwards compatible.
I don't have strong evidence about performance impact. I though more streams should cost the driver a bit more to monitor and schedule workload to the hardware.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D74145/new/
https://reviews.llvm.org/D74145
More information about the Openmp-commits
mailing list