[Openmp-commits] [PATCH] D74145: [OpenMP][Offloading] Added support for multiple streams so that multiple kernels can be executed concurrently
Johannes Doerfert via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Sun Feb 9 22:34:40 PST 2020
jdoerfert accepted this revision.
jdoerfert added a comment.
This revision is now accepted and ready to land.
In D74145#1866382 <https://reviews.llvm.org/D74145#1866382>, @ye-luo wrote:
> I tested the patch. The stream of H2D, D2H and compute behaves asynchronously as expected.
I do accept this pending D74258 <https://reviews.llvm.org/D74258> and the C++14 RFC. If they go through the version of this patch that uses C++14 is fine.
We can discuss and modify the stream number afterwards as necessary (assuming we don't find a consensus now).
This patch is strictly positive so we should work from here.
================
Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:246
+ // By default let's create 32 streams per device
+ EnvNumStreams = 32;
+ envStr = getenv("LIBOMPTARGET_NUM_STREAMS");
----------------
ye-luo wrote:
> tianshilei1992 wrote:
> > jdoerfert wrote:
> > > The hardware will cap the number internally anyway so we should go higher here. Maybe 256?
> > Sure
> I don't like this choice. The hardware limit is 32 which is preferred. Users can play with environment variable if they need more.
> On the nvprof, it is impossible to digest 256 streams from OpenMP plus other application streams.
@ye-luo Do you experience a downside to 256 streams?
There should not be a performance problem but it should help us to be future and backwards compatible.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D74145/new/
https://reviews.llvm.org/D74145
More information about the Openmp-commits
mailing list