[Openmp-commits] [PATCH] D138389: [OpenMP][libomptarget] Add AMDGPU NextGen plugin with asynchronous behavior

Kevin Sala Penadés via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Wed Nov 23 16:16:02 PST 2022


kevinsala added a comment.

>> How was this tested?
>
> We need a configuration for our regression tests. I thought we had one (or a patch) already. @kevinsala @jhuber6

For functional tests I've used the LLVM OpenMP target tests (`make check`) and miniQMC. For performance experiments, the miniQMC directly. Is there any test suite you use for functional/performance testing?



================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:829
+    if (Size == Capacity)
+      return Plugin::error("Stream is full");
+
----------------
jdoerfert wrote:
> Isn't this a problem? Shouldn't we wait instead? Same in other places. Let's mark it as TODO and address it in a follow up.
Yes, there is no problem waiting on it. Even so, this check will go away with the new version of dynamically sized streams.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1463
+        OMPX_StreamSize("LIBOMPTARGET_AMDGPU_STREAM_SIZE", 512),
+        OMPX_MaxAsyncCopySize("LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_SIZE",
+                              1 * 1024 * 1024), // 1MB
----------------
jdoerfert wrote:
> ye-luo wrote:
> > LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_BYTES?
> We need documentation for those in openmp/docs, next to the other env vars.
I'll add the documentation for these envars. They control the following aspects:

1) `LIBOMPTARGET_AMDGPU_QUEUE_SIZE`: The number of HSA packets that can be pushed into each HSA queue without waiting the driver to process them.
2) `LIBOMPTARGET_AMDGPU_STREAM_SIZE`: Number of asynchronous operations (e.g., kernel launches, memory transfers) that can be pushed into each of our streams without waiting on their finalization. With the upcoming patch implementing dynamically sized streams, this envar will be renamed and become a hint for the initial stream size.
3) `LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_SIZE`: Up to this size, the memory copies (__tgt_rtl_submit_data, __tgt_rtl_retrieve_data) will be asynchronous operations appended to the corresponding stream. For larger transfer, it will become synchronous transfers. This can be seen `dataSubmitImpl` (1695).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138389/new/

https://reviews.llvm.org/D138389



More information about the Openmp-commits mailing list