[Openmp-commits] [PATCH] D138389: [OpenMP][libomptarget] Add AMDGPU NextGen plugin with asynchronous behavior

Ye Luo via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Fri Nov 25 10:08:06 PST 2022


ye-luo added inline comments.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1463
+        OMPX_StreamSize("LIBOMPTARGET_AMDGPU_STREAM_SIZE", 512),
+        OMPX_MaxAsyncCopySize("LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_SIZE",
+                              1 * 1024 * 1024), // 1MB
----------------
kevinsala wrote:
> jdoerfert wrote:
> > ye-luo wrote:
> > > LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_BYTES?
> > We need documentation for those in openmp/docs, next to the other env vars.
> I'll add the documentation for these envars. They control the following aspects:
> 
> 1) `LIBOMPTARGET_AMDGPU_QUEUE_SIZE`: The number of HSA packets that can be pushed into each HSA queue without waiting the driver to process them.
> 2) `LIBOMPTARGET_AMDGPU_STREAM_SIZE`: Number of asynchronous operations (e.g., kernel launches, memory transfers) that can be pushed into each of our streams without waiting on their finalization. With the upcoming patch implementing dynamically sized streams, this envar will be renamed and become a hint for the initial stream size.
> 3) `LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_SIZE`: Up to this size, the memory copies (__tgt_rtl_submit_data, __tgt_rtl_retrieve_data) will be asynchronous operations appended to the corresponding stream. For larger transfer, it will become synchronous transfers. This can be seen `dataSubmitImpl` (1695).
Thanks for adding docs.
1. What I found in hsa doc is "Number of packets the queue is expected to hold". This is more understandable than your line. I think your description is true but that is a second level interpretation. I think both lines can be useful in the doc. LIBOMPTARGET_AMDGPU_QUEUE_SIZE is an insufficient name. LIBOMPTARGET_AMDGPU_HSA_QUEUE_SIZE is better IMO.
2. A STREAM may have many contents with different sizes. STREAM_SIZE is to vague. Better to have something like ASYNC_OP_DEPTH_PER_STREAM.
3. Still prefer COPY_BYTES.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138389/new/

https://reviews.llvm.org/D138389



More information about the Openmp-commits mailing list