[PATCH] D82496: [amdgpu] Add codegen support for HIP dynamic shared memory.

Thu Jun 25 08:00:50 PDT 2020

hliao added a comment.

In D82496#2112763 <https://reviews.llvm.org/D82496#2112763>, @t-tye wrote:

> My understanding is this feature is equivalent to the OpenCL dynamic group segment allocation. The runtime would presumably implement it in a similar way.
>
> So the HIP runtime must take the static LDS size, round up to the alignment requirement of the dynamic allocation (OpenCL just uses the maximally aligned OpenCL data type), then add the size of the dynamic LDS. The AQL packet group segment field is set to the total LDS size.
>
> In OpenCL there can be multiple kernel arguments, and the LDS address is passed to each. But for HIP there is only one dynamic area denoted by this weird extern. How is the dynamic LDS storage accessed? Is the address passed as an implicit kernel argument, or does the compiler implicitly use the aligned static LDS size?

This's the point. To keep compatible with CUDA, multiple dynamically sized arrays in a single kernel must declare a single `extern` unsized array and uses the address to divide it into multiple arrays by developers themselves. It's in fact quite similar to the local memory parameter in OpenCL. Thus, all `extern` unsized `__shared__` arrays are mapped onto the same address. The developer has the responsibility to divide it into multiple ones. That's in fact is same across HCC and HIP-Clang except that we want to maximize the compatibility and avoid changing source code too much.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82496/new/

https://reviews.llvm.org/D82496