[PATCH] D73979: [HIP] Allow non-incomplete array type for extern shared var

Tue Feb 4 11:17:14 PST 2020

yaxunl added a comment.

In D73979#1857485 <https://reviews.llvm.org/D73979#1857485>, @tra wrote:

> A better description for the change would be helpful.
>
> For what it's worth, NVCC accepts 'all of the above' for extern __shared__. https://godbolt.org/z/8cBsXv Whether that makes sense or not is another question.
>
>   IIRC, `extern __shared__` can, effectively only be a pointer to an opaque type because at compile time we neither know the address, nor do we know the size of the memory it will point to. I believe that was the reason why we've limited accepted types to size-less arrays only. Otherwise, we end up with situations where on the source level we declare an object (I.e. nobody expects a pointer to be involved), but end up failing with invalid memory access because no memory was ever allocated for it. Incomplete array seems to be a reasonable trade-off.
>   
>
> What's your proposed use case for this change? Does `extern __shared__` work in HIP the same way it works in CUDA?

The shared memory is divided into two parts: static part and dynamic part. AMDGPU backends calculates static usage of all shared vars for a kernel and put that info in code object. When runtime launches a kernel, it checks static shared memory required by the kernel, add the dynamic shared memory requirement specified by triple chevron and allocates the sum of static and dynamic requirements.

AMDGPU backend assumes static shared memory takes the lower memory address and the dynamic shared memory starts at the boundary between static shared memory and dynamic memory address.

AMDGPU backend lowers all external uninitialized shared vars to the size of all static shared var usage, i.e. the starting address of the dynamic part of shared memory.

Based on CUDA usage of extern shared var (https://devblogs.nvidia.com/using-shared-memory-cuda-cc/), it seems CUDA also assumes all extern shared vars have the same address, therefore HIP and CUDA have similar behavior.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73979/new/

https://reviews.llvm.org/D73979