[PATCH] D73979: [HIP] Allow non-incomplete array type for extern shared var

Tue Feb 4 12:12:55 PST 2020

tra added a comment.

In D73979#1857536 <https://reviews.llvm.org/D73979#1857536>, @yaxunl wrote:

> Based on CUDA usage of extern shared var (https://devblogs.nvidia.com/using-shared-memory-cuda-cc/), it seems CUDA also assumes all extern shared vars have the same address, therefore HIP and CUDA have similar behavior.

Yes. Because of the `CUDA also assumes all extern shared vars have the same address` I think that not allowing additional types for `extern __shared__` makes sense for HIP, too. I'd rather not give users more ways to do a wrong thing.

Can you elaborate on why you want to allow this feature? While it would be convenient for someone who uses exactly *one* such extern object, in practice the most common use case is that users declare a single `extern __shared__` array to serve as the memory pool and then manually allocate chunks within it and assign the addresses to appropriately typed pointers. I guess they could define an `extern __shared__ struct` with fields representing the objects, but that seems sort of pointless considering that the only reason to use `extern __shared__` is to allocate shared memory dynamically.

In general, the concept of `extern __shared__` with *all* such extern items occupying the same space is broken by design. It's not composable (every function using one needs to coordinate with every other function doig the same). It introduces failure modes not obvious from the source code (access an object, fail with invalid memory access). It does not fit the conventional meaning of what `extern something` means in C++ (different objects have different addresses). IMO, it should not have existed and the shared memory/pointer should've been exposed via explicit API. I.e. CUDA could've used the same mechanism which provides threads with threadIdx and blockIdx.

As things stand right now,  `extern __shared__` is something I want gone, not added more features to. AFAICT, the limitations clang places on it right now have not been an issue for the CUDA code we compile.

Is there a pressing need for this feature for HIP? Perhaps it would make more sense to introduce a more sensible API and port existing HIP code to use it.

WDYT?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73979/new/

https://reviews.llvm.org/D73979