[PATCH] D73979: [HIP] Allow non-incomplete array type for extern shared var

Tue Feb 4 12:51:24 PST 2020

yaxunl added a comment.

In D73979#1857664 <https://reviews.llvm.org/D73979#1857664>, @tra wrote:

> In D73979#1857536 <https://reviews.llvm.org/D73979#1857536>, @yaxunl wrote:
>
> > Based on CUDA usage of extern shared var (https://devblogs.nvidia.com/using-shared-memory-cuda-cc/), it seems CUDA also assumes all extern shared vars have the same address, therefore HIP and CUDA have similar behavior.
>
>
> Yes. Because of the `CUDA also assumes all extern shared vars have the same address` I think that not allowing additional types for `extern __shared__` makes sense for HIP, too. I'd rather not give users more ways to do a wrong thing.
>
> Can you elaborate on why you want to allow this feature? While it would be convenient for someone who uses exactly *one* such extern object, in practice the most common use case is that users declare a single `extern __shared__` array to serve as the memory pool and then manually allocate chunks within it and assign the addresses to appropriately typed pointers. I guess they could define an `extern __shared__ struct` with fields representing the objects, but that seems sort of pointless considering that the only reason to use `extern __shared__` is to allocate shared memory dynamically.
>
> In general, the concept of `extern __shared__` with *all* such extern items occupying the same space is broken by design. It's not composable (every function using one needs to coordinate with every other function doig the same). It introduces failure modes not obvious from the source code (access an object, fail with invalid memory access). It does not fit the conventional meaning of what `extern something` means in C++ (different objects have different addresses). IMO, it should not have existed and the shared memory/pointer should've been exposed via explicit API. I.e. CUDA could've used the same mechanism which provides threads with threadIdx and blockIdx.
>
> As things stand right now,  `extern __shared__` is something I want gone, not added more features to. AFAICT, the limitations clang places on it right now have not been an issue for the CUDA code we compile.
>
> Is there a pressing need for this feature for HIP? Perhaps it would make more sense to introduce a more sensible API and port existing HIP code to use it.
>
> WDYT?

All extern shared vars are sharing the same address, however, they may be used as different types in different functions.

For example,

  __device__ int foo() {
    extern __shared__ int a;
    for (...) a+=...;
    return a;
  }

  __device__ double bar(int x) {
    extern __shared__ double b[10];
    for(...) b[x]+=...;
    return b[0];
  }

  __global__ void k() {
    foo();
    //...
    bar();
  }

In one function foo, users need to use the shared memory as an int. In another function, users need to use the shared memory as a double array. Users just need to make sure they request sufficient dynamic shared memory in triple chevron to be greater than the max dynamic shared memory usage. Users do not need to pass values in extern shared var between functions. They just treat it as an uninitialized variable. Forbidding different types for extern shared variable does not add any benefit, just forcing users to work around the limitation and resulting in less readable code.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73979/new/

https://reviews.llvm.org/D73979