[PATCH] D89525: [amdgpu] Enhance AMDGPU AA.

Sat Oct 17 09:59:49 PDT 2020

t-tye added a comment.

LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.

Passing an LDS or SCRATCH address between threads is meaningful provided only the thread(s) that "own" the address dereference it. So storing the address in a global "place" to be read later by an "owning" thread is meaningful. However, some languages may restrict what they allow. So passing as a kernel argument in CUDA appears to not be allowed even though it is meaningful provided the above restricts are met. In OpenCL, there are special rules for passing LDS/Local to a kernel. In OpenCL you actually pass in a byte size, and the kernel dispatch allocates dynamic LDS automatically and passes the address of that to the created thread(s). CUDA has a different syntax for dynamic LDS/Local that is more like TLS.

So how is TLS handled? It seems a TLS address cannot be compile/link time value since it is a runtime concept. So using relocations to initialize global memory program scope variables seems invalid. Initializing a pointer object that is allocated in LDS/SCRATCH to be the address of another LDS/SCRATCH allocated in the same "owning" thread is meaningful and could be implemented using relocations. However, I suspect the languages do not allow this. I am unclear if TLS allows this either.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89525/new/

https://reviews.llvm.org/D89525