[PATCH] D89525: [amdgpu] Enhance AMDGPU AA.

Sat Oct 17 16:00:34 PDT 2020

t-tye added a comment.

In D89525#2337059 <https://reviews.llvm.org/D89525#2337059>, @rampitec wrote:

> In D89525#2337058 <https://reviews.llvm.org/D89525#2337058>, @t-tye wrote:
>
>>> In D89525#2336864 <https://reviews.llvm.org/D89525#2336864>, @t-tye wrote:
>>>
>>>> LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.
>>>>
>>>> Passing an LDS or SCRATCH address between threads is meaningful provided only the thread(s) that "own" the address dereference it. So storing the address in a global "place" to be read later by an "owning" thread is meaningful. However, some languages may restrict what they allow. So passing as a kernel argument in CUDA appears to not be allowed even though it is meaningful provided the above restricts are met. In OpenCL, there are special rules for passing LDS/Local to a kernel. In OpenCL you actually pass in a byte size, and the kernel dispatch allocates dynamic LDS automatically and passes the address of that to the created thread(s). CUDA has a different syntax for dynamic LDS/Local that is more like TLS.
>>>>
>>>> So how is TLS handled? It seems a TLS address cannot be compile/link time value since it is a runtime concept. So using relocations to initialize global memory program scope variables seems invalid. Initializing a pointer object that is allocated in LDS/SCRATCH to be the address of another LDS/SCRATCH allocated in the same "owning" thread is meaningful and could be implemented using relocations. However, I suspect the languages do not allow this. I am unclear if TLS allows this either.
>>>
>>> So you are saying that is always OK to assume no aliasing between a flat pointer which is a kernel argument and a pointer to LDS? OK, thanks!
>>
>> No I am not quite saying that as some languages are not clears. Having said that, some compiler implementations are assuming that for some languages. Basically the rule is language specific, so AA would need to ask the language if it is permissible to assume that or not. Also bear in mind the OpenCL case for LDS where the kernel argument is not really being passed in from externally, but created independently for each thread/group-of-threads.
>>
>> Generic pointers are another issue. They are pointers that may point to multiple address spaces. But the rules of dereferncing when they reference the non-global address space are the same. There can be rules that allow a generic pointers to be known to only point to one address space, in which case they can be treated the same as if they were a pointer to that address space. At the hardware level, FLAT instructions can be used to implement language generic pointers. But FLAT instructions can also be used when the address space is fixed, in which case the semantics are the same as the single address space case.
>>
>> Unlike OpenCL, the CUDA language does not have the address space of pointers as part of the type system. But it still allows allocation of objects to specific address spaces. For CUDA all addressing is conceptually generic, but the allocation address space can be propagated to know the fixed address space of the FLAT operations.
>
> To me deciding point here was that LDS is not actually allocated on host, but instead requested to be allocated at dispatch. If so then host cannot get an actual pointer to it and thus cannot convert it to a generic pointer and pass to a kernel.

The LDS/SCRATCH is actually allocated on individual wave/group-of-wave creation which is even smaller granularity than dispatch. The language defines that even if a thread has a valid LDS/SCRATCH address for some other wave, it is UB to access it. So the host (or some other wave) can get a pointer to LDS/SCRATCH, and can pass it to another wave, but only the wave that "owns" the allocation can access it. What the language may say is that passing a generic into a kernel is not allowed if it points to LDS. I do not believe CUDA explicitly states this, but some compilers appear to implement this.

Also note that an LDS pointer passed into a kernel dispatch A cannot be a legal LDS for that dispatch since until the waves of that dispatch are created, they have no LDS. So if a generic pointer is passed in as a kernel argument then best it can be is the LDS for some other already created wave of another dispatch B. The waves of dispatch A cannot access that LDS pointer as then do not "own" it. The best they can do is pass it to the "owning" waves of dispatch B. That cannot be done via kernel arguments as the waves are already executing. So even if the language allows kernel arguments to be generic pointers pointing to LDS, it is safe for single-thread-AA to assume they cannot alias this waves LDS since such pointers are UB to be accessed by this wave.

> Theoretically one can forge a generic pointer which will point to a specific LDS location after it is allocated, but I believe taking a pointer to an unallocated memory is a UB by any language standards.

The language models state that addresses must reference allocated objects (or one past the end) that are created, and it is UB to access any pointer that does not reference an allocated object. Forging or type punning (except for char in C) is UB.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89525/new/

https://reviews.llvm.org/D89525