[PATCH] D89525: [amdgpu] Enhance AMDGPU AA.

Sat Oct 17 13:09:23 PDT 2020

rampitec added a comment.

In D89525#2336479 <https://reviews.llvm.org/D89525#2336479>, @hliao wrote:

> In D89525#2336008 <https://reviews.llvm.org/D89525#2336008>, @rampitec wrote:
>
>> In D89525#2336002 <https://reviews.llvm.org/D89525#2336002>, @yaxunl wrote:
>>
>>> In D89525#2335991 <https://reviews.llvm.org/D89525#2335991>, @rampitec wrote:
>>>
>>>> In D89525#2335989 <https://reviews.llvm.org/D89525#2335989>, @yaxunl wrote:
>>>>
>>>>> In D89525#2335972 <https://reviews.llvm.org/D89525#2335972>, @rampitec wrote:
>>>>>
>>>>>> In D89525#2335949 <https://reviews.llvm.org/D89525#2335949>, @yaxunl wrote:
>>>>>>
>>>>>>> I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.
>>>>>>>
>>>>>>> However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:
>>>>>>>
>>>>>>>   __shared__ int a;
>>>>>>>   
>>>>>>>   __constant__ int *b = &a;
>>>>>>>
>>>>>>> Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee
>>>>>>>
>>>>>>> I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice
>>>>>>
>>>>>> But you are not checking for a constant pointer here!
>>>>>
>>>>> In HIP `__constant__` is a variable attribute, not the address space of the pointee. `__constant__ int *` means a pointer itself in constant address space and pointing to generic/flat address space.
>>>>
>>>> Where do you check for this specifically in this block:
>>>>
>>>>   } else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
>>>>      const Function *F = Arg->getParent();
>>>>      switch (F->getCallingConv()) {
>>>>      case CallingConv::AMDGPU_KERNEL:
>>>>        // In the kernel function, kernel arguments won't alias to (local)
>>>>        // variables in shared or private address space.
>>>>        return NoAlias;
>>>
>>> I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.
>>
>> But not in OpenCL.
>
> For OpenCL, since it won't allow generic pointer as kernel function arguments, there never be such a case, a generic pointer argument.

OK, that makes sense. Now we only need to make sure we never compile anything but HIP and OpenCL <= 2.0. For example that we will not support fortran maybe? And never extend any of the languages to allow different address space arguments.

My point here is that it is language specific, but there is nothing language specific on the AA implementation. A possible solution is to define some attributes or metadata saying that a certain situation may not ever happen and then check it in AA. That property has to ve set by the language though as AA may not know the source semantics.

In D89525#2336864 <https://reviews.llvm.org/D89525#2336864>, @t-tye wrote:

> LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.
>
> Passing an LDS or SCRATCH address between threads is meaningful provided only the thread(s) that "own" the address dereference it. So storing the address in a global "place" to be read later by an "owning" thread is meaningful. However, some languages may restrict what they allow. So passing as a kernel argument in CUDA appears to not be allowed even though it is meaningful provided the above restricts are met. In OpenCL, there are special rules for passing LDS/Local to a kernel. In OpenCL you actually pass in a byte size, and the kernel dispatch allocates dynamic LDS automatically and passes the address of that to the created thread(s). CUDA has a different syntax for dynamic LDS/Local that is more like TLS.
>
> So how is TLS handled? It seems a TLS address cannot be compile/link time value since it is a runtime concept. So using relocations to initialize global memory program scope variables seems invalid. Initializing a pointer object that is allocated in LDS/SCRATCH to be the address of another LDS/SCRATCH allocated in the same "owning" thread is meaningful and could be implemented using relocations. However, I suspect the languages do not allow this. I am unclear if TLS allows this either.

So you are saying that is always OK to assume no aliasing between a flat pointer and a kernel argument which is pointer to LDS? OK, thanks!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89525/new/

https://reviews.llvm.org/D89525