[PATCH] D68578: [HIP] Fix device stub name

Wed Nov 6 14:56:45 PST 2019

tra added a comment.

Apologies for the delay with my response.

In D68578#1700819 <https://reviews.llvm.org/D68578#1700819>, @t-tye wrote:

> In D68578#1700652 <https://reviews.llvm.org/D68578#1700652>, @tra wrote:
>
> > In D68578#1698864 <https://reviews.llvm.org/D68578#1698864>, @t-tye wrote:
> >
> > > From a source language point of view, the device function comprises the code that is launched as a grid. We need this fact to be present in the symbols used. Only the device function should have a symbol name matching the mangled name of the device function.
> >
> >
> > What do you have in mind when you use 'symbol name' here? Is that a symbol as seen by linker? If that's the case, do host and device share this name space on AMD GPUs? In case of CUDA, linker symbols are per-target (i.e. host and each GPU have their own spaces), so they never clash, but the kernel names must have identical mangled name on host and all devices, so the host can refer to the device-side kernel when it needs to launch it.
>
>
> We want to support a heterogeneous gdb debugger for a single source programming language. We would like to follow the same conventions used by compilers that implement other languages supported by gdb. The debugger can use symbols to find functions. It supports unmangling them and using the unmangled name to indicate the source language function it corresponds to. We would like this to remain true. The stub is not the kernel function, it is a helper function that will launch the kernel. In many ways it is acting like other trampolines. Therefore, it should be named as a internal helper function. The debugger can chose what it wants to do with it, but it does not want to be confused into thinking it actually IS the kernel function. If the user sets a breakpoint in the code of the kernel function then that breakpoint should be hit by every instance of the kernel that is created by the dispatch. It should not be hit by the code that is initiatig the dispatch. If that is what the user wanted they would set a breakpoint at the statement that performs the dispatch launch.

> Whether the kernel is present in the CPU or GPU code is s separate concept. If it is present in both, then both would have the same symbol as they are both implementing the kernel. The debugger would set a breakpoint in both as from a language execution model poit of view if either piece of code executes it corresponds to the same source language kernel.

Thank you for the details.

> 
> 
>> 
>> 
>>> It the device function has both a host and device implementation then both can have the source language function name for the symbol since both actually implement the device function. If the user asks to set a breakpoint in the device function then the debugger would set in both implementations so the user is notified when the source program executes the device function, regardless of which implementation is invoked. This is similar to the debugger setting a breakpoint in a function that is inlined into multiple places: the debugger sets breeakpoints in all the inlined places so the user can tstill think of the program debugging in terms of the source language semantics.
>> 
>> OK. This sounds like `__host__`/`__device__` function overloads and what you're saying does make sense for that.
> 
> Right. Well its not really and overload, not a request to have instances of the kernel avaiable for either the CPU or GPU to execute. They are the same function, not different overloads.

In case of cuda they may be overloads -- there may be two functions with identical signatures (modulo `__host__/__device__` attributes) or multiple functions with the same names but different signatures with different `__host__/__device__` attributes. It does not change things in principle. I'm just pointing out that CUDA (and thus HIP) as implemented in clang uses target attributes as another dimension in the space of functions with the same name.

>>> The debugger can still be used to set a breakpoint in it, or to step into it. But that should be done in terms of the stub name. If the debugger wants to support source language specific intelligence it can provide a helper library that understands the stub names. This helper library (similar to the thread helper library) can be used by the debugger to present a cleaner language view to the user. In fact OpenMP has also done this and provides a helper library called OMPD that can be used by tools such as a debugger to hide OpenMP trampoline functions etc.
>> 
>> Do I understand it correctly that giving the stub distinct name would effectively get it out of the way when a breakpoint is set on the kernel? I.e. it's essentially a work around the fact that debugger may not have convenient way to specify "set breakpoint on this name in device code only". Perhaps it would make sense to prove this ability as it sounds quite useful. I.e I may want to set breakpoint on all inlined host/device functions, but only on device side. That would be handy.
> 
> It is not really a work around. It is making the symbols reflect the reality of the source language program. The debugger can then simply trust that information and use it as gdb does for other languages.

It still seems to boil down to "the stub should not get in the way of debugger accessing the function itself", but I see your point and agree that it would be useful if the stub could be an entity separate from the function itself.

Now we need to figure out what would be the best way to implement it.

Clang uses the real function in AST to generate the IR for the stub and, because of that, the stub ends up using function's name.
Actually, the situation is a bit worse than that. Clang implicitly relies on `__host__` and `__device__` entities not being codegen'ed at the same time, so we don't have to care about name conflicts.
Your description above indicates that the assumption is somewhat optimistic and that's what really causes the issue here.

I think @yaxunl's suggestion that we may need different FuncDecl's would be a good way forward.
I suspect we may already have places where clang deals with compiler-generated functions, so we should have existing examples of how it could be done.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68578/new/

https://reviews.llvm.org/D68578