[PATCH] D68578: [HIP] Fix device stub name

Tue Oct 8 16:36:59 PDT 2019

tra added a comment.

In D68578#1698864 <https://reviews.llvm.org/D68578#1698864>, @t-tye wrote:

> From a source language point of view, the device function comprises the code that is launched as a grid. We need this fact to be present in the symbols used. Only the device function should have a symbol name matching the mangled name of the device function.

What do you have in mind when you use 'symbol name' here? Is that a symbol as seen by linker? If that's the case, do host and device share this name space on AMD GPUs? In case of CUDA, linker symbols are per-target (i.e. host and each GPU have their own spaces), so they never clash, but the kernel names must have identical mangled name on host and all devices, so the host can refer to the device-side kernel when it needs to launch it.

> It the device function has both a host and device implementation then both can have the source language function name for the symbol since both actually implement the device function. If the user asks to set a breakpoint in the device function then the debugger would set in both implementations so the user is notified when the source program executes the device function, regardless of which implementation is invoked. This is similar to the debugger setting a breakpoint in a function that is inlined into multiple places: the debugger sets breeakpoints in all the inlined places so the user can tstill think of the program debugging in terms of the source language semantics.

OK. This sounds like `__host__`/`__device__` function overloads and what you're saying does make sense for that.

> In contrast, the stub is effectively part of the implementation of actually launching the device function. It should have a distinct name.

I'm not sure how the requirement of distinct name follows from the fact that the stub is the host-side part of the device-side kernel? To me it looks like an argument for them to have the same name so it's clear that they are both part of the same function as written in the source.

The don't have to be different. CUDA (and HIP) does not allow overloading of kernels, so the stub and the kernel can have identical names as in the example of `__host__` and `__device__` overloads you've described above, only now it's `__host__` stub + `__global__` kernel itself, instead of two user-implemented functions. Debugger, of course, will need to know about that to pick the stub or kernel as the breakpoint location, but that appears doable.

> The debugger can still be used to set a breakpoint in it, or to step into it. But that should be done in terms of the stub name. If the debugger wants to support source language specific intelligence it can provide a helper library that understands the stub names. This helper library (similar to the thread helper library) can be used by the debugger to present a cleaner language view to the user. In fact OpenMP has also done this and provides a helper library called OMPD that can be used by tools such as a debugger to hide OpenMP trampoline functions etc.

Do I understand it correctly that giving the stub distinct name would effectively get it out of the way when a breakpoint is set on the kernel? I.e. it's essentially a work around the fact that debugger may not have convenient way to specify "set breakpoint on this name in device code only". Perhaps it would make sense to prove this ability as it sounds quite useful. I.e I may want to set breakpoint on all inlined host/device functions, but only on device side. That would be handy.

What happens if the stub and the kernel do have identical names?
My understanding, based on your comments above is that debugger does know about host and device 'spaces' and that it can find pointers to both host and device functions and set appropriate breakpoints for both. In this case it would normally attempt to set breakpoint on both the stub and the kernel as it would in case of `__host__`/`__device__` overloads you've described above. In case of stub/kernel, we would want the breakpoint set only on the kernel itself. Given that debugger does have ability to tell host and device functions/symbols apart, the difficulty is really in being able to tell a real host function from the stub, so we can skip it.

Is that indeed what we want/need? Is there something else?

Does debugger know that device-side function is a kernel? In case of CUDA, the kernels are distinct from regular device-side functions. I don't know whether that's the case for AMDGPU.
If debugger can tell that particular device function is a kernel, that can be used to infer that the matching host-side symbol is a stub and skip setting a breakpoint on it.
If that does not work, debugger presumably has access to the mangled symbols for the potential breakpoint locations. The stub currently has distinct `.stub` suffix. This can also be used to tell it apart from a regular `__host__` function.

I do not see how changing the source-level name for the stub is going to change things in principle. It's just yet another way to disambiguate a real `__host__` function from a `host` stub we generate for the kernel.
Is there anything else about the stubs that requires them to have a name different from the kernel?

> I am a little unclear what this patch is doing as it is mentioned that the mangled name has a _stub in it.

Currently the mangled name has .stub suffix which is discarded during unmangling, so unmangled names for the stub and the kernel end up being identical. I'm trying to figure out why is it a problem to be fixed in the compiler.

> My understanding is that the intention was to create a distinct unmangled name for the stub, and then mangle it so that the resulting symbol was a legal mangled name. It sounded like this was the preferred approach, and makes sense to me based on my current understanding. Am I understanding this correctly?

This patch proposes changing the source-level name for the stub. Unfortunately the way it attempt to implement it is by doing the renaming during mangling phase itself. This appears to be the wrong place to change source-level name. 
Before figuring out what would be the right thing to do here, I want to understand why we're doing it. I appreciate your description of what drives this requirement. I think I have petter idea of it now, but I still have some questions. Please bear with me.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68578/new/

https://reviews.llvm.org/D68578