[PATCH] D86376: [HIP] Improve kernel launching latency

Artem Belevich via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Aug 24 13:40:22 PDT 2020


tra added a comment.

I'm OK with how the patch is implemented.
I'm still on the fence regarding whether it should be implemented.

In D86376#2234458 <https://reviews.llvm.org/D86376#2234458>, @yaxunl wrote:

> `__hipPushConfiguration/__hipPopConfiguration' and kernel stub can cause 40 ns overhead, whereas we have requests to squeeze any overhead in kernel launching latency.

That's about the same as 1 cache miss. I'm willing to bet that it will be lost in the noise. Are there any real world benchmarks where it makes a difference?
Are those requests driven by a specific use case? Not all requests (even well intentioned ones) are worth implementing. 
This patch appears to be somewhere in the gray area to me. My prior experience with CUDA suggests that it will make little to no difference. On the other hand, AMD GPUs may be different enough to prove me wrong. Without specific evidence, I still can't tell what's the case here.

>> One side effect of this patch is that there will be no convenient way to set host-side breakpoint on kernel launch.
>> Another will be that examining call stack will become somewhat confusing as the arguments passed to the kernel as written in the source code will not match those observed in the stack trace. I guess preserving the appearance of normal function calls was the reason for the split  config setup/kernel launch in CUDA.  I'd say it's still useful to have as CUDA-specific debugger is not always available and one must use regular gdb on CUDA apps now and then.
>
> Eliminating kernel stub does not affect debugability negatively. At least this is true for HIP debugger. Actually our debugger team intentionally requests to eliminate any debug information for the kernel stub so that it will not confuse the debugger with the real kernel. This is because the kernel stub is an artificial function for launching the kernel, not the real kernel which is in device binary. For HIP debugger (rocmgdb), when the user set break point on a kernel, it will break on the real kernel in device binary, and the call stack are displayed correctly. The arguments to the real kernel are not lost, since the real kernel is a real function in device binary.

You appear to assume debuggability with HIP-aware debugger. That part I'm not particularly concerned about as I assume that it will be tested on AMD's side.
I was mostly concerned about debuggability with the ordinary gdb. Imagine someone having to debug a TF app they've got somewhere. The end user may not even have HIP tools installed. It would be useful to be able to debug until the point where control is passed to the GPU. The patch will likely have a minor, but still negative impact on that.

I guess one should still be able to set a breakpoint using the `file:line number`. If you could verify that it still works with gdb, that would be a reasonable workaround. I think we still need to have some way to set a breakpoint on the kernel launch site (I think it should still work) and on the kernel entry.

So, we have a trade-off of minor performance gain vs a minor debuggability regression. I don't have strong opinions which is the best way to go. By default, with no demonstrated benefit, I'd err on the side of not changing things.

> Another motivation for eliminating kernel stub is to be able to emit a symbol with the same mangled name as a kernel as a global variable instead of a function. Since we need such symbols to be able to launch kernels with mangled name in a C++ program. If we use kernel stub as the symbol, we cannot use the original mangled kernel name since our debugger does not allow that.

Is eliminating the host-side stub the goal, or just a coincidental side-effect? I.e. if it's something you *need* to do, then the discussion about minor performance gain becomes rather irrelevant and we should weigh 'improvements in HIP debugging' vs 'regression in host-only debugging' instead.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86376/new/

https://reviews.llvm.org/D86376



More information about the cfe-commits mailing list