[llvm] [Offload] Move RPC server handling to a dedicated thread (PR #112988)

Sun Oct 20 12:21:29 PDT 2024

jhuber6 wrote:

> IIUC we check if `cuStreamQuery` returns `CUDA_ERROR_NOT_READY`. If yes, we can still handle to RPC related logics. `CUDA_SUCCESS` means it's done so RPC related stuff can shut down. Others mean error. I don't see anything wrong/unsound with this logic. The query is non-blocking. Also, since we basically use one stream for each target region or target task, alternatively checking stream and RPC logic here is actually a nice solution.

The CUDA Driver API basically says that you cannot rely on the call to `cuLaunchkernel` not blocking forever until the kernel finished. Hence if the kernel itself is blocked on that thread then checking the RPC server it will deadlock.

With the current logic, if you do this it will deadlock forever.
```c
#include <stdio.h>

int main() {
#pragma omp target
    puts("Hello");
}
```
```console
$ clang input.c -fopenmp --offload-arch=sm_89
$ env CUDA_LAUNCH_BLOCKING=1 ./a.out 
...
```
With this patch it works fine.

https://github.com/llvm/llvm-project/pull/112988