[llvm] [Offload] Move RPC server handling to a dedicated thread (PR #112988)

Sun Oct 20 11:34:55 PDT 2024

shiltian wrote:

> > I thought we just take advantage of the fact that when we finish launching the kernel, the thread (no matter whether it is helper thread or regular thread), it is waiting there anyway?
> 
> AFAIK, doing this is always unsound.
> 
> ```
> while (cuStreamQuery(Stream) == CU_STREAM_BLOCKED);
> ```
> 
> With the current implementation, you can use `CUDA_LAUNCH_BLOCKING=1` and it will deadlock. Also this is indicated by the fact that I needed to put other random places to make it sleep. If it's a truly async launch then there will be no helper thread, since presumably we want this API to be usable by non-OpenMP users someday.

IIUC we check if `cuStreamQuery` returns `CUDA_ERROR_NOT_READY`. If yes, we can still handle to RPC related logics. `CUDA_SUCCESS` means it's done so RPC related stuff can shut down. Others mean error. I don't see anything wrong/unsound with this logic. The query is non-blocking. Also, since we basically use one stream for each target region or target task, alternatively checking stream and RPC logic here is actually a nice solution.

https://github.com/llvm/llvm-project/pull/112988