[llvm] [Offload] Move RPC server handling to a dedicated thread (PR #112988)

Sun Oct 20 19:42:00 PDT 2024

================
@@ -1294,10 +1275,26 @@ Error CUDAKernelTy::launchImpl(GenericDeviceTy &GenericDevice,
                     reinterpret_cast<void *>(&LaunchParams.Size),
                     CU_LAUNCH_PARAM_END};
 
+  // If we are running an RPC server we want to wake up the server thread
+  // whenever there is a kernel running and let it sleep otherwise.
+  if (GenericDevice.getRPCServer())
+    GenericDevice.Plugin.getRPCServer().Thread->notify();
+
   CUresult Res = cuLaunchKernel(Func, NumBlocks, /*gridDimY=*/1,
                                 /*gridDimZ=*/1, NumThreads,
                                 /*blockDimY=*/1, /*blockDimZ=*/1,
                                 MaxDynCGroupMem, Stream, nullptr, Config);
+
+  // Register a callback to indicate when the kernel is complete.
+  if (GenericDevice.getRPCServer())
+    cuStreamAddCallback(
----------------
shiltian wrote:

I'm personally not a fan with this function, because it is being deprecated. I generally don't think it is a good idea to add new code relying on features that become deprecated soon. Not sure what CUDA's next-gen API for this function.

https://github.com/llvm/llvm-project/pull/112988