[Openmp-commits] [PATCH] D154312: [Libomptarget] Begin implementing support for RPC services

Mon Jul 3 12:40:12 PDT 2023

jdoerfert added inline comments.

================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:533
+      return Plugin::success();
+    }
+
----------------
jhuber6 wrote:
> jdoerfert wrote:
> > This I don't get. We wait for 8k units, then we run the server once, then we wait again if the kernel is not done. Why isn't this part of the loop below with a timeout that adjusts based on the RPC needs?
> I wasn't sure how to mesh it with the existing logic, since we will presumably have streams that didn't come from kernel launches and we have some active timeout  exception here where I guess we query at a higher rate compared to the blocking one? I just worked around it but I could try to integrate it. AMDGPU was much more difficult here than CUDA because of the indirection.
I'm not convinced the thread starting the thing should run the server, but there is merit to having this option. However, the timeout still seems wrong to me. The fact that we have two locations we wait on (finish and RPC) is also not great.

================
Comment at: openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:558
+  DP("Running an RPC server on device %d\n", getDeviceId());
+  return Plugin::success();
+}
----------------
jhuber6 wrote:
> jdoerfert wrote:
> > Can you hoist  Plugin.getRPCServer(), to make it more readable.
> > 
> > I'm not a fan of RPCDevice as a name, we have too many "devices"
> Sure, I'll try to think of a better name.
RPC adapter? RCP service, ...

================
Comment at: openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h:258
   virtual Error launchImpl(GenericDeviceTy &GenericDevice, uint32_t NumThreads,
-                           uint64_t NumBlocks,
-                           KernelArgsTy &KernelArgs, void *Args,
+                           uint64_t NumBlocks, KernelArgsTy &KernelArgs,
+                           void *Args,
----------------
jhuber6 wrote:
> tianshilei1992 wrote:
> > looks like irrelevant changes
> People somehow commit changes without clang-formatting and then it ends up here because my editor will auto format. Somewhat annoying because I've clang formatted all of these files at least twice by now. I'll trim it.
Set your editor to only format the lines you changed. Or clang format and precommit.

================
Comment at: openmp/libomptarget/plugins-nextgen/common/PluginInterface/RPC.cpp:69-104
+  // Register a custom opcode handler to perform plugin specific allocation.
+  // FIXME: We need to make sure this uses asynchronous allocations on CUDA.
+  auto MallocHandler = [](rpc_port_t Port, void *Data) {
+    rpc_recv_and_send(
+        Port,
+        [](rpc_buffer_t *Buffer, void *Data) {
+          plugin::GenericDeviceTy &Device =
----------------
jhuber6 wrote:
> jdoerfert wrote:
> > Can the GenericDevice register these handlers?
> > A vector of them somewhere, or just let it call a register callback.
> > I would not make these special and define them here.
> > Move all into the GenericDevice and it will call an Impl method for the subclasses to register more.
> So this will probably only be necessary for things like malloc / memcpy. Figured those are already provided by the generic device so we should be able to hide that complexity in here right next to the implementation. Didn't feel like it warranted abstraction.
But this takes away the extension point, and splits the callbacks. We might want to use different mallocs per kernel. And having all callbacks in "one place" seems like a good idea as well.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154312/new/

https://reviews.llvm.org/D154312