[libc-commits] [PATCH] D147054: [libc] Begin implementing a library for the RPC server

Tue Jun 13 12:33:53 PDT 2023

jhuber6 added inline comments.

================
Comment at: libc/utils/gpu/loader/nvptx/Loader.cpp:207
+      },
+      &memory_stream);
+  rpc_register_callback(
----------------
JonChesterfield wrote:
> Does this work? It looks like the same stream running the kernel is being used to provide malloc/free, and I'd expect that to deadlock
There's a test for this that's been running on https://lab.llvm.org/buildbot/#/builders/46 for a few weeks now and it hasn't deadlocked as far as I can tell. It's a completely separate stream called `memory_stream` that's just created here. The one running the kernel is just called `stream`. This requires CUDA 11.2 IIRC.

================
Comment at: libc/utils/gpu/server/Server.cpp:35
+      : num_devices(num_devices),
+        devices(std::unique_ptr<Device[]>(new Device[num_devices])),
+        reference_count(0u) {}
----------------
JonChesterfield wrote:
> Could we go with a vector of Device instead of the new+array construct?
It's a static size so a constant sized array should be more correct.

================
Comment at: libc/utils/gpu/server/Server.cpp:44
+
+static State *state;
+
----------------
JonChesterfield wrote:
> Why is this a heap allocated thing, as opposed to `static State state;` ?
It's just easier to check if it's been initialized because the pointer is nullable. We coiuld probably make it a static thing and have a flag instead if you'd like.

================
Comment at: libc/utils/gpu/server/Server.cpp:51
+
+  if (state->reference_count == std::numeric_limits<uint32_t>::max())
+    return RPC_STATUS_ERROR;
----------------
JonChesterfield wrote:
> Could make the counter 64 bit and delete the test against max as a counter >= address space size can't overflow
> 
> In general the DIY reference counting is a bit odd - is there a reason this isn't a shared_ptr?
Would a shared pointer give us the same semantics? We would be allocating it multiple times and not copying it.

================
Comment at: libc/utils/gpu/server/Server.cpp:107
+    case rpc::Opcode::WRITE_TO_STDOUT: {
+      uint64_t sizes[rpc::MAX_LANE_SIZE] = {0};
+      void *strs[rpc::MAX_LANE_SIZE] = {nullptr};
----------------
JonChesterfield wrote:
> I'm still hopeful that we'll come up with a better idea than rpc::MAX_LANE_SIZE
We could use a vector and push back into it instead, or preallocate according to the size above, but this was the easiest solution.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147054/new/

https://reviews.llvm.org/D147054