[libc-commits] [PATCH] D148943: [libc] Enable multiple threads to use RPC on the GPU

Fri Apr 21 11:09:34 PDT 2023

jhuber6 marked 2 inline comments as done.
jhuber6 added inline comments.

================
Comment at: libc/src/__support/RPC/rpc.h:55
+#elif defined(LIBC_TARGET_ARCH_IS_NVPTX)
+  Buffer slot[32];
+#else
----------------
jdoerfert wrote:
> We should have a single generic warp size macro
> 
True, we can put this in the GPU utils.

================
Comment at: libc/src/__support/RPC/rpc.h:101
+  LIBC_INLINE void reset(uint32_t size, void *mtx, void *in, void *out,
+                         void *data) {
+    lane_size = size;
----------------
jdoerfert wrote:
> why is data not typed here?
Just easier to manage since this needs to be copied from the host to the GPU and it's easier to model arguments to the kernel as void pointers.

================
Comment at: libc/src/__support/RPC/rpc.h:188
   // Apply the \p fill function to initialize the buffer and release the memory.
-  fill(&process.buffer[index]);
+  for (uint32_t idx = 0; idx < process.lane_size; idx += gpu::get_lane_size())
+    if (is_process_gpu() || process.buffer[index].mask & 1ul << idx)
----------------
jdoerfert wrote:
> shouldn't 0 be something like gpu::get_id_in_lane()? Also below.
This is basically just to let the CPU pretend like it has all the "threads" the GPU has. So given the GPU's lane size of 32. It'll execute this loop once. The CPU however will execute it 32 times because we set its lane size to 1. The `idx` is just an offset as you march in multiples of the lane size.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148943/new/

https://reviews.llvm.org/D148943