[libc-commits] [libc] [libc] Export the RPC interface from `libc` (PR #71432)

Joseph Huber via libc-commits libc-commits at lists.llvm.org
Mon Nov 6 15:08:56 PST 2023


================
@@ -510,8 +512,10 @@ LIBC_INLINE void Port<T, S>::recv_n(void **dst, uint64_t *size, A &&alloc) {
 /// only open a port if we find an index that is in a valid sending state. That
 /// is, there are send operations pending that haven't been serviced on this
 /// port. Each port instance uses an associated \p opcode to tell the server
-/// what to do.
-template <uint16_t opcode> LIBC_INLINE Client::Port Client::open() {
+/// what to do. It is very important that the \p opcode value is identical
+/// across each lane. Use the template version of this unless absolutely
----------------
jhuber6 wrote:

So this is some GPU specific magic. Basically GPUs use so-called "SIMT" execution where we have execution units that are 32 or 64 lanes wide that all execute at once. These are called "warps" or "wavefronts" depending on the platform. This mean that some "threads" are effectively held hostage by other threads if they are in the same warp. So, the unit of least parallelism is actually a warp / wavefront here, which is why the RPC interface handles a whole warp / wavefront at a time. So, given a size of 32 and some code like,
```
int value = thread_id % 2 ? 0 : 1;
some_fn(value);
```
Will result in a warp with alternating values, but the whole warp / wavefront is still "convergent" that means that all "threads" are still active. If we apply this logic to opening a port, we could have the situation where a single warp tries to handle two separate opcodes, which is a break of the asusmption that each RPC call handles a single warp / wavefront. We can avoid this by forcing it to be a compile time constant via the template.

https://github.com/llvm/llvm-project/pull/71432


More information about the libc-commits mailing list