[libc-commits] [libc] [libc] Rework the RPC interface to accept runtime wave sizes (PR #80914)

Wed Feb 7 16:21:51 PST 2024

jhuber6 wrote:

> I think a fair amount of the complexity is from trying to treat the wavesize as a compile time constant in some places and as a variable in others. I don't think that's worth the hassle - if we go with it as a "runtime" value everywhere, it'll still constant propagate on the GPU, and it might do so on the host as well if function specialisation decides it's worth doing.
>

The main issue is that the Server needs to be instantiated with the given size. For example, if I have an Nvidia GPU and AMD machine I will end up with a `Server<32>` and `Server<64>` respectively, the lane size is required to slice into the buffer array and get the buffer bytes. The GPU can just call `get_lane_size()` and it returns the value, but the Server needs it to be passed in externally, this is currently done by the template. We cannot forward a runtime function to the Server interface that would do the right thing given the multitudes of targets. Right now the logic is more like, the Client gets to default to whatever it wants and the Server is responsible for making them match.

> Or we could go with it's a compile time constant everywhere and make the function specialization explicit. The host side would pick up a switch to convert the runtime value to a compile time one. Messy. In general the rpc implementation in trunk tends to trust the compiler to constant propagate (unlike the prototype which burned everything at compile time) so leaving calls to `gpu::wavesize()` scattered around is probably the better way to go.

This was my first inclination, but it didn't really sit well when I tried it out. I could give it another shot, but I think it just requires way too much branching in the interface.

https://github.com/llvm/llvm-project/pull/80914