[Mlir-commits] [mlir] [MLIR][OpenMP] Improve Generic-SPMD kernel detection (PR #137307)

Wed Apr 30 07:14:07 PDT 2025

skatrak wrote:

> IIUC, Generic-SPMD is when the openmp-opt pass converts a Generic kernel to an SPMD kernel. Because openmp-opt only sees the GPU kernel, it cannot modify the host-side kernel invocation, so you end up with a mix of both. One consequence is that at kernel invocation, it does not pass the number of threads because it is not known.

Yes, that's how it works for clang. This does not work for flang because we use different DeviceRTL functions for `distribute` than clang does, and that optimization looks at certain specific function calls. I did try at one point adding support for this, but I wasn't able to (seemingly related to the fact that the DeviceRTL functions we use in flang take a function pointer to the `distribute` body instead of updating the loop bounds passed and having the `distribute` body inline).

> With this background, I don't see why a frontend would ever use Generic-SPMD mode, since it has control over kernel code and host-side invocation.
> 
> Independent of that, Generic -- as the name implies -- is supposed to always work. If it does not, it is a bug.

That's the thing I also struggle to understand. There must be a bug in Generic mode if it doesn't always produce correct results, performance considerations apart. But it appears that these tests only work if tagged as Generic-SPMD, not Generic or SPMD. Considering the OpenMPOpt pass can't currently make the promotion from Generic on its own, we are temporarily handling it in codegen. There's a TODO comment documenting this.

https://github.com/llvm/llvm-project/pull/137307