[llvm] [mlir] [mlir][gpu] Change GPU modules to globals (PR #135478)

Wed Apr 23 10:14:40 PDT 2025

fabianmcg wrote:

Ok, I'll provide an overview of the current mechanism, then some rationale and how I thought it could be used:

Currently, `convert-to-llvm` only legalizes the args of `gpu.launch_func` (ie. the args updated, but the ops remain). It's only during translation that `gpu.binary` and `gpu.launch_func` ops fully expand.

The translation process is handled by [`OffloadingLLVMTranslationAttrInterface`](https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td#L100) which is an inherent attr of `gpu.binary`:
```tablegen
      "::llvm::LogicalResult", "embedBinary",
      (ins "::mlir::Operation*":$binaryOp,
           "::llvm::IRBuilderBase&":$hostBuilder,
           "::mlir::LLVM::ModuleTranslation&":$hostModuleTranslation)
      "::llvm::LogicalResult", "launchKernel",
      (ins "::mlir::Operation*":$launchFunc, "::mlir::Operation*":$binaryOp,
           "::llvm::IRBuilderBase&":$hostBuilder,
           "::mlir::LLVM::ModuleTranslation&":$hostModuleTranslation)
```

The rationale was, that users could customize the process by either adding new attributes implementing the interface, or by registering a different external interface model to an existing attribute. Also, it would avoid the pitfalls of modeling the process after a specific runtime like the CUDA runtime, CUDA driver or Vulkan...

Now, the idea of using translation instead of lowerings is that LLVM is getting project offload. Also, at the time there was a lot of infra in LLVM to handle certain offloading bits like registering kernels. So, it was decided to not reinvent the wheel and that it would be better to use those when they became available. An example of this infra, is the LLVM code used by clang for registering binaries with the CUDA runtime: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp#L305-L545

Now, an example of how I envisioned customization is shown in this PR https://github.com/llvm/llvm-project/pull/78117. It adds support to load/launch/register kernels and binaries using the CUDA and HIP runtimes. Avoiding the issues trying to be solved by this PR.

https://github.com/llvm/llvm-project/pull/135478