[Mlir-commits] [mlir] [mlir] Let GPU ID bounds work on any FunctionOpInterfaces (PR #95166)

Tue Jun 11 15:05:06 PDT 2024

================
@@ -73,12 +85,16 @@ static std::optional<uint64_t> getKnownLaunchDim(Op op, LaunchDims type) {
       return value.getZExtValue();
   }
 
-  if (auto func = op->template getParentOfType<GPUFuncOp>()) {
+  if (auto func = op->template getParentOfType<FunctionOpInterface>()) {
     switch (type) {
     case LaunchDims::Block:
-      return llvm::transformOptional(func.getKnownBlockSize(dim), zext);
+      return llvm::transformOptional(
+          getKnownLaunchAttr(func, GPUFuncOp::getKnownBlockSizeAttrName(), dim),
+          zext);
     case LaunchDims::Grid:
-      return llvm::transformOptional(func.getKnownGridSize(dim), zext);
+      return llvm::transformOptional(
+          getKnownLaunchAttr(func, GPUFuncOp::getKnownGridSizeAttrName(), dim),
+          zext);
----------------
krzysz00 wrote:

One other note is that the GPU dialect already has two official, somewhat independent purposes:
1. Handling compilation of GPU device code (`gpu.module`, `gpu.func`, etc.)
2. Handling execution of GPU code (`gpu.launch_func` etc.)

and I'm pretty sure you're allowed to just use one half and replace the other half with your own custom arrangement for that same task and have everything basically work (if slightly less seamlessly).

My claim is that operations like `gpu.thread_id` and `gpu.shuffle` are in a third class: abstractions around platform-specific GPU intrinsics. That is, these are operations meant to abstract across what are almost inevitably platform-specific intrinsics, allowing people to write code generation schemes that can target "a GPU" (though somewhere in their context they're likely to know which).

The abstracting segment of the GPU dialect is like, say, `math.sin` - there isn't and, IMO, shouldn't, be any special workflow needed to use it.

--- 

And re how GPU compilation happens, consider the following hypothetical compiler
```mlir
something.multi_target_function [..., @f] targets [
  builtin.module {something.target_info = <... cpu ... >} {
    func.func @f(...) { ... }
  },
  builtin.module {sometihng.target_info = <... gpu ...>} {
    func.func @f(...) { ... }
  }
]
```

If we forced all uses of basic GPU intrinsics into a `gpu.module`, then you'd have a hard time defining
```
std::optional<something::TargetInfoAttr> getTargetInfo(mlir::FunctionOpInterface func);
```
in a way that doesn't know about the special-case that is a GPU module. And attaching that data to the `gpu.module` instead means that anyone walking over `targets` has to watch out for `gpu.container_module` and recurse into the builtin.

I don't think that this extra level of nesting is justified to expose the GPU dialect's wrapper operations. Downstream developers should be able to use their own per-target isolation/separation scheme without losing access to the useful lowerings  that live in the GPU dialect.

https://github.com/llvm/llvm-project/pull/95166