[Mlir-commits] [mlir] [mlir] Let GPU ID bounds work on any FunctionOpInterfaces (PR #95166)

Tue Jun 11 16:21:18 PDT 2024

================
@@ -73,12 +85,16 @@ static std::optional<uint64_t> getKnownLaunchDim(Op op, LaunchDims type) {
       return value.getZExtValue();
   }
 
-  if (auto func = op->template getParentOfType<GPUFuncOp>()) {
+  if (auto func = op->template getParentOfType<FunctionOpInterface>()) {
     switch (type) {
     case LaunchDims::Block:
-      return llvm::transformOptional(func.getKnownBlockSize(dim), zext);
+      return llvm::transformOptional(
+          getKnownLaunchAttr(func, GPUFuncOp::getKnownBlockSizeAttrName(), dim),
+          zext);
     case LaunchDims::Grid:
-      return llvm::transformOptional(func.getKnownGridSize(dim), zext);
+      return llvm::transformOptional(
+          getKnownLaunchAttr(func, GPUFuncOp::getKnownGridSizeAttrName(), dim),
+          zext);
----------------
krzysz00 wrote:

Re `getTargetInfo`, my claim is that a user of the GPU dialect should not be (and, by current practice, is not) required to use a `gpu.module` as the container for their GPU compilations. They should be able to use a `their_own_custom.module` that's annotated with "this is targeting a GPU". (This gets even more relevant when we start merging the target attribute RFCs - I should be able to use a `builtin.module attributes {target = #gpu.rocdl_target<...>]` just as much as a `gpu.module`.)

The problem with forcing a gpu.module, especially since it requires its immediate parent to have `gpu.container_module` on it, is that it breaks a general multi-target compilation scheme. GPU modules would have to have an extra level of nesting when there isn't a corresponding, say, `x86.module`.

(Specifically re the attributes we're discussing, I think that `gpu.known_block_size` is a general piece of infrastructure that can be used in flows other than the gpu.module/gpu.func one. The contract with those is that "the function that contains the block_id/thread_id/... operations should have these attributes set if they're to be used in optimization or lowering". That can be any kind of function. Perhaps `gpu.func` could have special inherent attributes to make it harder to lose the hint, but ... consider that you can lower `gpu.func` to `llvm.func` or `spirv`func` in a separate sequence of conversions from the lowering of `gpu.thread_id x` ... and just copy the `gpu.known_block_sizes` attribute over. Heck, the existing gpu-to-{nvvm,rocdl} passes do this and then delete the attribute at the end)

In summary, insisting on gpu.module for GPU functions breaks other possible abstractions for keeping the code for different devices separate.

https://github.com/llvm/llvm-project/pull/95166