[Mlir-commits] [mlir] [mlir]introduce UnrollScopeInterface and apply it to funcOp and gpu.launch Op. (PR #123904)

Mon Feb 3 18:22:26 PST 2025

linuxlonelyeagle wrote:

> Can you add a commit summary for the interface being introduced (with a couple of lines on the rationale)? You have it in the comment at [#123904 (comment)](https://github.com/llvm/llvm-project/pull/123904#issuecomment-2607676126), but the commit summary is empty.
> 
> I ran the example in the first comment - the output:
> 
> ```
> // bin/mlir-opt -affine-loop-unroll="unroll-full" -gpu-kernel-outlining gpu_launch.mlir
> #map = affine_map<(d0) -> (d0 + 1)>
> #map1 = affine_map<(d0) -> (d0 + 2)>
> #map2 = affine_map<(d0) -> (d0 + 3)>
> module attributes {gpu.container_module} {
>   func.func @gpu_launch_unroll() {
>     %c0 = arith.constant 0 : index
>     %c1 = arith.constant 1 : index
>     gpu.launch_func  @gpu_launch_unroll_kernel::@gpu_launch_unroll_kernel blocks in (%c1, %c1, %c1) threads in (%c1, %c1, %c1)  args(%c0 : index)
>     return
>   }
>   gpu.module @gpu_launch_unroll_kernel {
>     gpu.func @gpu_launch_unroll_kernel(%arg0: index) kernel attributes {known_block_size = array<i32: 1, 1, 1>, known_grid_size = array<i32: 1, 1, 1>} {
>       %block_id_x = gpu.block_id  x
>       %block_id_y = gpu.block_id  y
>       %block_id_z = gpu.block_id  z
>       %thread_id_x = gpu.thread_id  x
>       %thread_id_y = gpu.thread_id  y
>       %thread_id_z = gpu.thread_id  z
>       %grid_dim_x = gpu.grid_dim  x
>       %grid_dim_y = gpu.grid_dim  y
>       %grid_dim_z = gpu.grid_dim  z
>       %block_dim_x = gpu.block_dim  x
>       %block_dim_y = gpu.block_dim  y
>       %block_dim_z = gpu.block_dim  z
>       %cst = arith.constant dense<0.000000e+00> : vector<2x4x2x2xf16>
>       %0 = affine.for %arg1 = 0 to 2 iter_args(%arg2 = %cst) -> (vector<2x4x2x2xf16>) {
>         %cst_0 = arith.constant dense<0.000000e+00> : vector<2x2xf16>
>         %1 = vector.insert %cst_0, %arg2 [%arg1, %arg0] : vector<2x2xf16> into vector<2x4x2x2xf16>
>         %2 = affine.apply #map(%arg0)
>         %cst_1 = arith.constant dense<0.000000e+00> : vector<2x2xf16>
>         %3 = vector.insert %cst_1, %arg2 [%arg1, %2] : vector<2x2xf16> into vector<2x4x2x2xf16>
>         %4 = affine.apply #map1(%arg0)
>         %cst_2 = arith.constant dense<0.000000e+00> : vector<2x2xf16>
>         %5 = vector.insert %cst_2, %arg2 [%arg1, %4] : vector<2x2xf16> into vector<2x4x2x2xf16>
>         %6 = affine.apply #map2(%arg0)
>         %cst_3 = arith.constant dense<0.000000e+00> : vector<2x2xf16>
>         %7 = vector.insert %cst_3, %arg2 [%arg1, %6] : vector<2x2xf16> into vector<2x4x2x2xf16>
>         affine.yield %7 : vector<2x4x2x2xf16>
>       }
>       gpu.return
>     }
>   }
> }
> ```
> 
> Which redundant values are you referring to?
> 
> loop-unroll/full was actually meant to be a test pass - it doesn't have a concrete heuristic. (In fact, it was the first pass of MLIR!) One would expect the utilities it exposes, `loopUnrollByFactor/loopUnrollFull` etc., to be the main use cases for other passes/downstream passes. It might be too much to introduce an interface to expose a choice of IR placement for the pass. Also, "statically" adding such an interface to ops to control unrolling is too narrow. One would instead expect an API argument to control something like that if desired.
> 
> On a minor note, separately, it makes sense to make it a `FunctionOpInterface` pass.

In your example. `%c0` is "redundant",It should be created in gpu.launch.`-gpu-launch-sink-index-computations` should fix this.
Add the summary of the commit (which might be a bit redundant), it's been so long that I don't know what that commit did, it's not really that important, but mainly I want an interface that returns a region to which the generated SSA value will be restricted.（I don't speak English very well, I might have misunderstood you, sorry.
```
  %c0 = arith.constant 0 : index
```

https://github.com/llvm/llvm-project/pull/123904