[Mlir-commits] [mlir] [mlir]introduce UnrollScopeInterface and apply it to funcOp and gpu.launch Op. (PR #123904)

Mon Feb 3 02:56:04 PST 2025

bondhugula wrote:

Can you add a commit summary for the interface being introduced (with a couple of lines on the rationale)? You have it in the comment at https://github.com/llvm/llvm-project/pull/123904#issuecomment-2607676126, but the commit summary is empty.

I ran the example in the first comment - the output:
```
// bin/mlir-opt -affine-loop-unroll="unroll-full" -gpu-kernel-outlining gpu_launch.mlir
#map = affine_map<(d0) -> (d0 + 1)>
#map1 = affine_map<(d0) -> (d0 + 2)>
#map2 = affine_map<(d0) -> (d0 + 3)>
module attributes {gpu.container_module} {
  func.func @gpu_launch_unroll() {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    gpu.launch_func  @gpu_launch_unroll_kernel::@gpu_launch_unroll_kernel blocks in (%c1, %c1, %c1) threads in (%c1, %c1, %c1)  args(%c0 : index)
    return
  }
  gpu.module @gpu_launch_unroll_kernel {
    gpu.func @gpu_launch_unroll_kernel(%arg0: index) kernel attributes {known_block_size = array<i32: 1, 1, 1>, known_grid_size = array<i32: 1, 1, 1>} {
      %block_id_x = gpu.block_id  x
      %block_id_y = gpu.block_id  y
      %block_id_z = gpu.block_id  z
      %thread_id_x = gpu.thread_id  x
      %thread_id_y = gpu.thread_id  y
      %thread_id_z = gpu.thread_id  z
      %grid_dim_x = gpu.grid_dim  x
      %grid_dim_y = gpu.grid_dim  y
      %grid_dim_z = gpu.grid_dim  z
      %block_dim_x = gpu.block_dim  x
      %block_dim_y = gpu.block_dim  y
      %block_dim_z = gpu.block_dim  z
      %cst = arith.constant dense<0.000000e+00> : vector<2x4x2x2xf16>
      %0 = affine.for %arg1 = 0 to 2 iter_args(%arg2 = %cst) -> (vector<2x4x2x2xf16>) {
        %cst_0 = arith.constant dense<0.000000e+00> : vector<2x2xf16>
        %1 = vector.insert %cst_0, %arg2 [%arg1, %arg0] : vector<2x2xf16> into vector<2x4x2x2xf16>
        %2 = affine.apply #map(%arg0)
        %cst_1 = arith.constant dense<0.000000e+00> : vector<2x2xf16>
        %3 = vector.insert %cst_1, %arg2 [%arg1, %2] : vector<2x2xf16> into vector<2x4x2x2xf16>
        %4 = affine.apply #map1(%arg0)
        %cst_2 = arith.constant dense<0.000000e+00> : vector<2x2xf16>
        %5 = vector.insert %cst_2, %arg2 [%arg1, %4] : vector<2x2xf16> into vector<2x4x2x2xf16>
        %6 = affine.apply #map2(%arg0)
        %cst_3 = arith.constant dense<0.000000e+00> : vector<2x2xf16>
        %7 = vector.insert %cst_3, %arg2 [%arg1, %6] : vector<2x2xf16> into vector<2x4x2x2xf16>
        affine.yield %7 : vector<2x4x2x2xf16>
      }
      gpu.return
    }
  }
}
```
Which redundant values are you referring to?

loop-unroll/full was actually meant to be a test pass - it doesn't have a concrete heuristic. (In fact, it was the first pass of MLIR!) One would expect the utilities it exposes, `loopUnrollByFactor/loopUnrollFull` etc., to be the main use cases for other passes/downstream passes. It might be too much to introduce an interface to expose a choice of IR placement for the pass. Also, "statically" adding such an interface to ops to control unrolling is too narrow. One would instead expect an API argument to control something like that if desired.

On a minor note, separately, it makes sense to make it a `FunctionOpInterface` pass.

https://github.com/llvm/llvm-project/pull/123904