[Mlir-commits] [mlir] [mlir]introduce UnrollScopeInterface and apply it to funcOp and gpu.launch Op. (PR #123904)

Fri Jan 24 12:45:51 PST 2025

krzysz00 wrote:

Ok, so, I cleaned up your example a bit to make it *almost* work - you had some typos and a GPU kernel with no side effects.

``mlir
// example
func.func @gpu_launch_unroll() {
  %buf = gpu.alloc() : memref<2x4x2x2xf16, #gpu.address_space<global>>
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  gpu.launch blocks(%arg0, %arg1, %arg2) in (%arg6 = %c1, %arg7 = %c1, %arg8 = %c1) threads(%arg3, %arg4, %arg5) in (%arg9 = %c1, %arg10 = %c1, %arg11 = %c1) {
  gpu.launch blocks(%arg0, %arg1, %arg2) in (%arg6 = %c1, %arg7 = %c1, %arg8 = %[0/1881]
ads(%arg3, %arg4, %arg5) in (%arg9 = %c1, %arg10 = %c1, %arg11 = %c1) {
    %cst = arith.constant dense<0.000000e+00> : vector<2x4x2x2xf16>                         %0 = affine.for %arg12 = 0 to 2 iter_args(%arg13 = %cst) -> (vector<2x4x2x2xf16>) {       %1 = affine.for %arg14 = 0 to 4 iter_args(%arg15 = %arg13) -> (vector<2x4x2x2xf16>) {                                                                                             %cst_0 = arith.constant dense<0.000000e+00> : vector<2x2xf16>
        %2 = vector.insert %cst_0, %arg15 [%arg12, %arg14] : vector<2x2xf16> into vector<2x4x2x2xf16>                                                                                   affine.yield %2 : vector<2x4x2x2xf16>
      }                                                                                       affine.yield %1 : vector<2x4x2x2xf16>                                                 }                                                                                       vector.transfer_write %0, %buf[%c0, %c0, %c0, %c0] {inbounds = [true, true, true, true]} : vector<2x4x2x2xf16>, memref<2x4x2x2xf16, #gpu.address_space<global>>
    gpu.terminator
  }
  gpu.dealloc %buf : memref<2x4x2x2xf16, #gpu.address_space<global>>
  return
}
```

which, when run through `mlir-opt -affine-loop-unroll="unroll-full" -affine-loop-unroll="unroll-full" -canonicalize -cse -gpu-launch-sink-index-computations -gpu-kernel-outlining` gives you fully-unrolled loops

(If you only want to unroll the inner loop, just get rid of the second `-affine-loop-unroll`)

https://github.com/llvm/llvm-project/pull/123904