[Mlir-commits] [mlir] [mlir]introduce UnrollScopeInterface and apply it to funcOp and gpu.launch Op. (PR #123904)

Fri Jan 24 12:59:32 PST 2025

linuxlonelyeagle wrote:

> Ok, so, I cleaned up your example a bit to make it work - you had some typos and a GPU kernel with no side effects.
> 
> ```mlir
> // example
> func.func @gpu_launch_unroll() {
>   %buf = gpu.alloc() : memref<2x4x2x2xf16, #gpu.address_space<global>>
>   %c0 = arith.constant 0 : index
>   %c1 = arith.constant 1 : index
>   gpu.launch blocks(%arg0, %arg1, %arg2) in (%arg6 = %c1, %arg7 = %c1, %arg8 = %c1) threads(%arg3, %arg4, %arg5) in (%arg9 = %c1, %arg10 = %c1, %arg11 = %c1) {
>   gpu.launch blocks(%arg0, %arg1, %arg2) in (%arg6 = %c1, %arg7 = %c1, %arg8 = %[0/1881]
> ads(%arg3, %arg4, %arg5) in (%arg9 = %c1, %arg10 = %c1, %arg11 = %c1) {
>     %cst = arith.constant dense<0.000000e+00> : vector<2x4x2x2xf16>                         %0 = affine.for %arg12 = 0 to 2 iter_args(%arg13 = %cst) -> (vector<2x4x2x2xf16>) {       %1 = affine.for %arg14 = 0 to 4 iter_args(%arg15 = %arg13) -> (vector<2x4x2x2xf16>) {                                                                                             %cst_0 = arith.constant dense<0.000000e+00> : vector<2x2xf16>
>         %2 = vector.insert %cst_0, %arg15 [%arg12, %arg14] : vector<2x2xf16> into vector<2x4x2x2xf16>                                                                                   affine.yield %2 : vector<2x4x2x2xf16>
>       }                                                                                       affine.yield %1 : vector<2x4x2x2xf16>                                                 }                                                                                       vector.transfer_write %0, %buf[%c0, %c0, %c0, %c0] {inbounds = [true, true, true, true]} : vector<2x4x2x2xf16>, memref<2x4x2x2xf16, #gpu.address_space<global>>
>     gpu.terminator
>   }
>   gpu.dealloc %buf : memref<2x4x2x2xf16, #gpu.address_space<global>>
>   return
> }
> ```
> 
> which, when run through `mlir-opt -affine-loop-unroll="unroll-full" -affine-loop-unroll="unroll-full" -canonicalize -cse -gpu-launch-sink-index-computations -gpu-kernel-outlining` gives you fully-unrolled loops
> 
> (If you only want to unroll the inner loop, just get rid of the second `-affine-loop-unroll`)

There's something wrong with the IR you pasted, `mlir-opt` won't parse it. the test added in PR is my own generated code truncated from it, so there's two layers of loops, and since unrolling it twice would make the test generate too much code, I only unrolled the innermost layer.
Maybe I should make the example in this PR make a little more sense.

https://github.com/llvm/llvm-project/pull/123904