[Mlir-commits] [mlir] [mlir]introduce UnrollScopeInterface and apply it to funcOp and gpu.launch Op. (PR #123904)
Krzysztof Drewniak
llvmlistbot at llvm.org
Fri Jan 24 12:45:51 PST 2025
krzysz00 wrote:
Ok, so, I cleaned up your example a bit to make it *almost* work - you had some typos and a GPU kernel with no side effects.
``mlir
// example
func.func @gpu_launch_unroll() {
%buf = gpu.alloc() : memref<2x4x2x2xf16, #gpu.address_space<global>>
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
gpu.launch blocks(%arg0, %arg1, %arg2) in (%arg6 = %c1, %arg7 = %c1, %arg8 = %c1) threads(%arg3, %arg4, %arg5) in (%arg9 = %c1, %arg10 = %c1, %arg11 = %c1) {
gpu.launch blocks(%arg0, %arg1, %arg2) in (%arg6 = %c1, %arg7 = %c1, %arg8 = %[0/1881]
ads(%arg3, %arg4, %arg5) in (%arg9 = %c1, %arg10 = %c1, %arg11 = %c1) {
%cst = arith.constant dense<0.000000e+00> : vector<2x4x2x2xf16> %0 = affine.for %arg12 = 0 to 2 iter_args(%arg13 = %cst) -> (vector<2x4x2x2xf16>) { %1 = affine.for %arg14 = 0 to 4 iter_args(%arg15 = %arg13) -> (vector<2x4x2x2xf16>) { %cst_0 = arith.constant dense<0.000000e+00> : vector<2x2xf16>
%2 = vector.insert %cst_0, %arg15 [%arg12, %arg14] : vector<2x2xf16> into vector<2x4x2x2xf16> affine.yield %2 : vector<2x4x2x2xf16>
} affine.yield %1 : vector<2x4x2x2xf16> } vector.transfer_write %0, %buf[%c0, %c0, %c0, %c0] {inbounds = [true, true, true, true]} : vector<2x4x2x2xf16>, memref<2x4x2x2xf16, #gpu.address_space<global>>
gpu.terminator
}
gpu.dealloc %buf : memref<2x4x2x2xf16, #gpu.address_space<global>>
return
}
```
which, when run through `mlir-opt -affine-loop-unroll="unroll-full" -affine-loop-unroll="unroll-full" -canonicalize -cse -gpu-launch-sink-index-computations -gpu-kernel-outlining` gives you fully-unrolled loops
(If you only want to unroll the inner loop, just get rid of the second `-affine-loop-unroll`)
https://github.com/llvm/llvm-project/pull/123904
More information about the Mlir-commits
mailing list