[Mlir-commits] [mlir] [OpenMP Dialect] Add omp.canonical_loop operation. (PR #65380)

Wed Sep 6 15:02:40 PDT 2023

Meinersbur wrote:

Unfortunately nesting instead of the %cli object like this:
```
omp.unroll loop(1)
  omp.tile loop (0) { tile_sizes=[4] } {
    omp.canonical_loop for (%c0) to (%c64) step (%c1) {
      ..
    }
  }
}
```
is that it does not work with the `apply` clause. For instance:
```
#pragma omp tile sizes(4) apply(intratile:unroll full)
for (int i =0; i < n; ++i) {
   ...
}
```
where after tiling, the *inner* loop is unrolled. The MLIR representation would be like this:
```
%cli = omp.canonical_loop %iv = [0, %tc) {
      ..
}
%grid, %intratile = omp.tile sizes(4) (%cli)
omp.unroll_full (%intratile)
```
I don't see a way to do this with nesting. A more useful example than the above (which is just equivalent to partial unrolling by 4), would be a 2d-tiling followed with a `simd`-ization of one (or both) of the inner loops. Since they are constant-sized, it makes them an ideal target for vectorization.

In the OpenMP spec, my original proposal was to allow the user to give those generates loop names (idea stolen from the xlc compiler) to avoid extensive nesting with chains of transformations. E.g.:
```
#pragma omp unroll on(mytiledloop)
#pragma omp tile on(myoriginalloop) sizes(4) generates(intratile:mytiledloop)

#pragma omp loopid(myoriginalloop)
for (int i =0; i < n; ++i) {
   ...
}
```
After big discussions on what the namespace of the loopids would be, we settled on the `apply` clause. They are equally powerful, but I think the loopids would allow make it easier to use, e.g. apply different transformations for each target architecture.

https://github.com/llvm/llvm-project/pull/65380