[Mlir-commits] [mlir] [MLIR][OpenMP] Add omp.simd operation (PR #79843)

Fri Feb 9 05:12:41 PST 2024

kiranchandramohan wrote:

> > While I don't currently get the full picture, the dependencies you mention between distribute and parallel/wsloop seem to stem from the fact that `distribute parallel do` is a composite construct. So maybe the solution is not to share lowering state between these but rather to recognize this is another construct separate from `parallel do` and to handle it independently, possibly sharing a good amount of code with that other combined construct.
> 
> Below is one example that hopefully illustrates the various issues.
> 
> ```
> #pragma omp target teams distribute parallel for
> #pragma omp tile sizes(2)
> for (int i = 0; i < N; ++i) {
>       code(i);
>     }
> }
> ```
> 
> is equivalent to:
> 
> ```
> #pragma omp target teams distribute parallel for
> for (int ii = 0; ii < N; ii+=2) {
>   for (int i = ii; i+2; ++i) {
>     code(i);
>   }
> }
> ```
> 
> If the lowering of the `#pragma omp tile sizes(2)` happens in the OMPIRBuilder, that means we would have to determine at lowering time that the resulting outer loop should be distribute parallel for, while the inner loop is sequential. If we are also bound by the current lowering mechanism, meaning a single pass over the MLIR module, then I think that implies there is no way to create combined constructs ahead of time, the ops have to be individual wrapper ops, and there has to be some information sharing between them during lowering. Also consider the case if collapse(2) was present, which would affect the codegen for the inner loop differently.

We already have some lowering (OpenMP+LLVM dialect -> LLVM IR) where the loop transformation is not directly applied to the immediate loop. The best example of this is collapse followed by worksharing (see links below). There will also be a similar one for simd. 
https://github.com/llvm/llvm-project/blob/b5a273a1cfe6f509f8d2541e04d9186438f33348/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp#L949
https://github.com/llvm/llvm-project/blob/b5a273a1cfe6f509f8d2541e04d9186438f33348/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp#L960

Basically there will be some canonical loop infos that will be processed by various loop transformations (collapse, tile, unroll etc). The CanonicalLoopInfo transformation in the OpenMPIRBuilder is already set up for this purpose. And I believe it already supports tile, unroll, collapse. So for these transformations going ahead with the OpenMPIRBuilder approach might be the easiest for you. I would anticipate that the wrapper operation codegen (openmp + llvm dialect -> llvm ir) will generate canonical loop infos and deposit on the stack if it has a parent operation that will further modify it.

https://github.com/llvm/llvm-project/pull/79843