[Mlir-commits] [mlir] [OpenMP][MLIR] Add omp.distribute op to the OMP dialect (PR #67720)

Mon Oct 2 15:14:58 PDT 2023

jsjodin wrote:

> > Since the canonical loop proposal might take some time before it's finished, in my opinion a good approach could be to mimic `omp.wsloop` and produce new loop index variables as entry block arguments to this op's region. I guess in that case we'd have to think about what the `omp.distribute` loop bounds and steps would be for a given nested Do loop, and how they might interoperate with the corresponding `omp.wsloop` operation.
> 
> Are you suggesting a different approach here? To have `omp.distribute` the same as `omp.wsloop`, i.e a loop-like operation? Currently `omp.wsloop` subsumes the nested `Do loop`.

I am proposing a different approach. The way I see it it makes sense that the omp.distribute is a wrapper op because it does not affect the code of the loop only the "schedule" of the loop, either run all iterations for each team, or a portion of the iterations per team.

> 
> In the lowering in #67798, an `omp.wsloop` is created for `!$omp distribute`. Is this always correct to do as per the standard? Is it based on existing Clang lowering?

I think a omp.wsloop is created from parallel do, Afaik 'distribute' is associated with teams, so that each team takes a chunk of the iterations instead of all teams taking all (duplicating) iterations. 

> 
> > > Could you share your plan before we proceed? The proposal of this patch of a distribute operation that (possibly) nests a loop seems to be different from both the existing worksharing-loop design (that includes the loop) and the canonical-loop proposal under design where the distribute will accept a CLI value that represents a loop.
> > 
> > 
> > The plan is to have some representation of 'distribute' that does not rely on meta-values (CLI) for now, since there are still a lot of unanswered questions. The version of the omp.distribute op in this patch basically works as a wrapper op that modified how the contained loop(s) execute. It does rely on there being a contained loop or (several if we want to consider collapse).
> 
> Will the contained loop be an `omp.wsloop` always?

No, it could be some other kind of loop.

> 
> Regarding the new proposal, we might still have to propagate CLIs if atleast one of the CLI is generated. Like in the following example, where the inner loop is unrolled partially. That unrolling generates a new CLI that has to be somehow propagated up right?
> 
> ```
> !$omp tile (4,5)
> do i=1,n
>   !$omp unroll partial
>   do j=1,m
>   end do
> end o
> ```

Sort of, it would not be propagated up in the regular code, just encoded by the meta-ops. In the example above we would get something like:
```
%cli1 = omp.cli()
%cli2 = omp.cli()
canonical_loop(%i, 1, %n, %cli1) {
   canonical_loop(%j, 1, %m, %cli2) {
 }
}
%cli3 = omp.cli_nest(%cli1, %cli2)
%cli4 = omp.unroll_partial(%cli3#1)
%cli5 = omp.tile(%cli4#0)
```
Alternatively if the order of the loop transforms don't matter (no dependence between them), there is no need to really encode the nesting, so I think it could be simplified to:
```
%cli1 = omp.cli()
%cli2 = omp.cli()
canonical_loop(%i, 1, %n, %cli1) {
   canonical_loop(%j, 1, %m, %cli2) {
 }
}
%cli3 = omp.unroll_partial(%cli2)
%cli4 = omp.tile(%cl1)
```
The CLIs get associated with a specific canonical loop, or they are created by a top-level loop transformation op.

https://github.com/llvm/llvm-project/pull/67720