[Mlir-commits] [mlir] [OpenMP][MLIR] Add omp.distribute op to the OMP dialect (PR #67720)

Tue Oct 3 07:04:13 PDT 2023

jsjodin wrote:

> > > In the lowering in #67798, an `omp.wsloop` is created for `!$omp distribute`. Is this always correct to do as per the standard? Is it based on existing Clang lowering?
> > 
> > 
> > I think a omp.wsloop is created from parallel do, Afaik 'distribute' is associated with teams, so that each team takes a chunk of the iterations instead of all teams taking all (duplicating) iterations.
>
> The example lowering created a `omp.wsloop` operation for an `omp.distribute` eventhough there was no `parallel do`. Hence the question. https://github.com/llvm/llvm-project/pull/67798/files#diff-0652f88238afa05fb262dcebab875780ab553b3914ba7239512c45986198240d
> 
Not sure if that is correct or not. 

> > > > > Could you share your plan before we proceed? The proposal of this patch of a distribute operation that (possibly) nests a loop seems to be different from both the existing worksharing-loop design (that includes the loop) and the canonical-loop proposal under design where the distribute will accept a CLI value that represents a loop.
> > > > 
> > > > 
> > > > The plan is to have some representation of 'distribute' that does not rely on meta-values (CLI) for now, since there are still a lot of unanswered questions. The version of the omp.distribute op in this patch basically works as a wrapper op that modified how the contained loop(s) execute. It does rely on there being a contained loop or (several if we want to consider collapse).
> > > 
> > > 
> > > Will the contained loop be an `omp.wsloop` always?
> > 
> > 
> > No, it could be some other kind of loop.
> 
> `fir.do_loop`, `scf.for` are converted to control-flow by the time they are in the OpenMP + LLVM dialect stage. So it has to be something encoded with OpenMP, like `omp.canonical_loop`.
> 
For now we would only support omp.wsloop until we decide what to do with omp.canonical_loop and CLIs if we believe that this is an okay solution for distribute.

> > The CLIs get associated with a specific canonical loop, or they are created by a top-level loop transformation op.
> 
> In the other approach, the canonical loop was always created by an omp.canonical_loop declaration or by a loop transformation op. This way it was always easy to find the canonical loop given a CLI. Now we will have to reach the `omp.cli` operation and look at its use.

I don't think that there is much of a difference other than at the top level since it would still be necessary to look at uses to find loops through the omp.yield ops in the original proposal. Also understanding the structure of the code would require more analysis, but could be encoded in the ops using this approach if we see the need for it.

> 
> Associating CLIs at the top-level will not always be possible particularly if there are other operations, like a parallel operation. 
I'm not sure what effect other operations would have. I understand that they might affect how the loop transformation is done, but not how they would affect the location of the transformation op.

For the following example,
> 
> ```
> !$omp unroll
> do i=1,l
>   !$omp parallel
>   do j=1,m
>     !$omp unroll
>     do k=1,n
>     end do
>   end do
> end o
> ```
> 
> Should generate something like the following.
> 
> ```
> %cli1 = omp.cli()
> canonical_loop(%i, 1, %l, %cli1) {
>    omp.parallel {
>      fir.do_loop (%j,1,%m) {
>        %cli2 = omp.cli()
>        canonical_loop(%j, 1, %m, %cli2) {
>        }
>        %cli4 = omp.unroll(%cli2)
>      }
>    }
> }
> ```
> 
> %cli4 = omp.unroll(%cli1)

The code below should be equivalent (assuming the duplicated $cli4, was a typo). It does not really matter where the omp.unroll ops occur in the code except that the uses of CLI values must be dominated by the definitions. A loop transformation happens at the location where the loop associated with a CLI is, not where the loop transformation op is. If the order of the transforms are important it would have to be encoded using CLI dependencies.
 ```
 %cli1 = omp.cli()
 %cli2 = omp.cli()
 canonical_loop(%i, 1, %l, %cli1) {
    omp.parallel {
      fir.do_loop (%j,1,%m) {
        canonical_loop(%j, 1, %m, %cli2) {
        }
      }
    }
 }
}
%cli3 = omp.unroll(%cli2)
%cli4 = omp.unroll(%cli1)
````

https://github.com/llvm/llvm-project/pull/67720