[Mlir-commits] [mlir] [OpenMP][MLIR] Add omp.distribute op to the OMP dialect (PR #67720)

Thu Nov 16 17:30:40 PST 2023

jsjodin wrote:

> > I was originally thinking about other regular operations, however considering that there might be transformation ops there could be other ops than just a omp.canonical_loop inside a omp.distribute region.
> > ```
> > omp.distribute {
> >    %cli = omp.cli
> >    canonical_loop ... %cli
> >    omp.tile(%cli, ... )
> > }
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > Also with the original proposal for omp.canonical_loop this would be possible.
> 
> Yes, but it will still apply to the first canonical loop or generated loop.
> 
> The possible issues, that I can think of if we lose the association with the loop are: -> Handling of lastprivate. It would be difficult to do this in a delayed fashion (at OpenMPTranslation time). -> Handling of collapse. How do you propose to handle this?
> 
The loops that distribute refer to don't exist until after the loop transformations have been performed, and if the loop transformations happen at the OpenMPTranslation and this is an issue for handling of lastprivate, it will be a problem no matter what solution we pick unless the loop transformations happen earlier in the compiler.

I think perhaps a single version of distribute is not enough to represent what we need. The distribute construct may have to be split up into a CLI version op, or wrapper op and additionally a lower level representation which would be attributes on the loop ops. I believe it should always be possible at the loop level to determine if some iteration should be executed on a specific thread or not, so adding an attribute should be enough (plus adding the clauses that were part of the distribute) information. 

The problem I see is that it will not be possible (or very convoluted) to directly generate code with omp.distribute higher up in the chain of CLIs, since at the time of generating code for the loops (e.g. omp.wsloop) we need to know if distribute is present or not. As you mention, the association is broken at that point and unless that information is propagated down or scanned upward in the chain of CLIs, or the region hierarchy in the case of this PR, it is not possible to generate code without non-local information in either solution. I think the only alternative would be to do the loop transformations in MLIR and convert the omp.distribute op into attributes on the loop operations.

> Also, is it guaranteed that the OpenMP runtime call generation for `distribute` do not need to know the bounds and loop control of the loop?

For the target device codegen, distribute is part of the generation for the loop, so it is just a different function call than the non-distribute version. There is no way to generate code for 'distribute' in isolation.

Would it make sense to implement a more minimal solution to handle simple distribute uses for now by adding a attribute to omp.wsloop and have the frontend handle distribute? 

https://github.com/llvm/llvm-project/pull/67720