[llvm-branch-commits] [flang] [flang] Introduce custom loop nest generation for loops in workshare construct (PR #101445)

Fri Aug 23 04:41:12 PDT 2024

skatrak wrote:

> > Maybe support for this operation could be just based on changes to how the MLIR representation is built in the first place, what do you think?
> 
> This is partly what this implementation aims to do. In fact, after the pass that lowers the omp.workshare operation we are left with IR very close to the one you showed in your example. (please take a look at some of the tests in #101446)
> 
> The approach taken here is similar to the omp.workdistribute implementation, in that the purpose of the omp.workshare and omp.workshare.loop_wrapper ops are to preserve the high-level optimizations available when using HLFIR, after we are done with the LowerWorkshare pass, both omp.workdistribute and omp.workdistribute.loop_wrapper disappear.
> 

I see that this approach reduces significantly the amount of OpenMP-specific handling that needs to be done inside of the creation of Fortran loops, so I don't have an issue with adding the `omp.workshare` and `omp.workdistribute` ops and creating a late-running transformation pass rather than generating directly the "lower-level" set of OpenMP operations that represent the semantics of these constructs.

> The sole purpose of the omp.workdistribute.loop_wrapper op is to be able to more explicitly mark loops that need to "parallelized" by the workshare construct and preserve that information through the pipeline. Its lifetime is from the frontend (Fortran->{HLFIR,FIR}) up to the the LowerWorkshare pass which runs after we are done with HLFIR optimizations (after HLFIR->FIR lowering), same for omp.workshare.
> 
> The problem with trying to convert fir.do_loop's to wsloop is that it is harder to keep track of where they came from - did they come from an array intrinsic which needs to be parallelized or was it just a do loop which the programmer wrote in the workshare which must not be parallelized.

I guess what I still don't understand is the need for `omp.work{share, distribute}.loop_wrapper` operations. For telling apart a sequential loop from a parallel loop inside of a workshare or workdistribute construct, we already have `fir.do_loop` and `omp.wsloop + omp.loop_nest`. If I'm not wrong, this PR prepares the `genLoopNest` function to later specify when to produce a parallel or a sequential loop inside of a workshare construct so that you can create `omp.workshare.loop_wrapper` or `fir.do_loop`. What I'm saying is that we can just use `omp.wsloop` in place of the first because it already has the meaning of "share iterations of the following loop across threads in the current team of threads". And that team of threads is defined by the parent `omp.parallel`, rather than `omp.workshare`. I can't think of a semantic difference between encountering `omp.wsloop` directly nested inside of `omp.parallel` vs nested inside of an `omp.workshare` nested inside of `omp.parallel`. What changes is everything else, which in the second case is functionally equivalent to being inside of an `omp.single`. Does that make sense or am I still missing something?

https://github.com/llvm/llvm-project/pull/101445