[PATCH] D86071: [MLIR][OpenMP] Add omp.do operation

Mon Sep 14 21:18:38 PDT 2020

jdoerfert added a comment.

I forgot to submit the first part of this response a few days ago, apologies.

In D86071#2268569 <https://reviews.llvm.org/D86071#2268569>, @kiranchandramohan wrote:

> OK. I had almost typed in discourse. I primarily wanted to ask the following question before posing a general question in llvm-dev about the dialect or the OpenMP IRBuilder.
>
> Consider that we want to parallelize an SCF loop with OpenMP. We can add the omp.do operation around the loop. This would look like.
>
>   func @some_op_inside_loop(%arg0: index, %arg1: index, %arg2: index) {
>   omp.do {
>     scf.for %i = %arg0 to %arg1 step %arg2 {
>       "some.op"(%i) : (index) -> ()
>     }
>   }
>     return
>   }

This is not parallelization but worksharing, but I think I get what you're saying.

> One way to pass this to the OpenMP IR Builder would be as follows. The loop exists as control flow. (Question: Can we have index, start, end and step variables here as operands or attributes?)
>
>     llvm.func @some_op_inside_loop(%arg0: !llvm.i64, %arg1: !llvm.i64, %arg2: !llvm.i64) {
>   omp.do {
>       llvm.br ^bb1(%arg0 : !llvm.i64)
>     ^bb1(%0: !llvm.i64):  // 2 preds: ^bb0, ^bb2
>       %1 = llvm.icmp "slt" %0, %arg1 : !llvm.i64
>       llvm.cond_br %1, ^bb2, ^bb3
>     ^bb2:  // pred: ^bb1
>       "some.op"(%0) : (!llvm.i64) -> ()
>       %2 = llvm.add %0, %arg2 : !llvm.i64
>       llvm.br ^bb1(%2 : !llvm.i64)
>     ^bb3:  // pred: ^bb1
>   }
>       llvm.return
>     }
>
> Another way would be as follows where the loop exists without the control flow of the loop.
>
>   llvm.func @some_op_inside_loop(%arg0: !llvm.i64, %arg1: !llvm.i64, %arg2: !llvm.i64) {
>   omp.do index(%i: !llvm.i64) start(%arg0: !llvm.i64) stop(%arg1: !llvm.i64) step(%arg2: !llvm.i64) {
>       "some.op"(%i) : (!llvm.i64) -> ()
>   }
>     return
>   }
>
> The question I have for @jdoerfert is which one of these will be preferable for the OpenMP IRBuilder? And the question I have for @ftynse is what is the issue that is there in these schemes?

First, the loop belongs to the `omp.do`. If you want to lower an `omp.do` (w/ or w/o the OpenMPIRBuilder) you need the loop information (=bounds + step). The loop body is irrelevant at this point.
The interface will somewhat look like this:

  InsertPos CreateWorksharingLoop(..., LowerBound, UpperBound, Step, ..., Body)

In D86071#2270542 <https://reviews.llvm.org/D86071#2270542>, @bondhugula wrote:

> In D86071#2265876 <https://reviews.llvm.org/D86071#2265876>, @ftynse wrote:
>
>>> I guess for the general workshare loop design issues we can have an RFC in discourse. But this patch can go ahead.
>>
>> I haven't seen answers to the questions about lowering to LLVM IR + OpenMP runtime, and it sounds suboptimal to push the patch before discussing and agreeing on the actual design.
>>
>> In particular, it is not clear to me how this construct will connect to loops and what the lowering flow is. Does it expect an `scf.for`/`scf.parallel` as the only nested op? Is there a plan for a separate `omp.for`? How long do loops persist when we go to LLVM, given that OpenMPIRBuilder does not handle loop constructs and we really want to avoid converting loops to CFG during MLIR->LLVM IR translation.
>
> +1 on having clarity on the lowering part to LLVM dialect.  I think the op itself doesn't place any restrictions on what's inside. There could be `scf.for` ops nested inside or surrounding an `omp.do` given that they both use the same type subsystem (standard types). 
> One would expect an `scf.parallel` to be converted to an omp.do?

You can lower `scf.parallel` into `omp parallel for/do` *if* you prove the body does not contain (certain) OpenMP runtime calls and directives. So without analysis you cannot.

> The remaining scf.for's would be lowered to LLVM dialect the usual way. It looks like we need some sort of a pre-pass on the LLVM dialect to lower/handle the `omp.do` and some of the implementation therein would have to duplicate/reuse the logic of `scf.for` lowering.

I don't see why this would be the case. As mentioned above, the loop, whatever "op" it might be, belongs to the `omp do`. There is no `omp do` without loop, there is no "loop" once `omp do` has been lowered (to runtime calls). The `omp do` lowering is also not duplicating `scf.for` code if you use the OpenMPIRBuilder.

> @DavidTruby @kiranchandramohan - do you have an example sketch of how the LLVM dialect would like for a simple / minimal omp.do loop right before it's translated out to LLVM IR proper? Would it require duplicating LLVM's OpenMPIRBuilder functionality here on the LLVM dialect?

Please, do not duplicate OpenMPIRBuilder functionality.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86071/new/

https://reviews.llvm.org/D86071