[Mlir-commits] [mlir] [MLIR][OpenMP] Add omp.simd operation (PR #79843)

Wed Feb 7 08:39:04 PST 2024

skatrak wrote:

Thanks Jan for adding to the discussion.

> Using wrapper ops seems to be a better option imo since it should be easier to extend and we avoid the combinatorial explosion.

Here I think the main factor would be how many new composite constructs would be added, because combined constructs are already and should remain represented as wrapper ops, and they make up most of the table above. If there's a chance that many new composite constructs are added (or even one or two new single constructs are added that can be composite with all existing ones) then the option of having composite ops really makes no sense. As it stands right now, both alternatives still make sense.

> I have one question about if there are cases where writing something on two separate omp lines vs a single line would change the semantics, e.g. in one case it is a combined construct but the other it isn't?

Yes, so a combined construct can be split into two constructs where the second is nested directly inside of the first. The first is, as far as I can tell, always a leaf construct (i.e. not combined or composite). So, for example, `parallel do` is a combined construct that means the same as `parallel` with a single `do` nested as its only child. Composite constructs are the ones where this is not the case like, for example, `distribute parallel do`, which has its own semantics (run in parallel iterations of the associated collapsed loop nest by threads of multiple teams). In that case, it wouldn't even be syntactically correct to create a `distribute` construct with a nested `parallel do` construct.

> I believe another issue with not using wrapper ops is that it may not be possible to create the combined ops directly if there are loop transformation ops present, which means the wrapper ops will have to be used anyway. To clarify, if the loop transformations are handled by the OpenMPIRBuilder, then there is no point in the compilation where combined ops could be created. I'm not convinced this is the right approach though, it would make sense to me to to the loop transformations first and then do the lowering.

I'm not as familiar to the discussions on representing loop transformations in the OpenMP dialect, but I think that whatever it ends up looking like it should be in a way independent from the parallelism-generating and work-distributing operations. By that I mean that it shouldn't matter whether a loop has been transformed or not by the time we want to run it. We should try to split the what from the how (what loop to run vs how to execute the loop).

> There is another problem which @DominikAdamski was mentioning and that I've been trying to get some clarity on, which is how to communicate information between op lowerings since there are dependencies between e.g. distribute and parallel/wsloop, this will have to be solved. In the case of loops there is a proposal to use CLIs, but this is only one kind of information that needs to be communicated, another is reduction information, and there are other cases as well (collapse?). I think it makes sense that the OpenMPIRBuilder keeps track of these things. @kiranchandramohan suggested using block arguments for CLIs, which probably fits better with this approach, and the solution would be more uniform compared to special handling for CLIs. Another option would be to have other kind of lowering information represented in MLIR as values like CLIs, but it might degenerate if CLIs get expanded to hold other information into having a single value linking all the omp ops, which would just represent the state in the OpenMPIRBuilder. This is why I"m leaning towards wrapper ops with a simple recursive traversal and the OpenMPIRBuilder keeping the information that needs to be passed between the op lowerings.

While I don't currently get the full picture, the dependencies you mention between distribute and parallel/wsloop seem to stem from the fact that `distribute parallel do` is a composite construct. So maybe the solution is not to share lowering state between these but rather to recognize this is another construct separate from `parallel do` and to handle it independently, possibly sharing a good amount of code with that other combined construct.

https://github.com/llvm/llvm-project/pull/79843