[Mlir-commits] [mlir] [MLIR][OpenMP] Add omp.simd operation (PR #79843)

Tue Jan 30 09:27:32 PST 2024

skatrak wrote:

Thanks Kiran for the feedback, I'll try to answer your concerns below.

> 1. Please point to the relevant section in the standard.

The section describing this in the 5.0 spec is 2.9.3.2, as @DominikAdamski pointed out. I'll add references to it to the relevant places.

> 2. Would simd as an attribute be better?
> 3. Or could worksharing-loop simd be a separate operation? Worksharing-loop simd is a composite construct so this might make sense here.

I think these are some of the options that we were considering, so it's a matter of choosing what seems more reasonable to everyone. The representation proposed here is something like the following:
```mlir
omp.wsloop for (%i) : index = (%lb) to (%up) step (%step) <workshare-loop-specific clauses> {
  omp.simd <simd-specific clauses> {
    ...
    omp.yield
  }
  omp.yield
}
```
The idea behind that is to follow what the same approach as the `omp.distribute` operation. It works as a wrapper that is associated to some loop. In this case it goes nested inside holding the loop body rather than around the loop, according to the order in which the various subdivisions of work happen.

One other thing that could be done is to extend the `omp.wsloop` operation to accept the various attributes to represent the clauses that would be applied to the SIMD construct as well. I think this is what your second point refers to. We could have it looking something like this:
```mlir
omp.wsloop for simd (%i) : index = (%lb) to (%ub) step (%step) <workshare-loop + simd clauses> {
  ...
  omp.yield
}
```
The addition of "simd" after "for" is a way to convey there being an MLIR `UnitAttr`, boolean or something like this. Then, SIMD-specific arguments and attributes would be rejected by the verifier if they appear on a non-simd variant of the operation. The obvious problem with this is that it pollutes `omp.wsloop` and sort of also make it represent different things. We moved away from this in the `omp.distribute` discussions early on and I think it's the same concerns here.

I'm not against using ops to represent composite constructs, since they have their own unique behavior. So, in this case, my understanding is that it should look similar to this:
```mlir
omp.wsloop_simd for (%i) : index = (%lb) to (%ub) step (%step) <workshare-loop-simd-specific clauses> {
  ...
  omp.yield
}
```
I guess the main reason not to go this route is that it is not done for any other composite construct. For this proposal I went with the alternative for which there is some precedent in the OpenMP dialect. Though it may make sense to do this for all composite constructs, assuming the number of them isn't too large and make the dialect unnecessarily complicated/redundant.

> 
>     4. How did you arrive at the set of clauses supported?
These are just the clauses that were supported by `omp.simdloop` that weren't related to the loop range and step.

Thinking about it a bit more, it would be possible to make some sort of 2-level representation system for composite constructs. The first level would be to just represent each allowed composite construct as its own MLIR operation, which is what the frontend would produce. Then, there would be an OpenMP dialect MLIR pass to split them up according to their semantics. In this case, from the `omp.wsloop_simd` example above, we would produce something like this (assuming the worksharing loop schedule doesn't prevent doing it like this):
```mlir
omp.wsloop (%ii) : index = (%lb) to (%ub) step (%block_size) <worksharing-loop-specific clauses> {
  <calculate bounds of SIMD loop>
  omp.simdloop (%i): index = (%block_lb) to (%block_ub) step (%block_step) <simd-specific clauses> {
    ...
    omp.yield
  }
  omp.yield
}
```
The good thing about something like this is that when reaching the MLIR to LLVM IR translation, we will get the same input MLIR for equivalent source code regardless of whether app developers used a set of composite/combined constructs or they defined each one of the constructs into various nested loops. Then, the problem would be to recognize these patterns again to be able to target the OpenMP runtime, where certain construct combinations can already be targeted independently.

https://github.com/llvm/llvm-project/pull/79843