[Mlir-commits] [mlir] [MLIR][OpenMP] Add omp.simd operation (PR #79843)

Wed Feb 7 07:58:11 PST 2024

skatrak wrote:

> @skatrak Thanks for great work with summarizing the OpenMP constructs.
> 
> Let me express my thoughts:
> 
>     1. Lowering of composite operations will be harder because we will need to combine some lowering steps into one operation (for example: `omp.distparwsloop` will require to generate two runtime calls for the device call (one for enabling parallel execution -> kmpc_parallel_51 and one for workshare loop ). Wrapper operations are aligned with current code generation schemes.
> 
>     2. Some composite operations can denote the same. For example `omp.wssimdloop` and `omp.wsloop` are exactly the same if `simd length = 1`. Having said that, the MLIR optimization opportunities can be lower for composite operations.
> 
>     3. Maybe we have to split OpenMP dialect in to two sub-dialects. The high level dialect will contain composite operations. The lower one will reflect LLVM IR code structure. The MLIR lowering pass can simplify OpenMPIRBuilder logic.
> 
>     4. I don't know how reductions will play with composite operations.

Thanks Dominik for sharing your thoughts on this, and excuse the delay in getting back to you. I'll try to share what I think about these.

1. I think it should be possible to address this issue by doing a minor refactoring. In the OpenMP to LLVMIR translation stage, we currently have `convertOmp<Op-Name>` functions we call for each of the defined MLIR operations. It would be possible to create some `convertOmp<Composite-Name>` that instead of re-implementing all that, could actually call some outlined subset of the corresponding `convertOmp<Op-Name>` functions together with any other special codegen that may be needed. Maybe these outlined functions would take as arguments new MLIR interfaces to represent each single construct that can be part of a composite one.
2. In the case of SIMD, where it would be legal to "ignore" the construct and codegen for width=1, it should be fine to just call the non-SIMD lowering function for the cases for which we don't currently support or want vectorization. Not sure about the potential of missing MLIR optimizations by creating composite operations.
3. I agree that this is a possibility as well and it was something mentioned before. The two options are to do this in MLIR with a higher-level (composite ops) and a lower-level (single ops) dialect as you say or to deal with the splitting when lowering to LLVM IR. It would be a matter of agreeing on a path forward, but both options should be possible.
4. Not sure either about this.

https://github.com/llvm/llvm-project/pull/79843