[llvm-branch-commits] [flang] [Flang][OMP] Replace SUM intrinsic call with SUM operations (PR #113082)

Wed Oct 23 04:48:08 PDT 2024

https://github.com/tblah commented:

I don't have a strong opinion about whether this should be done in flang codegen or in a runtime library. Some drive-by thoughts:

- I imagine that using a C `omp for` inside of a fortran `omp parallel` should work **so long as they use the same openmp library**. Of course care will need to be taken about privatisation etc but that can probably be handled in the interface of the runtime function.
- The above requirement basically breaks allowing users to pick their own openmp library when compiling their application. I'm not sure how widely used this is.
- This is already hard to read/review and SUM is a very simple case. I think there is a danger in re-inventing the wheel here. Something like a high quality parallelised MATMUL which is useful on both CPU and GPU sounds hard. There is a lot of work in that direction in "upstream" MLIR dialects. I know flang (for historical reasons) doesn't use linalg, affine, etc but we might have to eventually.
- If we decide not to re-use upstream mlir work, I think more complex intrinsics could get quite hard to review in this style. A C++ runtime library implementation would be easier to read and maintain
- Having it in a runtime library would make it easier for vendors to swap out their own implementations e.g. maybe somebody already has a great MATMUL for AMD GPUs and wants flang to use that, but the implementation is useless on a CPU

Overall I don't know what the right approach is. Maybe we should do both as Ivan said. I think if we do decide to implement several intrinsics this way, it should go in its own pass because the code could quickly become very long.

https://github.com/llvm/llvm-project/pull/113082