[flang-commits] [flang] Parallel runtime library design doc (PRIF) (PR #76088)

Thu Oct 17 16:25:53 PDT 2024

klausler wrote:

> A purely "single-node" system, where all images happen to use an intra-node transport is just a special case of this more general situation.

In the community's compiler's default compilation mode, I would expect that optimizations peculiar to a single transport would not apply.  But I think that we agree that there are target interconnects, or families of interconnects, for which specialized optimization could be beneficial, and for which the necessary support in your API could be designed now.  Having the hooks in your API that I mentioned above (remote address calculation, split asynchronous transactions) would make it more attractive as a common solution.

One memory-mapped interconnect of interest to me is not "single-node", namely NVLink with NVSwitch as a multi-node GPU fabric (https://www.nvidia.com/en-us/data-center/nvlink/).  If the compiler and runtime can support optimized compilation for this fabric, then Fortran's coarrays may become a viable parallel & accelerated programming model for such systems.

https://github.com/llvm/llvm-project/pull/76088