[flang-commits] [flang] Parallel runtime library design doc (PRIF) (PR #76088)

Peter Klausler via flang-commits flang-commits at lists.llvm.org
Wed Oct 2 13:09:58 PDT 2024


klausler wrote:

I am not a reviewer for this PR, and I usually don't intrude on flang community activity, but I have read the document and have a couple of suggestions to make, mostly from an optimization perspective.

In brief: Target hardware for corray Fortran includes two important subsets: those targets whose interconnects admit direct load/store access to remote data, and those whose data transfers are driven by controlling an RDMA NIC's MMRs.  A user who wants to optimize for a specific target interconnect, or class of interconnects, should be able to stipulate so with a command-line option, and might get better performance than they would from a compilation that supports any possible target interconnect.

When the target interconnect is known at compilation time to be one of those that support load/store access to remote memory, it would be useful to have a runtime library interface to perform the necessary address calculation to compute a remote base address for a given coarray on a particular image, if that image is not known to have failed.  This would allow an optimizer to amortize the cost of that calculation when multiple references to the same corray/image will follow, and would enable loop transformations to prefetch data and hide load latency.

On the other hand, when the target interconnect is known at compilation time to be one that supports asynchronous transactions, it would similarly be useful to have runtime library interfaces to initiate asynchronous reads and await their completions, again for hiding load latency.  (Other optimizations that might apply for these targets, such as software caching, can be left to the target's runtime library implementation, I think.)

https://github.com/llvm/llvm-project/pull/76088


More information about the flang-commits mailing list