[flang-commits] [flang] [RFC][flang] Add support for assumed-shape dummy arrays repacking. (PR #127147)

Tue Feb 18 12:16:10 PST 2025

vzakhari wrote:

> Thanks @vzakhari for the detailed document about repacking arrays.

Thank you for reading, Kiran!

> We implemented a loop versioning pass (https://discourse.llvm.org/t/rfc-loop-versioning-for-unit-stride/68605, https://flang.llvm.org/docs/FlangCommandLineReference.html#cmdoption-flang-fversion-loops-for-stride) that versions the loop for better cache locality, prevent scatter/gathers etc. We get around 36% improvement in the spec2017 roms benchmark due to this pass. Since this pass serves a similar usecase as repack arrays, the pass can be switched OFF when 'repack-arrays' is ON.

I think we may get better results with both array repacking and the loop versioning.  I added a section about this into the document.

> BTW, Is the performance data that you collected with the loop versioning pass ON? Or is it purely artificial?

I collected the performance numbers using `gfortran`, so they are not artificial :) Flang does not have array repacking right now, but the manual `REPACKING` version shows the same speed-up as `gfortran`'s array repacking. The times are the same regardless of the loop versioning. I believe in `capacita` case none of the compilers vectorizes the hot loop, because of unknown data dependencies, so all the speed-up from the array repacking is coming from better data cache utilization.

> There were some comments about issues with versioning/repacking with the asynchronous attribute and the contiguous attribute. It might be good to add these points to the document. https://discourse.llvm.org/t/transformations-to-aid-optimizer-for-subroutines-functions-with-assumed-shape-arguments/66447

Thanks for the link! The document covers `ASYNCHRONOUS` and `VOLATILE` cases, and it also mentions that the array repacking should be applied to the dummy arguments that are `CONTIGUOUS`.

> I have added AMD engineers working on offloading to have a look for the offloading questions.

Thanks!

https://github.com/llvm/llvm-project/pull/127147