[PATCH] D159480: [Clang][AArch64] Fine-grained ldp and stp policies.
Manos Anagnostakis via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Sep 8 00:54:41 PDT 2023
manosanag added a comment.
Hello Dave,
thanks for replying.
Yes, this is an optimization.
On some AArch64 cores, including Ampere's ampere1 architecture that this is targeted for, load/store pair instructions are faster compared to simple loads/stores only when the alignment of the pair is at least twice that of the individual element being loaded. Based on the performance of various benchmarks, emitting ldp/stp instructions was disabled on GCC at some point (discussion is https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615672.html). This patch improves on that and offers control over when the instructions are used.
Similar patch with the same flags has been recently submitted for review in the GCC mailing lists (https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628590.html).
I have a fix ready for the fortran regressions shown by autotesting. I can include some of this information to the commit message of the diff.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D159480/new/
https://reviews.llvm.org/D159480
More information about the cfe-commits
mailing list