[PATCH] D159480: [Clang][AArch64] Fine-grained ldp and stp policies.

Fri Sep 8 00:54:41 PDT 2023

manosanag added a comment.

Hello Dave,

thanks for replying.

Yes, this is an optimization.

On some AArch64 cores, including Ampere's ampere1 architecture that this is targeted for, load/store pair instructions are faster compared to simple loads/stores only when the alignment of the pair is at least twice that of the individual element being loaded. Based on the performance of various benchmarks, emitting ldp/stp instructions was disabled on GCC at some point (discussion is https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615672.html). This patch improves on that and offers control over when the instructions are used.

Similar patch with the same flags has been recently submitted for review in the GCC mailing lists (https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628590.html).

I have a fix ready for the fortran regressions shown by autotesting. I can include some of this information to the commit message of the diff.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159480/new/

https://reviews.llvm.org/D159480