[all-commits] [llvm/llvm-project] d96ea4: [AArch64LoadStoreOptimizer] Generate more STPs by ...

Wed Jun 9 04:33:19 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: d96ea46629803641038ebe46d8cd512f8cf7e20f
      https://github.com/llvm/llvm-project/commit/d96ea46629803641038ebe46d8cd512f8cf7e20f
  Author: Meera Nakrani <meera.nakrani at arm.com>
  Date:   2021-06-09 (Wed, 09 Jun 2021)

  Changed paths:
    M llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
    M llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll
    M llvm/test/CodeGen/AArch64/consthoist-gep.ll
    M llvm/test/CodeGen/AArch64/ldst-opt.ll
    M llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir

  Log Message:
  -----------
  [AArch64LoadStoreOptimizer] Generate more STPs by renaming registers earlier

Our initial motivating case was memcpy's with alignments > 16. The
loads/stores, to which small memcpy's expand, are kept together in
several places so that we get a sequence like this for a 64 bit copy:
LD w0
LD w1
ST w0
ST w1
The load/store optimiser can generate a LDP/STP w0, w1 from this because
the registers read/written are consecutive. In our case however, the
sequence is optimised during ISel, resulting in:
LD w0
ST w0
LD w0
ST w0
This instruction reordering allows reuse of registers. Since the registers
are no longer consecutive (i.e. they are the same), it inhibits LDP/STP
creation. The approach here is to perform renaming:
LD w0
ST w0
LD w1
ST w1
to enable the folding of the stores into a STP. We do not yet generate
the LDP due to a limitation in the renaming implementation, but plan to
look at that in a follow-up so that we fully support this case. While
this was initially motivated by certain memcpy's, this is a general
approach and thus is beneficial for other cases too, as can be seen
in some test changes.

Differential Revision: https://reviews.llvm.org/D103597