[all-commits] [llvm/llvm-project] d96ea4: [AArch64LoadStoreOptimizer] Generate more STPs by ...
Meera Nakrani via All-commits
all-commits at lists.llvm.org
Wed Jun 9 04:33:19 PDT 2021
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: d96ea46629803641038ebe46d8cd512f8cf7e20f
https://github.com/llvm/llvm-project/commit/d96ea46629803641038ebe46d8cd512f8cf7e20f
Author: Meera Nakrani <meera.nakrani at arm.com>
Date: 2021-06-09 (Wed, 09 Jun 2021)
Changed paths:
M llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
M llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll
M llvm/test/CodeGen/AArch64/consthoist-gep.ll
M llvm/test/CodeGen/AArch64/ldst-opt.ll
M llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir
Log Message:
-----------
[AArch64LoadStoreOptimizer] Generate more STPs by renaming registers earlier
Our initial motivating case was memcpy's with alignments > 16. The
loads/stores, to which small memcpy's expand, are kept together in
several places so that we get a sequence like this for a 64 bit copy:
LD w0
LD w1
ST w0
ST w1
The load/store optimiser can generate a LDP/STP w0, w1 from this because
the registers read/written are consecutive. In our case however, the
sequence is optimised during ISel, resulting in:
LD w0
ST w0
LD w0
ST w0
This instruction reordering allows reuse of registers. Since the registers
are no longer consecutive (i.e. they are the same), it inhibits LDP/STP
creation. The approach here is to perform renaming:
LD w0
ST w0
LD w1
ST w1
to enable the folding of the stores into a STP. We do not yet generate
the LDP due to a limitation in the renaming implementation, but plan to
look at that in a follow-up so that we fully support this case. While
this was initially motivated by certain memcpy's, this is a general
approach and thus is beneficial for other cases too, as can be seen
in some test changes.
Differential Revision: https://reviews.llvm.org/D103597
More information about the All-commits
mailing list