[llvm] 17554b8 - [AArch64] Teach Load/Store optimizier to rename store operands for pairing.
Rumeet Dhindsa via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 11 09:09:34 PST 2019
Hi Florian,
This commit seems to be giving unused variable error in non-assert build
for variable MI for file AArch64LoadStoreOptimizer.cpp
<https://github.com/llvm/llvm-project/commit/17554b89617e084848784dfd9ac58e2718d8f8f7#diff-fa3932354e0ceda4288dbea02847e8e0>
.
Could you please take a look.
Thanks,
Rumeet
On Wed, Dec 11, 2019 at 5:54 AM Florian Hahn via llvm-commits <
llvm-commits at lists.llvm.org> wrote:
>
> Author: Florian Hahn
> Date: 2019-12-11T13:50:11Z
> New Revision: 17554b89617e084848784dfd9ac58e2718d8f8f7
>
> URL:
> https://github.com/llvm/llvm-project/commit/17554b89617e084848784dfd9ac58e2718d8f8f7
> DIFF:
> https://github.com/llvm/llvm-project/commit/17554b89617e084848784dfd9ac58e2718d8f8f7.diff
>
> LOG: [AArch64] Teach Load/Store optimizier to rename store operands for
> pairing.
>
> In some cases, we can rename a store operand, in order to enable pairing
> of stores. For store pairs, that cannot be merged because the first
> tored register is defined in between the second store, we try to find
> suitable rename register.
>
> First, we check if we can rename the given register:
>
> 1. The first store register must be killed at the store, which means we
> do not have to rename instructions after the first store.
> 2. We scan backwards from the first store, to find the definition of the
> stored register and check all uses in between are renamable. Along
> they way, we collect the minimal register classes of the uses for
> overlapping (sub/super)registers.
>
> Second, we try to find an available register from the minimal physical
> register class of the original register. A suitable register must not be
>
> 1. defined before FirstMI
> 2. between the previous definition of the register to rename
> 3. a callee saved register.
>
> We use KILL flags to clear defined registers while scanning from the
> beginning to the end of the block.
>
> This triggers quite often, here are the top changes for MultiSource,
> SPEC2000, SPEC2006 compiled with -O3 for iOS:
>
> Metric: aarch64-ldst-opt.NumPairCreated
>
> Program base patch diff
> test-suite...nch/fourinarow/fourinarow.test 2.00 39.00 1850.0%
> test-suite...s/ASC_Sequoia/IRSmk/IRSmk.test 46.00 80.00 73.9%
> test-suite...chmarks/Olden/power/power.test 70.00 96.00 37.1%
> test-suite...cations/hexxagon/hexxagon.test 29.00 39.00 34.5%
> test-suite...nchmarks/McCat/05-eks/eks.test 100.00 132.00 32.0%
> test-suite.../Trimaran/enc-rc4/enc-rc4.test 46.00 59.00 28.3%
> test-suite...T2006/473.astar/473.astar.test 160.00 200.00 25.0%
> test-suite.../Trimaran/enc-md5/enc-md5.test 8.00 10.00 25.0%
> test-suite...telecomm-gsm/telecomm-gsm.test 113.00 139.00 23.0%
> test-suite...ediabench/gsm/toast/toast.test 113.00 139.00 23.0%
> test-suite...Source/Benchmarks/sim/sim.test 91.00 111.00 22.0%
> test-suite...C/CFP2000/179.art/179.art.test 41.00 49.00 19.5%
> test-suite...peg2/mpeg2dec/mpeg2decode.test 245.00 279.00 13.9%
> test-suite...marks/Olden/health/health.test 16.00 18.00 12.5%
> test-suite...ks/Prolangs-C/cdecl/cdecl.test 90.00 101.00 12.2%
> test-suite...fice-ispell/office-ispell.test 91.00 100.00 9.9%
> test-suite...oxyApps-C/miniGMG/miniGMG.test 430.00 465.00 8.1%
> test-suite...lowfish/security-blowfish.test 39.00 42.00 7.7%
> test-suite.../Applications/spiff/spiff.test 42.00 45.00 7.1%
> test-suite...arks/mafft/pairlocalalign.test 2473.00 2646.00 7.0%
> test-suite.../VersaBench/ecbdes/ecbdes.test 29.00 31.00 6.9%
> test-suite...nch/beamformer/beamformer.test 220.00 235.00 6.8%
> test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2252.00 6.7%
> test-suite...ve-susan/automotive-susan.test 109.00 116.00 6.4%
> test-suite...s-C/unix-smail/unix-smail.test 65.00 69.00 6.2%
> test-suite...CI_Purple/SMG2000/smg2000.test 1194.00 1265.00 5.9%
> test-suite.../Benchmarks/nbench/nbench.test 472.00 500.00 5.9%
> test-suite...oxyApps-C/miniAMR/miniAMR.test 248.00 262.00 5.6%
> test-suite...quoia/CrystalMk/CrystalMk.test 18.00 19.00 5.6%
> test-suite...rks/tramp3d-v4/tramp3d-v4.test 7331.00 7710.00 5.2%
> test-suite.../Benchmarks/Bullet/bullet.test 5651.00 5938.00 5.1%
> test-suite...ternal/HMMER/hmmcalibrate.test 750.00 788.00 5.1%
> test-suite...T2006/456.hmmer/456.hmmer.test 764.00 802.00 5.0%
> test-suite...ications/JM/ldecod/ldecod.test 1028.00 1079.00 5.0%
> test-suite...CFP2006/444.namd/444.namd.test 1368.00 1434.00 4.8%
> test-suite...marks/7zip/7zip-benchmark.test 4471.00 4685.00 4.8%
> test-suite...6/464.h264ref/464.h264ref.test 3122.00 3271.00 4.8%
> test-suite...pplications/oggenc/oggenc.test 1497.00 1565.00 4.5%
> test-suite...T2000/300.twolf/300.twolf.test 742.00 774.00 4.3%
> test-suite.../Prolangs-C/loader/loader.test 24.00 25.00 4.2%
> test-suite...0.perlbench/400.perlbench.test 1983.00 2058.00 3.8%
> test-suite...ications/JM/lencod/lencod.test 4612.00 4785.00 3.8%
> test-suite...yApps-C++/PENNANT/PENNANT.test 995.00 1032.00 3.7%
> test-suite...arks/VersaBench/dbms/dbms.test 54.00 56.00 3.7%
>
> Reviewers: efriedma, thegameg, samparker, dmgreen, paquette, evandro
>
> Reviewed By: paquette
>
> Differential Revision: https://reviews.llvm.org/D70450
>
> Added:
> llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir
>
> Modified:
> llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
> llvm/test/CodeGen/AArch64/arm64-abi-varargs.ll
> llvm/test/CodeGen/AArch64/arm64-abi_align.ll
> llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
> llvm/test/CodeGen/AArch64/machine-outliner-remarks.ll
> llvm/test/CodeGen/AArch64/machine-outliner.ll
>
> Removed:
>
>
>
>
> ################################################################################
> diff --git a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
> b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
> index a0c4a25bb5b9..d60dc43d19b4 100644
> --- a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
> +++ b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
> @@ -32,10 +32,12 @@
> #include "llvm/Pass.h"
> #include "llvm/Support/CommandLine.h"
> #include "llvm/Support/Debug.h"
> +#include "llvm/Support/DebugCounter.h"
> #include "llvm/Support/ErrorHandling.h"
> #include "llvm/Support/raw_ostream.h"
> #include <cassert>
> #include <cstdint>
> +#include <functional>
> #include <iterator>
> #include <limits>
>
> @@ -51,6 +53,9 @@ STATISTIC(NumUnscaledPairCreated,
> STATISTIC(NumZeroStoresPromoted, "Number of narrow zero stores promoted");
> STATISTIC(NumLoadsFromStoresPromoted, "Number of loads from stores
> promoted");
>
> +DEBUG_COUNTER(RegRenamingCounter, DEBUG_TYPE "-reg-renaming",
> + "Controls which pairs are considered for renaming");
> +
> // The LdStLimit limits how far we search for load/store pairs.
> static cl::opt<unsigned> LdStLimit("aarch64-load-store-scan-limit",
> cl::init(20), cl::Hidden);
> @@ -76,6 +81,11 @@ using LdStPairFlags = struct LdStPairFlags {
> // to be extended, 0 means I, and 1 means the returned iterator.
> int SExtIdx = -1;
>
> + // If not none, RenameReg can be used to rename the result register of
> the
> + // first store in a pair. Currently this only works when merging stores
> + // forward.
> + Optional<MCPhysReg> RenameReg = None;
> +
> LdStPairFlags() = default;
>
> void setMergeForward(bool V = true) { MergeForward = V; }
> @@ -83,6 +93,10 @@ using LdStPairFlags = struct LdStPairFlags {
>
> void setSExtIdx(int V) { SExtIdx = V; }
> int getSExtIdx() const { return SExtIdx; }
> +
> + void setRenameReg(MCPhysReg R) { RenameReg = R; }
> + void clearRenameReg() { RenameReg = None; }
> + Optional<MCPhysReg> getRenameReg() const { return RenameReg; }
> };
>
> struct AArch64LoadStoreOpt : public MachineFunctionPass {
> @@ -99,6 +113,7 @@ struct AArch64LoadStoreOpt : public MachineFunctionPass
> {
>
> // Track which register units have been modified and used.
> LiveRegUnits ModifiedRegUnits, UsedRegUnits;
> + LiveRegUnits DefinedInBB;
>
> void getAnalysisUsage(AnalysisUsage &AU) const override {
> AU.addRequired<AAResultsWrapperPass>();
> @@ -599,8 +614,8 @@ static void getPrePostIndexedMemOpInfo(const
> MachineInstr &MI, int &Scale,
> }
> }
>
> -static const MachineOperand &getLdStRegOp(const MachineInstr &MI,
> - unsigned PairedRegOp = 0) {
> +static MachineOperand &getLdStRegOp(MachineInstr &MI,
> + unsigned PairedRegOp = 0) {
> assert(PairedRegOp < 2 && "Unexpected register operand idx.");
> unsigned Idx = isPairedLdSt(MI) ? PairedRegOp : 0;
> return MI.getOperand(Idx);
> @@ -783,6 +798,44 @@
> AArch64LoadStoreOpt::mergeNarrowZeroStores(MachineBasicBlock::iterator I,
> return NextI;
> }
>
> +// Apply Fn to all instructions between MI and the beginning of the
> block, until
> +// a def for DefReg is reached. Returns true, iff Fn returns true for all
> +// visited instructions. Stop after visiting Limit iterations.
> +static bool forAllMIsUntilDef(MachineInstr &MI, MCPhysReg DefReg,
> + const TargetRegisterInfo *TRI, unsigned
> Limit,
> + std::function<bool(MachineInstr &, bool)>
> &Fn) {
> + auto MBB = MI.getParent();
> + for (MachineBasicBlock::reverse_iterator I = MI.getReverseIterator(),
> + E = MBB->rend();
> + I != E; I++) {
> + if (!Limit)
> + return false;
> + --Limit;
> +
> + bool isDef = any_of(I->operands(), [DefReg, TRI](MachineOperand &MOP)
> {
> + return MOP.isReg() && MOP.isDef() &&
> + TRI->regsOverlap(MOP.getReg(), DefReg);
> + });
> + if (!Fn(*I, isDef))
> + return false;
> + if (isDef)
> + break;
> + }
> + return true;
> +}
> +
> +static void updateDefinedRegisters(MachineInstr &MI, LiveRegUnits &Units,
> + const TargetRegisterInfo *TRI) {
> +
> + for (const MachineOperand &MOP : phys_regs_and_masks(MI))
> + if (MOP.isReg() && MOP.isKill())
> + Units.removeReg(MOP.getReg());
> +
> + for (const MachineOperand &MOP : phys_regs_and_masks(MI))
> + if (MOP.isReg() && !MOP.isKill())
> + Units.addReg(MOP.getReg());
> +}
> +
> MachineBasicBlock::iterator
> AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,
> MachineBasicBlock::iterator Paired,
> @@ -803,6 +856,70 @@
> AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,
> int OffsetStride = IsUnscaled ? getMemScale(*I) : 1;
>
> bool MergeForward = Flags.getMergeForward();
> +
> + Optional<MCPhysReg> RenameReg = Flags.getRenameReg();
> + if (MergeForward && RenameReg) {
> + MCRegister RegToRename = getLdStRegOp(*I).getReg();
> + DefinedInBB.addReg(*RenameReg);
> +
> + // Return the sub/super register for RenameReg, matching the size of
> + // OriginalReg.
> + auto GetMatchingSubReg = [this,
> + RenameReg](MCPhysReg OriginalReg) ->
> MCPhysReg {
> + for (MCPhysReg SubOrSuper :
> TRI->sub_and_superregs_inclusive(*RenameReg))
> + if (TRI->getMinimalPhysRegClass(OriginalReg) ==
> + TRI->getMinimalPhysRegClass(SubOrSuper))
> + return SubOrSuper;
> + llvm_unreachable("Should have found matching sub or super
> register!");
> + };
> +
> + std::function<bool(MachineInstr &, bool)> UpdateMIs =
> + [this, RegToRename, GetMatchingSubReg](MachineInstr &MI, bool
> IsDef) {
> + if (IsDef) {
> + bool SeenDef = false;
> + for (auto &MOP : MI.operands()) {
> + // Rename the first explicit definition and all implicit
> + // definitions matching RegToRename.
> + if (MOP.isReg() &&
> + (!SeenDef || (MOP.isDef() && MOP.isImplicit())) &&
> + TRI->regsOverlap(MOP.getReg(), RegToRename)) {
> + assert((MOP.isImplicit() ||
> + (MOP.isRenamable() && !MOP.isEarlyClobber())) &&
> + "Need renamable operands");
> + MOP.setReg(GetMatchingSubReg(MOP.getReg()));
> + SeenDef = true;
> + }
> + }
> + } else {
> + for (auto &MOP : MI.operands()) {
> + if (MOP.isReg() && TRI->regsOverlap(MOP.getReg(),
> RegToRename)) {
> + assert(MOP.isImplicit() ||
> + (MOP.isRenamable() && !MOP.isEarlyClobber()) &&
> + "Need renamable operands");
> + MOP.setReg(GetMatchingSubReg(MOP.getReg()));
> + }
> + }
> + }
> + LLVM_DEBUG(dbgs() << "Renamed " << MI << "\n");
> + return true;
> + };
> + forAllMIsUntilDef(*I, RegToRename, TRI, LdStLimit, UpdateMIs);
> +
> + // Make sure the register used for renaming is not used between the
> paired
> + // instructions. That would trash the content before the new paired
> + // instruction.
> + for (auto &MI :
> + iterator_range<MachineInstrBundleIterator<llvm::MachineInstr>>(
> + std::next(I), std::next(Paired)))
> + assert(all_of(MI.operands(),
> + [this, &RenameReg](const MachineOperand &MOP) {
> + return !MOP.isReg() ||
> + !TRI->regsOverlap(MOP.getReg(), *RenameReg);
> + }) &&
> + "Rename register used between paired instruction, trashing
> the "
> + "content");
> + }
> +
> // Insert our new paired instruction after whichever of the paired
> // instructions MergeForward indicates.
> MachineBasicBlock::iterator InsertionPoint = MergeForward ? Paired : I;
> @@ -931,6 +1048,11 @@
> AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,
> }
> LLVM_DEBUG(dbgs() << "\n");
>
> + if (MergeForward)
> + for (const MachineOperand &MOP : phys_regs_and_masks(*I))
> + if (MOP.isReg() && MOP.isKill())
> + DefinedInBB.addReg(MOP.getReg());
> +
> // Erase the old instructions.
> I->eraseFromParent();
> Paired->eraseFromParent();
> @@ -1207,6 +1329,144 @@ static bool
> areCandidatesToMergeOrPair(MachineInstr &FirstMI, MachineInstr &MI,
> // FIXME: Can we also match a mixed sext/zext unscaled/scaled pair?
> }
>
> +static bool
> +canRenameUpToDef(MachineInstr &FirstMI, LiveRegUnits &UsedInBetween,
> + SmallPtrSetImpl<const TargetRegisterClass *>
> &RequiredClasses,
> + const TargetRegisterInfo *TRI) {
> + if (!FirstMI.mayStore())
> + return false;
> +
> + // Check if we can find an unused register which we can use to rename
> + // the register used by the first load/store.
> + auto *RegClass =
> TRI->getMinimalPhysRegClass(getLdStRegOp(FirstMI).getReg());
> + MachineFunction &MF = *FirstMI.getParent()->getParent();
> + if (!RegClass || !MF.getRegInfo().tracksLiveness())
> + return false;
> +
> + auto RegToRename = getLdStRegOp(FirstMI).getReg();
> + // For now, we only rename if the store operand gets killed at the
> store.
> + if (!getLdStRegOp(FirstMI).isKill() &&
> + !any_of(FirstMI.operands(),
> + [TRI, RegToRename](const MachineOperand &MOP) {
> + return MOP.isReg() && MOP.isImplicit() && MOP.isKill() &&
> + TRI->regsOverlap(RegToRename, MOP.getReg());
> + })) {
> + LLVM_DEBUG(dbgs() << " Operand not killed at " << FirstMI << "\n");
> + return false;
> + }
> + auto canRenameMOP = [](const MachineOperand &MOP) {
> + return MOP.isImplicit() ||
> + (MOP.isRenamable() && !MOP.isEarlyClobber() && !MOP.isTied());
> + };
> +
> + bool FoundDef = false;
> +
> + // For each instruction between FirstMI and the previous def for
> RegToRename,
> + // we
> + // * check if we can rename RegToRename in this instruction
> + // * collect the registers used and required register classes for
> RegToRename.
> + std::function<bool(MachineInstr &, bool)> CheckMIs = [&](MachineInstr
> &MI,
> + bool IsDef) {
> + LLVM_DEBUG(dbgs() << "Checking " << MI << "\n");
> + // Currently we do not try to rename across frame-setup instructions.
> + if (MI.getFlag(MachineInstr::FrameSetup)) {
> + LLVM_DEBUG(dbgs() << " Cannot rename framesetup instructions
> currently ("
> + << MI << ")\n");
> + return false;
> + }
> +
> + UsedInBetween.accumulate(MI);
> +
> + // For a definition, check that we can rename the definition and exit
> the
> + // loop.
> + FoundDef = IsDef;
> +
> + // For defs, check if we can rename the first def of RegToRename.
> + if (FoundDef) {
> + for (auto &MOP : MI.operands()) {
> + if (!MOP.isReg() || !MOP.isDef() ||
> + !TRI->regsOverlap(MOP.getReg(), RegToRename))
> + continue;
> + if (!canRenameMOP(MOP)) {
> + LLVM_DEBUG(dbgs()
> + << " Cannot rename " << MOP << " in " << MI <<
> "\n");
> + return false;
> + }
> + RequiredClasses.insert(TRI->getMinimalPhysRegClass(MOP.getReg()));
> + }
> + return true;
> + } else {
> + for (auto &MOP : MI.operands()) {
> + if (!MOP.isReg() || !TRI->regsOverlap(MOP.getReg(), RegToRename))
> + continue;
> +
> + if (!canRenameMOP(MOP)) {
> + LLVM_DEBUG(dbgs()
> + << " Cannot rename " << MOP << " in " << MI <<
> "\n");
> + return false;
> + }
> + RequiredClasses.insert(TRI->getMinimalPhysRegClass(MOP.getReg()));
> + }
> + }
> + return true;
> + };
> +
> + if (!forAllMIsUntilDef(FirstMI, RegToRename, TRI, LdStLimit, CheckMIs))
> + return false;
> +
> + if (!FoundDef) {
> + LLVM_DEBUG(dbgs() << " Did not find definition for register in
> BB\n");
> + return false;
> + }
> + return true;
> +}
> +
> +// Check if we can find a physical register for renaming. This register
> must:
> +// * not be defined up to FirstMI (checking DefinedInBB)
> +// * not used between the MI and the defining instruction of the register
> to
> +// rename (checked using UsedInBetween).
> +// * is available in all used register classes (checked using
> RequiredClasses).
> +static Optional<MCPhysReg> tryToFindRegisterToRename(
> + MachineInstr &FirstMI, MachineInstr &MI, LiveRegUnits &DefinedInBB,
> + LiveRegUnits &UsedInBetween,
> + SmallPtrSetImpl<const TargetRegisterClass *> &RequiredClasses,
> + const TargetRegisterInfo *TRI) {
> + auto &MF = *FirstMI.getParent()->getParent();
> +
> + // Checks if any sub- or super-register of PR is callee saved.
> + auto AnySubOrSuperRegCalleePreserved = [&MF, TRI](MCPhysReg PR) {
> + return any_of(TRI->sub_and_superregs_inclusive(PR),
> + [&MF, TRI](MCPhysReg SubOrSuper) {
> + return TRI->isCalleeSavedPhysReg(SubOrSuper, MF);
> + });
> + };
> +
> + // Check if PR or one of its sub- or super-registers can be used for all
> + // required register classes.
> + auto CanBeUsedForAllClasses = [&RequiredClasses, TRI](MCPhysReg PR) {
> + return all_of(RequiredClasses, [PR, TRI](const TargetRegisterClass
> *C) {
> + return any_of(TRI->sub_and_superregs_inclusive(PR),
> + [C, TRI](MCPhysReg SubOrSuper) {
> + return C == TRI->getMinimalPhysRegClass(SubOrSuper);
> + });
> + });
> + };
> +
> + auto *RegClass =
> TRI->getMinimalPhysRegClass(getLdStRegOp(FirstMI).getReg());
> + for (const MCPhysReg &PR : *RegClass) {
> + if (DefinedInBB.available(PR) && UsedInBetween.available(PR) &&
> + !AnySubOrSuperRegCalleePreserved(PR) &&
> CanBeUsedForAllClasses(PR)) {
> + DefinedInBB.addReg(PR);
> + LLVM_DEBUG(dbgs() << "Found rename register " << printReg(PR, TRI)
> + << "\n");
> + return {PR};
> + }
> + }
> + LLVM_DEBUG(dbgs() << "No rename register found from "
> + << TRI->getRegClassName(RegClass) << "\n");
> + return None;
> +}
> +
> /// Scan the instructions looking for a load/store that can be combined
> with the
> /// current instruction into a wider equivalent or a load/store pair.
> MachineBasicBlock::iterator
> @@ -1215,6 +1475,7 @@
> AArch64LoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,
> bool FindNarrowMerge) {
> MachineBasicBlock::iterator E = I->getParent()->end();
> MachineBasicBlock::iterator MBBI = I;
> + MachineBasicBlock::iterator MBBIWithRenameReg;
> MachineInstr &FirstMI = *I;
> ++MBBI;
>
> @@ -1226,6 +1487,13 @@
> AArch64LoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,
> int OffsetStride = IsUnscaled ? getMemScale(FirstMI) : 1;
> bool IsPromotableZeroStore = isPromotableZeroStoreInst(FirstMI);
>
> + Optional<bool> MaybeCanRename = None;
> + SmallPtrSet<const TargetRegisterClass *, 5> RequiredClasses;
> + LiveRegUnits UsedInBetween;
> + UsedInBetween.init(*TRI);
> +
> + Flags.clearRenameReg();
> +
> // Track which register units have been modified and used between the
> first
> // insn (inclusive) and the second insn.
> ModifiedRegUnits.clear();
> @@ -1237,6 +1505,8 @@
> AArch64LoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,
> for (unsigned Count = 0; MBBI != E && Count < Limit; ++MBBI) {
> MachineInstr &MI = *MBBI;
>
> + UsedInBetween.accumulate(MI);
> +
> // Don't count transient instructions towards the search limit since
> there
> // may be
> diff erent numbers of them if e.g. debug information is present.
> if (!MI.isTransient())
> @@ -1329,7 +1599,9 @@
> AArch64LoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,
> !(MI.mayLoad() &&
> !UsedRegUnits.available(getLdStRegOp(MI).getReg())) &&
> !mayAlias(MI, MemInsns, AA)) {
> +
> Flags.setMergeForward(false);
> + Flags.clearRenameReg();
> return MBBI;
> }
>
> @@ -1337,18 +1609,41 @@
> AArch64LoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,
> // between the two instructions and none of the instructions
> between the
> // first and the second alias with the first, we can combine the
> first
> // into the second.
> - if (ModifiedRegUnits.available(getLdStRegOp(FirstMI).getReg()) &&
> - !(MayLoad &&
> + if (!(MayLoad &&
> !UsedRegUnits.available(getLdStRegOp(FirstMI).getReg())) &&
> !mayAlias(FirstMI, MemInsns, AA)) {
> - Flags.setMergeForward(true);
> - return MBBI;
> +
> + if (ModifiedRegUnits.available(getLdStRegOp(FirstMI).getReg()))
> {
> + Flags.setMergeForward(true);
> + Flags.clearRenameReg();
> + return MBBI;
> + }
> +
> + if (DebugCounter::shouldExecute(RegRenamingCounter)) {
> + if (!MaybeCanRename)
> + MaybeCanRename = {canRenameUpToDef(FirstMI, UsedInBetween,
> + RequiredClasses, TRI)};
> +
> + if (*MaybeCanRename) {
> + Optional<MCPhysReg> MaybeRenameReg =
> tryToFindRegisterToRename(
> + FirstMI, MI, DefinedInBB, UsedInBetween,
> RequiredClasses,
> + TRI);
> + if (MaybeRenameReg) {
> + Flags.setRenameReg(*MaybeRenameReg);
> + Flags.setMergeForward(true);
> + MBBIWithRenameReg = MBBI;
> + }
> + }
> + }
> }
> // Unable to combine these instructions due to interference in
> between.
> // Keep looking.
> }
> }
>
> + if (Flags.getRenameReg())
> + return MBBIWithRenameReg;
> +
> // If the instruction wasn't a matching load or store. Stop
> searching if we
> // encounter a call instruction that might modify memory.
> if (MI.isCall())
> @@ -1680,7 +1975,13 @@ bool
> AArch64LoadStoreOpt::tryToPairLdStInst(MachineBasicBlock::iterator &MBBI) {
> ++NumUnscaledPairCreated;
> // Keeping the iterator straight is a pain, so we let the merge
> routine tell
> // us what the next instruction is after it's done mucking about.
> + auto Prev = std::prev(MBBI);
> MBBI = mergePairedInsns(MBBI, Paired, Flags);
> + // Collect liveness info for instructions between Prev and the new
> position
> + // MBBI.
> + for (auto I = std::next(Prev); I != MBBI; I++)
> + updateDefinedRegisters(*I, DefinedInBB, TRI);
> +
> return true;
> }
> return false;
> @@ -1742,6 +2043,7 @@ bool AArch64LoadStoreOpt::tryToMergeLdStUpdate
>
> bool AArch64LoadStoreOpt::optimizeBlock(MachineBasicBlock &MBB,
> bool EnableNarrowZeroStOpt) {
> +
> bool Modified = false;
> // Four tranformations to do here:
> // 1) Find loads that directly read from stores and promote them by
> @@ -1786,8 +2088,17 @@ bool
> AArch64LoadStoreOpt::optimizeBlock(MachineBasicBlock &MBB,
> // ldr x1, [x2, #8]
> // ; becomes
> // ldp x0, x1, [x2]
> +
> + if (MBB.getParent()->getRegInfo().tracksLiveness()) {
> + DefinedInBB.clear();
> + DefinedInBB.addLiveIns(MBB);
> + }
> +
> for (MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();
> MBBI != E;) {
> + // Track currently live registers up to this point, to help with
> + // searching for a rename register on demand.
> + updateDefinedRegisters(*MBBI, DefinedInBB, TRI);
> if (TII->isPairableLdStInst(*MBBI) && tryToPairLdStInst(MBBI))
> Modified = true;
> else
> @@ -1825,11 +2136,14 @@ bool
> AArch64LoadStoreOpt::runOnMachineFunction(MachineFunction &Fn) {
> // or store.
> ModifiedRegUnits.init(*TRI);
> UsedRegUnits.init(*TRI);
> + DefinedInBB.init(*TRI);
>
> bool Modified = false;
> bool enableNarrowZeroStOpt = !Subtarget->requiresStrictAlign();
> - for (auto &MBB : Fn)
> - Modified |= optimizeBlock(MBB, enableNarrowZeroStOpt);
> + for (auto &MBB : Fn) {
> + auto M = optimizeBlock(MBB, enableNarrowZeroStOpt);
> + Modified |= M;
> + }
>
> return Modified;
> }
>
> diff --git a/llvm/test/CodeGen/AArch64/arm64-abi-varargs.ll
> b/llvm/test/CodeGen/AArch64/arm64-abi-varargs.ll
> index ec3b51bd37a8..7caa4c06d698 100644
> --- a/llvm/test/CodeGen/AArch64/arm64-abi-varargs.ll
> +++ b/llvm/test/CodeGen/AArch64/arm64-abi-varargs.ll
> @@ -14,10 +14,9 @@ define void @fn9(i32* %a1, i32 %a2, i32 %a3, i32 %a4,
> i32 %a5, i32 %a6, i32 %a7,
> ; CHECK-NEXT: stp w6, w5, [sp, #36]
> ; CHECK-NEXT: str w7, [sp, #32]
> ; CHECK-NEXT: str w8, [x0]
> -; CHECK-NEXT: ldr w8, [sp, #72]
> -; CHECK-NEXT: str w8, [sp, #20]
> +; CHECK-NEXT: ldr w9, [sp, #72]
> ; CHECK-NEXT: ldr w8, [sp, #80]
> -; CHECK-NEXT: str w8, [sp, #16]
> +; CHECK-NEXT: stp w8, w9, [sp, #16]
> ; CHECK-NEXT: add x8, sp, #72 ; =72
> ; CHECK-NEXT: add x8, x8, #24 ; =24
> ; CHECK-NEXT: str x8, [sp, #24]
> @@ -65,22 +64,18 @@ define i32 @main() nounwind ssp {
> ; CHECK: ; %bb.0:
> ; CHECK-NEXT: sub sp, sp, #96 ; =96
> ; CHECK-NEXT: stp x29, x30, [sp, #80] ; 16-byte Folded Spill
> -; CHECK-NEXT: mov w8, #1
> -; CHECK-NEXT: str w8, [sp, #76]
> +; CHECK-NEXT: mov w9, #1
> ; CHECK-NEXT: mov w8, #2
> -; CHECK-NEXT: str w8, [sp, #72]
> -; CHECK-NEXT: mov w8, #3
> -; CHECK-NEXT: str w8, [sp, #68]
> +; CHECK-NEXT: stp w8, w9, [sp, #72]
> +; CHECK-NEXT: mov w9, #3
> ; CHECK-NEXT: mov w8, #4
> -; CHECK-NEXT: str w8, [sp, #64]
> -; CHECK-NEXT: mov w8, #5
> -; CHECK-NEXT: str w8, [sp, #60]
> +; CHECK-NEXT: stp w8, w9, [sp, #64]
> +; CHECK-NEXT: mov w9, #5
> ; CHECK-NEXT: mov w8, #6
> -; CHECK-NEXT: str w8, [sp, #56]
> -; CHECK-NEXT: mov w8, #7
> -; CHECK-NEXT: str w8, [sp, #52]
> +; CHECK-NEXT: stp w8, w9, [sp, #56]
> +; CHECK-NEXT: mov w9, #7
> ; CHECK-NEXT: mov w8, #8
> -; CHECK-NEXT: str w8, [sp, #48]
> +; CHECK-NEXT: stp w8, w9, [sp, #48]
> ; CHECK-NEXT: mov w8, #9
> ; CHECK-NEXT: mov w9, #10
> ; CHECK-NEXT: stp w9, w8, [sp, #40]
>
> diff --git a/llvm/test/CodeGen/AArch64/arm64-abi_align.ll
> b/llvm/test/CodeGen/AArch64/arm64-abi_align.ll
> index 7db3ea76de05..189546a4553b 100644
> --- a/llvm/test/CodeGen/AArch64/arm64-abi_align.ll
> +++ b/llvm/test/CodeGen/AArch64/arm64-abi_align.ll
> @@ -392,10 +392,8 @@ entry:
> define i32 @caller43() #3 {
> entry:
> ; CHECK-LABEL: caller43
> -; CHECK-DAG: str {{q[0-9]+}}, [sp, #48]
> -; CHECK-DAG: str {{q[0-9]+}}, [sp, #32]
> -; CHECK-DAG: str {{q[0-9]+}}, [sp, #16]
> -; CHECK-DAG: str {{q[0-9]+}}, [sp]
> +; CHECK-DAG: stp q1, q0, [sp, #32]
> +; CHECK-DAG: stp q1, q0, [sp]
> ; CHECK: add x1, sp, #32
> ; CHECK: mov x2, sp
> ; Space for s1 is allocated at sp+32
> @@ -434,10 +432,8 @@ entry:
> ; CHECK-LABEL: caller43_stack
> ; CHECK: sub sp, sp, #112
> ; CHECK: add x29, sp, #96
> -; CHECK-DAG: stur {{q[0-9]+}}, [x29, #-16]
> -; CHECK-DAG: stur {{q[0-9]+}}, [x29, #-32]
> -; CHECK-DAG: str {{q[0-9]+}}, [sp, #48]
> -; CHECK-DAG: str {{q[0-9]+}}, [sp, #32]
> +; CHECK-DAG: stp q1, q0, [x29, #-32]
> +; CHECK-DAG: stp q1, q0, [sp, #32]
> ; Space for s1 is allocated at x29-32 = sp+64
> ; Space for s2 is allocated at sp+32
> ; CHECK: add x[[B:[0-9]+]], sp, #32
>
> diff --git a/llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
> b/llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
> index db87d7fae803..5f4c17d0cfba 100644
> --- a/llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
> +++ b/llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
> @@ -26,11 +26,11 @@ define void @test_simple(i32 %n, ...) {
>
> ; CHECK: add [[GR_TOPTMP:x[0-9]+]], sp, #[[GR_BASE]]
> ; CHECK: add [[GR_TOP:x[0-9]+]], [[GR_TOPTMP]], #56
> -; CHECK: str [[GR_TOP]], [x[[VA_LIST]], #8]
> +
>
> ; CHECK: mov [[VR_TOPTMP:x[0-9]+]], sp
> ; CHECK: add [[VR_TOP:x[0-9]+]], [[VR_TOPTMP]], #128
> -; CHECK: str [[VR_TOP]], [x[[VA_LIST]], #16]
> +; CHECK: stp [[GR_TOP]], [[VR_TOP]], [x[[VA_LIST]], #8]
>
> ; CHECK: mov [[GRVR:x[0-9]+]], #-56
> ; CHECK: movk [[GRVR]], #65408, lsl #32
> @@ -62,11 +62,10 @@ define void @test_fewargs(i32 %n, i32 %n1, i32 %n2,
> float %m, ...) {
>
> ; CHECK: add [[GR_TOPTMP:x[0-9]+]], sp, #[[GR_BASE]]
> ; CHECK: add [[GR_TOP:x[0-9]+]], [[GR_TOPTMP]], #40
> -; CHECK: str [[GR_TOP]], [x[[VA_LIST]], #8]
>
> ; CHECK: mov [[VR_TOPTMP:x[0-9]+]], sp
> ; CHECK: add [[VR_TOP:x[0-9]+]], [[VR_TOPTMP]], #112
> -; CHECK: str [[VR_TOP]], [x[[VA_LIST]], #16]
> +; CHECK: stp [[GR_TOP]], [[VR_TOP]], [x[[VA_LIST]], #8]
>
> ; CHECK: mov [[GRVR_OFFS:x[0-9]+]], #-40
> ; CHECK: movk [[GRVR_OFFS]], #65424, lsl #32
>
> diff --git a/llvm/test/CodeGen/AArch64/machine-outliner-remarks.ll
> b/llvm/test/CodeGen/AArch64/machine-outliner-remarks.ll
> index 19351262b82b..f188301579e3 100644
> --- a/llvm/test/CodeGen/AArch64/machine-outliner-remarks.ll
> +++ b/llvm/test/CodeGen/AArch64/machine-outliner-remarks.ll
> @@ -4,7 +4,7 @@
> ; CHECK-SAME: Bytes from outlining all occurrences (16) >=
> ; CHECK-SAME: Unoutlined instruction bytes (16)
> ; CHECK-SAME: (Also found at: <UNKNOWN LOCATION>)
> -; CHECK: remark: <unknown>:0:0: Saved 48 bytes by outlining 14
> instructions
> +; CHECK: remark: <unknown>:0:0: Saved 36 bytes by outlining 11
> instructions
> ; CHECK-SAME: from 2 locations. (Found at: <UNKNOWN LOCATION>,
> ; CHECK-SAME: <UNKNOWN LOCATION>)
> ; RUN: llc %s -enable-machine-outliner -mtriple=aarch64-unknown-unknown
> -o /dev/null -pass-remarks-missed=machine-outliner
> -pass-remarks-output=%t.yaml
> @@ -38,10 +38,10 @@
> ; YAML-NEXT: Function: OUTLINED_FUNCTION_0
> ; YAML-NEXT: Args:
> ; YAML-NEXT: - String: 'Saved '
> -; YAML-NEXT: - OutliningBenefit: '48'
> +; YAML-NEXT: - OutliningBenefit: '36'
> ; YAML-NEXT: - String: ' bytes by '
> ; YAML-NEXT: - String: 'outlining '
> -; YAML-NEXT: - Length: '14'
> +; YAML-NEXT: - Length: '11'
> ; YAML-NEXT: - String: ' instructions '
> ; YAML-NEXT: - String: 'from '
> ; YAML-NEXT: - NumOccurrences: '2'
>
> diff --git a/llvm/test/CodeGen/AArch64/machine-outliner.ll
> b/llvm/test/CodeGen/AArch64/machine-outliner.ll
> index 15afdd43d116..13a12f76695d 100644
> --- a/llvm/test/CodeGen/AArch64/machine-outliner.ll
> +++ b/llvm/test/CodeGen/AArch64/machine-outliner.ll
> @@ -91,19 +91,16 @@ define void @dog() #0 {
> ; ODR: [[OUTLINED]]:
> ; CHECK: .p2align 2
> ; CHECK-NEXT: [[OUTLINED]]:
> -; CHECK: mov w8, #1
> -; CHECK-NEXT: str w8, [sp, #28]
> -; CHECK-NEXT: mov w8, #2
> -; CHECK-NEXT: str w8, [sp, #24]
> -; CHECK-NEXT: mov w8, #3
> -; CHECK-NEXT: str w8, [sp, #20]
> -; CHECK-NEXT: mov w8, #4
> -; CHECK-NEXT: str w8, [sp, #16]
> -; CHECK-NEXT: mov w8, #5
> -; CHECK-NEXT: str w8, [sp, #12]
> -; CHECK-NEXT: mov w8, #6
> -; CHECK-NEXT: str w8, [sp, #8]
> -; CHECK-NEXT: add sp, sp, #32
> -; CHECK-NEXT: ret
> +; CHECK: mov w9, #1
> +; CHECK-DAG: mov w8, #2
> +; CHECK-DAG: stp w8, w9, [sp, #24]
> +; CHECK-DAG: mov w9, #3
> +; CHECK-DAG: mov w8, #4
> +; CHECK-DAG: stp w8, w9, [sp, #16]
> +; CHECK-DAG: mov w9, #5
> +; CHECK-DAG: mov w8, #6
> +; CHECK-DAG: stp w8, w9, [sp, #8]
> +; CHECK-DAG: add sp, sp, #32
> +; CHECK-DAG: ret
>
> attributes #0 = { noredzone "target-cpu"="cyclone"
> "target-features"="+sse" }
>
> diff --git a/llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir
> b/llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir
> new file mode 100644
> index 000000000000..018827772da5
> --- /dev/null
> +++ b/llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir
> @@ -0,0 +1,471 @@
> +# RUN: llc -run-pass=aarch64-ldst-opt -mtriple=arm64-apple-iphoneos
> -verify-machineinstrs -o - %s | FileCheck %s
> +
> +---
> +# CHECK-LABEL: name: test1
> +# CHECK: bb.0:
> +# CHECK-NEXT: liveins: $x0, $x1
> +# CHECK: $x10, renamable $x8 = LDPXi renamable $x0, 0 :: (load 8)
> +# CHECK-NEXT: renamable $x9 = LDRXui renamable $x0, 1 :: (load 8)
> +# CHECK-NEXT: STRXui renamable $x9, renamable $x0, 100 :: (store 8,
> align 4)
> +# CHECK-NEXT: renamable $x8 = ADDXrr $x8, $x8
> +# CHECK-NEXT: STPXi renamable $x8, killed $x10, renamable $x0, 10 ::
> (store 8, align 4)
> +# CHECK-NEXT: RET undef $lr
> +
> +name: test1
> +alignment: 4
> +tracksRegLiveness: true
> +liveins:
> + - { reg: '$x0' }
> + - { reg: '$x1' }
> + - { reg: '$x8' }
> +frameInfo:
> + maxAlignment: 1
> + maxCallFrameSize: 0
> +machineFunctionInfo: {}
> +body: |
> + bb.0:
> + liveins: $x0, $x1
> + renamable $x9, renamable $x8 = LDPXi renamable $x0, 0 :: (load 8)
> + STRXui renamable killed $x9, renamable $x0, 11 :: (store 8, align 4)
> + renamable $x9 = LDRXui renamable $x0, 1 :: (load 8)
> + STRXui renamable $x9, renamable $x0, 100 :: (store 8, align 4)
> + renamable $x8 = ADDXrr $x8, $x8
> + STRXui renamable $x8, renamable $x0, 10 :: (store 8, align 4)
> + RET undef $lr
> +
> +...
> +---
> +# CHECK-LABEL: name: test2
> +# CHECK-LABEL: bb.0:
> +# CHECK-NEXT: liveins: $x0, $x9, $x1
> +
> +# CHECK: $x10, renamable $x8 = LDPXi renamable $x9, 0 :: (load 8)
> +# CHECK-NEXT: renamable $x9 = LDRXui renamable $x0, 2 :: (load 8)
> +# CHECK-NEXT: STRXui renamable $x9, renamable $x0, 100 :: (store 8,
> align 4)
> +# CHECK-NEXT: renamable $x8 = ADDXrr $x8, $x8
> +# CHECK-NEXT: STPXi renamable $x8, killed $x10, renamable $x0, 10 ::
> (store 8, align 4)
> +# CHECK-NEXT: RET undef $lr
> +
> +name: test2
> +alignment: 4
> +tracksRegLiveness: true
> +liveins:
> + - { reg: '$x0' }
> + - { reg: '$x1' }
> + - { reg: '$x9' }
> +frameInfo:
> + maxAlignment: 1
> + maxCallFrameSize: 0
> +machineFunctionInfo: {}
> +body: |
> + bb.0:
> + liveins: $x0, $x9, $x1
> + renamable $x9, renamable $x8 = LDPXi renamable $x9, 0 :: (load 8)
> + STRXui renamable killed $x9, renamable $x0, 11 :: (store 8, align 4)
> + renamable $x9 = LDRXui renamable $x0, 2 :: (load 8)
> + STRXui renamable $x9, renamable $x0, 100 :: (store 8, align 4)
> + renamable $x8 = ADDXrr $x8, $x8
> + STRXui renamable $x8, renamable $x0, 10 :: (store 8, align 4)
> + RET undef $lr
> +
> +...
> +---
> +# MOVK has a tied operand and we currently do not rename across tied defs.
> +# CHECK-LABEL: bb.0:
> +# CHECK-NEXT: liveins: $x0
> +#
> +# CHECK: renamable $x8 = MRS 58880
> +# CHECK-NEXT: renamable $x8 = MOVZXi 15309, 0
> +# CHECK-NEXT: renamable $x8 = MOVKXi renamable $x8, 26239, 16
> +# CHECK-NEXT: STRXui renamable $x8, renamable $x0, 0, implicit killed
> $x8 :: (store 8)
> +# CHECK-NEXT: renamable $x8 = MRS 55840
> +# CHECK-NEXT: STRXui killed renamable $x8, killed renamable $x0, 1,
> implicit killed $x8 :: (store 8)
> +# CHECK-NEXT: RET undef $lr
> +#
> +name: test3
> +alignment: 2
> +tracksRegLiveness: true
> +liveins:
> + - { reg: '$x0' }
> +frameInfo:
> + maxCallFrameSize: 0
> +machineFunctionInfo: {}
> +body: |
> + bb.0:
> + liveins: $x0
> +
> + renamable $x8 = MRS 58880
> + renamable $x8 = MOVZXi 15309, 0
> + renamable $x8 = MOVKXi renamable $x8, 26239, 16
> + STRXui renamable $x8, renamable $x0, 0, implicit killed $x8 :: (store
> 8)
> + renamable $x8 = MRS 55840
> + STRXui killed renamable $x8, renamable killed $x0, 1, implicit
> killed $x8 :: (store 8)
> + RET undef $lr
> +
> +...
> +---
> +# CHECK-LABEL: name: test4
> +# CHECK-LABEL: bb.0:
> +# CHECK-NEXT: liveins: $x0, $x1
> +
> +# CHECK: $x9 = MRS 58880
> +# CHECK-NEXT: renamable $x8 = MRS 55840
> +# CHECK-NEXT: STPXi $x9, killed renamable $x8, killed renamable $x0, 0
> :: (store 4)
> +# CHECK-NEXT: RET undef $lr
> +
> +name: test4
> +alignment: 4
> +tracksRegLiveness: true
> +liveins:
> + - { reg: '$x0' }
> + - { reg: '$x1' }
> + - { reg: '$x8' }
> +frameInfo:
> + maxAlignment: 1
> + maxCallFrameSize: 0
> +machineFunctionInfo: {}
> +body: |
> + bb.0:
> + liveins: $x0, $x1
> +
> + renamable $x8 = MRS 58880
> + STRXui renamable $x8, renamable $x0, 0, implicit killed $x8 :: (store
> 4)
> + renamable $x8 = MRS 55840
> + STRXui killed renamable $x8, renamable killed $x0, 1, implicit
> killed $x8 :: (store 4)
> + RET undef $lr
> +
> +...
> +---
> +# CHECK-LABEL: name: test5
> +# CHECK-LABEL: bb.0:
> +# CHECK-NEXT: liveins: $x0, $x1
> +
> +# CHECK: $x9 = MRS 58880
> +# CHECK-NEXT: renamable $x8 = MRS 55840
> +# CHECK-NEXT: STPWi $w9, killed renamable $w8, killed renamable $x0, 0
> :: (store 4)
> +# CHECK-NEXT: RET undef $lr
> +
> +name: test5
> +alignment: 4
> +tracksRegLiveness: true
> +liveins:
> + - { reg: '$x0' }
> + - { reg: '$x1' }
> + - { reg: '$x8' }
> +frameInfo:
> + maxAlignment: 1
> + maxCallFrameSize: 0
> +machineFunctionInfo: {}
> +body: |
> + bb.0:
> + liveins: $x0, $x1
> +
> + renamable $x8 = MRS 58880
> + STRWui renamable $w8, renamable $x0, 0, implicit killed $x8 :: (store
> 4)
> + renamable $x8 = MRS 55840
> + STRWui killed renamable $w8, renamable killed $x0, 1, implicit killed
> $x8 :: (store 4)
> + RET undef $lr
> +
> +...
> +---
> +# CHECK-LABEL: name: test6
> +# CHECK-LABEL bb.0:
> +# CHECK: liveins: $x0, $x1, $q3
> +
> +# CHECK: renamable $q9 = LDRQui $x0, 0 :: (load 16)
> +# CHECK-NEXT: renamable $q9 = XTNv8i16 renamable $q9, killed renamable
> $q3
> +# CHECK-NEXT: STRQui renamable $q9, renamable $x0, 11 :: (store 16,
> align 4)
> +# CHECK-NEXT: renamable $q9 = FADDv2f64 renamable $q9, renamable $q9
> +# CHECK-NEXT: STRQui renamable $q9, renamable $x0, 10 :: (store 16,
> align 4)
> +# CHECK-NEXT: RET undef $lr
> +
> +# XTN has a tied use-def.
> +name: test6
> +alignment: 4
> +tracksRegLiveness: true
> +liveins:
> + - { reg: '$x0' }
> + - { reg: '$x1' }
> + - { reg: '$x8' }
> + - { reg: '$q3' }
> +frameInfo:
> + maxAlignment: 1
> + maxCallFrameSize: 0
> +machineFunctionInfo: {}
> +body: |
> + bb.0:
> + liveins: $x0, $x1, $q3
> + renamable $q9 = LDRQui $x0, 0 :: (load 16)
> + renamable $q9 = XTNv8i16 renamable $q9, killed renamable $q3
> + STRQui renamable $q9, renamable $x0, 11 :: (store 16, align 4)
> + renamable $q9 = FADDv2f64 renamable $q9, renamable $q9
> + STRQui renamable $q9, renamable $x0, 10 :: (store 16, align 4)
> + RET undef $lr
> +
> +...
> +---
> +# Currently we do not rename across frame-setup instructions.
> +# CHECK-LABEL: name: test7
> +# CHECK-LABEL: bb.0:
> +# CHECK-NEXT: liveins: $x0, $x1
> +
> +# CHECK: $sp = frame-setup SUBXri $sp, 64, 0
> +# CHECK-NEXT: renamable $x9 = frame-setup LDRXui renamable $x0, 0 ::
> (load 8)
> +# CHECK-NEXT: STRXui renamable $x9, $x0, 10 :: (store 8, align 4)
> +# CHECK-NEXT: renamable $x9 = LDRXui renamable $x0, 1 :: (load 8)
> +# CHECK-NEXT: STRXui renamable $x9, $x0, 11 :: (store 8, align 4)
> +# CHECK-NEXT: RET undef $lr
> +#
> +name: test7
> +alignment: 4
> +tracksRegLiveness: true
> +liveins:
> + - { reg: '$x0' }
> + - { reg: '$x1' }
> + - { reg: '$x8' }
> +frameInfo:
> + stackSize: 64
> + maxAlignment: 16
> + adjustsStack: true
> + hasCalls: true
> + maxCallFrameSize: 0
> +stack:
> + - { id: 0, type: spill-slot, offset: -48, size: 16, alignment: 16 }
> + - { id: 1, type: spill-slot, offset: -64, size: 16, alignment: 16 }
> +machineFunctionInfo: {}
> +body: |
> + bb.0:
> + liveins: $x0, $x1
> + $sp = frame-setup SUBXri $sp, 64, 0
> + renamable $x9 = frame-setup LDRXui renamable $x0, 0 :: (load 8)
> + STRXui renamable $x9, $x0, 10 :: (store 8, align 4)
> + renamable $x9 = LDRXui renamable $x0, 1 :: (load 8)
> + STRXui renamable $x9, $x0, 11 :: (store 8, align 4)
> + RET undef $lr
> +...
> +---
> +# CHECK-LABEL: name: test8
> +# CHECK-LABEL: bb.0:
> +# CHECK-NEXT: liveins: $x0, $x1
> +
> +# CHECK: renamable $x8 = MRS 58880
> +# CHECK-NEXT: $w9 = ORRWrs $wzr, killed renamable $w8, 0, implicit-def
> $x9
> +# CHECK-NEXT: renamable $x8 = MRS 55840
> +# CHECK-NEXT: STPWi $w9, killed renamable $w8, killed renamable $x0, 0
> :: (store 4)
> +# CHECK-NEXT: RET undef $lr
> +
> +name: test8
> +alignment: 4
> +tracksRegLiveness: true
> +liveins:
> + - { reg: '$x0' }
> + - { reg: '$x1' }
> + - { reg: '$x8' }
> +frameInfo:
> + maxAlignment: 1
> + maxCallFrameSize: 0
> +machineFunctionInfo: {}
> +body: |
> + bb.0:
> + liveins: $x0, $x1
> +
> + renamable $x8 = MRS 58880
> + renamable $w8 = ORRWrs $wzr, killed renamable $w8, 0, implicit-def $x8
> + STRWui renamable $w8, renamable $x0, 0, implicit killed $x8 :: (store
> 4)
> + renamable $x8 = MRS 55840
> + STRWui killed renamable $w8, renamable killed $x0, 1, implicit killed
> $x8 :: (store 4)
> + RET undef $lr
> +
> +...
> +---
> +# The reg class returned for $q9 contains only the first 16 Q registers.
> +# TODO: Can we check that all instructions that require renaming also
> support
> +# the second 16 Q registers?
> +# CHECK-LABEL: name: test9
> +# CHECK-LABEL bb.0:
> +# CHECK: liveins: $x0, $x1, $q0, $q1, $q2, $q3, $q4, $q5, $q6, $q7
> +
> +# CHECK: renamable $q9 = LDRQui $x0, 0 :: (load 16)
> +# CHECK-NEXT: STRQui killed renamable $q9, renamable $x0, 10 :: (store
> 16, align 4)
> +# CHECK: renamable $q9 = LDRQui $x0, 1 :: (load 16)
> +# CHECK-NEXT: STRQui renamable $q9, renamable $x0, 11 :: (store 16,
> align 4)
> +# CHECK-NEXT: RET undef $lr
> +
> +name: test9
> +alignment: 4
> +tracksRegLiveness: true
> +liveins:
> + - { reg: '$x0' }
> + - { reg: '$x1' }
> + - { reg: '$x8' }
> + - { reg: '$q3' }
> +frameInfo:
> + maxAlignment: 1
> + maxCallFrameSize: 0
> +machineFunctionInfo: {}
> +body: |
> + bb.0:
> + liveins: $x0, $x1, $q0, $q1, $q2, $q3, $q4, $q5, $q6, $q7
> + renamable $q9 = LDRQui $x0, 0 :: (load 16)
> + STRQui renamable killed $q9, renamable $x0, 10 :: (store 16, align 4)
> + renamable $q9 = LDRQui $x0, 1 :: (load 16)
> + STRQui renamable $q9, renamable $x0, 11 :: (store 16, align 4)
> + RET undef $lr
> +
> +...
> +---
> +# The livein $q7 is killed early, so we can re-use it for renaming.
> +# CHECK-LABEL: name: test10
> +# CHECK-LABEL bb.0:
> +# CHECK: liveins: $x0, $x1, $q0, $q1, $q2, $q3, $q4, $q5, $q6, $q7
> +
> +# CHECK: renamable $q7 = FADDv2f64 renamable $q7, renamable $q7
> +# CHECK-NEXT: STRQui killed renamable $q7, renamable $x0, 100 ::
> (store 16, align 4)
> +# CHECK-NEXT: $q7 = LDRQui $x0, 0 :: (load 16)
> +# CHECK-NEXT: renamable $q9 = LDRQui $x0, 1 :: (load 16)
> +# CHECK-NEXT: STPQi killed renamable $q9, killed $q7, renamable $x0,
> 10 :: (store 16, align 4)
> +# CHECK-NEXT: RET undef $lr
> +
> +name: test10
> +alignment: 4
> +tracksRegLiveness: true
> +liveins:
> + - { reg: '$x0' }
> + - { reg: '$x1' }
> + - { reg: '$x8' }
> + - { reg: '$q3' }
> +frameInfo:
> + maxAlignment: 1
> + maxCallFrameSize: 0
> +machineFunctionInfo: {}
> +body: |
> + bb.0:
> + liveins: $x0, $x1, $q0, $q1, $q2, $q3, $q4, $q5, $q6, $q7
> + renamable $q7 = FADDv2f64 renamable $q7, renamable $q7
> + STRQui renamable killed $q7, renamable $x0, 100 :: (store 16, align 4)
> + renamable $q9 = LDRQui $x0, 0 :: (load 16)
> + STRQui renamable killed $q9, renamable $x0, 11 :: (store 16, align 4)
> + renamable $q9 = LDRQui $x0, 1 :: (load 16)
> + STRQui renamable killed $q9, renamable $x0, 10 :: (store 16, align 4)
> + RET undef $lr
> +
> +...
> +---
> +# Make sure we do not use any registers that are defined between paired
> candidates
> +# ($x14 in this example)
> +# CHECK-LABEL: name: test11
> +# CHECK: bb.0:
> +# CHECK-NEXT: liveins: $x0, $x1, $x11, $x12, $x13
> +
> +# CHECK: renamable $w10 = LDRWui renamable $x0, 0 :: (load 8)
> +# CHECK-NEXT: $x15, renamable $x8 = LDPXi renamable $x0, 1 :: (load 8)
> +# CHECK-NEXT: renamable $x9 = LDRXui renamable $x0, 3 :: (load 8)
> +# CHECK-NEXT: renamable $x14 = LDRXui renamable $x0, 5 :: (load 8)
> +# CHECK-NEXT: STPXi renamable $x9, killed $x15, renamable $x0, 10 ::
> (store 8, align 4)
> +# CHECK-NEXT: STRXui killed renamable $x14, renamable $x0, 200 ::
> (store 8, align 4)
> +# CHECK-NEXT: renamable $w8 = ADDWrr $w10, $w10
> +# CHECK-NEXT: STRWui renamable $w8, renamable $x0, 100 :: (store 8,
> align 4)
> +# CHECK-NEXT: RET undef $lr
> +#
> +name: test11
> +alignment: 4
> +tracksRegLiveness: true
> +liveins:
> + - { reg: '$x0' }
> + - { reg: '$x1' }
> + - { reg: '$x8' }
> +frameInfo:
> + maxAlignment: 1
> + maxCallFrameSize: 0
> +machineFunctionInfo: {}
> +body: |
> + bb.0:
> + liveins: $x0, $x1, $x11, $x12, $x13
> + renamable $w10 = LDRWui renamable $x0, 0 :: (load 8)
> + renamable $x9, renamable $x8 = LDPXi renamable $x0, 1 :: (load 8)
> + STRXui renamable killed $x9, renamable $x0, 11 :: (store 8, align 4)
> + renamable $x9 = LDRXui renamable $x0, 3 :: (load 8)
> + renamable $x14 = LDRXui renamable $x0, 5 :: (load 8)
> + STRXui renamable $x9, renamable $x0, 10 :: (store 8, align 4)
> + STRXui renamable killed $x14, renamable $x0, 200 :: (store 8, align 4)
> + renamable $w8 = ADDWrr $w10, $w10
> + STRWui renamable $w8, renamable $x0, 100 :: (store 8, align 4)
> + RET undef $lr
> +
> +...
> +---
> +# Check that we correctly deal with killed registers in stores that get
> merged forward,
> +# which extends the live range of the first store operand.
> +# CHECK-LABEL: name: test12
> +# CHECK: bb.0:
> +# CHECK-NEXT: liveins: $x0, $x1
> +#
> +# CHECK: renamable $x10 = LDRXui renamable $x0, 0 :: (load 8)
> +# CHECK-NEXT: $x11, renamable $x8 = LDPXi renamable $x0, 3 :: (load 8)
> +# CHECK-NEXT: renamable $x9 = LDRXui renamable $x0, 2 :: (load 8)
> +# CHECK-NEXT: renamable $x8 = ADDXrr $x8, $x8
> +# CHECK-NEXT: STPXi renamable $x8, killed $x11, renamable $x0, 10 ::
> (store 8, align 4)
> +# CHECK-NEXT: STPXi killed renamable $x10, renamable $x9, renamable
> $x0, 20 :: (store 8, align 4)
> +# CHECK-NEXT: RET undef $lr
> +
> +name: test12
> +alignment: 4
> +tracksRegLiveness: true
> +liveins:
> + - { reg: '$x0' }
> + - { reg: '$x1' }
> + - { reg: '$x8' }
> +frameInfo:
> + maxAlignment: 1
> + maxCallFrameSize: 0
> +machineFunctionInfo: {}
> +body: |
> + bb.0:
> + liveins: $x0, $x1
> + renamable $x10 = LDRXui renamable $x0, 0 :: (load 8)
> + STRXui renamable killed $x10, renamable $x0, 20 :: (store 8, align 4)
> + renamable $x9, renamable $x8 = LDPXi renamable $x0, 3 :: (load 8)
> + STRXui renamable killed $x9, renamable $x0, 11 :: (store 8, align 4)
> + renamable $x9 = LDRXui renamable $x0, 2 :: (load 8)
> + renamable $x8 = ADDXrr $x8, $x8
> + STRXui renamable $x8, renamable $x0, 10 :: (store 8, align 4)
> + STRXui renamable $x9, renamable $x0, 21 :: (store 8, align 4)
> + RET undef $lr
> +
> +...
> +---
> +# Make sure we do not use any registers that are defined between def to
> rename and the first
> +# paired store. ($x14 in this example)
> +# CHECK-LABEL: name: test13
> +# CHECK: bb.0:
> +# CHECK-NEXT: liveins: $x0, $x1, $x10, $x11, $x12, $x13
> +# CHECK: $x15, renamable $x8 = LDPXi renamable $x0, 0 :: (load 8)
> +# CHECK-NEXT: renamable $x14 = LDRXui renamable $x0, 4 :: (load 8)
> +# CHECK-NEXT: STRXui killed renamable $x14, renamable $x0, 100 ::
> (store 8, align 4)
> +# CHECK-NEXT: renamable $x9 = LDRXui renamable $x0, 2 :: (load 8)
> +# CHECK-NEXT: STPXi renamable $x9, killed $x15, renamable $x0, 10 ::
> (store 8, align 4)
> +# CHECK-NEXT: RET undef $lr
> +#
> +name: test13
> +alignment: 4
> +tracksRegLiveness: true
> +liveins:
> + - { reg: '$x0' }
> + - { reg: '$x1' }
> + - { reg: '$x8' }
> +frameInfo:
> + maxAlignment: 1
> + maxCallFrameSize: 0
> +machineFunctionInfo: {}
> +body: |
> + bb.0:
> + liveins: $x0, $x1, $x10, $x11, $x12, $x13
> + renamable $x9, renamable $x8 = LDPXi renamable $x0, 0 :: (load 8)
> + renamable $x14 = LDRXui renamable $x0, 4 :: (load 8)
> + STRXui renamable killed $x14, renamable $x0, 100 :: (store 8, align 4)
> + STRXui renamable killed $x9, renamable $x0, 11 :: (store 8, align 4)
> + renamable $x9 = LDRXui renamable $x0, 2 :: (load 8)
> + STRXui renamable $x9, renamable $x0, 10 :: (store 8)
> + RET undef $lr
> +
> +...
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20191211/46facb26/attachment.html>
More information about the llvm-commits
mailing list