[llvm] r283867 - [Thumb] Save/restore high registers in Thumb1 pro/epilogues
Reid Kleckner via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 11 14:05:14 PDT 2016
Reverted in 283938
On Tue, Oct 11, 2016 at 1:39 PM, Reid Kleckner <rnk at google.com> wrote:
> This change causes the test suite to time out on 32-bit Windows:
> http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/1040
>
> I'm trying to reproduce locally, and will revert or fix depending on the
> complexity.
>
> On Tue, Oct 11, 2016 at 3:12 AM, Oliver Stannard via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> Author: olista01
>> Date: Tue Oct 11 05:12:25 2016
>> New Revision: 283867
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=283867&view=rev
>> Log:
>> [Thumb] Save/restore high registers in Thumb1 pro/epilogues
>>
>> The high registers are not allocatable in Thumb1 functions, but they
>> could still be used by inline assembly, so we need to save and restore
>> the callee-saved high registers (r8-r11) in the prologue and epilogue.
>>
>> This is complicated by the fact that the Thumb1 push and pop
>> instructions cannot access these registers. Therefore, we have to move
>> them down into low registers before pushing, and move them back after
>> popping into low registers.
>>
>> In most functions, we will have low registers that are also being
>> pushed/popped, which we can use as the temporary registers for
>> saving/restoring the high registers. However, this is not guaranteed, so
>> we may need to push some extra low registers to ensure that the high
>> registers can be saved/restored. For correctness, it would be sufficient
>> to use just one low register, but if we have enough low registers
>> available then we only need one push/pop instruction, rather than one
>> per high register.
>>
>> We can also use the argument/return registers when they are not live,
>> and the link register when saving (but not restoring), reducing the
>> number of extra registers we need to push.
>>
>> There are still a few extreme edge cases where we need two push/pop
>> instructions, because not enough low registers can be made live in the
>> prologue or epilogue.
>>
>> In addition to the regression tests included here, I've also tested this
>> using a script to generate functions which clobber different
>> combinations of registers, have different numbers of argument and return
>> registers (including variadic arguments), allocate different fixed sized
>> objects on the stack, and do or don't use variable sized allocas and the
>> __builtin_return_address intrinsic (all of which affect the available
>> registers in the prologue and epilogue). I ran these functions in a test
>> harness which verifies that all of the callee-saved registers are
>> correctly preserved.
>>
>> Differential Revision: https://reviews.llvm.org/D24228
>>
>>
>> Added:
>> llvm/trunk/test/CodeGen/Thumb/callee_save.ll
>> Modified:
>> llvm/trunk/lib/Target/ARM/ARMFrameLowering.cpp
>> llvm/trunk/lib/Target/ARM/ARMSubtarget.h
>> llvm/trunk/lib/Target/ARM/Thumb1FrameLowering.cpp
>> llvm/trunk/test/CodeGen/Thumb/large-stack.ll
>> llvm/trunk/test/CodeGen/Thumb2/frame-pointer.ll
>>
>> Modified: llvm/trunk/lib/Target/ARM/ARMFrameLowering.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AR
>> M/ARMFrameLowering.cpp?rev=283867&r1=283866&r2=283867&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/lib/Target/ARM/ARMFrameLowering.cpp (original)
>> +++ llvm/trunk/lib/Target/ARM/ARMFrameLowering.cpp Tue Oct 11 05:12:25
>> 2016
>> @@ -30,6 +30,8 @@
>> #include "llvm/Support/CommandLine.h"
>> #include "llvm/Target/TargetOptions.h"
>>
>> +#define DEBUG_TYPE "arm-frame-lowering"
>> +
>> using namespace llvm;
>>
>> static cl::opt<bool>
>> @@ -1485,6 +1487,7 @@ void ARMFrameLowering::determineCalleeSa
>> ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
>> MachineFrameInfo &MFI = MF.getFrameInfo();
>> MachineRegisterInfo &MRI = MF.getRegInfo();
>> + const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
>> unsigned FramePtr = RegInfo->getFrameRegister(MF);
>>
>> // Spill R4 if Thumb2 function requires stack realignment - it will be
>> used as
>> @@ -1640,6 +1643,9 @@ void ARMFrameLowering::determineCalleeSa
>> SavedRegs.set(ARM::LR);
>> LRSpilled = true;
>> NumGPRSpills++;
>> + auto LRPos = find(UnspilledCS1GPRs, ARM::LR);
>> + if (LRPos != UnspilledCS1GPRs.end())
>> + UnspilledCS1GPRs.erase(LRPos);
>> }
>> auto FPPos = find(UnspilledCS1GPRs, FramePtr);
>> if (FPPos != UnspilledCS1GPRs.end())
>> @@ -1649,6 +1655,116 @@ void ARMFrameLowering::determineCalleeSa
>> CS1Spilled = true;
>> }
>>
>> + if (AFI->isThumb1OnlyFunction()) {
>> + // For Thumb1-only targets, we need some low registers when we
>> save and
>> + // restore the high registers (which aren't allocatable, but could
>> be
>> + // used by inline assembly) because the push/pop instructions can
>> not
>> + // access high registers. If necessary, we might need to push more
>> low
>> + // registers to ensure that there is at least one free that can be
>> used
>> + // for the saving & restoring, and preferably we should ensure
>> that as
>> + // many as are needed are available so that fewer push/pop
>> instructions
>> + // are required.
>> +
>> + // Low registers which are not currently pushed, but could be
>> (r4-r7).
>> + SmallVector<unsigned, 4> AvailableRegs;
>> +
>> + // Unused argument registers (r0-r3) can be clobbered in the
>> prologue for
>> + // free.
>> + int EntryRegDeficit = 0;
>> + for (unsigned Reg : {ARM::R0, ARM::R1, ARM::R2, ARM::R3}) {
>> + if (!MF.getRegInfo().isLiveIn(Reg)) {
>> + --EntryRegDeficit;
>> + DEBUG(dbgs() << PrintReg(Reg, TRI)
>> + << " is unused argument register, EntryRegDeficit
>> = "
>> + << EntryRegDeficit << "\n");
>> + }
>> + }
>> +
>> + // Unused return registers can be clobbered in the epilogue for
>> free.
>> + int ExitRegDeficit = AFI->getReturnRegsCount() - 4;
>> + DEBUG(dbgs() << AFI->getReturnRegsCount()
>> + << " return regs used, ExitRegDeficit = " <<
>> ExitRegDeficit
>> + << "\n");
>> +
>> + int RegDeficit = std::max(EntryRegDeficit, ExitRegDeficit);
>> + DEBUG(dbgs() << "RegDeficit = " << RegDeficit << "\n");
>> +
>> + // r4-r6 can be used in the prologue if they are pushed by the
>> first push
>> + // instruction.
>> + for (unsigned Reg : {ARM::R4, ARM::R5, ARM::R6}) {
>> + if (SavedRegs.test(Reg)) {
>> + --RegDeficit;
>> + DEBUG(dbgs() << PrintReg(Reg, TRI)
>> + << " is saved low register, RegDeficit = " <<
>> RegDeficit
>> + << "\n");
>> + } else {
>> + AvailableRegs.push_back(Reg);
>> + DEBUG(dbgs()
>> + << PrintReg(Reg, TRI)
>> + << " is non-saved low register, adding to
>> AvailableRegs\n");
>> + }
>> + }
>> +
>> + // r7 can be used if it is not being used as the frame pointer.
>> + if (!hasFP(MF)) {
>> + if (SavedRegs.test(ARM::R7)) {
>> + --RegDeficit;
>> + DEBUG(dbgs() << "%R7 is saved low register, RegDeficit = "
>> + << RegDeficit << "\n");
>> + } else {
>> + AvailableRegs.push_back(ARM::R7);
>> + DEBUG(dbgs()
>> + << "%R7 is non-saved low register, adding to
>> AvailableRegs\n");
>> + }
>> + }
>> +
>> + // Each of r8-r11 needs to be copied to a low register, then
>> pushed.
>> + for (unsigned Reg : {ARM::R8, ARM::R9, ARM::R10, ARM::R11}) {
>> + if (SavedRegs.test(Reg)) {
>> + ++RegDeficit;
>> + DEBUG(dbgs() << PrintReg(Reg, TRI)
>> + << " is saved high register, RegDeficit = " <<
>> RegDeficit
>> + << "\n");
>> + }
>> + }
>> +
>> + // LR can only be used by PUSH, not POP, and can't be used at all
>> if the
>> + // llvm.returnaddress intrinsic is used. This is only worth doing
>> if we
>> + // are more limited at function entry than exit.
>> + if ((EntryRegDeficit > ExitRegDeficit) &&
>> + !(MF.getRegInfo().isLiveIn(ARM::LR) &&
>> + MF.getFrameInfo().isReturnAddressTaken())) {
>> + if (SavedRegs.test(ARM::LR)) {
>> + --RegDeficit;
>> + DEBUG(dbgs() << "%LR is saved register, RegDeficit = " <<
>> RegDeficit
>> + << "\n");
>> + } else {
>> + AvailableRegs.push_back(ARM::LR);
>> + DEBUG(dbgs() << "%LR is not saved, adding to AvailableRegs\n");
>> + }
>> + }
>> +
>> + // If there are more high registers that need pushing than low
>> registers
>> + // available, push some more low registers so that we can use
>> fewer push
>> + // instructions. This might not reduce RegDeficit all the way to
>> zero,
>> + // because we can only guarantee that r4-r6 are available, but
>> r8-r11 may
>> + // need saving.
>> + DEBUG(dbgs() << "Final RegDeficit = " << RegDeficit << "\n");
>> + for (; RegDeficit > 0 && !AvailableRegs.empty(); --RegDeficit) {
>> + unsigned Reg = AvailableRegs.pop_back_val();
>> + DEBUG(dbgs() << "Spilling " << PrintReg(Reg, TRI)
>> + << " to make up reg deficit\n");
>> + SavedRegs.set(Reg);
>> + NumGPRSpills++;
>> + CS1Spilled = true;
>> + ExtraCSSpill = true;
>> + UnspilledCS1GPRs.erase(find(UnspilledCS1GPRs, Reg));
>> + if (Reg == ARM::LR)
>> + LRSpilled = true;
>> + }
>> + DEBUG(dbgs() << "After adding spills, RegDeficit = " << RegDeficit
>> << "\n");
>> + }
>> +
>> // If LR is not spilled, but at least one of R4, R5, R6, and R7 is
>> spilled.
>> // Spill LR as well so we can fold BX_RET to the registers restore
>> (LDM).
>> if (!LRSpilled && CS1Spilled) {
>> @@ -1666,6 +1782,7 @@ void ARMFrameLowering::determineCalleeSa
>> // If stack and double are 8-byte aligned and we are spilling an odd
>> number
>> // of GPRs, spill one extra callee save GPR so we won't have to pad
>> between
>> // the integer and double callee save areas.
>> + DEBUG(dbgs() << "NumGPRSpills = " << NumGPRSpills << "\n");
>> unsigned TargetAlign = getStackAlignment();
>> if (TargetAlign >= 8 && (NumGPRSpills & 1)) {
>> if (CS1Spilled && !UnspilledCS1GPRs.empty()) {
>> @@ -1677,6 +1794,8 @@ void ARMFrameLowering::determineCalleeSa
>> (STI.isTargetWindows() && Reg == ARM::R11) ||
>> isARMLowRegister(Reg) || Reg == ARM::LR) {
>> SavedRegs.set(Reg);
>> + DEBUG(dbgs() << "Spilling " << PrintReg(Reg, TRI)
>> + << " to make up alignment\n");
>> if (!MRI.isReserved(Reg))
>> ExtraCSSpill = true;
>> break;
>> @@ -1685,6 +1804,8 @@ void ARMFrameLowering::determineCalleeSa
>> } else if (!UnspilledCS2GPRs.empty() &&
>> !AFI->isThumb1OnlyFunction()) {
>> unsigned Reg = UnspilledCS2GPRs.front();
>> SavedRegs.set(Reg);
>> + DEBUG(dbgs() << "Spilling " << PrintReg(Reg, TRI)
>> + << " to make up alignment\n");
>> if (!MRI.isReserved(Reg))
>> ExtraCSSpill = true;
>> }
>>
>> Modified: llvm/trunk/lib/Target/ARM/ARMSubtarget.h
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AR
>> M/ARMSubtarget.h?rev=283867&r1=283866&r2=283867&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/lib/Target/ARM/ARMSubtarget.h (original)
>> +++ llvm/trunk/lib/Target/ARM/ARMSubtarget.h Tue Oct 11 05:12:25 2016
>> @@ -568,10 +568,12 @@ public:
>> }
>> /// Returns true if the frame setup is split into two separate pushes
>> (first
>> /// r0-r7,lr then r8-r11), principally so that the frame pointer is
>> adjacent
>> - /// to lr.
>> + /// to lr. This is always required on Thumb1-only targets, as the push
>> and
>> + /// pop instructions can't access the high registers.
>> bool splitFramePushPop(const MachineFunction &MF) const {
>> - return useR7AsFramePointer() &&
>> - MF.getTarget().Options.DisableFramePointerElim(MF);
>> + return (useR7AsFramePointer() &&
>> + MF.getTarget().Options.DisableFramePointerElim(MF)) ||
>> + isThumb1Only();
>> }
>>
>> bool useStride4VFPs(const MachineFunction &MF) const;
>>
>> Modified: llvm/trunk/lib/Target/ARM/Thumb1FrameLowering.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AR
>> M/Thumb1FrameLowering.cpp?rev=283867&r1=283866&r2=283867&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/lib/Target/ARM/Thumb1FrameLowering.cpp (original)
>> +++ llvm/trunk/lib/Target/ARM/Thumb1FrameLowering.cpp Tue Oct 11
>> 05:12:25 2016
>> @@ -188,7 +188,8 @@ void Thumb1FrameLowering::emitPrologue(M
>>
>> int FramePtrOffsetInBlock = 0;
>> unsigned adjustedGPRCS1Size = GPRCS1Size;
>> - if (tryFoldSPUpdateIntoPushPop(STI, MF, &*std::prev(MBBI), NumBytes))
>> {
>> + if (GPRCS1Size > 0 && GPRCS2Size == 0 &&
>> + tryFoldSPUpdateIntoPushPop(STI, MF, &*std::prev(MBBI), NumBytes))
>> {
>> FramePtrOffsetInBlock = NumBytes;
>> adjustedGPRCS1Size += NumBytes;
>> NumBytes = 0;
>> @@ -261,6 +262,48 @@ void Thumb1FrameLowering::emitPrologue(M
>> AFI->setShouldRestoreSPFromFP(true);
>> }
>>
>> + // Skip past the spilling of r8-r11, which could consist of multiple
>> tPUSH
>> + // and tMOVr instructions. We don't need to add any call frame
>> information
>> + // in-between these instructions, because they do not modify the high
>> + // registers.
>> + while (true) {
>> + MachineBasicBlock::iterator OldMBBI = MBBI;
>> + // Skip a run of tMOVr instructions
>> + while (MBBI != MBB.end() && MBBI->getOpcode() == ARM::tMOVr)
>> + MBBI++;
>> + if (MBBI != MBB.end() && MBBI->getOpcode() == ARM::tPUSH) {
>> + MBBI++;
>> + } else {
>> + // We have reached an instruction which is not a push, so the
>> previous
>> + // run of tMOVr instructions (which may have been empty) was not
>> part of
>> + // the prologue. Reset MBBI back to the last PUSH of the prologue.
>> + MBBI = OldMBBI;
>> + break;
>> + }
>> + }
>> +
>> + // Emit call frame information for the callee-saved high registers.
>> + for (auto &I : CSI) {
>> + unsigned Reg = I.getReg();
>> + int FI = I.getFrameIdx();
>> + switch (Reg) {
>> + case ARM::R8:
>> + case ARM::R9:
>> + case ARM::R10:
>> + case ARM::R11:
>> + case ARM::R12: {
>> + unsigned CFIIndex = MMI.addFrameInst(MCCFIInstruct
>> ion::createOffset(
>> + nullptr, MRI->getDwarfRegNum(Reg, true),
>> MFI.getObjectOffset(FI)));
>> + BuildMI(MBB, MBBI, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
>> + .addCFIIndex(CFIIndex)
>> + .setMIFlags(MachineInstr::FrameSetup);
>> + break;
>> + }
>> + default:
>> + break;
>> + }
>> + }
>> +
>> if (NumBytes) {
>> // Insert it after all the callee-save spills.
>> emitSPUpdate(MBB, MBBI, TII, dl, *RegInfo, -NumBytes,
>> @@ -308,12 +351,12 @@ static bool isCSRestore(MachineInstr &MI
>> isCalleeSavedRegister(MI.getOperand(0).getReg(), CSRegs))
>> return true;
>> else if (MI.getOpcode() == ARM::tPOP) {
>> - // The first two operands are predicates. The last two are
>> - // imp-def and imp-use of SP. Check everything in between.
>> - for (int i = 2, e = MI.getNumOperands() - 2; i != e; ++i)
>> - if (!isCalleeSavedRegister(MI.getOperand(i).getReg(), CSRegs))
>> - return false;
>> return true;
>> + } else if (MI.getOpcode() == ARM::tMOVr) {
>> + unsigned Dst = MI.getOperand(0).getReg();
>> + unsigned Src = MI.getOperand(1).getReg();
>> + return ((ARM::tGPRRegClass.contains(Src) || Src == ARM::LR) &&
>> + ARM::hGPRRegClass.contains(Dst));
>> }
>> return false;
>> }
>> @@ -568,6 +611,19 @@ bool Thumb1FrameLowering::emitPopSpecial
>> return true;
>> }
>>
>> +// Return the first iteraror after CurrentReg which is present in
>> EnabledRegs,
>> +// or OrderEnd if no further registers are in that set. This does not
>> advance
>> +// the iterator fiorst, so returns CurrentReg if it is in EnabledRegs.
>> +template <unsigned SetSize>
>> +static ArrayRef<unsigned>::const_iterator
>> +findNextOrderedReg(ArrayRef<unsigned>::const_iterator CurrentReg,
>> + SmallSet<unsigned, SetSize> &EnabledRegs,
>> + ArrayRef<unsigned>::const_iterator OrderEnd) {
>> + while (CurrentReg != OrderEnd && !EnabledRegs.count(*CurrentReg))
>> + ++CurrentReg;
>> + return CurrentReg;
>> +}
>> +
>> bool Thumb1FrameLowering::
>> spillCalleeSavedRegisters(MachineBasicBlock &MBB,
>> MachineBasicBlock::iterator MI,
>> @@ -578,29 +634,111 @@ spillCalleeSavedRegisters(MachineBasicBl
>>
>> DebugLoc DL;
>> const TargetInstrInfo &TII = *STI.getInstrInfo();
>> + MachineFunction &MF = *MBB.getParent();
>> + const ARMBaseRegisterInfo *RegInfo = static_cast<const
>> ARMBaseRegisterInfo *>(
>> + MF.getSubtarget().getRegisterInfo());
>> +
>> + SmallSet<unsigned, 9> LoRegsToSave; // r0-r7, lr
>> + SmallSet<unsigned, 4> HiRegsToSave; // r8-r11
>> + SmallSet<unsigned, 9> CopyRegs; // Registers which can be used after
>> pushing
>> + // LoRegs for saving HiRegs.
>>
>> - MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(ARM::tPUSH));
>> - AddDefaultPred(MIB);
>> for (unsigned i = CSI.size(); i != 0; --i) {
>> unsigned Reg = CSI[i-1].getReg();
>> - bool isKill = true;
>>
>> - // Add the callee-saved register as live-in unless it's LR and
>> - // @llvm.returnaddress is called. If LR is returned for
>> @llvm.returnaddress
>> - // then it's already added to the function and entry block live-in
>> sets.
>> - if (Reg == ARM::LR) {
>> - MachineFunction &MF = *MBB.getParent();
>> - if (MF.getFrameInfo().isReturnAddressTaken() &&
>> - MF.getRegInfo().isLiveIn(Reg))
>> - isKill = false;
>> + if (ARM::tGPRRegClass.contains(Reg) || Reg == ARM::LR) {
>> + LoRegsToSave.insert(Reg);
>> + } else if (ARM::hGPRRegClass.contains(Reg) && Reg != ARM::LR) {
>> + HiRegsToSave.insert(Reg);
>> + } else {
>> + llvm_unreachable("callee-saved register of unexpected class");
>> }
>>
>> - if (isKill)
>> - MBB.addLiveIn(Reg);
>> + if ((ARM::tGPRRegClass.contains(Reg) || Reg == ARM::LR) &&
>> + !MF.getRegInfo().isLiveIn(Reg) &&
>> + !(hasFP(MF) && Reg == RegInfo->getFrameRegister(MF)))
>> + CopyRegs.insert(Reg);
>> + }
>> +
>> + // Unused argument registers can be used for the high register saving.
>> + for (unsigned ArgReg : {ARM::R0, ARM::R1, ARM::R2, ARM::R3})
>> + if (!MF.getRegInfo().isLiveIn(ArgReg))
>> + CopyRegs.insert(ArgReg);
>> +
>> + // Push the low registers and lr
>> + if (!LoRegsToSave.empty()) {
>> + MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(ARM::tPUSH));
>> + AddDefaultPred(MIB);
>> + for (unsigned Reg : {ARM::R4, ARM::R5, ARM::R6, ARM::R7, ARM::LR}) {
>> + if (LoRegsToSave.count(Reg)) {
>> + bool isKill = !MF.getRegInfo().isLiveIn(Reg);
>> + if (isKill)
>> + MBB.addLiveIn(Reg);
>>
>> - MIB.addReg(Reg, getKillRegState(isKill));
>> + MIB.addReg(Reg, getKillRegState(isKill));
>> + }
>> + }
>> + MIB.setMIFlags(MachineInstr::FrameSetup);
>> }
>> - MIB.setMIFlags(MachineInstr::FrameSetup);
>> +
>> + // Push the high registers. There are no store instructions that can
>> access
>> + // these registers directly, so we have to move them to low registers,
>> and
>> + // push them. This might take multiple pushes, as it is possible for
>> there to
>> + // be fewer low registers available than high registers which need
>> saving.
>> +
>> + // These are in reverse order so that in the case where we need to use
>> + // multiple PUSH instructions, the order of the registers on the stack
>> still
>> + // matches the unwind info. They need to be swicthed back to ascending
>> order
>> + // before adding to the PUSH instruction.
>> + ArrayRef<unsigned> AllCopyRegs({ARM::LR,
>> + ARM::R7, ARM::R6, ARM::R5, ARM::R4,
>> + ARM::R3, ARM::R2, ARM::R1, ARM::R0});
>> + ArrayRef<unsigned> AllHighRegs({ARM::R11, ARM::R10, ARM::R9, ARM::R8});
>> +
>> + // Find the first register to save.
>> + auto HiRegToSave =
>> + findNextOrderedReg(AllHighRegs.begin(), HiRegsToSave,
>> AllHighRegs.end());
>> +
>> + while (HiRegToSave != AllHighRegs.end()) {
>> + // Find the first low register to use.
>> + auto CopyReg =
>> + findNextOrderedReg(AllCopyRegs.begin(), CopyRegs,
>> AllCopyRegs.end());
>> +
>> + // Create the PUSH, but don't insert it yet (the MOVs need to come
>> first).
>> + MachineInstrBuilder PushMIB = BuildMI(MF, DL, TII.get(ARM::tPUSH));
>> + AddDefaultPred(PushMIB);
>> +
>> + SmallVector<unsigned, 4> RegsToPush;
>> + while (HiRegToSave != AllHighRegs.end() && CopyReg !=
>> AllCopyRegs.end()) {
>> + if (HiRegsToSave.count(*HiRegToSave)) {
>> + bool isKill = !MF.getRegInfo().isLiveIn(*HiRegToSave);
>> + if (isKill)
>> + MBB.addLiveIn(*HiRegToSave);
>> +
>> + // Emit a MOV from the high reg to the low reg.
>> + MachineInstrBuilder MIB =
>> + BuildMI(MBB, MI, DL, TII.get(ARM::tMOVr));
>> + MIB.addReg(*CopyReg, RegState::Define);
>> + MIB.addReg(*HiRegToSave, getKillRegState(isKill));
>> + AddDefaultPred(MIB);
>> +
>> + // Record the register that must be added to the PUSH.
>> + RegsToPush.push_back(*CopyReg);
>> +
>> + CopyReg = findNextOrderedReg(++CopyReg, CopyRegs,
>> AllCopyRegs.end());
>> + HiRegToSave =
>> + findNextOrderedReg(++HiRegToSave, HiRegsToSave,
>> AllHighRegs.end());
>> + }
>> + }
>> +
>> + // Add the low registers to the PUSH, in ascending order.
>> + for (unsigned Reg : reverse(RegsToPush))
>> + PushMIB.addReg(Reg, RegState::Kill);
>> +
>> + // Insert the PUSH instruction after the MOVs.
>> + MBB.insert(MI, PushMIB);
>> + }
>> +
>> return true;
>> }
>>
>> @@ -615,15 +753,98 @@ restoreCalleeSavedRegisters(MachineBasic
>> MachineFunction &MF = *MBB.getParent();
>> ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
>> const TargetInstrInfo &TII = *STI.getInstrInfo();
>> + const ARMBaseRegisterInfo *RegInfo = static_cast<const
>> ARMBaseRegisterInfo *>(
>> + MF.getSubtarget().getRegisterInfo());
>>
>> bool isVarArg = AFI->getArgRegsSaveSize() > 0;
>> DebugLoc DL = MI != MBB.end() ? MI->getDebugLoc() : DebugLoc();
>> +
>> + SmallSet<unsigned, 9> LoRegsToRestore;
>> + SmallSet<unsigned, 4> HiRegsToRestore;
>> + // Low registers (r0-r7) which can be used to restore the high
>> registers.
>> + SmallSet<unsigned, 9> CopyRegs;
>> +
>> + for (CalleeSavedInfo I : CSI) {
>> + unsigned Reg = I.getReg();
>> +
>> + if (ARM::tGPRRegClass.contains(Reg) || Reg == ARM::LR) {
>> + LoRegsToRestore.insert(Reg);
>> + } else if (ARM::hGPRRegClass.contains(Reg) && Reg != ARM::LR) {
>> + HiRegsToRestore.insert(Reg);
>> + } else {
>> + llvm_unreachable("callee-saved register of unexpected class");
>> + }
>> +
>> + // If this is a low register not used as the frame pointer, we may
>> want to
>> + // use it for restoring the high registers.
>> + if ((ARM::tGPRRegClass.contains(Reg)) &&
>> + !(hasFP(MF) && Reg == RegInfo->getFrameRegister(MF)))
>> + CopyRegs.insert(Reg);
>> + }
>> +
>> + // If this is a return block, we may be able to use some unused return
>> value
>> + // registers for restoring the high regs.
>> + auto Terminator = MBB.getFirstTerminator();
>> + if (Terminator != MBB.end() && Terminator->getOpcode() ==
>> ARM::tBX_RET) {
>> + CopyRegs.insert(ARM::R0);
>> + CopyRegs.insert(ARM::R1);
>> + CopyRegs.insert(ARM::R2);
>> + CopyRegs.insert(ARM::R3);
>> + for (auto Op : Terminator->implicit_operands()) {
>> + if (Op.isReg())
>> + CopyRegs.erase(Op.getReg());
>> + }
>> + }
>> +
>> + ArrayRef<unsigned> AllCopyRegs({ARM::R0, ARM::R1, ARM::R2, ARM::R3,
>> + ARM::R4, ARM::R5, ARM::R6, ARM::R7});
>> + ArrayRef<unsigned> AllHighRegs({ARM::R8, ARM::R9, ARM::R10, ARM::R11});
>> +
>> + // Find the first register to restore.
>> + auto HiRegToRestore = findNextOrderedReg(AllHighRegs.begin(),
>> HiRegsToRestore,
>> + AllHighRegs.end());
>> +
>> + while (HiRegToRestore != AllHighRegs.end()) {
>> + assert(!CopyRegs.empty());
>> + // Find the first low register to use.
>> + auto CopyReg =
>> + findNextOrderedReg(AllCopyRegs.begin(), CopyRegs,
>> AllCopyRegs.end());
>> +
>> + // Create the POP instruction.
>> + MachineInstrBuilder PopMIB = BuildMI(MBB, MI, DL,
>> TII.get(ARM::tPOP));
>> + AddDefaultPred(PopMIB);
>> +
>> + while (HiRegToRestore != AllHighRegs.end() && CopyReg !=
>> AllCopyRegs.end()) {
>> + // Add the low register to the POP.
>> + PopMIB.addReg(*CopyReg, RegState::Define);
>> +
>> + // Create the MOV from low to high register.
>> + MachineInstrBuilder MIB =
>> + BuildMI(MBB, MI, DL, TII.get(ARM::tMOVr));
>> + MIB.addReg(*HiRegToRestore, RegState::Define);
>> + MIB.addReg(*CopyReg, RegState::Kill);
>> + AddDefaultPred(MIB);
>> +
>> + CopyReg = findNextOrderedReg(++CopyReg, CopyRegs,
>> AllCopyRegs.end());
>> + HiRegToRestore = findNextOrderedReg(++HiRegToRestore,
>> HiRegsToRestore,
>> + AllHighRegs.end());
>> + }
>> + }
>> +
>> +
>> +
>> +
>> MachineInstrBuilder MIB = BuildMI(MF, DL, TII.get(ARM::tPOP));
>> AddDefaultPred(MIB);
>>
>> bool NeedsPop = false;
>> for (unsigned i = CSI.size(); i != 0; --i) {
>> unsigned Reg = CSI[i-1].getReg();
>> +
>> + // High registers (excluding lr) have already been dealt with
>> + if (!(ARM::tGPRRegClass.contains(Reg) || Reg == ARM::LR))
>> + continue;
>> +
>> if (Reg == ARM::LR) {
>> if (MBB.succ_empty()) {
>> // Special epilogue for vararg functions. See emitEpilogue
>>
>> Added: llvm/trunk/test/CodeGen/Thumb/callee_save.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>> Thumb/callee_save.ll?rev=283867&view=auto
>> ============================================================
>> ==================
>> --- llvm/trunk/test/CodeGen/Thumb/callee_save.ll (added)
>> +++ llvm/trunk/test/CodeGen/Thumb/callee_save.ll Tue Oct 11 05:12:25 2016
>> @@ -0,0 +1,236 @@
>> +; RUN: llc -mtriple=thumbv6m-none-eabi < %s | FileCheck %s
>> +
>> +declare i8* @llvm.returnaddress(i32)
>> +
>> +; We don't allocate high registers, so any function not using inline asm
>> will
>> +; only need to save the low registers.
>> +define void @low_regs_only() {
>> +; CHECK-LABEL: low_regs_only:
>> +entry:
>> +; CHECK: push {r4, r5, r6, r7, lr}
>> + tail call void asm sideeffect "", "~{r4},~{r5},~{r6},~{r7}"()
>> +; CHECK: pop {r4, r5, r6, r7, pc}
>> + ret void
>> +}
>> +
>> +; One high reg clobbered, but no low regs, args or returns. We can use an
>> +; argument/return register to help save/restore it.
>> +define void @one_high() {
>> +; CHECK-LABEL: one_high:
>> +entry:
>> +; CHECK: mov [[SAVEREG:r[0-3]]], r8
>> +; CHECK: push {[[SAVEREG]]}
>> + tail call void asm sideeffect "", "~{r8}"()
>> +; CHECK: pop {[[RESTOREREG:r[0-3]]]}
>> +; CHECK: mov r8, [[RESTOREREG]]
>> + ret void
>> +}
>> +
>> +; 4 high regs clobbered, but still no low regs, args or returns. We can
>> use all
>> +; 4 arg/return regs for the save/restore.
>> +define void @four_high() {
>> +; CHECK-LABEL: four_high:
>> +entry:
>> +; CHECK: mov r3, r11
>> +; CHECK: mov r2, r10
>> +; CHECK: mov r1, r9
>> +; CHECK: mov r0, r8
>> +; CHECK: push {r0, r1, r2, r3}
>> + tail call void asm sideeffect "", "~{r8},~{r9},~{r10},~{r11}"()
>> +; CHECK: pop {r0, r1, r2, r3}
>> +; CHECK: mov r8, r0
>> +; CHECK: mov r9, r1
>> +; CHECK: mov r10, r2
>> +; CHECK: mov r11, r3
>> + ret void
>> +}
>> +
>> +; One high and one low register clobbered. lr also gets pushed to
>> simplify the
>> +; return, and r7 to keep the stack aligned. Here, we could use r0-r3,
>> r4, r7 or
>> +; lr to save/restore r8.
>> +define void @one_high_one_low() {
>> +; CHECK-LABEL: one_high_one_low:
>> +entry:
>> +; CHECK: push {r4, r7, lr}
>> +; CHECK: mov [[SAVEREG:r0|r1|r2|r3|r4|r7|lr]], r8
>> +; CHECK: push {[[SAVEREG]]}
>> + tail call void asm sideeffect "", "~{r4},~{r8}"()
>> +; CHECK: pop {[[RESTOREREG:r0|r1|r2|r3|r4|r7]]}
>> +; CHECK: mov r8, [[RESTOREREG]]
>> +; CHECK: pop {r4, r7, pc}
>> + ret void
>> +}
>> +
>> +; All callee-saved registers clobbered, r4-r7 and lr are not live after
>> the
>> +; first push so can be used for pushing the high registers.
>> +define void @four_high_four_low() {
>> +; CHECK-LABEL: four_high_four_low:
>> +entry:
>> +; CHECK: push {r4, r5, r6, r7, lr}
>> +; CHECK: mov lr, r11
>> +; CHECK: mov r7, r10
>> +; CHECK: mov r6, r9
>> +; CHECK: mov r5, r8
>> +; CHECK: push {r5, r6, r7, lr}
>> + tail call void asm sideeffect "", "~{r4},~{r5},~{r6},~{r7},~{r8}
>> ,~{r9},~{r10},~{r11}"()
>> +; CHECK: pop {r0, r1, r2, r3}
>> +; CHECK: mov r8, r0
>> +; CHECK: mov r9, r1
>> +; CHECK: mov r10, r2
>> +; CHECK: mov r11, r3
>> +; CHECK: pop {r4, r5, r6, r7, pc}
>> + ret void
>> +}
>> +
>> +
>> +; All callee-saved registers clobbered, and frame pointer is requested.
>> r7 now
>> +; cannot be used while saving/restoring the high regs.
>> +define void @four_high_four_low_frame_ptr()
>> "no-frame-pointer-elim"="true" {
>> +; CHECK-LABEL: four_high_four_low_frame_ptr:
>> +entry:
>> +; CHECK: push {r4, r5, r6, r7, lr}
>> +; CHECK: add r7, sp, #12
>> +; CHECK: mov lr, r11
>> +; CHECK: mov r6, r10
>> +; CHECK: mov r5, r9
>> +; CHECK: mov r4, r8
>> +; CHECK: push {r4, r5, r6, lr}
>> + tail call void asm sideeffect "", "~{r4},~{r5},~{r6},~{r7},~{r8}
>> ,~{r9},~{r10},~{r11}"()
>> +; CHECK: pop {r0, r1, r2, r3}
>> +; CHECK: mov r8, r0
>> +; CHECK: mov r9, r1
>> +; CHECK: mov r10, r2
>> +; CHECK: mov r11, r3
>> +; CHECK: pop {r4, r5, r6, r7, pc}
>> + ret void
>> +}
>> +
>> +; All callee-saved registers clobbered, frame pointer is requested and
>> +; llvm.returnaddress used. r7 and lr now cannot be used while
>> saving/restoring
>> +; the high regs.
>> +define void @four_high_four_low_frame_ptr_ret_addr()
>> "no-frame-pointer-elim"="true" {
>> +; CHECK-LABEL: four_high_four_low_frame_ptr_ret_addr:
>> +entry:
>> +; CHECK: push {r4, r5, r6, r7, lr}
>> +; CHECK: mov r6, r11
>> +; CHECK: mov r5, r10
>> +; CHECK: mov r4, r9
>> +; CHECK: mov r3, r8
>> +; CHECK: push {r3, r4, r5, r6}
>> + %a = tail call i8* @llvm.returnaddress(i32 0)
>> + tail call void asm sideeffect "", "r,~{r4},~{r5},~{r6},~{r7},~{r8},~{r9},~{r10},~{r11}"(i8*
>> %a)
>> +; CHECK: pop {r0, r1, r2, r3}
>> +; CHECK: mov r8, r0
>> +; CHECK: mov r9, r1
>> +; CHECK: mov r10, r2
>> +; CHECK: mov r11, r3
>> +; CHECK: pop {r4, r5, r6, r7, pc}
>> + ret void
>> +}
>> +
>> +; 4 high regs clobbered, all 4 argument registers used. We push an extra
>> 4 low
>> +; registers, so that we can use them for saving the high regs.
>> +define void @four_high_four_arg(i32 %a, i32 %b, i32 %c, i32 %d) {
>> +; CHECK-LABEL: four_high_four_arg:
>> +entry:
>> +; CHECK: push {r5, r6, r7, lr}
>> +; CHECK: mov lr, r11
>> +; CHECK: mov r7, r10
>> +; CHECK: mov r6, r9
>> +; CHECK: mov r5, r8
>> +; CHECK: push {r5, r6, r7, lr}
>> + tail call void asm sideeffect "", "r,r,r,r,~{r8},~{r9},~{r10},~{r11}"(i32
>> %a, i32 %b, i32 %c, i32 %d)
>> +; CHECK: pop {r0, r1, r2, r3}
>> +; CHECK: mov r8, r0
>> +; CHECK: mov r9, r1
>> +; CHECK: mov r10, r2
>> +; CHECK: mov r11, r3
>> +; CHECK: pop {r5, r6, r7, pc}
>> + ret void
>> +}
>> +
>> +; 4 high regs clobbered, all 4 return registers used. We push an extra 4
>> low
>> +; registers, so that we can use them for restoring the high regs.
>> +define <4 x i32> @four_high_four_return() {
>> +; CHECK-LABEL: four_high_four_return:
>> +entry:
>> +; CHECK: push {r4, r5, r6, r7, lr}
>> +; CHECK: mov lr, r11
>> +; CHECK: mov r7, r10
>> +; CHECK: mov r6, r9
>> +; CHECK: mov r5, r8
>> +; CHECK: push {r5, r6, r7, lr}
>> + tail call void asm sideeffect "", "~{r8},~{r9},~{r10},~{r11}"()
>> + %vecinit = insertelement <4 x i32> undef, i32 1, i32 0
>> + %vecinit11 = insertelement <4 x i32> %vecinit, i32 2, i32 1
>> + %vecinit12 = insertelement <4 x i32> %vecinit11, i32 3, i32 2
>> + %vecinit13 = insertelement <4 x i32> %vecinit12, i32 4, i32 3
>> +; CHECK: pop {r4, r5, r6, r7}
>> +; CHECK: mov r8, r4
>> +; CHECK: mov r9, r5
>> +; CHECK: mov r10, r6
>> +; CHECK: mov r11, r7
>> +; CHECK: pop {r4, r5, r6, r7, pc}
>> + ret <4 x i32> %vecinit13
>> +}
>> +
>> +; 4 high regs clobbered, all args & returns used, frame pointer
>> requested and
>> +; llvm.returnaddress called. This leaves us with 3 low registers
>> available (r4,
>> +; r5, r6), with which to save 4 high registers, so we have to use two
>> pushes
>> +; and pops.
>> +define <4 x i32> @all_of_the_above(i32 %a, i32 %b, i32 %c, i32 %d)
>> "no-frame-pointer-elim"="true" {
>> +; CHECK-LABEL: all_of_the_above
>> +entry:
>> +; CHECK: push {r4, r5, r6, r7, lr}
>> +; CHECK: add r7, sp, #12
>> +; CHECK: mov r6, r11
>> +; CHECK: mov r5, r10
>> +; CHECK: mov r4, r9
>> +; CHECK: push {r4, r5, r6}
>> +; CHECK: mov r6, r8
>> +; CHECK: push {r6}
>> + tail call void asm sideeffect "", "r,r,r,r,~{r4},~{r5},~{r6},~{r
>> 7},~{r8},~{r9},~{r10},~{r11}"(i32 %a, i32 %b, i32 %c, i32 %d)
>> + %e = tail call i8* @llvm.returnaddress(i32 0)
>> + %f = ptrtoint i8* %e to i32
>> + %vecinit = insertelement <4 x i32> undef, i32 %f, i32 0
>> + %vecinit11 = insertelement <4 x i32> %vecinit, i32 2, i32 1
>> + %vecinit12 = insertelement <4 x i32> %vecinit11, i32 3, i32 2
>> + %vecinit13 = insertelement <4 x i32> %vecinit12, i32 4, i32 3
>> +; CHECK: pop {r4, r5, r6}
>> +; CHECK: mov r8, r4
>> +; CHECK: mov r9, r5
>> +; CHECK: mov r10, r6
>> +; CHECK: pop {r4}
>> +; CHECK: mov r11, r4
>> +; CHECK: pop {r4, r5, r6, r7, pc}
>> + ret <4 x i32> %vecinit13
>> +}
>> +
>> +; When a base pointer is being used, we can safely use it for
>> saving/restoring
>> +; the high regs because it is set after the last push, and not used at
>> all in the
>> +; epliogue. We can also use r4 for restoring the registers despite it
>> also being
>> +; used when restoring sp from fp, as that happens before the first pop.
>> +define <4 x i32> @base_pointer(i32 %a) {
>> +; CHECK-LABEL: base_pointer:
>> +entry:
>> +; CHECK: push {r4, r6, r7, lr}
>> +; CHECK: add r7, sp, #8
>> +; CHECK: mov lr, r9
>> +; CHECK: mov r6, r8
>> +; CHECK: push {r6, lr}
>> +; CHECK: mov r6, sp
>> + %b = alloca i32, i32 %a
>> + call void asm sideeffect "", "r,~{r8},~{r9}"(i32* %b)
>> + %vecinit = insertelement <4 x i32> undef, i32 1, i32 0
>> + %vecinit11 = insertelement <4 x i32> %vecinit, i32 2, i32 1
>> + %vecinit12 = insertelement <4 x i32> %vecinit11, i32 3, i32 2
>> + %vecinit13 = insertelement <4 x i32> %vecinit12, i32 4, i32 3
>> +; CHECK: subs r4, r7, #7
>> +; CHECK: subs r4, #9
>> +; CHECK: mov sp, r4
>> +; CHECK: pop {r4, r6}
>> +; CHECK: mov r8, r4
>> +; CHECK: mov r9, r6
>> +; CHECK: pop {r4, r6, r7, pc}
>> + ret <4 x i32> %vecinit13
>> +}
>>
>> Modified: llvm/trunk/test/CodeGen/Thumb/large-stack.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>> Thumb/large-stack.ll?rev=283867&r1=283866&r2=283867&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/test/CodeGen/Thumb/large-stack.ll (original)
>> +++ llvm/trunk/test/CodeGen/Thumb/large-stack.ll Tue Oct 11 05:12:25 2016
>> @@ -1,9 +1,9 @@
>> -; RUN: llc < %s -mtriple=thumb-apple-ios | FileCheck %s
>> -; RUN: llc < %s -mtriple=thumb-none-eabi | FileCheck %s
>> +; RUN: llc < %s -mtriple=thumb-apple-ios | FileCheck %s
>> --check-prefix=CHECK --check-prefix=ALIGN4
>> +; RUN: llc < %s -mtriple=thumb-none-eabi | FileCheck %s
>> --check-prefix=CHECK --check-prefix=ALIGN8
>> ; RUN: llc < %s -o %t -filetype=obj -mtriple=thumbv6-apple-ios
>> -; RUN: llvm-objdump -triple=thumbv6-apple-ios -d %t | FileCheck %s
>> +; RUN: llvm-objdump -triple=thumbv6-apple-ios -d %t | FileCheck %s
>> --check-prefix=CHECK --check-prefix=ALIGN4
>> ; RUN: llc < %s -o %t -filetype=obj -mtriple=thumbv6-none-eabi
>> -; RUN: llvm-objdump -triple=thumbv6-none-eabi -d %t | FileCheck %s
>> +; RUN: llvm-objdump -triple=thumbv6-none-eabi -d %t | FileCheck %s
>> --check-prefix=CHECK --check-prefix=ALIGN8
>>
>> ; Largest stack for which a single tADDspi/tSUBspi is enough
>> define void @test1() {
>> @@ -33,7 +33,9 @@ define void @test100_nofpelim() "no-fram
>> ; CHECK: sub sp, #508
>> ; CHECK: sub sp, #508
>> ; CHECK: sub sp, #508
>> -; CHECK: subs r4, r7, #4
>> +; ALIGN4: subs r4, r7, #4
>> +; ALIGN8: subs r4, r7, #7
>> +; ALIGN8: subs r4, #1
>> ; CHECK: mov sp, r4
>> %tmp = alloca [ 1524 x i8 ] , align 4
>> ret void
>> @@ -55,7 +57,9 @@ define void @test2_nofpelim() "no-frame-
>> ; CHECK-LABEL: test2_nofpelim:
>> ; CHECK: ldr [[TEMP:r[0-7]]],
>> ; CHECK: add sp, [[TEMP]]
>> -; CHECK: subs r4, r7, #4
>> +; ALIGN4: subs r4, r7, #4
>> +; ALIGN8: subs r4, r7, #7
>> +; ALIGN8: subs r4, #1
>> ; CHECK: mov sp, r4
>> %tmp = alloca [ 1528 x i8 ] , align 4
>> ret void
>>
>> Modified: llvm/trunk/test/CodeGen/Thumb2/frame-pointer.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>> Thumb2/frame-pointer.ll?rev=283867&r1=283866&r2=283867&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/test/CodeGen/Thumb2/frame-pointer.ll (original)
>> +++ llvm/trunk/test/CodeGen/Thumb2/frame-pointer.ll Tue Oct 11 05:12:25
>> 2016
>> @@ -27,9 +27,9 @@ define void @leaf_nofpelim() "no-frame-p
>> ; need to use a frame pointer.
>> define void @leaf_lowreg_nofpelim() "no-frame-pointer-elim"="true" {
>> ; CHECK-LABEL: leaf_lowreg_nofpelim:
>> -; CHECK: push {r4, r7, lr}
>> -; CHECK: add r7, sp, #4
>> -; CHECK: pop {r4, r7, pc}
>> +; CHECK: push {r4, r6, r7, lr}
>> +; CHECK: add r7, sp, #8
>> +; CHECK: pop {r4, r6, r7, pc}
>> call void asm sideeffect "", "~{r4}" ()
>> ret void
>> }
>> @@ -40,11 +40,11 @@ define void @leaf_lowreg_nofpelim() "no-
>> ; the stack.
>> define void @leaf_highreg_nofpelim() "no-frame-pointer-elim"="true" {
>> ; CHECK-LABEL: leaf_highreg_nofpelim:
>> -; CHECK: push {r7, lr}
>> -; CHECK: mov r7, sp
>> +; CHECK: push {r6, r7, lr}
>> +; CHECK: add r7, sp, #4
>> ; CHECK: str r8, [sp, #-4]!
>> ; CHECK: ldr r8, [sp], #4
>> -; CHECK: pop {r7, pc}
>> +; CHECK: pop {r6, r7, pc}
>> call void asm sideeffect "", "~{r8}" ()
>> ret void
>> }
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161011/b0ae9309/attachment.html>
More information about the llvm-commits
mailing list