[llvm] r305516 - RegScavenging: Add scavengeRegisterBackwards()

Fri Jun 16 19:13:36 PDT 2017

Great, thanks!

On Fri, Jun 16, 2017 at 7:10 PM Matthias Braun <matze at braunis.de> wrote:

> Thanks for the reproducer! Fixed in r305625.
>
> On Jun 16, 2017, at 5:58 PM, Eric Christopher <echristo at gmail.com> wrote:
>
> Sent offline due to size :\
>
> -eric
>
> On Fri, Jun 16, 2017 at 10:59 AM Eric Christopher <echristo at gmail.com>
> wrote:
>
>> Yep. Will do :)
>>
>> -eric
>>
>> On Fri, Jun 16, 2017 at 10:50 AM Matthias Braun <matze at braunis.de> wrote:
>>
>>> Sure, I reverted in r305566 but will definitely need a reproducer to
>>> investigate what exactly went wrong.
>>>
>>> - Matthias
>>>
>>> On Jun 16, 2017, at 10:14 AM, Eric Christopher <echristo at gmail.com>
>>> wrote:
>>>
>>> Hi Matthias,
>>>
>>> Working on getting you something, but I did find this:
>>>
>>> fatal error: error in backend: Error while trying to spill X0 from class
>>> G8RC: Cannot scavenge register without an emergency spill slot!
>>>
>>> on a testcase. Guess we can try to revert for now and I'll work on a
>>> testcase?
>>>
>>> -eric
>>>
>>> On Thu, Jun 15, 2017 at 3:15 PM Matthias Braun via llvm-commits <
>>> llvm-commits at lists.llvm.org> wrote:
>>>
>>>> Author: matze
>>>> Date: Thu Jun 15 17:14:55 2017
>>>> New Revision: 305516
>>>>
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=305516&view=rev
>>>> Log:
>>>> RegScavenging: Add scavengeRegisterBackwards()
>>>>
>>>> Re-apply r276044/r279124. Trying to reproduce or disprove the ppc64
>>>> problems reported in the stage2 build last time, which I cannot
>>>> reproduce right now.
>>>>
>>>> This is a variant of scavengeRegister() that works for
>>>> enterBasicBlockEnd()/backward(). The benefit of the backward mode is
>>>> that it is not affected by incomplete kill flags.
>>>>
>>>> This patch also changes
>>>> PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
>>>> scavenger in backwards mode.
>>>>
>>>> Differential Revision: http://reviews.llvm.org/D21885
>>>>
>>>> Modified:
>>>>     llvm/trunk/include/llvm/CodeGen/RegisterScavenging.h
>>>>     llvm/trunk/lib/CodeGen/RegisterScavenging.cpp
>>>>     llvm/trunk/test/CodeGen/AArch64/reg-scavenge-frame.mir
>>>>     llvm/trunk/test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll
>>>>
>>>> llvm/trunk/test/CodeGen/AMDGPU/code-object-metadata-kernel-debug-props.ll
>>>>     llvm/trunk/test/CodeGen/AMDGPU/frame-index-elimination.ll
>>>>     llvm/trunk/test/CodeGen/ARM/alloca-align.ll
>>>>     llvm/trunk/test/CodeGen/ARM/execute-only-big-stack-frame.ll
>>>>     llvm/trunk/test/CodeGen/ARM/fpoffset_overflow.mir
>>>>     llvm/trunk/test/CodeGen/Mips/emergency-spill-slot-near-fp.ll
>>>>     llvm/trunk/test/CodeGen/PowerPC/dyn-alloca-aligned.ll
>>>>     llvm/trunk/test/CodeGen/PowerPC/scavenging.mir
>>>>     llvm/trunk/test/CodeGen/Thumb/large-stack.ll
>>>>     llvm/trunk/test/CodeGen/X86/scavenger.mir
>>>>
>>>> Modified: llvm/trunk/include/llvm/CodeGen/RegisterScavenging.h
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/RegisterScavenging.h?rev=305516&r1=305515&r2=305516&view=diff
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/include/llvm/CodeGen/RegisterScavenging.h (original)
>>>> +++ llvm/trunk/include/llvm/CodeGen/RegisterScavenging.h Thu Jun 15
>>>> 17:14:55 2017
>>>> @@ -156,12 +156,24 @@ public:
>>>>    /// available and do the appropriate bookkeeping. SPAdj is the stack
>>>>    /// adjustment due to call frame, it's passed along to
>>>> eliminateFrameIndex().
>>>>    /// Returns the scavenged register.
>>>> +  /// This is deprecated as it depends on the quality of the kill
>>>> flags being
>>>> +  /// present; Use scavengeRegisterBackwards() instead!
>>>>    unsigned scavengeRegister(const TargetRegisterClass *RegClass,
>>>>                              MachineBasicBlock::iterator I, int SPAdj);
>>>>    unsigned scavengeRegister(const TargetRegisterClass *RegClass, int
>>>> SPAdj) {
>>>>      return scavengeRegister(RegClass, MBBI, SPAdj);
>>>>    }
>>>>
>>>> +  /// Make a register of the specific register class available from
>>>> the current
>>>> +  /// position backwards to the place before \p To. If \p RestoreAfter
>>>> is true
>>>> +  /// this includes the instruction following the current position.
>>>> +  /// SPAdj is the stack adjustment due to call frame, it's passed
>>>> along to
>>>> +  /// eliminateFrameIndex().
>>>> +  /// Returns the scavenged register.
>>>> +  unsigned scavengeRegisterBackwards(const TargetRegisterClass &RC,
>>>> +                                     MachineBasicBlock::iterator To,
>>>> +                                     bool RestoreAfter, int SPAdj);
>>>> +
>>>>    /// Tell the scavenger a register is used.
>>>>    void setRegUsed(unsigned Reg, LaneBitmask LaneMask =
>>>> LaneBitmask::getAll());
>>>>
>>>> @@ -202,6 +214,12 @@ private:
>>>>
>>>>    /// Mark live-in registers of basic block as used.
>>>>    void setLiveInsUsed(const MachineBasicBlock &MBB);
>>>> +
>>>> +  /// Spill a register after position \p After and reload it before
>>>> position
>>>> +  /// \p UseMI.
>>>> +  ScavengedInfo &spill(unsigned Reg, const TargetRegisterClass &RC,
>>>> int SPAdj,
>>>> +                       MachineBasicBlock::iterator After,
>>>> +                       MachineBasicBlock::iterator &UseMI);
>>>>  };
>>>>
>>>>  /// Replaces all frame index virtual registers with physical
>>>> registers. Uses the
>>>>
>>>> Modified: llvm/trunk/lib/CodeGen/RegisterScavenging.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/RegisterScavenging.cpp?rev=305516&r1=305515&r2=305516&view=diff
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/lib/CodeGen/RegisterScavenging.cpp (original)
>>>> +++ llvm/trunk/lib/CodeGen/RegisterScavenging.cpp Thu Jun 15 17:14:55
>>>> 2017
>>>> @@ -35,6 +35,7 @@
>>>>  #include "llvm/Target/TargetInstrInfo.h"
>>>>  #include "llvm/Target/TargetRegisterInfo.h"
>>>>  #include "llvm/Target/TargetSubtargetInfo.h"
>>>> +#include <algorithm>
>>>>  #include <cassert>
>>>>  #include <iterator>
>>>>  #include <limits>
>>>> @@ -260,6 +261,14 @@ void RegScavenger::backward() {
>>>>    const MachineInstr &MI = *MBBI;
>>>>    LiveUnits.stepBackward(MI);
>>>>
>>>> +  // Expire scavenge spill frameindex uses.
>>>> +  for (ScavengedInfo &I : Scavenged) {
>>>> +    if (I.Restore == &MI) {
>>>> +      I.Reg = 0;
>>>> +      I.Restore = nullptr;
>>>> +    }
>>>> +  }
>>>> +
>>>>    if (MBBI == MBB->begin()) {
>>>>      MBBI = MachineBasicBlock::iterator(nullptr);
>>>>      Tracking = false;
>>>> @@ -356,6 +365,76 @@ unsigned RegScavenger::findSurvivorReg(M
>>>>    return Survivor;
>>>>  }
>>>>
>>>> +/// Given the bitvector \p Available of free register units at position
>>>> +/// \p From. Search backwards to find a register that is part of \p
>>>> +/// Candidates and not used/clobbered until the point \p To. If there
>>>> is
>>>> +/// multiple candidates continue searching and pick the one that is
>>>> not used/
>>>> +/// clobbered for the longest time.
>>>> +/// Returns the register and the earliest position we know it to be
>>>> free or
>>>> +/// the position MBB.end() if no register is available.
>>>> +static std::pair<unsigned, MachineBasicBlock::iterator>
>>>> +findSurvivorBackwards(const TargetRegisterInfo &TRI,
>>>> +    MachineBasicBlock::iterator From, MachineBasicBlock::iterator To,
>>>> +    BitVector &Available, BitVector &Candidates) {
>>>> +  bool FoundTo = false;
>>>> +  unsigned Survivor = 0;
>>>> +  MachineBasicBlock::iterator Pos;
>>>> +  MachineBasicBlock &MBB = *From->getParent();
>>>> +  unsigned InstrLimit = 25;
>>>> +  unsigned InstrCountDown = InstrLimit;
>>>> +  for (MachineBasicBlock::iterator I = From;; --I) {
>>>> +    const MachineInstr &MI = *I;
>>>> +
>>>> +    // Remove any candidates touched by instruction.
>>>> +    bool FoundVReg = false;
>>>> +    for (const MachineOperand &MO : MI.operands()) {
>>>> +      if (MO.isRegMask()) {
>>>> +        Candidates.clearBitsNotInMask(MO.getRegMask());
>>>> +        continue;
>>>> +      }
>>>> +      if (!MO.isReg() || MO.isUndef() || MO.isDebug())
>>>> +        continue;
>>>> +      unsigned Reg = MO.getReg();
>>>> +      if (TargetRegisterInfo::isVirtualRegister(Reg)) {
>>>> +        FoundVReg = true;
>>>> +      } else if (TargetRegisterInfo::isPhysicalRegister(Reg)) {
>>>> +        for (MCRegAliasIterator AI(Reg, &TRI, true); AI.isValid();
>>>> ++AI)
>>>> +          Candidates.reset(*AI);
>>>> +      }
>>>> +    }
>>>> +
>>>> +    if (I == To) {
>>>> +      // If one of the available registers survived this long take it.
>>>> +      Available &= Candidates;
>>>> +      int Reg = Available.find_first();
>>>> +      if (Reg != -1)
>>>> +        return std::make_pair(Reg, MBB.end());
>>>> +      // Otherwise we will continue up to InstrLimit instructions to
>>>> find
>>>> +      // the register which is not defined/used for the longest time.
>>>> +      FoundTo = true;
>>>> +      Pos = To;
>>>> +    }
>>>> +    if (FoundTo) {
>>>> +      if (Survivor == 0 || !Candidates.test(Survivor)) {
>>>> +        int Reg = Candidates.find_first();
>>>> +        if (Reg == -1)
>>>> +          break;
>>>> +        Survivor = Reg;
>>>> +      }
>>>> +      if (--InstrCountDown == 0 || I == MBB.begin())
>>>> +        break;
>>>> +      if (FoundVReg) {
>>>> +        // Keep searching when we find a vreg since the spilled
>>>> register will
>>>> +        // be usefull for this other vreg as well later.
>>>> +        InstrCountDown = InstrLimit;
>>>> +        Pos = I;
>>>> +      }
>>>> +    }
>>>> +  }
>>>> +
>>>> +  return std::make_pair(Survivor, Pos);
>>>> +}
>>>> +
>>>>  static unsigned getFrameIndexOperandNum(MachineInstr &MI) {
>>>>    unsigned i = 0;
>>>>    while (!MI.getOperand(i).isFI()) {
>>>> @@ -365,44 +444,16 @@ static unsigned getFrameIndexOperandNum(
>>>>    return i;
>>>>  }
>>>>
>>>> -unsigned RegScavenger::scavengeRegister(const TargetRegisterClass *RC,
>>>> -                                        MachineBasicBlock::iterator I,
>>>> -                                        int SPAdj) {
>>>> -  MachineInstr &MI = *I;
>>>> -  const MachineFunction &MF = *MI.getParent()->getParent();
>>>> -  // Consider all allocatable registers in the register class initially
>>>> -  BitVector Candidates = TRI->getAllocatableSet(MF, RC);
>>>> -
>>>> -  // Exclude all the registers being used by the instruction.
>>>> -  for (const MachineOperand &MO : MI.operands()) {
>>>> -    if (MO.isReg() && MO.getReg() != 0 && !(MO.isUse() &&
>>>> MO.isUndef()) &&
>>>> -        !TargetRegisterInfo::isVirtualRegister(MO.getReg()))
>>>> -      for (MCRegAliasIterator AI(MO.getReg(), TRI, true);
>>>> AI.isValid(); ++AI)
>>>> -        Candidates.reset(*AI);
>>>> -  }
>>>> -
>>>> -  // Try to find a register that's unused if there is one, as then we
>>>> won't
>>>> -  // have to spill.
>>>> -  BitVector Available = getRegsAvailable(RC);
>>>> -  Available &= Candidates;
>>>> -  if (Available.any())
>>>> -    Candidates = Available;
>>>> -
>>>> -  // Find the register whose use is furthest away.
>>>> -  MachineBasicBlock::iterator UseMI;
>>>> -  unsigned SReg = findSurvivorReg(I, Candidates, 25, UseMI);
>>>> -
>>>> -  // If we found an unused register there is no reason to spill it.
>>>> -  if (!isRegUsed(SReg)) {
>>>> -    DEBUG(dbgs() << "Scavenged register: " << TRI->getName(SReg) <<
>>>> "\n");
>>>> -    return SReg;
>>>> -  }
>>>> -
>>>> +RegScavenger::ScavengedInfo &
>>>> +RegScavenger::spill(unsigned Reg, const TargetRegisterClass &RC, int
>>>> SPAdj,
>>>> +                    MachineBasicBlock::iterator Before,
>>>> +                    MachineBasicBlock::iterator &UseMI) {
>>>>    // Find an available scavenging slot with size and alignment matching
>>>>    // the requirements of the class RC.
>>>> +  const MachineFunction &MF = *Before->getParent()->getParent();
>>>>    const MachineFrameInfo &MFI = MF.getFrameInfo();
>>>> -  unsigned NeedSize = TRI->getSpillSize(*RC);
>>>> -  unsigned NeedAlign = TRI->getSpillAlignment(*RC);
>>>> +  unsigned NeedSize = TRI->getSpillSize(RC);
>>>> +  unsigned NeedAlign = TRI->getSpillAlignment(RC);
>>>>
>>>>    unsigned SI = Scavenged.size(), Diff =
>>>> std::numeric_limits<unsigned>::max();
>>>>    int FIB = MFI.getObjectIndexBegin(), FIE = MFI.getObjectIndexEnd();
>>>> @@ -437,39 +488,72 @@ unsigned RegScavenger::scavengeRegister(
>>>>    }
>>>>
>>>>    // Avoid infinite regress
>>>> -  Scavenged[SI].Reg = SReg;
>>>> +  Scavenged[SI].Reg = Reg;
>>>>
>>>>    // If the target knows how to save/restore the register, let it do
>>>> so;
>>>>    // otherwise, use the emergency stack spill slot.
>>>> -  if (!TRI->saveScavengerRegister(*MBB, I, UseMI, RC, SReg)) {
>>>> -    // Spill the scavenged register before I.
>>>> +  if (!TRI->saveScavengerRegister(*MBB, Before, UseMI, &RC, Reg)) {
>>>> +    // Spill the scavenged register before \p Before.
>>>>      int FI = Scavenged[SI].FrameIndex;
>>>>      if (FI < FIB || FI >= FIE) {
>>>>        std::string Msg = std::string("Error while trying to spill ") +
>>>> -          TRI->getName(SReg) + " from class " +
>>>> TRI->getRegClassName(RC) +
>>>> +          TRI->getName(Reg) + " from class " +
>>>> TRI->getRegClassName(&RC) +
>>>>            ": Cannot scavenge register without an emergency spill
>>>> slot!";
>>>>        report_fatal_error(Msg.c_str());
>>>>      }
>>>> -    TII->storeRegToStackSlot(*MBB, I, SReg, true,
>>>> Scavenged[SI].FrameIndex,
>>>> -                             RC, TRI);
>>>> -    MachineBasicBlock::iterator II = std::prev(I);
>>>> +    TII->storeRegToStackSlot(*MBB, Before, Reg, true,
>>>> Scavenged[SI].FrameIndex,
>>>> +                             &RC, TRI);
>>>> +    MachineBasicBlock::iterator II = std::prev(Before);
>>>>
>>>>      unsigned FIOperandNum = getFrameIndexOperandNum(*II);
>>>>      TRI->eliminateFrameIndex(II, SPAdj, FIOperandNum, this);
>>>>
>>>>      // Restore the scavenged register before its use (or first
>>>> terminator).
>>>> -    TII->loadRegFromStackSlot(*MBB, UseMI, SReg,
>>>> Scavenged[SI].FrameIndex,
>>>> -                              RC, TRI);
>>>> +    TII->loadRegFromStackSlot(*MBB, UseMI, Reg,
>>>> Scavenged[SI].FrameIndex,
>>>> +                              &RC, TRI);
>>>>      II = std::prev(UseMI);
>>>>
>>>>      FIOperandNum = getFrameIndexOperandNum(*II);
>>>>      TRI->eliminateFrameIndex(II, SPAdj, FIOperandNum, this);
>>>>    }
>>>> +  return Scavenged[SI];
>>>> +}
>>>> +
>>>> +unsigned RegScavenger::scavengeRegister(const TargetRegisterClass *RC,
>>>> +                                        MachineBasicBlock::iterator I,
>>>> +                                        int SPAdj) {
>>>> +  MachineInstr &MI = *I;
>>>> +  const MachineFunction &MF = *MI.getParent()->getParent();
>>>> +  // Consider all allocatable registers in the register class initially
>>>> +  BitVector Candidates = TRI->getAllocatableSet(MF, RC);
>>>> +
>>>> +  // Exclude all the registers being used by the instruction.
>>>> +  for (const MachineOperand &MO : MI.operands()) {
>>>> +    if (MO.isReg() && MO.getReg() != 0 && !(MO.isUse() &&
>>>> MO.isUndef()) &&
>>>> +        !TargetRegisterInfo::isVirtualRegister(MO.getReg()))
>>>> +      for (MCRegAliasIterator AI(MO.getReg(), TRI, true);
>>>> AI.isValid(); ++AI)
>>>> +        Candidates.reset(*AI);
>>>> +  }
>>>> +
>>>> +  // Try to find a register that's unused if there is one, as then we
>>>> won't
>>>> +  // have to spill.
>>>> +  BitVector Available = getRegsAvailable(RC);
>>>> +  Available &= Candidates;
>>>> +  if (Available.any())
>>>> +    Candidates = Available;
>>>> +
>>>> +  // Find the register whose use is furthest away.
>>>> +  MachineBasicBlock::iterator UseMI;
>>>> +  unsigned SReg = findSurvivorReg(I, Candidates, 25, UseMI);
>>>>
>>>> -  Scavenged[SI].Restore = &*std::prev(UseMI);
>>>> +  // If we found an unused register there is no reason to spill it.
>>>> +  if (!isRegUsed(SReg)) {
>>>> +    DEBUG(dbgs() << "Scavenged register: " << TRI->getName(SReg) <<
>>>> "\n");
>>>> +    return SReg;
>>>> +  }
>>>>
>>>> -  // Doing this here leads to infinite regress.
>>>> -  // Scavenged[SI].Reg = SReg;
>>>> +  ScavengedInfo &Scavenged = spill(SReg, *RC, SPAdj, I, UseMI);
>>>> +  Scavenged.Restore = &*std::prev(UseMI);
>>>>
>>>>    DEBUG(dbgs() << "Scavenged register (with spill): " <<
>>>> TRI->getName(SReg) <<
>>>>          "\n");
>>>> @@ -477,85 +561,200 @@ unsigned RegScavenger::scavengeRegister(
>>>>    return SReg;
>>>>  }
>>>>
>>>> -void llvm::scavengeFrameVirtualRegs(MachineFunction &MF, RegScavenger
>>>> &RS) {
>>>> -  // FIXME: Iterating over the instruction stream is unnecessary. We
>>>> can simply
>>>> -  // iterate over the vreg use list, which at this point only contains
>>>> machine
>>>> -  // operands for which eliminateFrameIndex need a new scratch reg.
>>>> +unsigned RegScavenger::scavengeRegisterBackwards(const
>>>> TargetRegisterClass &RC,
>>>> +
>>>>  MachineBasicBlock::iterator To,
>>>> +                                                 bool RestoreAfter,
>>>> int SPAdj) {
>>>> +  const MachineBasicBlock &MBB = *To->getParent();
>>>> +  const MachineFunction &MF = *MBB.getParent();
>>>> +  // Consider all allocatable registers in the register class initially
>>>> +  BitVector Candidates = TRI->getAllocatableSet(MF, &RC);
>>>>
>>>> -  // Run through the instructions and find any virtual registers.
>>>> -  MachineRegisterInfo &MRI = MF.getRegInfo();
>>>> -  for (MachineBasicBlock &MBB : MF) {
>>>> -    RS.enterBasicBlock(MBB);
>>>> +  // Try to find a register that's unused if there is one, as then we
>>>> won't
>>>> +  // have to spill.
>>>> +  BitVector Available = getRegsAvailable(&RC);
>>>>
>>>> -    int SPAdj = 0;
>>>> +  // Find the register whose use is furthest away.
>>>> +  MachineBasicBlock::iterator UseMI;
>>>> +  std::pair<unsigned, MachineBasicBlock::iterator> P =
>>>> +      findSurvivorBackwards(*TRI, MBBI, To, Available, Candidates);
>>>> +  unsigned Reg = P.first;
>>>> +  MachineBasicBlock::iterator SpillBefore = P.second;
>>>> +  assert(Reg != 0 && "No register left to scavenge!");
>>>> +  // Found an available register?
>>>> +  if (SpillBefore != MBB.end()) {
>>>> +    MachineBasicBlock::iterator ReloadAfter =
>>>> +      RestoreAfter ? std::next(MBBI) : MBBI;
>>>> +    MachineBasicBlock::iterator ReloadBefore = std::next(ReloadAfter);
>>>> +    DEBUG(dbgs() << "Reload before: " << *ReloadBefore << '\n');
>>>> +    ScavengedInfo &Scavenged = spill(Reg, RC, SPAdj, SpillBefore,
>>>> ReloadBefore);
>>>> +    Scavenged.Restore = &*std::prev(SpillBefore);
>>>> +    LiveUnits.removeReg(Reg);
>>>> +    DEBUG(dbgs() << "Scavenged register with spill: " << PrintReg(Reg,
>>>> TRI)
>>>> +          << " until " << *SpillBefore);
>>>> +  } else {
>>>> +    DEBUG(dbgs() << "Scavenged free register: " << PrintReg(Reg, TRI)
>>>> << '\n');
>>>> +  }
>>>> +  return Reg;
>>>> +}
>>>>
>>>> -    // The instruction stream may change in the loop, so check
>>>> MBB.end()
>>>> -    // directly.
>>>> -    for (MachineBasicBlock::iterator I = MBB.begin(); I != MBB.end();
>>>> ) {
>>>> -      // We might end up here again with a NULL iterator if we
>>>> scavenged a
>>>> -      // register for which we inserted spill code for definition by
>>>> what was
>>>> -      // originally the first instruction in MBB.
>>>> -      if (I == MachineBasicBlock::iterator(nullptr))
>>>> -        I = MBB.begin();
>>>> -
>>>> -      const MachineInstr &MI = *I;
>>>> -      MachineBasicBlock::iterator J = std::next(I);
>>>> -      MachineBasicBlock::iterator P =
>>>> -                         I == MBB.begin() ?
>>>> MachineBasicBlock::iterator(nullptr)
>>>> -                                          : std::prev(I);
>>>> -
>>>> -      // RS should process this instruction before we might scavenge
>>>> at this
>>>> -      // location. This is because we might be replacing a virtual
>>>> register
>>>> -      // defined by this instruction, and if so, registers killed by
>>>> this
>>>> -      // instruction are available, and defined registers are not.
>>>> -      RS.forward(I);
>>>> +/// Allocate a register for the virtual register \p VReg. The last use
>>>> of
>>>> +/// \p VReg is around the current position of the register scavenger
>>>> \p RS.
>>>> +/// \p ReserveAfter controls whether the scavenged register needs to
>>>> be reserved
>>>> +/// after the current instruction, otherwise it will only be reserved
>>>> before the
>>>> +/// current instruction.
>>>> +static unsigned scavengeVReg(MachineRegisterInfo &MRI, RegScavenger
>>>> &RS,
>>>> +                             unsigned VReg, bool ReserveAfter) {
>>>> +  const TargetRegisterInfo &TRI = *MRI.getTargetRegisterInfo();
>>>> +#ifndef NDEBUG
>>>> +  // Verify that all definitions and uses are in the same basic block.
>>>> +  const MachineBasicBlock *CommonMBB = nullptr;
>>>> +  // Real definition for the reg, re-definitions are not considered.
>>>> +  const MachineInstr *RealDef = nullptr;
>>>> +  for (MachineOperand &MO : MRI.reg_nodbg_operands(VReg)) {
>>>> +    MachineBasicBlock *MBB = MO.getParent()->getParent();
>>>> +    if (CommonMBB == nullptr)
>>>> +      CommonMBB = MBB;
>>>> +    assert(MBB == CommonMBB && "All defs+uses must be in the same
>>>> basic block");
>>>> +    if (MO.isDef()) {
>>>> +      const MachineInstr &MI = *MO.getParent();
>>>> +      if (!MI.readsRegister(VReg, &TRI)) {
>>>> +        assert(!RealDef || RealDef == &MI &&
>>>> +               "Can have at most one definition which is not a
>>>> redefinition");
>>>> +        RealDef = &MI;
>>>> +      }
>>>> +    }
>>>> +  }
>>>> +  assert(RealDef != nullptr && "Must have at least 1 Def");
>>>> +#endif
>>>> +
>>>> +  // We should only have one definition of the register. However to
>>>> accomodate
>>>> +  // the requirements of two address code we also allow definitions in
>>>> +  // subsequent instructions provided they also read the register.
>>>> That way
>>>> +  // we get a single contiguous lifetime.
>>>> +  //
>>>> +  // Definitions in MRI.def_begin() are unordered, search for the
>>>> first.
>>>> +  MachineRegisterInfo::def_iterator FirstDef =
>>>> +    std::find_if(MRI.def_begin(VReg), MRI.def_end(),
>>>> +                 [VReg, &TRI](const MachineOperand &MO) {
>>>> +      return !MO.getParent()->readsRegister(VReg, &TRI);
>>>> +    });
>>>> +  assert(FirstDef != MRI.def_end() &&
>>>> +         "Must have one definition that does not redefine vreg");
>>>> +  MachineInstr &DefMI = *FirstDef->getParent();
>>>> +
>>>> +  // The register scavenger will report a free register inserting an
>>>> emergency
>>>> +  // spill/reload if necessary.
>>>> +  int SPAdj = 0;
>>>> +  const TargetRegisterClass &RC = *MRI.getRegClass(VReg);
>>>> +  unsigned SReg = RS.scavengeRegisterBackwards(RC, DefMI.getIterator(),
>>>> +                                               ReserveAfter, SPAdj);
>>>> +  MRI.replaceRegWith(VReg, SReg);
>>>> +  ++NumScavengedRegs;
>>>> +  return SReg;
>>>> +}
>>>>
>>>> -      for (const MachineOperand &MO : MI.operands()) {
>>>> +/// Allocate (scavenge) vregs inside a single basic block.
>>>> +/// Returns true if the target spill callback created new vregs and a
>>>> 2nd pass
>>>> +/// is necessary.
>>>> +static bool scavengeFrameVirtualRegsInBlock(MachineRegisterInfo &MRI,
>>>> +                                            RegScavenger &RS,
>>>> +                                            MachineBasicBlock &MBB) {
>>>> +  const TargetRegisterInfo &TRI = *MRI.getTargetRegisterInfo();
>>>> +  RS.enterBasicBlockEnd(MBB);
>>>> +
>>>> +  unsigned InitialNumVirtRegs = MRI.getNumVirtRegs();
>>>> +  bool NextInstructionReadsVReg = false;
>>>> +  for (MachineBasicBlock::iterator I = MBB.end(); I != MBB.begin(); ) {
>>>> +    --I;
>>>> +    // Move RegScavenger to the position between *I and *std::next(I).
>>>> +    RS.backward(I);
>>>> +
>>>> +    // Look for unassigned vregs in the uses of *std::next(I).
>>>> +    if (NextInstructionReadsVReg) {
>>>> +      MachineBasicBlock::iterator N = std::next(I);
>>>> +      const MachineInstr &NMI = *N;
>>>> +      for (const MachineOperand &MO : NMI.operands()) {
>>>>          if (!MO.isReg())
>>>>            continue;
>>>>          unsigned Reg = MO.getReg();
>>>> -        if (!TargetRegisterInfo::isVirtualRegister(Reg))
>>>> +        // We only care about virtual registers and ignore virtual
>>>> registers
>>>> +        // created by the target callbacks in the process (those will
>>>> be handled
>>>> +        // in a scavenging round).
>>>> +        if (!TargetRegisterInfo::isVirtualRegister(Reg) ||
>>>> +            TargetRegisterInfo::virtReg2Index(Reg) >=
>>>> InitialNumVirtRegs)
>>>>            continue;
>>>> +        if (!MO.readsReg())
>>>> +          continue;
>>>> +
>>>> +        unsigned SReg = scavengeVReg(MRI, RS, Reg, true);
>>>> +        N->addRegisterKilled(SReg, &TRI, false);
>>>> +        RS.setRegUsed(SReg);
>>>> +      }
>>>> +    }
>>>> +
>>>> +    // Look for unassigned vregs in the defs of *I.
>>>> +    NextInstructionReadsVReg = false;
>>>> +    const MachineInstr &MI = *I;
>>>> +    for (const MachineOperand &MO : MI.operands()) {
>>>> +      if (!MO.isReg())
>>>> +        continue;
>>>> +      unsigned Reg = MO.getReg();
>>>> +      // Only vregs, no newly created vregs (see above).
>>>> +      if (!TargetRegisterInfo::isVirtualRegister(Reg) ||
>>>> +          TargetRegisterInfo::virtReg2Index(Reg) >= InitialNumVirtRegs)
>>>> +        continue;
>>>> +      // We have to look at all operands anyway so we can precalculate
>>>> here
>>>> +      // whether there is a reading operand. This allows use to skip
>>>> the use
>>>> +      // step in the next iteration if there was none.
>>>> +      assert(!MO.isInternalRead() && "Cannot assign inside bundles");
>>>> +      assert((!MO.isUndef() || MO.isDef()) && "Cannot handle undef
>>>> uses");
>>>> +      if (MO.readsReg()) {
>>>> +        NextInstructionReadsVReg = true;
>>>> +      }
>>>> +      if (MO.isDef()) {
>>>> +        unsigned SReg = scavengeVReg(MRI, RS, Reg, false);
>>>> +        I->addRegisterDead(SReg, &TRI, false);
>>>> +      }
>>>> +    }
>>>> +  }
>>>> +#ifndef NDEBUG
>>>> +  for (const MachineOperand &MO : MBB.front().operands()) {
>>>> +    if (!MO.isReg() ||
>>>> !TargetRegisterInfo::isVirtualRegister(MO.getReg()))
>>>> +      continue;
>>>> +    assert(!MO.isInternalRead() && "Cannot assign inside bundles");
>>>> +    assert((!MO.isUndef() || MO.isDef()) && "Cannot handle undef
>>>> uses");
>>>> +    assert(!MO.readsReg() && "Vreg use in first instruction not
>>>> allowed");
>>>> +  }
>>>> +#endif
>>>> +
>>>> +  return MRI.getNumVirtRegs() != InitialNumVirtRegs;
>>>> +}
>>>> +
>>>> +void llvm::scavengeFrameVirtualRegs(MachineFunction &MF, RegScavenger
>>>> &RS) {
>>>> +  // FIXME: Iterating over the instruction stream is unnecessary. We
>>>> can simply
>>>> +  // iterate over the vreg use list, which at this point only contains
>>>> machine
>>>> +  // operands for which eliminateFrameIndex need a new scratch reg.
>>>> +  MachineRegisterInfo &MRI = MF.getRegInfo();
>>>> +  // Shortcut.
>>>> +  if (MRI.getNumVirtRegs() == 0) {
>>>> +
>>>> MF.getProperties().set(MachineFunctionProperties::Property::NoVRegs);
>>>> +    return;
>>>> +  }
>>>> +
>>>> +  // Run through the instructions and find any virtual registers.
>>>> +  for (MachineBasicBlock &MBB : MF) {
>>>> +    if (MBB.empty())
>>>> +      continue;
>>>>
>>>> -        // When we first encounter a new virtual register, it
>>>> -        // must be a definition.
>>>> -        assert(MO.isDef() && "frame index virtual missing def!");
>>>> -        // Scavenge a new scratch register
>>>> -        const TargetRegisterClass *RC = MRI.getRegClass(Reg);
>>>> -        unsigned ScratchReg = RS.scavengeRegister(RC, J, SPAdj);
>>>> -
>>>> -        ++NumScavengedRegs;
>>>> -
>>>> -        // Replace this reference to the virtual register with the
>>>> -        // scratch register.
>>>> -        assert(ScratchReg && "Missing scratch register!");
>>>> -        MRI.replaceRegWith(Reg, ScratchReg);
>>>> -
>>>> -        // Because this instruction was processed by the RS before this
>>>> -        // register was allocated, make sure that the RS now records
>>>> the
>>>> -        // register as being used.
>>>> -        RS.setRegUsed(ScratchReg);
>>>> -      }
>>>> -
>>>> -      // If the scavenger needed to use one of its spill slots, the
>>>> -      // spill code will have been inserted in between I and J. This
>>>> is a
>>>> -      // problem because we need the spill code before I: Move I to
>>>> just
>>>> -      // prior to J.
>>>> -      if (I != std::prev(J)) {
>>>> -        MBB.splice(J, &MBB, I);
>>>> -
>>>> -        // Before we move I, we need to prepare the RS to visit I
>>>> again.
>>>> -        // Specifically, RS will assert if it sees uses of registers
>>>> that
>>>> -        // it believes are undefined. Because we have already processed
>>>> -        // register kills in I, when it visits I again, it will
>>>> believe that
>>>> -        // those registers are undefined. To avoid this situation,
>>>> unprocess
>>>> -        // the instruction I.
>>>> -        assert(RS.getCurrentPosition() == I &&
>>>> -          "The register scavenger has an unexpected position");
>>>> -        I = P;
>>>> -        RS.unprocess(P);
>>>> -      } else
>>>> -        ++I;
>>>> +    bool Again = scavengeFrameVirtualRegsInBlock(MRI, RS, MBB);
>>>> +    if (Again) {
>>>> +      DEBUG(dbgs() << "Warning: Required two scavenging passes for
>>>> block "
>>>> +            << MBB.getName() << '\n');
>>>> +      Again = scavengeFrameVirtualRegsInBlock(MRI, RS, MBB);
>>>> +      // The target required a 2nd run (because it created new vregs
>>>> while
>>>> +      // spilling). Refuse to do another pass to keep compiletime in
>>>> check.
>>>> +      if (Again)
>>>> +        report_fatal_error("Incomplete scavenging after 2nd pass");
>>>>      }
>>>>    }
>>>>
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/AArch64/reg-scavenge-frame.mir
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/reg-scavenge-frame.mir?rev=305516&r1=305515&r2=305516&view=diff
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/AArch64/reg-scavenge-frame.mir (original)
>>>> +++ llvm/trunk/test/CodeGen/AArch64/reg-scavenge-frame.mir Thu Jun 15
>>>> 17:14:55 2017
>>>> @@ -45,8 +45,42 @@ body:             |
>>>>      %fp = COPY %xzr
>>>>      %lr = COPY %xzr
>>>>      ST1Fourv1d killed %d16_d17_d18_d19, %stack.0 :: (store 32 into
>>>> %stack.0, align 8)
>>>> -# CHECK:  STRXui killed %[[SCAVREG:x[0-9]+|fp|lr]], %sp,
>>>> [[SPOFFSET:[0-9]+]] :: (store 8 into %stack.1)
>>>> -# CHECK-NEXT:  %[[SCAVREG]] = ADDXri %sp, {{[0-9]+}}, 0
>>>> -# CHECK-NEXT:  ST1Fourv1d killed %d16_d17_d18_d19, killed %[[SCAVREG]]
>>>> :: (store 32 into %stack.0, align 8)
>>>> -# CHECK-NEXT:  %[[SCAVREG]] = LDRXui %sp, [[SPOFFSET]] :: (load 8 from
>>>> %stack.1)
>>>> +    ; CHECK:  STRXui killed %[[SCAVREG:x[0-9]+|fp|lr]], %sp,
>>>> [[SPOFFSET:[0-9]+]] :: (store 8 into %stack.1)
>>>> +    ; CHECK-NEXT:  %[[SCAVREG]] = ADDXri %sp, {{[0-9]+}}, 0
>>>> +    ; CHECK-NEXT:  ST1Fourv1d killed %d16_d17_d18_d19, killed
>>>> %[[SCAVREG]] :: (store 32 into %stack.0, align 8)
>>>> +    ; CHECK-NEXT:  %[[SCAVREG]] = LDRXui %sp, [[SPOFFSET]] :: (load 8
>>>> from %stack.1)
>>>> +
>>>> +    HINT 0, implicit %x0
>>>> +    HINT 0, implicit %x1
>>>> +    HINT 0, implicit %x2
>>>> +    HINT 0, implicit %x3
>>>> +    HINT 0, implicit %x4
>>>> +    HINT 0, implicit %x5
>>>> +    HINT 0, implicit %x6
>>>> +    HINT 0, implicit %x7
>>>> +    HINT 0, implicit %x8
>>>> +    HINT 0, implicit %x9
>>>> +    HINT 0, implicit %x10
>>>> +    HINT 0, implicit %x11
>>>> +    HINT 0, implicit %x12
>>>> +    HINT 0, implicit %x13
>>>> +    HINT 0, implicit %x14
>>>> +    HINT 0, implicit %x15
>>>> +    HINT 0, implicit %x16
>>>> +    HINT 0, implicit %x17
>>>> +    HINT 0, implicit %x18
>>>> +    HINT 0, implicit %x19
>>>> +    HINT 0, implicit %x20
>>>> +    HINT 0, implicit %x21
>>>> +    HINT 0, implicit %x22
>>>> +    HINT 0, implicit %x23
>>>> +    HINT 0, implicit %x24
>>>> +    HINT 0, implicit %x25
>>>> +    HINT 0, implicit %x26
>>>> +    HINT 0, implicit %x27
>>>> +    HINT 0, implicit %x28
>>>> +    HINT 0, implicit %fp
>>>> +    HINT 0, implicit %lr
>>>> +
>>>> +    RET_ReallyLR
>>>>  ...
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll?rev=305516&r1=305515&r2=305516&view=diff
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll Thu Jun 15
>>>> 17:14:55 2017
>>>> @@ -39,44 +39,49 @@ define amdgpu_kernel void @max_9_sgprs(i
>>>>  ; features when the number of registers is frozen), this ends up using
>>>>  ; more than expected.
>>>>
>>>> -; ALL-LABEL: {{^}}max_12_sgprs_14_input_sgprs:
>>>> -; TOSGPR: SGPRBlocks: 1
>>>> -; TOSGPR: NumSGPRsForWavesPerEU: 16
>>>> -
>>>> -; TOSMEM: s_mov_b64 s[10:11], s[2:3]
>>>> -; TOSMEM: s_mov_b64 s[8:9], s[0:1]
>>>> -; TOSMEM: s_mov_b32 s7, s13
>>>> -
>>>> -; TOSMEM: SGPRBlocks: 1
>>>> -; TOSMEM: NumSGPRsForWavesPerEU: 16
>>>> -define amdgpu_kernel void @max_12_sgprs_14_input_sgprs(i32
>>>> addrspace(1)* %out1,
>>>> -                                        i32 addrspace(1)* %out2,
>>>> -                                        i32 addrspace(1)* %out3,
>>>> -                                        i32 addrspace(1)* %out4,
>>>> -                                        i32 %one, i32 %two, i32
>>>> %three, i32 %four) #2 {
>>>> -  %x.0 = call i32 @llvm.amdgcn.workgroup.id.x()
>>>> -  %x.1 = call i32 @llvm.amdgcn.workgroup.id.y()
>>>> -  %x.2 = call i32 @llvm.amdgcn.workgroup.id.z()
>>>> -  %x.3 = call i64 @llvm.amdgcn.dispatch.id()
>>>> -  %x.4 = call i8 addrspace(2)* @llvm.amdgcn.dispatch.ptr()
>>>> -  %x.5 = call i8 addrspace(2)* @llvm.amdgcn.queue.ptr()
>>>> -  store volatile i32 0, i32* undef
>>>> -  br label %stores
>>>> -
>>>> -stores:
>>>> -  store volatile i32 %x.0, i32 addrspace(1)* undef
>>>> -  store volatile i32 %x.0, i32 addrspace(1)* undef
>>>> -  store volatile i32 %x.0, i32 addrspace(1)* undef
>>>> -  store volatile i64 %x.3, i64 addrspace(1)* undef
>>>> -  store volatile i8 addrspace(2)* %x.4, i8 addrspace(2)* addrspace(1)*
>>>> undef
>>>> -  store volatile i8 addrspace(2)* %x.5, i8 addrspace(2)* addrspace(1)*
>>>> undef
>>>> -
>>>> -  store i32 %one, i32 addrspace(1)* %out1
>>>> -  store i32 %two, i32 addrspace(1)* %out2
>>>> -  store i32 %three, i32 addrspace(1)* %out3
>>>> -  store i32 %four, i32 addrspace(1)* %out4
>>>> -  ret void
>>>> -}
>>>> +; XALL-LABEL: {{^}}max_12_sgprs_14_input_sgprs:
>>>> +; XTOSGPR: SGPRBlocks: 1
>>>> +; XTOSGPR: NumSGPRsForWavesPerEU: 16
>>>> +
>>>> +; XTOSMEM: s_mov_b64 s[10:11], s[2:3]
>>>> +; XTOSMEM: s_mov_b64 s[8:9], s[0:1]
>>>> +; XTOSMEM: s_mov_b32 s7, s13
>>>> +
>>>> +; XTOSMEM: SGPRBlocks: 1
>>>> +; XTOSMEM: NumSGPRsForWavesPerEU: 16
>>>> +;
>>>> +; This test case is disabled: When calculating the spillslot addresses
>>>> AMDGPU
>>>> +; creates an extra vreg to save/restore m0 which in a point of maximum
>>>> register
>>>> +; pressure would trigger an endless loop; the compiler aborts earlier
>>>> with
>>>> +; "Incomplete scavenging after 2nd pass" in practice.
>>>> +;define amdgpu_kernel void @max_12_sgprs_14_input_sgprs(i32
>>>> addrspace(1)* %out1,
>>>> +;                                        i32 addrspace(1)* %out2,
>>>> +;                                        i32 addrspace(1)* %out3,
>>>> +;                                        i32 addrspace(1)* %out4,
>>>> +;                                        i32 %one, i32 %two, i32
>>>> %three, i32 %four) #2 {
>>>> +;  %x.0 = call i32 @llvm.amdgcn.workgroup.id.x()
>>>> +;  %x.1 = call i32 @llvm.amdgcn.workgroup.id.y()
>>>> +;  %x.2 = call i32 @llvm.amdgcn.workgroup.id.z()
>>>> +;  %x.3 = call i64 @llvm.amdgcn.dispatch.id()
>>>> +;  %x.4 = call i8 addrspace(2)* @llvm.amdgcn.dispatch.ptr()
>>>> +;  %x.5 = call i8 addrspace(2)* @llvm.amdgcn.queue.ptr()
>>>> +;  store volatile i32 0, i32* undef
>>>> +;  br label %stores
>>>> +;
>>>> +;stores:
>>>> +;  store volatile i32 %x.0, i32 addrspace(1)* undef
>>>> +;  store volatile i32 %x.0, i32 addrspace(1)* undef
>>>> +;  store volatile i32 %x.0, i32 addrspace(1)* undef
>>>> +;  store volatile i64 %x.3, i64 addrspace(1)* undef
>>>> +;  store volatile i8 addrspace(2)* %x.4, i8 addrspace(2)*
>>>> addrspace(1)* undef
>>>> +;  store volatile i8 addrspace(2)* %x.5, i8 addrspace(2)*
>>>> addrspace(1)* undef
>>>> +;
>>>> +;  store i32 %one, i32 addrspace(1)* %out1
>>>> +;  store i32 %two, i32 addrspace(1)* %out2
>>>> +;  store i32 %three, i32 addrspace(1)* %out3
>>>> +;  store i32 %four, i32 addrspace(1)* %out4
>>>> +;  ret void
>>>> +;}
>>>>
>>>>  ; The following test is commented out for now; http://llvm.org/PR31230
>>>>  ; XALL-LABEL: max_12_sgprs_12_input_sgprs{{$}}
>>>>
>>>> Modified:
>>>> llvm/trunk/test/CodeGen/AMDGPU/code-object-metadata-kernel-debug-props.ll
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/code-object-metadata-kernel-debug-props.ll?rev=305516&r1=305515&r2=305516&view=diff
>>>>
>>>> ==============================================================================
>>>> ---
>>>> llvm/trunk/test/CodeGen/AMDGPU/code-object-metadata-kernel-debug-props.ll
>>>> (original)
>>>> +++
>>>> llvm/trunk/test/CodeGen/AMDGPU/code-object-metadata-kernel-debug-props.ll
>>>> Thu Jun 15 17:14:55 2017
>>>> @@ -12,8 +12,8 @@ declare void @llvm.dbg.declare(metadata,
>>>>  ; CHECK:      DebugProps:
>>>>  ; CHECK:        DebuggerABIVersion:                [ 1, 0 ]
>>>>  ; CHECK:        ReservedNumVGPRs:                  4
>>>> -; GFX700:       ReservedFirstVGPR:                 11
>>>> -; GFX800:       ReservedFirstVGPR:                 11
>>>> +; GFX700:       ReservedFirstVGPR:                 8
>>>> +; GFX800:       ReservedFirstVGPR:                 8
>>>>  ; GFX9:         ReservedFirstVGPR:                 14
>>>>  ; CHECK:        PrivateSegmentBufferSGPR:          0
>>>>  ; CHECK:        WavefrontPrivateSegmentOffsetSGPR: 11
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/AMDGPU/frame-index-elimination.ll
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/frame-index-elimination.ll?rev=305516&r1=305515&r2=305516&view=diff
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/AMDGPU/frame-index-elimination.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/AMDGPU/frame-index-elimination.ll Thu Jun
>>>> 15 17:14:55 2017
>>>> @@ -22,9 +22,9 @@ define void @func_mov_fi_i32() #0 {
>>>>
>>>>  ; GCN-LABEL: {{^}}func_add_constant_to_fi_i32:
>>>>  ; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
>>>> -; GCN: s_sub_u32 s6, s5, s4
>>>> -; GCN-NEXT: s_lshr_b32 s6, s6, 6
>>>> -; GCN-NEXT: v_add_i32_e64 v0, s{{\[[0-9]+:[0-9]+\]}}, s6, 4
>>>> +; GCN: s_sub_u32 vcc_hi, s5, s4
>>>> +; GCN-NEXT: s_lshr_b32 vcc_hi, vcc_hi, 6
>>>> +; GCN-NEXT: v_add_i32_e64 v0, {{s\[[0-9]+:[0-9]+\]|vcc}}, vcc_hi, 4
>>>>  ; GCN-NEXT: v_add_i32_e32 v0, vcc, 4, v0
>>>>  ; GCN-NOT: v_mov
>>>>  ; GCN: ds_write_b32 v0, v0
>>>> @@ -71,8 +71,8 @@ define void @func_load_private_arg_i32_p
>>>>
>>>>  ; GCN-LABEL: {{^}}void_func_byval_struct_i8_i32_ptr:
>>>>  ; GCN: s_waitcnt
>>>> -; GCN-NEXT: s_sub_u32 s6, s5, s4
>>>> -; GCN-NEXT: v_lshr_b32_e64 v0, s6, 6
>>>> +; GCN-NEXT: s_sub_u32 vcc_hi, s5, s4
>>>> +; GCN-NEXT: v_lshr_b32_e64 v0, vcc_hi, 6
>>>>  ; GCN-NEXT: v_add_i32_e32 v0, vcc, 4, v0
>>>>  ; GCN-NOT: v_mov
>>>>  ; GCN: ds_write_b32 v0, v0
>>>> @@ -99,8 +99,8 @@ define void @void_func_byval_struct_i8_i
>>>>  }
>>>>
>>>>  ; GCN-LABEL: {{^}}void_func_byval_struct_i8_i32_ptr_nonentry_block:
>>>> -; GCN: s_sub_u32 s8, s5, s4
>>>> -; GCN: v_lshr_b32_e64 v1, s8, 6
>>>> +; GCN: s_sub_u32 vcc_hi, s5, s4
>>>> +; GCN: v_lshr_b32_e64 v1, vcc_hi, 6
>>>>  ; GCN: s_and_saveexec_b64
>>>>
>>>>  ; GCN: v_add_i32_e32 v0, vcc, 4, v1
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/ARM/alloca-align.ll
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/alloca-align.ll?rev=305516&r1=305515&r2=305516&view=diff
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/ARM/alloca-align.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/ARM/alloca-align.ll Thu Jun 15 17:14:55 2017
>>>> @@ -12,7 +12,7 @@ declare void @bar(i32*, [20000 x i8]* by
>>>>  ; And a base pointer getting used.
>>>>  ; CHECK: mov r6, sp
>>>>  ; Which is passed to the call
>>>> -; CHECK: add [[REG:r[0-9]+]], r6, #19456
>>>> +; CHECK: add [[REG:r[0-9]+|lr]], r6, #19456
>>>>  ; CHECK: add r0, [[REG]], #536
>>>>  ; CHECK: bl bar
>>>>  define void @foo([20000 x i8]* %addr) {
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/ARM/execute-only-big-stack-frame.ll
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/execute-only-big-stack-frame.ll?rev=305516&r1=305515&r2=305516&view=diff
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/ARM/execute-only-big-stack-frame.ll
>>>> (original)
>>>> +++ llvm/trunk/test/CodeGen/ARM/execute-only-big-stack-frame.ll Thu Jun
>>>> 15 17:14:55 2017
>>>> @@ -10,10 +10,10 @@ define i8 @test_big_stack_frame() {
>>>>  ; CHECK-SUBW-ADDW-NOT:   ldr {{r[0-9]+}}, .{{.*}}
>>>>  ; CHECK-SUBW-ADDW:       sub.w sp, sp, #65536
>>>>  ; CHECK-SUBW-ADDW-NOT:   ldr {{r[0-9]+}}, .{{.*}}
>>>> -; CHECK-SUBW-ADDW:       add.w [[REG1:r[0-9]+]], sp, #255
>>>> +; CHECK-SUBW-ADDW:       add.w [[REG1:r[0-9]+|lr]], sp, #255
>>>>  ; CHECK-SUBW-ADDW:       add.w {{r[0-9]+}}, [[REG1]], #65280
>>>>  ; CHECK-SUBW-ADDW-NOT:   ldr {{r[0-9]+}}, .{{.*}}
>>>> -; CHECK-SUBW-ADDW:       add.w lr, sp, #61440
>>>> +; CHECK-SUBW-ADDW:       add.w [[REGX:r[0-9]+|lr]], sp, #61440
>>>>  ; CHECK-SUBW-ADDW-NOT:   ldr {{r[0-9]+}}, .{{.*}}
>>>>  ; CHECK-SUBW-ADDW:       add.w sp, sp, #65536
>>>>
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/ARM/fpoffset_overflow.mir
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/fpoffset_overflow.mir?rev=305516&r1=305515&r2=305516&view=diff
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/ARM/fpoffset_overflow.mir (original)
>>>> +++ llvm/trunk/test/CodeGen/ARM/fpoffset_overflow.mir Thu Jun 15
>>>> 17:14:55 2017
>>>> @@ -3,10 +3,10 @@
>>>>  # This should trigger an emergency spill in the register scavenger
>>>> because the
>>>>  # frame offset into the large argument is too large.
>>>>  # CHECK-LABEL: name: func0
>>>> -# CHECK: t2STRi12 killed %r7, %sp, 0, 14, _ :: (store 4 into %stack.0)
>>>> -# CHECK: %r7 = t2ADDri killed %sp, 4096, 14, _, _
>>>> -# CHECK: %r11 = t2LDRi12 killed %r7, 36, 14, _ :: (load 4)
>>>> -# CHECK: %r7 = t2LDRi12 %sp, 0, 14, _ :: (load 4 from %stack.0)
>>>> +# CHECK: t2STRi12 killed [[SPILLED:%r[0-9]+]], %sp, 0, 14, _ :: (store
>>>> 4 into %stack.0)
>>>> +# CHECK: [[SPILLED]] = t2ADDri killed %sp, 4096, 14, _, _
>>>> +# CHECK: %sp = t2LDRi12 killed [[SPILLED]], 40, 14, _ :: (load 4)
>>>> +# CHECK: [[SPILLED]] = t2LDRi12 %sp, 0, 14, _ :: (load 4 from %stack.0)
>>>>  name: func0
>>>>  tracksRegLiveness: true
>>>>  fixedStack:
>>>> @@ -23,6 +23,7 @@ body: |
>>>>      %r4 = IMPLICIT_DEF
>>>>      %r5 = IMPLICIT_DEF
>>>>      %r6 = IMPLICIT_DEF
>>>> +    %r7 = IMPLICIT_DEF
>>>>      %r8 = IMPLICIT_DEF
>>>>      %r9 = IMPLICIT_DEF
>>>>      %r10 = IMPLICIT_DEF
>>>> @@ -30,7 +31,7 @@ body: |
>>>>      %r12 = IMPLICIT_DEF
>>>>      %lr = IMPLICIT_DEF
>>>>
>>>> -    %r11 = t2LDRi12 %fixed-stack.0, 0, 14, _ :: (load 4)
>>>> +    %sp = t2LDRi12 %fixed-stack.0, 0, 14, _ :: (load 4)
>>>>
>>>>      KILL %r0
>>>>      KILL %r1
>>>> @@ -39,6 +40,7 @@ body: |
>>>>      KILL %r4
>>>>      KILL %r5
>>>>      KILL %r6
>>>> +    KILL %r7
>>>>      KILL %r8
>>>>      KILL %r9
>>>>      KILL %r10
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/Mips/emergency-spill-slot-near-fp.ll
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Mips/emergency-spill-slot-near-fp.ll?rev=305516&r1=305515&r2=305516&view=diff
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/Mips/emergency-spill-slot-near-fp.ll
>>>> (original)
>>>> +++ llvm/trunk/test/CodeGen/Mips/emergency-spill-slot-near-fp.ll Thu
>>>> Jun 15 17:14:55 2017
>>>> @@ -1,34 +1,62 @@
>>>> -; Check that register scavenging spill slot is close to $fp.
>>>>  ; RUN: llc -march=mipsel -O0 -relocation-model=pic < %s | FileCheck %s
>>>> +; Check that register scavenging spill slot is close to $fp.
>>>> +target triple="mipsel--"
>>>>
>>>> -; CHECK: sw ${{.*}}, 8($sp)
>>>> -; CHECK: lw ${{.*}}, 8($sp)
>>>> + at var = external global i32
>>>> + at ptrvar = external global i8*
>>>>
>>>> -define i32 @main(i32 signext %argc, i8** %argv) #0 {
>>>> -entry:
>>>> -  %retval = alloca i32, align 4
>>>> -  %argc.addr = alloca i32, align 4
>>>> -  %argv.addr = alloca i8**, align 4
>>>> -  %v0 = alloca <16 x i8>, align 16
>>>> -  %.compoundliteral = alloca <16 x i8>, align 16
>>>> -  %v1 = alloca <16 x i8>, align 16
>>>> -  %.compoundliteral1 = alloca <16 x i8>, align 16
>>>> -  %unused_variable = alloca [16384 x i32], align 4
>>>> -  %result = alloca <16 x i8>, align 16
>>>> -  store i32 0, i32* %retval
>>>> -  store i32 %argc, i32* %argc.addr, align 4
>>>> -  store i8** %argv, i8*** %argv.addr, align 4
>>>> -  store <16 x i8> <i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8
>>>> 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16>, <16 x i8>*
>>>> %.compoundliteral
>>>> -  %0 = load <16 x i8>, <16 x i8>* %.compoundliteral
>>>> -  store <16 x i8> %0, <16 x i8>* %v0, align 16
>>>> -  store <16 x i8> zeroinitializer, <16 x i8>* %.compoundliteral1
>>>> -  %1 = load <16 x i8>, <16 x i8>* %.compoundliteral1
>>>> -  store <16 x i8> %1, <16 x i8>* %v1, align 16
>>>> -  %2 = load <16 x i8>, <16 x i8>* %v0, align 16
>>>> -  %3 = load <16 x i8>, <16 x i8>* %v1, align 16
>>>> -  %mul = mul <16 x i8> %2, %3
>>>> -  store <16 x i8> %mul, <16 x i8>* %result, align 16
>>>> -  ret i32 0
>>>> -}
>>>> +; CHECK-LABEL: func:
>>>> +define void @func() {
>>>> +  %space = alloca i32, align 4
>>>> +  %stackspace = alloca[16384 x i32], align 4
>>>> +
>>>> +  ; ensure stackspace is not optimized out
>>>> +  %stackspace_casted = bitcast [16384 x i32]* %stackspace to i8*
>>>> +  store volatile i8* %stackspace_casted, i8** @ptrvar
>>>>
>>>> -attributes #0 = { noinline "no-frame-pointer-elim"="true" }
>>>> +  ; Load values to increase register pressure.
>>>> +  %v0 = load volatile i32, i32* @var
>>>> +  %v1 = load volatile i32, i32* @var
>>>> +  %v2 = load volatile i32, i32* @var
>>>> +  %v3 = load volatile i32, i32* @var
>>>> +  %v4 = load volatile i32, i32* @var
>>>> +  %v5 = load volatile i32, i32* @var
>>>> +  %v6 = load volatile i32, i32* @var
>>>> +  %v7 = load volatile i32, i32* @var
>>>> +  %v8 = load volatile i32, i32* @var
>>>> +  %v9 = load volatile i32, i32* @var
>>>> +  %v10 = load volatile i32, i32* @var
>>>> +  %v11 = load volatile i32, i32* @var
>>>> +  %v12 = load volatile i32, i32* @var
>>>> +  %v13 = load volatile i32, i32* @var
>>>> +  %v14 = load volatile i32, i32* @var
>>>> +  %v15 = load volatile i32, i32* @var
>>>> +  %v16 = load volatile i32, i32* @var
>>>> +
>>>> +  ; Computing a stack-relative values needs an additional register.
>>>> +  ; We should get an emergency spill/reload for this.
>>>> +  ; CHECK: sw ${{.*}}, 0($sp)
>>>> +  ; CHECK: lw ${{.*}}, 0($sp)
>>>> +  store volatile i32 %v0, i32* %space
>>>> +
>>>> +  ; store values so they are used.
>>>> +  store volatile i32 %v0, i32* @var
>>>> +  store volatile i32 %v1, i32* @var
>>>> +  store volatile i32 %v2, i32* @var
>>>> +  store volatile i32 %v3, i32* @var
>>>> +  store volatile i32 %v4, i32* @var
>>>> +  store volatile i32 %v5, i32* @var
>>>> +  store volatile i32 %v6, i32* @var
>>>> +  store volatile i32 %v7, i32* @var
>>>> +  store volatile i32 %v8, i32* @var
>>>> +  store volatile i32 %v9, i32* @var
>>>> +  store volatile i32 %v10, i32* @var
>>>> +  store volatile i32 %v11, i32* @var
>>>> +  store volatile i32 %v12, i32* @var
>>>> +  store volatile i32 %v13, i32* @var
>>>> +  store volatile i32 %v14, i32* @var
>>>> +  store volatile i32 %v15, i32* @var
>>>> +  store volatile i32 %v16, i32* @var
>>>> +
>>>> +  ret void
>>>> +}
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/PowerPC/dyn-alloca-aligned.ll
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/dyn-alloca-aligned.ll?rev=305516&r1=305515&r2=305516&view=diff
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/PowerPC/dyn-alloca-aligned.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/PowerPC/dyn-alloca-aligned.ll Thu Jun 15
>>>> 17:14:55 2017
>>>> @@ -25,8 +25,8 @@ entry:
>>>>
>>>>  ; CHECK-DAG: li [[REG1:[0-9]+]], -128
>>>>  ; CHECK-DAG: neg [[REG2:[0-9]+]],
>>>> -; CHECK: and [[REG1]], [[REG2]], [[REG1]]
>>>> -; CHECK: stdux {{[0-9]+}}, 1, [[REG1]]
>>>> +; CHECK: and [[REG3:[0-9]+]], [[REG2]], [[REG1]]
>>>> +; CHECK: stdux {{[0-9]+}}, 1, [[REG3]]
>>>>
>>>>  ; CHECK: blr
>>>>
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/PowerPC/scavenging.mir
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/scavenging.mir?rev=305516&r1=305515&r2=305516&view=diff
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/PowerPC/scavenging.mir (original)
>>>> +++ llvm/trunk/test/CodeGen/PowerPC/scavenging.mir Thu Jun 15 17:14:55
>>>> 2017
>>>> @@ -6,7 +6,7 @@ tracksRegLiveness: true
>>>>  body: |
>>>>    bb.0:
>>>>      ; CHECK: [[REG0:%r[0-9]+]] = LI 42
>>>> -    ; CHECK-NEXT: NOP implicit [[REG0]]
>>>> +    ; CHECK-NEXT: NOP implicit killed [[REG0]]
>>>>      %0 : gprc = LI 42
>>>>      NOP implicit %0
>>>>
>>>> @@ -14,7 +14,7 @@ body: |
>>>>      ; CHECK-NEXT: NOP
>>>>      ; CHECK-NEXT: NOP implicit [[REG1]]
>>>>      ; CHECK-NEXT: NOP
>>>> -    ; CHECK-NEXT: NOP implicit [[REG1]]
>>>> +    ; CHECK-NEXT: NOP implicit killed [[REG1]]
>>>>      %1 : gprc = LI 42
>>>>      NOP
>>>>      NOP implicit %1
>>>> @@ -48,8 +48,8 @@ body: |
>>>>      ; CHECK-NOT: %x30 = LI 42
>>>>      ; CHECK: [[REG3:%r[0-9]+]] = LI 42
>>>>      ; CHECK-NEXT: %x5 = IMPLICIT_DEF
>>>> -    ; CHECK-NEXT: NOP implicit [[REG2]]
>>>> -    ; CHECK-NEXT: NOP implicit [[REG3]]
>>>> +    ; CHECK-NEXT: NOP implicit killed [[REG2]]
>>>> +    ; CHECK-NEXT: NOP implicit killed [[REG3]]
>>>>      %3 : gprc = LI 42
>>>>      %x5 = IMPLICIT_DEF
>>>>      NOP implicit %2
>>>> @@ -110,7 +110,7 @@ body: |
>>>>
>>>>      ; CHECK: STD killed [[SPILLEDREG:%x[0-9]+]]
>>>>      ; CHECK: [[SPILLEDREG]] = LI8 42
>>>> -    ; CHECK: NOP implicit [[SPILLEDREG]]
>>>> +    ; CHECK: NOP implicit killed [[SPILLEDREG]]
>>>>      ; CHECK: [[SPILLEDREG]] = LD
>>>>      %0 : g8rc = LI8 42
>>>>      NOP implicit %0
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/Thumb/large-stack.ll
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Thumb/large-stack.ll?rev=305516&r1=305515&r2=305516&view=diff
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/Thumb/large-stack.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/Thumb/large-stack.ll Thu Jun 15 17:14:55
>>>> 2017
>>>> @@ -69,10 +69,10 @@ define i32 @test3() {
>>>>  ; CHECK-LABEL: test3:
>>>>  ; CHECK: ldr [[TEMP:r[0-7]]],
>>>>  ; CHECK: add sp, [[TEMP]]
>>>> -; CHECK: ldr [[TEMP]],
>>>> -; CHECK: add [[TEMP]], sp
>>>> -; CHECK: ldr [[TEMP:r[0-7]]],
>>>> -; CHECK: add sp, [[TEMP]]
>>>> +; CHECK: ldr [[TEMP2:r[0-7]]],
>>>> +; CHECK: add [[TEMP2]], sp
>>>> +; CHECK: ldr [[TEMP3:r[0-7]]],
>>>> +; CHECK: add sp, [[TEMP3]]
>>>>      %retval = alloca i32, align 4
>>>>      %tmp = alloca i32, align 4
>>>>      %a = alloca [805306369 x i8], align 16
>>>> @@ -85,8 +85,8 @@ define i32 @test3_nofpelim() "no-frame-p
>>>>  ; CHECK-LABEL: test3_nofpelim:
>>>>  ; CHECK: ldr [[TEMP:r[0-7]]],
>>>>  ; CHECK: add sp, [[TEMP]]
>>>> -; CHECK: ldr [[TEMP]],
>>>> -; CHECK: add [[TEMP]], sp
>>>> +; CHECK: ldr [[TEMP2:r[0-7]]],
>>>> +; CHECK: add [[TEMP2]], sp
>>>>  ; CHECK: subs r4, r7,
>>>>  ; CHECK: mov sp, r4
>>>>      %retval = alloca i32, align 4
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/X86/scavenger.mir
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/scavenger.mir?rev=305516&r1=305515&r2=305516&view=diff
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/X86/scavenger.mir (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/scavenger.mir Thu Jun 15 17:14:55 2017
>>>> @@ -5,6 +5,8 @@ name: func0
>>>>  tracksRegLiveness: true
>>>>  body: |
>>>>    bb.0:
>>>> +    ; CHECK: [[REG0:%e[a-z]+]] = MOV32ri 42
>>>> +    ; CHECK: %ebp = COPY killed [[REG0]]
>>>>      %0 : gr32 = MOV32ri 42
>>>>      %ebp = COPY %0
>>>>  ...
>>>> @@ -16,7 +18,7 @@ body: |
>>>>    bb.0:
>>>>      ; CHECK-NOT: %eax = MOV32ri 42
>>>>      ; CHECK: [[REG0:%e[a-z]+]] = MOV32ri 42
>>>> -    ; CHECK: %ebp = COPY [[REG0]]
>>>> +    ; CHECK: %ebp = COPY killed [[REG0]]
>>>>      %eax = MOV32ri 13
>>>>      %0 : gr32 = MOV32ri 42
>>>>      %ebp = COPY %0
>>>> @@ -30,25 +32,18 @@ body: |
>>>>
>>>>      NOOP implicit %ebp
>>>>
>>>> -    ; CHECK: NOOP implicit [[REG2]]
>>>> -    ; CHECK: NOOP implicit [[REG1]]
>>>> +    ; CHECK: NOOP implicit killed [[REG2]]
>>>> +    ; CHECK: NOOP implicit killed [[REG1]]
>>>>      NOOP implicit %2
>>>>      NOOP implicit %1
>>>>      RETQ %eax
>>>>  ...
>>>>  ---
>>>> -# Defs without uses are currently broken
>>>> -#name: func3
>>>> -#tracksRegLiveness: true
>>>> -#body: |
>>>> -#  bb.0:
>>>> -#    dead %0 : gr32 = MOV32ri 42
>>>> -...
>>>> ----
>>>> -# Uses without defs are currently broken (and honestly not that
>>>> useful).
>>>> -#name: func3
>>>> -#tracksRegLiveness: true
>>>> -#body: |
>>>> -#  bb.0:
>>>> -#    NOOP undef implicit %0 : gr32
>>>> +# CHECK-LABEL: name: func3
>>>> +name: func3
>>>> +tracksRegLiveness: true
>>>> +body: |
>>>> +  bb.0:
>>>> +    ; CHECK dead {{%e[a-z]+}} = MOV32ri 42
>>>> +    dead %0 : gr32 = MOV32ri 42
>>>>  ...
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170617/3c349154/attachment.html>