[PATCH] Hexagon Register Cleanup

Tom Stellard tom at stellard.net
Mon May 13 10:18:46 PDT 2013


On Mon, May 13, 2013 at 11:07:21AM -0500, Krzysztof Parzyszek wrote:
> Hi,
> This is the Hexagon pass that was meant to address the complications
> related to implicit uses and defs of super- and sub-registers.
> 
> To clarify the situation for everybody:
> Hexagon has 32 registers R0..R31, and each is 32-bits.  Certain
> instructions can do 64-bit calculations, and their operands are
> 64-bit register pairs (even-odd).  These pairs are usually written
> as D0..D15, but there are in fact pairs R1:0, R3:2, R5:4, etc.
> (Hexagon is little-endian, hence the "reversed" notation).  It is
> not unusual to have the individual registers in a register pair be
> defined separately, and then used as a pair in another instruction,
> for example:
>   R0 = ...
>   R1 = ...
>   ... = D0
> 
> This introduces certain complications with the current register
> allocation.  The problem is that the register rewriter will add
> implicit uses and implicit defs of super-registers when a
> sub-register is used or defined.  For example:
>   %vreg1:subreg_loreg = COPY %vreg2:subreg_loreg
>   %vreg1:subreg_hireg = COPY %vreg2:subreg_hireg
> assuming that vreg1 becomes D0, and vreg2 becomes D1, would become
>   %R0<def> = COPY %R2<use>, %D0<imp-def>, %D1<imp-use>
>   %R1<def> = COPY %R3<use>, %D0<imp-def>, %D1<imp-use>
> 
> Hexagon is a VLIW machine, i.e. instructions are grouped into
> packets, and then the packets are executed as a unit (i.e. all
> instructions within a packet are executed in parallel, subject to
> certain limitations).  For performance it is much better to pack as
> many instructions in a packet as possible (architecture limit is 4),
> instead of having more packets with fewer instructions.
> 
> One restriction is that there cannot be any dependencies between
> instructions in a packet, so for the example above, the packetizer
> would be unable to put the two COPY instructions in the same packet,
> even though, from the architecture point of view, there are no
> dependencies and they can execute in parallel.  The reason for that
> would be that D0 appears to be defined in both instructions (hence
> they cannot be parallelized).
> 
> 

Hi Krzysztof,

We have the exact same problem for the VLIW4/5 subtargets in R600.
Instructions that write to different sub registers of the same super
register cannot be scheduled together in the same packet.  Vincent has
written a patch to fix this in LLVM core. The latest version that was sent
to the list is here:
http://permalink.gmane.org/gmane.comp.compilers.llvm.cvs/141708

But Vincent may be able to point you to a more up to date version.

-Tom

> This pass tries to solve this problem (and related issues) by
> shifting the liveness tracking from super-registers to
> sub-registers.  It does so by marking all explicit uses and defs of
> register pairs as "undef", and adds implicit uses and defs of the
> 32-bit components.  In addition to that, it removes the "extra"
> implicit uses and defs of super-registers (i.e. register pairs) that
> were added by the rewriter.  So, the above example would become
>   %R0<def> = COPY %R2<use>
>   %R1<def> = COPY %R3<use>
> If we had an instruction that actually uses register pairs, such as
>   %D0<def> = ADD64_rr %D1<use>, %D2<use>
> it would be processed to look like this:
>   %D0<def,undef> = ADD64_rr %D1<use,undef>, %D2<use,undef>,
>                             %R0<imp-def>, %R1<imp-def>  // D0 = ...
>                             %R2<imp-use>, %R3<imp-use>  // ... = D1
>                             %R4<imp-use>, %R5<imp-use>  // ... = D2
> 
> The intent here is to mark the pairs as "undef" and thus remove them
> from dependence analysis.  The little problem here was that
> dependence analysis still considered those registers, hence if this
> transformation is enabled, it also forces ignoring of "undef"
> registers in the dependence analysis.  This is done using debug
> flags so that other targets are unaffected.
> 
> Since after this transformation, a former anti-dependence on a
> single register (register pair) now becomes an anti-dependence on
> two 32-bit registers, the existing anti-dependence breaking
> algorithm will no longer work in such cases.  The problem is that
> both sub-registers would need to be rewritten in such a way, as to
> remain in a "pair" relationship, e.g. R1:0 could become R5:4, but
> not just some two random 32-bit registers.  To address this problem,
> there is an "anti-dependence" part in the HRC pass.
> 
> The whole transformation is divided into 3 stages:
> 1. "Finalize RA", where corrective actions are taken to address some
> undesirable outputs from the rewriter (see below).
> 2. "Anti-dep HRC", where the bulk of the work happens, i.e. putting
> the "undef" flag, and rewriting anti-dependencies on register pairs.
> 3. "Finalize", where the hijacking of "undef" ends, and the explicit
> register pairs become "legitimate def/use" again.
> 
> 
> Issues with the rewritter mentioned above are that it will spill an
> entire 64-bit register, even when only a part of it was explicitly
> defined.  Normally, the whole 64-bit register would be "implicitly
> defined", as per the usual rewritter treatment, but since we are
> trying to track the sub-registers, we may end up with a store of
> R1:0, where only R0 was actually defined.  To address this, we
> simply add a definition of R1 to "complete" the definition of R1:0,
> so that it can be spilled as a whole.  Here's a bit on inefficiency
> injected, since we actually add an extra instruction, but overall
> this is still profitable for us.
> 
> This pass is written to be transparent to any other targets.  The
> only globally-visible change would be printing of the "undef" flag
> on MachineInstr operands.  The ignoring of the "undef" registers in
> dependence analysis should only happen on Hexagon, and only when HRC
> is enabled.
> 
> 
> Please let me know if you have any comments.
> 
> Thanks,
> -Krzysztof
> 
> 
> -- 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation

> From aacdbe517e00254602d985d55069a7c64a04bc2b Mon Sep 17 00:00:00 2001
> From: Krzysztof Parzyszek <kparzysz at codeaurora.org>
> Date: Mon, 6 May 2013 18:05:49 -0500
> Subject: [PATCH] Hexagon Register Cleanup
> 
> ---
>  lib/CodeGen/MachineInstr.cpp                  |   11 +-
>  lib/CodeGen/ScheduleDAGInstrs.cpp             |   18 +-
>  lib/Target/Hexagon/Hexagon.h                  |    7 +
>  lib/Target/Hexagon/HexagonRegisterCleanup.cpp | 2133 +++++++++++++++++++++++++
>  lib/Target/Hexagon/HexagonTargetMachine.cpp   |   24 +-
>  test/CodeGen/Hexagon/hrc-basic.ll             |   27 +
>  6 files changed, 2207 insertions(+), 13 deletions(-)
>  create mode 100644 lib/Target/Hexagon/HexagonRegisterCleanup.cpp
>  create mode 100644 test/CodeGen/Hexagon/hrc-basic.ll
> 
> diff --git a/lib/CodeGen/MachineInstr.cpp b/lib/CodeGen/MachineInstr.cpp
> index 32d0668..ffafd44 100644
> --- a/lib/CodeGen/MachineInstr.cpp
> +++ b/lib/CodeGen/MachineInstr.cpp
> @@ -278,10 +278,13 @@ void MachineOperand::print(raw_ostream &OS, const TargetMachine *TM) const {
>            OS << "imp-";
>          OS << "def";
>          NeedComma = true;
> -        // <def,read-undef> only makes sense when getSubReg() is set.
> -        // Don't clutter the output otherwise.
> -        if (isUndef() && getSubReg())
> -          OS << ",read-undef";
> +        if (isUndef()) {
> +          // <def,read-undef> only makes sense when getSubReg() is set.
> +          if (getSubReg())
> +            OS << ",read-undef";
> +          else
> +            OS << ",undef";
> +        }
>        } else if (isImplicit()) {
>            OS << "imp-use";
>            NeedComma = true;
> diff --git a/lib/CodeGen/ScheduleDAGInstrs.cpp b/lib/CodeGen/ScheduleDAGInstrs.cpp
> index e4da6a4..1999ae3 100644
> --- a/lib/CodeGen/ScheduleDAGInstrs.cpp
> +++ b/lib/CodeGen/ScheduleDAGInstrs.cpp
> @@ -42,6 +42,10 @@ static cl::opt<bool> EnableAASchedMI("enable-aa-sched-mi", cl::Hidden,
>      cl::ZeroOrMore, cl::init(false),
>      cl::desc("Enable use of AA during MI GAD construction"));
>  
> +cl::opt<bool> IgnoreUndef("dep-ignore-undef", cl::Hidden, cl::ZeroOrMore,
> +    cl::init(false), cl::desc("Ignore undef uses and defs in dependence "
> +                              "analysis"));
> +
>  ScheduleDAGInstrs::ScheduleDAGInstrs(MachineFunction &mf,
>                                       const MachineLoopInfo &mli,
>                                       const MachineDominatorTree &mdt,
> @@ -213,9 +217,10 @@ void ScheduleDAGInstrs::addSchedBarrierDeps() {
>        unsigned Reg = MO.getReg();
>        if (Reg == 0) continue;
>  
> -      if (TRI->isPhysicalRegister(Reg))
> -        Uses.insert(PhysRegSUOper(&ExitSU, -1, Reg));
> -      else {
> +      if (TRI->isPhysicalRegister(Reg)) {
> +        if (!IgnoreUndef || !MO.isUndef())
> +          Uses.insert(PhysRegSUOper(&ExitSU, -1, Reg));
> +      } else {
>          assert(!IsPostRA && "Virtual register encountered after regalloc.");
>          if (MO.readsReg()) // ignore undef operands
>            addVRegUseDeps(&ExitSU, i);
> @@ -764,9 +769,10 @@ void ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA,
>        unsigned Reg = MO.getReg();
>        if (Reg == 0) continue;
>  
> -      if (TRI->isPhysicalRegister(Reg))
> -        addPhysRegDeps(SU, j);
> -      else {
> +      if (TRI->isPhysicalRegister(Reg)) {
> +        if (!IgnoreUndef || !MO.isUndef())
> +          addPhysRegDeps(SU, j);
> +      } else {
>          assert(!IsPostRA && "Virtual register encountered!");
>          if (MO.isDef()) {
>            HasVRegDef = true;
> diff --git a/lib/Target/Hexagon/Hexagon.h b/lib/Target/Hexagon/Hexagon.h
> index 8e19c61..5afd612 100644
> --- a/lib/Target/Hexagon/Hexagon.h
> +++ b/lib/Target/Hexagon/Hexagon.h
> @@ -47,6 +47,13 @@ namespace llvm {
>    FunctionPass *createHexagonPacketizer();
>    FunctionPass *createHexagonNewValueJump();
>  
> +  FunctionPass *createHexagonRegisterCleanup_PostRewrite(
> +                            const HexagonTargetMachine &TM);
> +  FunctionPass *createHexagonRegisterCleanup_PreSchedule(
> +                            const HexagonTargetMachine &TM);
> +  FunctionPass *createHexagonRegisterCleanup_Finalize(
> +                            const HexagonTargetMachine &TM);
> +
>  /* TODO: object output.
>    MCCodeEmitter *createHexagonMCCodeEmitter(const Target &,
>                                              const TargetMachine &TM,
> diff --git a/lib/Target/Hexagon/HexagonRegisterCleanup.cpp b/lib/Target/Hexagon/HexagonRegisterCleanup.cpp
> new file mode 100644
> index 0000000..836715f
> --- /dev/null
> +++ b/lib/Target/Hexagon/HexagonRegisterCleanup.cpp
> @@ -0,0 +1,2133 @@
> +//===-- HexagonRegisterCleanup.cpp - Postprocess register flags ----------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +//
> +// Eliminate pessimistic implicit-use and implicit-def flags for super-
> +// -registers associated with uses and defs of sub-registers.
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#define DEBUG_TYPE "hrc"
> +
> +#include "llvm/ADT/BitVector.h"
> +#include "llvm/CodeGen/MachineFunction.h"
> +#include "llvm/CodeGen/MachineFunctionPass.h"
> +#include "llvm/CodeGen/MachineInstrBuilder.h"
> +#include "llvm/CodeGen/MachineRegisterInfo.h"
> +#include "llvm/CodeGen/Passes.h"
> +#include "llvm/IR/Function.h"
> +#include "llvm/IR/Module.h"
> +#include "llvm/PassSupport.h"
> +#include "llvm/Support/CommandLine.h"
> +#include "llvm/Support/Debug.h"
> +#include "llvm/Support/raw_ostream.h"
> +#include "llvm/Target/TargetInstrInfo.h"
> +#include "llvm/Target/TargetMachine.h"
> +#include "llvm/Target/TargetRegisterInfo.h"
> +
> +#include <algorithm>
> +#include <set>
> +#include "Hexagon.h"
> +#include "HexagonTargetMachine.h"
> +
> +#include <stdio.h>
> +
> +using namespace llvm;
> +
> +extern cl::opt<bool> IgnoreUndef;
> +
> +cl::opt<bool> RunHRC("run-hrc", cl::Hidden, cl::ZeroOrMore, cl::init(false),
> +      cl::desc("Specify whether run the HRC pass or not.  Default: false"));
> +
> +static cl::opt<unsigned> RenameLimit("maxrr", cl::Hidden, cl::init(UINT_MAX),
> +      cl::desc("The limit of attempted register renames.  UINT_MAX means "
> +               "no limit."));
> +
> +// Registers in the code below are considered to be either "elementary"
> +// registers (i.e. without subregisters), or "sequences", i.e. registers
> +// composed of a sequence of non-overlapping subregisters.
> +// For Hexagon, sequences would include D0, D1, etc. (and really mean
> +// "register pairs"), while non-sequences would be R0, R1, etc.
> +
> +namespace llvm {
> +  void initializeHexagonRegisterCleanup_PostRewritePass(PassRegistry &Registry);
> +  void initializeHexagonRegisterCleanup_PreSchedulePass(PassRegistry &Registry);
> +  void initializeHexagonRegisterCleanup_FinalizePass(PassRegistry &Registry);
> +}
> +
> +namespace {
> +  template <typename T> class IndexTypeTmpl;
> +  template <typename T>
> +    raw_ostream &operator<< (raw_ostream &os, const IndexTypeTmpl<T> &Idx);
> +
> +  // This is to represent an "index", which is an abstraction of a position
> +  // of an instruction within a basic block.
> +  template<typename T> class IndexTypeTmpl {
> +  public:
> +    static const T None  = 0;
> +    static const T Entry = 1;
> +    static const T Exit  = 2;
> +    static const T First = 11;  // 10 + 1st
> +    static bool isInstr(const IndexTypeTmpl<T> &X) { return X.Index >= First; }
> +
> +    typedef T BaseType;
> +    IndexTypeTmpl() : Index(None) {}
> +    IndexTypeTmpl(T Idx) : Index(Idx) {}
> +    operator T() const {
> +      assert(Index >= First);
> +      return Index;
> +    }
> +    bool operator== (const T &x) const {
> +      return Index == x;
> +    }
> +    bool operator== (const IndexTypeTmpl<T> &Idx) const {
> +      return Index == Idx.Index;
> +    }
> +    bool operator!= (const T &x) const {
> +      return Index != x;
> +    }
> +    bool operator!= (const IndexTypeTmpl<T> &Idx) const {
> +      return Index != Idx.Index;
> +    }
> +    IndexTypeTmpl<T> operator++ () {
> +      assert(Index != None);
> +      assert(Index != Exit);
> +      if (Index == Entry)
> +        Index = First;
> +      else
> +        ++Index;
> +      return *this;
> +    }
> +    bool operator< (const T &Idx) const {
> +      return operator< (IndexTypeTmpl<T>(Idx));
> +    }
> +    bool operator< (const IndexTypeTmpl<T> &Idx) const {
> +      // !(x < x).
> +      if (Index == Idx.Index)
> +        return false;
> +      // !(None < x) for all x.
> +      // !(x < None) for all x.
> +      if (Index == None || Idx.Index == None)
> +        return false;
> +      // !(Exit < x) for all x.
> +      // !(x < Entry) for all x.
> +      if (Index == Exit || Idx.Index == Entry)
> +        return false;
> +      // Entry < x for all x != Entry.
> +      // x < Exit for all x != Exit.
> +      if (Index == Entry || Idx.Index == Exit)
> +        return true;
> +
> +      return Index < Idx.Index;
> +    }
> +    bool operator<= (const IndexTypeTmpl<T> &Idx) const {
> +      return operator==(Idx) || operator<(Idx);
> +    }
> +    friend raw_ostream &operator<<<T> (raw_ostream &os,
> +                                       const IndexTypeTmpl<T> &Idx);
> +
> +  private:
> +    bool operator>  (const IndexTypeTmpl<T> &Idx) const;
> +    bool operator>= (const IndexTypeTmpl<T> &Idx) const;
> +
> +    T Index;
> +  };
> +
> +  // Distance between two instruction indices.
> +  template <typename T>
> +  T dist(const IndexTypeTmpl<T> &A, const IndexTypeTmpl<T> &B,
> +         const IndexTypeTmpl<T> &Last) {
> +    typedef IndexTypeTmpl<T> U;
> +    assert(A != U::None && B != U::None && Last != U::None);
> +    // Compute B-A.
> +    if (B == U::Entry || A == U::Exit)
> +      return 0;
> +    T y = (B == U::Exit)  ? T(Last)       : T(B);
> +    T x = (A == U::Entry) ? T(U::Entry)-1 : T(A);
> +    return y-x;
> +  }
> +
> +
> +  template class IndexTypeTmpl<unsigned>;
> +  template<> const unsigned IndexTypeTmpl<unsigned>::None;
> +  template<> const unsigned IndexTypeTmpl<unsigned>::Entry;
> +  template<> const unsigned IndexTypeTmpl<unsigned>::Exit;
> +  template<> const unsigned IndexTypeTmpl<unsigned>::First;
> +
> +  typedef IndexTypeTmpl<unsigned> IndexType;
> +
> +
> +  // A range of indices, essentially a representation of a live range.
> +  // This is also used to represent "dead ranges", i.e. ranges where a
> +  // register is dead.
> +  class IndexRange : public std::pair<IndexType,IndexType> {
> +  public:
> +    IndexRange() : Sequenced(false), Fixed(false), TiedEnd(false) {}
> +    IndexRange(IndexType Start, IndexType End, bool S = false,
> +        bool F = false, bool T = false)
> +      : std::pair<IndexType,IndexType>(Start, End),
> +        Sequenced(S), Fixed(F), TiedEnd(T) {}
> +    IndexType start() const { return first; }
> +    IndexType end() const   { return second; }
> +
> +    bool operator< (const IndexRange &A) const {
> +      return start() < A.start();
> +    }
> +    bool overlaps(const IndexRange &A) const {
> +      // If A contains start(), or "this" contains A.start(), then overlap.
> +      IndexType S = start(), E = end(), AS = A.start(), AE = A.end();
> +      if (AS == S)
> +        return true;
> +      bool SbAE = (S < AE) || (S == AE && A.TiedEnd);  // S-before-AE.
> +      bool ASbE = (AS < E) || (AS == E && TiedEnd);    // AS-before-E.
> +      if ((AS < S && SbAE) || (S < AS && ASbE))
> +        return true;
> +      // Otherwise no overlap.
> +      return false;
> +    }
> +    bool contains(const IndexRange &A) const {
> +      if (start() <= A.start()) {
> +        // Treat "None" in the range end as equal to the range start.
> +        IndexType E = (end() != IndexType::None) ? end() : start();
> +        IndexType AE = (A.end() != IndexType::None) ? A.end() : A.start();
> +        if (AE <= E)
> +          return true;
> +      }
> +      return false;
> +    }
> +    void merge(const IndexRange &A) {
> +      // Allow merging adjacent ranges.
> +      assert(end() == A.start() || overlaps(A));
> +      IndexType AS = A.start(), AE = A.end();
> +      if (AS < start() || start() == IndexType::None)
> +        setStart(AS);
> +      if (end() < AE || end() == IndexType::None) {
> +        setEnd(AE);
> +        TiedEnd = A.TiedEnd;
> +      } else {
> +        if (end() == AE)
> +          TiedEnd |= A.TiedEnd;
> +      }
> +      if (A.Sequenced)
> +        Sequenced = true;
> +      if (A.Fixed)
> +        Fixed = true;
> +    }
> +
> +    bool Sequenced;  // Corresponds to a range of a sequence register
> +                     // (e.g. D0, as opposed to R1).
> +    bool Fixed;      // Can be renamed?  "Fixed" means "no".
> +    bool TiedEnd;    // The end is not a use, but a dead def tied to a use.
> +
> +  private:
> +    void setStart(const IndexType &S) { first = S; }
> +    void setEnd(const IndexType &E)   { second = E; }
> +  };
> +
> +
> +  // A list of index ranges.  This represents liveness of a register
> +  // in a basic block.
> +  class RangeList : public std::vector<IndexRange> {
> +  public:
> +    void add(IndexType Start, IndexType End, bool Sequenced, bool Fixed,
> +          bool TiedEnd = false) {
> +      push_back(IndexRange(Start, End, Sequenced, Fixed, TiedEnd));
> +    }
> +    void add(const IndexRange &Range) {
> +      push_back(Range);
> +    }
> +    void include(const RangeList &RL);
> +    void unionize(bool MergeAdjacent = false);
> +    void subtract(const IndexRange &Range);
> +
> +  private:
> +    void addsub(const IndexRange &A, const IndexRange &B);
> +  };
> +
> +  typedef std::map<unsigned,RangeList> RegToRangeMap;
> +
> +
> +  // A mapping to translate between instructions and their indices.
> +  class InstrIndexMap {
> +  public:
> +    InstrIndexMap(MachineBasicBlock &B);
> +    MachineInstr *getInstr(IndexType Idx) const;
> +    IndexType getIndex(MachineInstr *MI) const;
> +    IndexType getPrevIndex(IndexType Idx) const;
> +    IndexType getNextIndex(IndexType Idx) const;
> +    friend raw_ostream &operator<< (raw_ostream &os, const InstrIndexMap &Map);
> +
> +    IndexType First, Last;
> +
> +  private:
> +    MachineBasicBlock &Block;
> +    typedef std::map<IndexType,MachineInstr*> MapType;
> +    MapType Map;
> +  };
> +
> +  static const char* PassNames[] = {
> +    "Hexagon Post-rewrite HRC",
> +    "Hexagon Anti-dep HRC",
> +    "Hexagon Final HRC"
> +  };
> +
> +  class HexagonRegisterCleanup : public MachineFunctionPass {
> +  public:
> +    enum ExecStage { PostRewrite, PreSchedule, Finalize };
> +
> +    HexagonRegisterCleanup(const HexagonTargetMachine &tm, ExecStage stage,
> +                           char &ID) :
> +        MachineFunctionPass(ID), TM(tm), TII(*tm.getInstrInfo()),
> +        TRI(*tm.getRegisterInfo()), Stage(stage), NumRegs(TRI.getNumRegs()),
> +        MRI(0)
> +    {}
> +
> +    bool runOnMachineFunction(MachineFunction &MF);
> +
> +    const char *getPassName() const {
> +      switch (Stage) {
> +        case PostRewrite:
> +          return PassNames[0];
> +        case PreSchedule:
> +          return PassNames[1];
> +        case Finalize:
> +          return PassNames[2];
> +      }
> +      return "Error";
> +    }
> +
> +  private:
> +    // OptDepDist is an "optimal dependence distance", i.e. a distance
> +    // between dependent instructions which is large enough not to be
> +    // of concern.  The number 4 happens to be a max packet size, but
> +    // there is no in-depth research validating this choice as superior
> +    // to others.
> +    static const unsigned OptDepDist = 4;
> +    static unsigned Debug_RRC;
> +
> +    const HexagonTargetMachine  &TM;
> +    const HexagonInstrInfo      &TII;
> +    const HexagonRegisterInfo   &TRI;
> +
> +    ExecStage Stage;     // Execution stage of HRC.
> +    unsigned NumRegs;    // Number of physical registers.
> +    BitVector Reserved;  // Reserved registers.
> +    BitVector Return;    // Value return registers for a given function.
> +    MachineRegisterInfo *MRI;
> +
> +    const TargetRegisterClass *getPhysRegClass(unsigned Reg);
> +    bool isReturn(const MachineInstr *MI);
> +    bool isRegSeq(unsigned Reg);
> +    void getSubRegs(unsigned Reg, BitVector &SRs);
> +    void subRegClosure(BitVector &Regs);
> +    void getHwImplicits(const MCInstrDesc &D, BitVector &Uses,
> +                        BitVector &Defs, bool Reset = true);
> +    bool expandLiveIns(MachineBasicBlock &B);
> +    bool addLiveInSeq(MachineBasicBlock &B);
> +    bool removeExtraImplicitRefs(MachineInstr *MI);
> +    bool addImpRefs(MachineInstr *MI);
> +    bool correctMissingReturnValues(MachineBasicBlock &B);
> +
> +    void computeLiveRangesNonSeq(MachineBasicBlock &B, InstrIndexMap &IndexMap,
> +                                 RegToRangeMap &LiveMap);
> +    void completeLiveRanges(InstrIndexMap &IndexMap, RegToRangeMap &LiveMap);
> +    bool markDefsDead(unsigned Reg, MachineInstr *MI);
> +    bool markUseKill(unsigned Reg, MachineInstr *MI);
> +    bool markDeadKill(MachineBasicBlock &B, InstrIndexMap &IndexMap,
> +                      RegToRangeMap &LiveMap);
> +
> +    void computeDeadRanges(RegToRangeMap &DeadMap, RegToRangeMap &LiveMap,
> +                           InstrIndexMap &IndexMap);
> +    bool findSuperRange(const RangeList &RL, const IndexRange &Sub,
> +                        unsigned MinDist, IndexType MaxIndex, IndexRange &Sup);
> +    void moveRange(unsigned Reg, const IndexRange &Range, RegToRangeMap &From,
> +                   RegToRangeMap &To, bool MergeAdjacent);
> +    void makeRangeDead(unsigned Reg, const IndexRange &Range,
> +                       RegToRangeMap &LiveMap, RegToRangeMap &DeadMap);
> +    void makeRangeAlive(unsigned Reg, const IndexRange &Range,
> +                        RegToRangeMap &DeadMap, RegToRangeMap &LiveMap);
> +    void renameRegInRange(unsigned OldReg, unsigned NewReg,
> +                          const IndexRange &Range, InstrIndexMap &IndexMap);
> +    bool renameRange(unsigned Reg, IndexRange &Range, MachineBasicBlock &B,
> +                     unsigned MinDist, InstrIndexMap &IndexMap,
> +                     RegToRangeMap &LiveMap, RegToRangeMap &DeadMap);
> +    bool breakAntiDepForReg(unsigned Reg, unsigned Dist, RangeList &RL,
> +                     MachineBasicBlock &B, InstrIndexMap &IndexMap,
> +                     RegToRangeMap &LiveMap, RegToRangeMap &DeadMap);
> +    bool breakAntiDep(MachineBasicBlock &B, InstrIndexMap &IndexMap,
> +                      RegToRangeMap &LiveMap);
> +
> +    bool processImplicitsEarly(MachineBasicBlock &B);
> +    bool processBlockLate(MachineBasicBlock &B);
> +
> +    bool isSpillStore(MachineInstr *MI);
> +    unsigned getSpilledRegister(MachineInstr *MI);
> +    bool correctPartialSpills(MachineBasicBlock &B);
> +    bool correctUndefinedSubregisterReads(MachineBasicBlock &B);
> +
> +    bool cleanupRewrite(MachineFunction &MF);
> +    bool processDependencies(MachineFunction &MF);
> +    bool finalCleanup(MachineFunction &MF);
> +  };
> +
> +  // We need extra subclasses, so that we can register HRC three times, as
> +  // three different passes.  We can't use 3 different IDs in the same pass
> +  // because we would have to call the MachineFunctionPass's constructor
> +  // three times, each time with a different ID.  While the HRC's constructor
> +  // would be called three times as well, the constructor would have to
> +  // use a different ID to pass to the base class constructor (which we
> +  // cannot do).
> +  class HexagonRegisterCleanup_PostRewrite : public HexagonRegisterCleanup {
> +  public:
> +    static char ID;
> +    HexagonRegisterCleanup_PostRewrite(const HexagonTargetMachine &TM)
> +      : HexagonRegisterCleanup(TM, HexagonRegisterCleanup::PostRewrite, ID) {
> +      PassRegistry &Registry = *PassRegistry::getPassRegistry();
> +      initializeHexagonRegisterCleanup_PostRewritePass(Registry);
> +    }
> +  };
> +
> +  class HexagonRegisterCleanup_PreSchedule : public HexagonRegisterCleanup {
> +  public:
> +    static char ID;
> +    HexagonRegisterCleanup_PreSchedule(const HexagonTargetMachine &TM)
> +      : HexagonRegisterCleanup(TM, HexagonRegisterCleanup::PreSchedule, ID) {
> +      PassRegistry &Registry = *PassRegistry::getPassRegistry();
> +      initializeHexagonRegisterCleanup_PreSchedulePass(Registry);
> +    }
> +  };
> +
> +  class HexagonRegisterCleanup_Finalize : public HexagonRegisterCleanup {
> +  public:
> +    static char ID;
> +    HexagonRegisterCleanup_Finalize(const HexagonTargetMachine &TM)
> +      : HexagonRegisterCleanup(TM, HexagonRegisterCleanup::Finalize, ID) {
> +      PassRegistry &Registry = *PassRegistry::getPassRegistry();
> +      initializeHexagonRegisterCleanup_FinalizePass(Registry);
> +    }
> +  };
> +
> +  char HexagonRegisterCleanup_PostRewrite::ID = 0;
> +  char HexagonRegisterCleanup_PreSchedule::ID = 0;
> +  char HexagonRegisterCleanup_Finalize::ID = 0;
> +  unsigned HexagonRegisterCleanup::Debug_RRC = 0;
> +
> +
> +  template <typename T>
> +  raw_ostream &operator<< (raw_ostream &os, const IndexTypeTmpl<T> &Idx) {
> +    switch (Idx.Index) {
> +      case IndexType::None:
> +        return os << '-';
> +      case IndexType::Entry:
> +        return os << 'n';
> +      case IndexType::Exit:
> +        return os << 'x';
> +      default:
> +        return os << Idx.Index-IndexType::First+1;
> +    }
> +  }
> +
> +
> +  raw_ostream &operator<< (raw_ostream &os, const IndexRange &IR) {
> +    os << '[' << IR.start() << ':' << IR.end() << (IR.TiedEnd ? '}' : ']');
> +    if (IR.Sequenced)
> +      os << 's';
> +    if (IR.Fixed)
> +      os << '!';
> +    return os;
> +  }
> +
> +
> +  raw_ostream &operator<< (raw_ostream &os, const RangeList &RL) {
> +    for (RangeList::const_iterator I = RL.begin(), E = RL.end(); I != E; ++I)
> +      os << *I << " ";
> +    return os;
> +  }
> +
> +
> +  raw_ostream &operator<< (raw_ostream &os, const InstrIndexMap &M) {
> +    typedef MachineBasicBlock::instr_iterator instr_iterator;
> +    for (instr_iterator I = M.Block.instr_begin(), E = M.Block.instr_end();
> +         I != E; ++I) {
> +      MachineInstr *MI = &*I;
> +      IndexType Idx = M.getIndex(MI);
> +      os << Idx << (Idx == M.Last ? ". " : "  ") << *MI;
> +    }
> +    return os;
> +  }
> +
> +
> +  void dump_map(raw_ostream &os, const RegToRangeMap &M,
> +        const HexagonRegisterInfo &TRI) {
> +    for (RegToRangeMap::const_iterator I = M.begin(), E = M.end();
> +         I != E; ++I) {
> +      const RangeList &RL = I->second;
> +      if (RL.empty())
> +        continue;
> +      os << PrintReg(I->first, &TRI) << " -> " << RL << "\n";
> +    }
> +  }
> +
> +}
> +
> +
> +typedef HexagonRegisterCleanup HRC;
> +
> +
> +// Return true is A is a subset of B.
> +static inline bool subset(const BitVector &A, const BitVector &B) {
> +  int B_size = B.size();
> +  for (int x = A.find_first(); x >= 0; x = A.find_next(x))
> +    if (x > B_size || !B[x])
> +      return false;
> +  return true;
> +}
> +
> +
> +static void getLiveIns(MachineBasicBlock &B, BitVector &Regs) {
> +  typedef MachineBasicBlock::livein_iterator livein_iterator;
> +  for (livein_iterator I = B.livein_begin(), E = B.livein_end(); I != E; ++I)
> +    Regs[*I] = true;
> +}
> +
> +
> +static void dump_regs(const BitVector &RS, const TargetRegisterInfo &TRI) {
> +  bool First = true;
> +  for (int x = RS.find_first(); x >= 0; x = RS.find_next(x)) {
> +    unsigned R = x;
> +    if (!First)
> +      dbgs() << ", ";
> +    dbgs() << PrintReg(R, &TRI);
> +    First = false;
> +  }
> +}
> +
> +
> +void RangeList::include(const RangeList &RL) {
> +  for (RangeList::const_iterator I = RL.begin(), E = RL.end(); I != E; ++I) {
> +    const IndexRange &R = *I;
> +    push_back(R);
> +  }
> +}
> +
> +
> +// Merge all overlapping ranges in the list, so that all that remains
> +// is a list of disjoint ranges.
> +void RangeList::unionize(bool MergeAdjacent) {
> +  if (empty())
> +    return;
> +
> +  std::sort(begin(), end());
> +  iterator Iter = begin();
> +
> +  while (Iter != end()-1) {
> +    iterator Next = Iter+1;
> +    // If MergeAdjacent is true, merge ranges A and B, where A.end == B.start.
> +    // This allows merging dead ranges, but is not valid for live ranges.
> +    bool Merge = MergeAdjacent && (Iter->end() == Next->start());
> +    if (Merge || Iter->overlaps(*Next)) {
> +      Iter->merge(*Next);
> +      erase(Next);
> +      continue;
> +    }
> +    ++Iter;
> +  }
> +}
> +
> +
> +// Compute a range A-B and add it to the list.
> +void RangeList::addsub(const IndexRange &A, const IndexRange &B) {
> +  // Exclusion of non-overlapping ranges makes some checks simpler
> +  // later in this function.
> +  if (!A.overlaps(B)) {
> +    // A - B = A.
> +    add(A);
> +    return;
> +  }
> +
> +  IndexType AS = A.start(), AE = A.end();
> +  IndexType BS = B.start(), BE = B.end();
> +
> +  // If AE is None, then A is included in B, since A and B overlap.
> +  // The result of subtraction if empty, so just return.
> +  if (AE == IndexType::None)
> +    return;
> +
> +  if (AS < BS) {
> +    // A starts before B.
> +    // AE cannot be None since A and B overlap.
> +    assert (AE != IndexType::None);
> +    // Add the part of A that extends on the "less" side of B.
> +    add(AS, BS, A.Sequenced, A.Fixed);
> +  }
> +
> +  if (BE < AE) {
> +    // BE cannot be Exit here.
> +    if (BE == IndexType::None)
> +      add(BS, AE, A.Sequenced, A.Fixed);
> +    else
> +      add(BE, AE, A.Sequenced, A.Fixed);
> +  }
> +}
> +
> +
> +// Subtract a given range from each element in the list.
> +void RangeList::subtract(const IndexRange &Range) {
> +  // Cannot assume that the list is unionized (i.e. contains only non-
> +  // overlapping ranges.
> +  RangeList T;
> +  for (iterator Next, I = begin(); I != end(); I = Next) {
> +    IndexRange &Rg = *I;
> +    if (Rg.overlaps(Range)) {
> +      T.addsub(Rg, Range);
> +      Next = this->erase(I);
> +    } else {
> +      Next = I+1;
> +    }
> +  }
> +  include(T);
> +}
> +
> +
> +InstrIndexMap::InstrIndexMap(MachineBasicBlock &B) : Block(B) {
> +  typedef MachineBasicBlock::instr_iterator instr_iterator;
> +  IndexType Idx = IndexType::First;
> +  First = Idx;
> +  for (instr_iterator I = B.instr_begin(), E = B.instr_end(); I != E; ++I) {
> +    MachineInstr *MI = &*I;
> +    assert (getIndex(MI) == IndexType::None && "Instruction already in map");
> +    Map.insert(std::make_pair(Idx, MI));
> +    ++Idx;
> +  }
> +  Last = B.empty() ? IndexType::None
> +                   : IndexType::BaseType(Idx)-1;
> +}
> +
> +
> +MachineInstr *InstrIndexMap::getInstr(IndexType Idx) const {
> +  MapType::const_iterator F = Map.find(Idx);
> +  return (F != Map.end()) ? F->second : 0;
> +}
> +
> +
> +IndexType InstrIndexMap::getIndex(MachineInstr *MI) const {
> +  for (MapType::const_iterator I = Map.begin(), E = Map.end(); I != E; ++I)
> +    if (I->second == MI)
> +      return I->first;
> +  return IndexType::None;
> +}
> +
> +
> +IndexType InstrIndexMap::getPrevIndex(IndexType Idx) const {
> +  assert (Idx != IndexType::None);
> +  if (Idx == IndexType::Entry)
> +    return IndexType::None;
> +  if (Idx == IndexType::Exit)
> +    return Last;
> +  return IndexType::BaseType(Idx)-1;
> +}
> +
> +
> +IndexType InstrIndexMap::getNextIndex(IndexType Idx) const {
> +  assert (Idx != IndexType::None);
> +  if (Idx == IndexType::Entry)
> +    return IndexType::First;
> +  if (Idx == IndexType::Exit || Idx == Last)
> +    return IndexType::None;
> +  return IndexType::BaseType(Idx)+1;
> +}
> +
> +
> +const TargetRegisterClass *HRC::getPhysRegClass(unsigned Reg) {
> +  typedef TargetRegisterInfo::regclass_iterator regclass_iterator;
> +  for (regclass_iterator I = TRI.regclass_begin(), E = TRI.regclass_end();
> +       I != E; ++I) {
> +    const TargetRegisterClass *RC = *I;
> +    if (RC->contains(Reg))
> +      return RC;
> +  }
> +  return 0;
> +}
> +
> +
> +// Test if a register is a sequence.
> +inline bool HRC::isRegSeq(unsigned Reg) {
> +  const TargetRegisterClass *RC = getPhysRegClass(Reg);
> +  return RC == &Hexagon::DoubleRegsRegClass;
> +}
> +
> +
> +inline bool HRC::isReturn(const MachineInstr *MI) {
> +  if (!MI->isReturn())
> +    return false;
> +
> +  unsigned Opc = MI->getOpcode();
> +  switch (Opc) {
> +    // These are calls, not really returns.  We need to identify returns
> +    // to keep track of registers that will contain the return values.
> +    case Hexagon::TCRETURNR:
> +    case Hexagon::TCRETURNtext:
> +    case Hexagon::TCRETURNtg:
> +      return false;
> +  }
> +
> +  for (unsigned i = 0, n = MI->getNumOperands(); i < n; ++i) {
> +    const MachineOperand &MO = MI->getOperand(i);
> +    if (!MO.isReg())
> +      continue;
> +    unsigned R = MO.getReg();
> +    if (R == Hexagon::PC || R == Hexagon::R31)
> +      return true;
> +  }
> +
> +  return false;
> +}
> +
> +
> +void HRC::getSubRegs(unsigned Reg, BitVector &SRs) {
> +  for (MCSubRegIterator I(Reg, &TRI); I.isValid(); ++I)
> +    SRs[*I] = true;
> +}
> +
> +
> +// For each R in Regs, add all subregisters of R to Regs.
> +void HRC::subRegClosure(BitVector &Regs) {
> +  BitVector Copy = Regs;
> +  for (int x = Copy.find_first(); x >= 0; x = Copy.find_next(x)) {
> +    unsigned R = x;
> +    if (!isRegSeq(R))
> +      continue;
> +    getSubRegs(R, Regs);
> +  }
> +}
> +
> +
> +// Get the sets of implicit uses and defs, determined by the hardware
> +// characteristics of the instruction.  For example, allocframe will
> +// define R29 (stack pointer) without having it as a parameter.
> +void HRC::getHwImplicits(const MCInstrDesc &D, BitVector &Uses,
> +      BitVector &Defs, bool Reset) {
> +  if (Reset) {
> +    Uses.reset();
> +    Defs.reset();
> +  }
> +
> +  if (const uint16_t *R = D.ImplicitUses)
> +    while (*R)
> +      Uses[*R++] = true;
> +  if (const uint16_t *R = D.ImplicitDefs)
> +    while (*R)
> +      Defs[*R++] = true;
> +
> +  subRegClosure(Uses);
> +  subRegClosure(Defs);
> +}
> +
> +
> +// Replace all sequence registers in the live-in sets with the sets of
> +// corresponding non-seqence registers.
> +bool HRC::expandLiveIns(MachineBasicBlock &B) {
> +  bool Changed = false;
> +  BitVector NewRegs(NumRegs), OldRegs(NumRegs);
> +
> +  typedef MachineBasicBlock::livein_iterator livein_iterator;
> +  for (livein_iterator I = B.livein_begin(), E = B.livein_end(); I != E; ++I) {
> +    unsigned R = *I;
> +    // Save existing registers to speed up the detection if we are actually
> +    // changing anything.
> +    OldRegs[R] = true;
> +    if (isRegSeq(R))
> +      getSubRegs(R, NewRegs);
> +  }
> +  for (int x = NewRegs.find_first(); x >= 0; x = NewRegs.find_next(x)) {
> +    if (OldRegs[x])
> +      continue;
> +    B.addLiveIn(x);
> +    OldRegs[x] = true;
> +    Changed = true;
> +  }
> +
> +  // Finally, for all register sequences (only including double registers
> +  // for Hexagon at the moment), remove the super-register from the live-ins.
> +  const TargetRegisterClass *DRC = &Hexagon::DoubleRegsRegClass;
> +  for (TargetRegisterClass::iterator I = DRC->begin(), E = DRC->end();
> +       I != E; ++I) {
> +    unsigned R = *I;
> +    if (OldRegs[R])   // Quicker test for B.isLiveIn(R).
> +      B.removeLiveIn(R);
> +  }
> +
> +  return Changed;
> +}
> +
> +
> +// Add all sequence registers to the live-in set, whose all sub-registers
> +// are already live-in.
> +bool HRC::addLiveInSeq(MachineBasicBlock &B) {
> +  bool Changed = false;
> +  BitVector SubRegs(NumRegs), LiveIns(NumRegs);
> +
> +  typedef MachineBasicBlock::livein_iterator livein_iterator;
> +  for (livein_iterator I = B.livein_begin(), E = B.livein_end(); I != E; ++I) {
> +    unsigned R = *I;
> +    LiveIns[R] = true;
> +  }
> +
> +  const TargetRegisterClass *DRC = &Hexagon::DoubleRegsRegClass;
> +  for (TargetRegisterClass::iterator I = DRC->begin(), E = DRC->end();
> +       I != E; ++I) {
> +    unsigned R = *I;
> +    if (LiveIns[R])
> +      continue;
> +    SubRegs.reset();
> +    getSubRegs(R, SubRegs);
> +    if (subset(SubRegs, LiveIns)) {
> +      B.addLiveIn(R);
> +      Changed = true;
> +    }
> +  }
> +
> +  return Changed;
> +}
> +
> +
> +// Remove all imp-use/imp-def operands from the instruction, that don't
> +// correspond to a hardware-implied implicit reference, or other form
> +// of implicit register reference that must be preserved.
> +bool HRC::removeExtraImplicitRefs(MachineInstr *MI) {
> +  // Don't touch calls for now.  There are no implicit uses for (at least
> +  // some) call instructions in the .td files, but they do use registers
> +  // in which parameters are passed.  Could check if the registers marked
> +  // as uses match the declaration of the function, but that would be too
> +  // much work.
> +//  if (MI->isCall() || MI->isInlineAsm() || MI->isKill() || isReturn(MI))
> +  if (MI->isCall() || MI->isInlineAsm() || isReturn(MI))
> +    return false;
> +
> +  BitVector HwDefs(NumRegs);
> +  BitVector HwUses(NumRegs);
> +
> +  const MCInstrDesc &D = MI->getDesc();
> +  getHwImplicits(D, HwUses, HwDefs, true);
> +
> +  typedef std::vector<unsigned> UIntVect;
> +  UIntVect BadOps;
> +
> +  for (unsigned i = 0, n = MI->getNumOperands(); i < n; ++i) {
> +    MachineOperand &MO = MI->getOperand(i);
> +    if (!MO.isReg() || !MO.isImplicit())
> +      continue;
> +    unsigned R = MO.getReg();
> +    if (Reserved[R])
> +      continue;
> +
> +    if ((MO.isUse() && !HwUses[R]) || (MO.isDef() && !HwDefs[R]))
> +      BadOps.push_back(i);
> +  }
> +
> +  bool Changed = false;
> +
> +  // Remove "bad operands".  Sort the indices and remove going from the
> +  // largest to the smallest.
> +  std::sort(BadOps.begin(), BadOps.end());
> +  for (UIntVect::reverse_iterator I = BadOps.rbegin(), E = BadOps.rend();
> +       I != E; ++I) {
> +    unsigned X = *I;
> +    MI->RemoveOperand(X);
> +    Changed = true;
> +  }
> +
> +  return Changed;
> +}
> +
> +
> +// For each register sequence referenced in the instruction, mark the reference
> +// as "undef", and add implicit corresponding references of the sub-registers.
> +bool HRC::addImpRefs(MachineInstr *MI) {
> +  if (MI->isKill())
> +    return false;
> +
> +  bool Changed = false;
> +  BitVector ImpDefs(NumRegs), ImpUses(NumRegs);
> +  BitVector NewDefs(NumRegs), NewUses(NumRegs);
> +
> +  for (unsigned i = 0, n = MI->getNumOperands(); i < n; ++i) {
> +    MachineOperand &MO = MI->getOperand(i);
> +    // It's possible for an operand to already be "undef".  For example,
> +    // after dead argument elimination, there can be an "undef" use of
> +    // a register that passes the value of the dead argument.
> +    if (!MO.isReg())
> +      continue;
> +    unsigned R = MO.getReg();
> +    if (Reserved[R])
> +      continue;
> +    if (isRegSeq(R)) {
> +      if (MO.isUndef())
> +        continue;
> +      if (MO.isUse())
> +        getSubRegs(R, NewUses);
> +      else
> +        getSubRegs(R, NewDefs);
> +      MO.setIsUndef(true);
> +      Changed = true;
> +    } else {
> +      // Virtual register rewriter can set "undef" on subregister definitions.
> +      // This is the opposite of what we want, so clear the flag.
> +      if (MO.isDef() && MO.isUndef()) {
> +        Changed = true;
> +        MO.setIsUndef(false);
> +      }
> +      if (MO.isImplicit()) {
> +        if (MO.isUse())
> +          ImpUses[R] = true;
> +        else
> +          ImpDefs[R] = true;
> +      }
> +    }
> +  }
> +
> +  if (isReturn(MI))
> +    NewUses |= Return;
> +
> +  for (int x = NewDefs.find_first(); x >= 0; x = NewDefs.find_next(x)) {
> +    if (ImpDefs[x])
> +      continue;
> +    MachineOperand Op = MachineOperand::CreateReg(x, true/*def*/, true/*imp*/);
> +    MI->addOperand(Op);
> +    Changed = true;
> +  }
> +
> +  for (int x = NewUses.find_first(); x >= 0; x = NewUses.find_next(x)) {
> +    if (ImpUses[x])
> +      continue;
> +    MachineOperand Op = MachineOperand::CreateReg(x, false, true);
> +    MI->addOperand(Op);
> +    Changed = true;
> +  }
> +
> +  return Changed;
> +}
> +
> +
> +// Correct a situation, where a return instruction has a use of a register
> +// that hasn't been defined.  Simply remove the (implicit) reference from
> +// the instruction.
> +bool HRC::correctMissingReturnValues(MachineBasicBlock &B) {
> +  BitVector Defined(NumRegs);
> +
> +  bool Changed = false;
> +  getLiveIns(B, Defined);
> +
> +  typedef MachineBasicBlock::instr_iterator instr_iterator;
> +  for (instr_iterator I = B.instr_begin(), E = B.instr_end(); I != E; ++I) {
> +    MachineInstr *MI = &*I;
> +    // Allow having multiple returns in a block, the same way as there can
> +    // be multiple branches (i.e. BrCond+Br).
> +    if (isReturn(MI)) {
> +      for (unsigned i = MI->getNumOperands(); i > 0; --i) {
> +        MachineOperand &MO = MI->getOperand(i-1);
> +        if (!MO.isReg() || !MO.isUse() || MO.isUndef())
> +          continue;
> +        unsigned R = MO.getReg();
> +        if (Reserved[R])
> +          continue;
> +
> +        assert(Return[R]);    // This better be use of a "return" register...
> +        assert(!isRegSeq(R)); // We should have processed that.
> +        if (!Defined[R]) {
> +          DEBUG(dbgs() << "Warning: removing " << MO << " from return "
> +                          "instruction in BB#" << B.getNumber() << "\n");
> +          MI->RemoveOperand(i-1);
> +          Changed = true;
> +        }
> +      }
> +      continue;
> +    }
> +
> +    for (unsigned i = 0, n = MI->getNumOperands(); i < n; ++i) {
> +      MachineOperand &MO = MI->getOperand(i);
> +      if (!MO.isReg() || !MO.isDef() || MO.isUndef())
> +        continue;
> +      unsigned R = MO.getReg();
> +      // Some instructions will have the sequences unprocessed (calls, etc).
> +      // Make sure we account for those.
> +      Defined[R] = true;
> +      if (isRegSeq(R))
> +        getSubRegs(R, Defined);
> +    }
> +  }
> +
> +  return Changed;
> +}
> +
> +
> +// Compute live ranges for non-sequence registers.  The live ranges for
> +// sequence registers will be added later, during completion of the live
> +// ranges.
> +void HRC::computeLiveRangesNonSeq(MachineBasicBlock &B,
> +      InstrIndexMap &IndexMap, RegToRangeMap &LiveMap) {
> +  typedef std::map<unsigned,IndexType> RegToIndexMap;
> +  RegToIndexMap LastUse;
> +
> +  BitVector LiveOnEntry(NumRegs);
> +  BitVector LiveOnExit(NumRegs);
> +
> +  getLiveIns(B, LiveOnEntry);
> +
> +  // Collect live-on-exit.
> +  typedef MachineBasicBlock::succ_iterator succ_iterator;
> +  for (succ_iterator I = B.succ_begin(), E = B.succ_end(); I != E; ++I) {
> +    MachineBasicBlock *S = *I;
> +    getLiveIns(*S, LiveOnExit);
> +  }
> +
> +  for (int x = LiveOnExit.find_first(); x >= 0; x = LiveOnExit.find_next(x)) {
> +    unsigned R = x;
> +    if (Reserved[R])
> +      continue;
> +    LastUse[R] = IndexType::Exit;
> +  }
> +
> +
> +  // Cumulative tracking.
> +  BitVector Fixed(NumRegs);
> +  BitVector TiedEnds(NumRegs);
> +  BitVector Seq(NumRegs);
> +
> +  // Information per instruction.
> +  BitVector HwUses(NumRegs), HwDefs(NumRegs);
> +  BitVector Uses(NumRegs), Defs(NumRegs);
> +  // "Fixed" and "Sequenced" needs to be separated within instruction
> +  // between defs and uses, since defs may (usually will) belong to a
> +  // different range than uses.  Defs will always count toward the current
> +  // range, so they can be stored directly in the cumulative bit vectors,
> +  // but uses need to be kept separate until after defs are processed.
> +  BitVector FixedUses(NumRegs), SeqUses(NumRegs);
> +
> +  // Walk all instructions backwards and determine live ranges created
> +  // by uses and defs in instructions and the live-on-exit information.
> +  typedef MachineBasicBlock::reverse_instr_iterator reverse_instr_iterator;
> +  for (reverse_instr_iterator I = B.instr_rbegin(), E = B.instr_rend();
> +       I != E; ++I) {
> +    MachineInstr *MI = &*I;
> +    if (MI->isDebugValue())
> +      continue;
> +
> +    IndexType Index = IndexMap.getIndex(MI);
> +    getHwImplicits(MI->getDesc(), HwUses, HwDefs, true);
> +    Uses = HwUses;
> +    Defs = HwDefs;
> +    FixedUses.reset();
> +    SeqUses.reset();
> +
> +    typedef SmallVector<unsigned,2> ShortVect;
> +    ShortVect TiedRegs;
> +    // Uses or defs on certain instructions cannot be renamed, e.g. on calls
> +    // or on instructions considered to be "returns" from functions.  Strictly
> +    // speaking, for returns, only uses of registers that are actually
> +    // used to return a value should be fixed, but this way makes it
> +    // a bit simpler.
> +    bool FixUses = MI->isCall() || isReturn(MI) || MI->isInlineAsm() ||
> +                   MI->isKill();
> +    bool FixDefs = MI->isCall() || MI->isInlineAsm() || MI->isKill();
> +
> +    // Scan operands for uses and defs.
> +    for (unsigned i = 0, n = MI->getNumOperands(); i < n; ++i) {
> +      MachineOperand &MO = MI->getOperand(i);
> +      if (!MO.isReg())
> +        continue;
> +      unsigned R = MO.getReg();
> +      if (Reserved[R])
> +        continue;
> +
> +      // Tied operands get the same physical register.  If an instruction
> +      // ties a def and a use, we do not want to break the live range for
> +      // this register.  The problem is that tied sequences have implicit
> +      // subreg refs that are not tied.  To avoid complication, just record
> +      // the tie and handle it later.
> +      if (MO.isTied())
> +        TiedRegs.push_back(R);
> +
> +      if (isRegSeq(R)) {
> +        // Regarding the asserts: the "undef" is needed in a sense that
> +        // there must be implied uses/defs for subregisters, since the live
> +        // ranges for sequences will be determined from those.
> +        if (MO.isUse()) {
> +          assert(FixUses || MO.isUndef());
> +          getSubRegs(R, SeqUses);
> +        } else {
> +          assert(FixDefs || MO.isUndef());
> +          getSubRegs(R, Seq);
> +        }
> +        continue;
> +      }
> +
> +      if (MO.isDef() && (FixDefs || HwDefs[R]))
> +        Fixed[R] = true;
> +      if (MO.isUse() && (FixUses || HwUses[R]))
> +        FixedUses[R] = true;
> +
> +      if (MO.isDef())
> +        Defs[R] = true;
> +      else
> +        Uses[R] = true;
> +    }
> +
> +    // Expand all tied registers to include subregisters.
> +    BitVector Tied(NumRegs);
> +    for (unsigned i = 0, n = TiedRegs.size(); i < n; ++i) {
> +      unsigned T = TiedRegs[i];
> +      Tied[T] = true;
> +      if (isRegSeq(T))
> +        getSubRegs(T, Tied);
> +    }
> +
> +    // We're going backwards in the data flow, so first check defs.
> +    for (int x = Defs.find_first(); x >= 0; x = Defs.find_next(x)) {
> +      unsigned R = x;
> +      if (Tied[R]) {
> +        // If R already has a live range going, don't break it for a tied def.
> +        // If R does not have a LastUse, record the TiedEnd for it here.
> +        if (LastUse[R] == IndexType::None)
> +          TiedEnds[R] = true;
> +        continue;
> +      }
> +      // This is the beginning of a live range that is ended by LastUse[R].
> +      bool LoX = (LastUse[R] == IndexType::Exit);
> +      bool FxR = Fixed[R];
> +      LiveMap[R].add(Index, LastUse[R], Seq[R], LoX | FxR, TiedEnds[R]);
> +      // Reset the range start info.
> +      LastUse[R] = IndexType::None;
> +      Fixed[R] = TiedEnds[R] = Seq[R] = false;
> +    }
> +
> +    // Now check uses.
> +    for (int x = Uses.find_first(); x >= 0; x = Uses.find_next(x)) {
> +      unsigned R = x;
> +      // Record the properties of uses from this instruction.
> +      Fixed[R] = (FixedUses[R] || Fixed[R]);
> +      Seq[R] = (SeqUses[R] || Seq[R]);
> +      if (LastUse[R] == IndexType::None)
> +        LastUse[R] = Index;
> +    }
> +  }
> +
> +  // Finally, process the live-on-entry information.  The live registers
> +  // as determined by the analysis should not contradict the live-in info
> +  // from the basic block.
> +  for (RegToIndexMap::iterator I = LastUse.begin(), E = LastUse.end();
> +       I != E; ++I) {
> +    if (I->second == IndexType::None)
> +      continue;
> +    unsigned R = I->first;
> +    DEBUG({
> +      if (!B.isLiveIn(R)) {
> +        dbgs() << "BB#" << B.getNumber() << ": missing live-in: "
> +               << PrintReg(R, &TRI) << "\n";
> +        if (MachineInstr *MI = IndexMap.getInstr(I->second))
> +          dbgs() << "--use: " << *MI;
> +      }
> +    });
> +    LiveMap[R].add(IndexType::Entry, I->second, false, true);
> +  }
> +
> +
> +  // Finally, sort the live ranges.
> +  for (RegToRangeMap::iterator I = LiveMap.begin(), E = LiveMap.end();
> +       I != E; ++I) {
> +    RangeList &RL = I->second;
> +    std::sort(RL.begin(), RL.end());
> +  }
> +}
> +
> +
> +// Finish the calculation of live ranges by computing live ranges of
> +// sequence registers, given the live ranges of non-sequence registers.
> +void HRC::completeLiveRanges(InstrIndexMap &IndexMap, RegToRangeMap &LiveMap) {
> +  BitVector RegSeqs(NumRegs);
> +  const TargetRegisterClass *DRC = &Hexagon::DoubleRegsRegClass;
> +  for (TargetRegisterClass::iterator I = DRC->begin(), E = DRC->end();
> +       I != E; ++I) {
> +    unsigned R = *I;
> +    if (!Reserved[R])
> +      RegSeqs[R] = true;
> +  }
> +
> +  for (int x = RegSeqs.find_first(); x >= 0; x = RegSeqs.find_next(x)) {
> +    unsigned R = x;
> +    RangeList &RL = LiveMap[R];
> +    BitVector SubRegs(NumRegs);
> +    getSubRegs(R, SubRegs);
> +    for (int y = SubRegs.find_first(); y >= 0; y = SubRegs.find_next(y)) {
> +      unsigned S = y;
> +      RL.include(LiveMap[S]);
> +    }
> +    RL.unionize();
> +  }
> +
> +}
> +
> +
> +bool HRC::markDefsDead(unsigned Reg, MachineInstr *MI) {
> +  // Mark all def operands as "dead".  Do not process subregisters,
> +  // they should be marked independently.
> +  bool Changed = false;
> +
> +  for (unsigned i = 0, n = MI->getNumOperands(); i < n; ++i) {
> +    MachineOperand &MO = MI->getOperand(i);
> +    if (!MO.isReg() || !MO.isDef() || MO.getReg() != Reg)
> +      continue;
> +    if (!MO.isDead()) {
> +      MO.setIsDead(true);
> +      Changed = true;
> +    }
> +  }
> +
> +  return Changed;
> +}
> +
> +
> +bool HRC::markUseKill(unsigned Reg, MachineInstr *MI) {
> +  // Mark the last use operand as "kill".  Do not process subregisters,
> +  // they should be marked independently.
> +  for (unsigned i = MI->getNumOperands(); i > 0; --i) {
> +    MachineOperand &MO = MI->getOperand(i-1);
> +    if (MO.isReg() && MO.isUse() && MO.getReg() == Reg) {
> +      MO.setIsKill(true);
> +      return true;
> +    }
> +  }
> +  return false;
> +}
> +
> +
> +// Place dead/kill flags on appropriate operands, according to the liveness
> +// information.
> +bool HRC::markDeadKill(MachineBasicBlock &B, InstrIndexMap &IndexMap,
> +      RegToRangeMap &LiveMap) {
> +  bool Changed = false;
> +
> +  // First, clear all dead/kill flags on non-reserved registers.
> +  typedef MachineBasicBlock::instr_iterator instr_iterator;
> +  for (instr_iterator I = B.instr_begin(), E = B.instr_end(); I != E; ++I) {
> +    MachineInstr *MI = &*I;
> +    for (unsigned i = 0, n = MI->getNumOperands(); i < n; ++i) {
> +      MachineOperand &MO = MI->getOperand(i);
> +      if (!MO.isReg())
> +        continue;
> +      unsigned R = MO.getReg();
> +      if (Reserved[R])
> +        continue;
> +      if (MO.isUse()) {
> +        if (MO.isKill()) {
> +          MO.setIsKill(false);
> +          Changed = true;
> +        }
> +      } else {
> +        if (MO.isDead()) {
> +          MO.setIsDead(false);
> +          Changed = true;
> +        }
> +      }
> +    } // for i
> +  } // for I
> +
> +  // Walk over all the live ranges and mark register operands as dead/kill.
> +  for (RegToRangeMap::iterator I = LiveMap.begin(), E = LiveMap.end();
> +       I != E; ++I) {
> +    unsigned R = I->first;
> +    if (Reserved[R])
> +      continue;
> +    const RangeList &RL = I->second;
> +    for (RangeList::const_iterator J = RL.begin(), F = RL.end(); J != F; ++J) {
> +      const IndexRange &IR = *J;
> +      IndexType IS = IR.start(), IE = IR.end();
> +      if (IE == IndexType::None) {
> +        if (IndexType::isInstr(IS)) {
> +          MachineInstr *First = IndexMap.getInstr(IS);
> +          Changed |= markDefsDead(R, First);
> +        }
> +      } else if (IndexType::isInstr(IE)) {
> +        MachineInstr *Last = IndexMap.getInstr(IE);
> +        Changed |= markUseKill(R, Last);
> +      }
> +    }
> +  }
> +
> +  return Changed;
> +}
> +
> +
> +// Cleanup pre-pass for the implicit references:
> +// - remove unnecessary implicit refs,
> +// - mark all sequence registers as "undef" and add corresponding implicit
> +//   references for the sub-registers,
> +// - finally, clean up return instructions.
> +bool HRC::processImplicitsEarly(MachineBasicBlock &B) {
> +  bool Changed = false;
> +  typedef MachineBasicBlock::instr_iterator instr_iterator;
> +
> +  // First, delete all unnecessary implicit uses and defs.
> +  for (instr_iterator I = B.instr_begin(), E = B.instr_end(); I != E; ++I)
> +    Changed |= removeExtraImplicitRefs(&*I);
> +
> +  // Mark each reference to a double register as "undef" and add implicit
> +  // uses and defs of the corresponding int registers.
> +  // Do this for both, live-ins and registers in instructions.
> +
> +  for (instr_iterator I = B.instr_begin(), E = B.instr_end(); I != E; ++I) {
> +    MachineInstr *MI = &*I;
> +    Changed |= addImpRefs(MI);
> +  }
> +
> +  Changed |= correctMissingReturnValues(B);
> +
> +  return Changed;
> +}
> +
> +
> +// Compute dead ranges for all registers, i.e. all ranges where a register
> +// is dead.  This is essentially a negation of the live range information.
> +void HRC::computeDeadRanges(RegToRangeMap &DeadMap, RegToRangeMap &LiveMap,
> +      InstrIndexMap &IndexMap) {
> +  for (RegToRangeMap::iterator I = LiveMap.begin(), E = LiveMap.end();
> +       I != E; ++I) {
> +    unsigned R = I->first;
> +    bool Seq = isRegSeq(R);
> +    RangeList &RL = I->second;
> +    if (RL.empty()) {
> +      DeadMap[R].add(IndexType::Entry, IndexType::Exit, Seq, false);
> +      continue;
> +    }
> +
> +    RangeList::iterator A = RL.begin(), Z = RL.end()-1;
> +
> +    // Try to create the initial range.
> +    if (A->start() != IndexType::Entry) {
> +      IndexType DE = IndexMap.getPrevIndex(A->start());
> +      DeadMap[R].add(IndexType::Entry, DE, Seq, false);
> +    }
> +
> +    while (A != Z) {
> +      // Creating a dead range that follows A.  Pay attention to empty
> +      // ranges (i.e. those ending with "None").
> +      IndexType AE = (A->end() == IndexType::None) ? A->start() : A->end();
> +      IndexType DS = IndexMap.getNextIndex(AE);
> +      ++A;
> +      IndexType DE = IndexMap.getPrevIndex(A->start());
> +      if (DS < DE)
> +        DeadMap[R].add(DS, DE, Seq, false);
> +    }
> +
> +    // Try to create the final range.
> +    if (Z->end() != IndexType::Exit) {
> +      IndexType ZE = (Z->end() == IndexType::None) ? Z->start() : Z->end();
> +      IndexType DS = IndexMap.getNextIndex(ZE);
> +      if (DS < IndexType::Exit)
> +        DeadMap[R].add(DS, IndexType::Exit, Seq, false);
> +    }
> +  }
> +}
> +
> +
> +// Find an index range, which starts at least MinDist before the range Sub
> +// begins, and ends at least MinDist after Sub ends.  MaxIndex is the maximum
> +// possible index of an instruction.  This is needed when computing a distance
> +// between some index and a block-exit. (Note: this has changed---the block
> +// entry and exit are not currently subjected to the distance calculation.)
> +bool HRC::findSuperRange(const RangeList &RL, const IndexRange &Sub,
> +      unsigned MinDist, IndexType MaxIndex, IndexRange &Sup) {
> +  // Very simple for now: find the first range that contains the given one.
> +  IndexType SE = (Sub.end() == IndexType::None) ? Sub.start() : Sub.end();
> +  for (RangeList::const_iterator I = RL.begin(), E = RL.end(); I != E; ++I) {
> +    const IndexRange &C = *I;
> +    if (C.Fixed)
> +      continue;
> +    if (!C.Sequenced ^ !Sub.Sequenced)
> +      continue;
> +    if (C.contains(Sub)) {
> +      // Do not calculate distance if block entry or exit is involved.
> +      // (Assume optimistically that the super range extends farther beyond
> +      // the current block, instead of pessimistically assuming that it
> +      // doesn't.)
> +      if (C.start() != IndexType::Entry) {
> +        unsigned DS = dist(C.start(), Sub.start(), MaxIndex);
> +        if (DS < MinDist)
> +          continue;
> +      }
> +      if (C.end() != IndexType::Exit) {
> +        IndexType CE = (C.end() == IndexType::None) ? C.start() : C.end();
> +        unsigned DE = dist(SE, CE, MaxIndex);
> +        if (DE < MinDist)
> +          continue;
> +      }
> +      Sup = C;
> +      return true;
> +    }
> +  }
> +  return false;
> +}
> +
> +
> +// Move an index range from one map to another.  Used when a live range
> +// becomes dead, or a dead range becomes live.
> +void HRC::moveRange(unsigned Reg, const IndexRange &Range, RegToRangeMap &From,
> +      RegToRangeMap &To, bool MergeAdjacent) {
> +  BitVector Regs(NumRegs);
> +  if (isRegSeq(Reg))
> +    getSubRegs(Reg, Regs);
> +  Regs[Reg] = true;
> +
> +  for (int x = Regs.find_first(); x >= 0; x = Regs.find_next(x)) {
> +    unsigned R = x;
> +    To[R].add(Range);
> +    To[R].unionize(MergeAdjacent);
> +    From[R].subtract(Range);
> +  }
> +}
> +
> +
> +void HRC::makeRangeDead(unsigned Reg, const IndexRange &Range,
> +      RegToRangeMap &LiveMap, RegToRangeMap &DeadMap) {
> +  DEBUG({
> +    BitVector Regs(NumRegs);
> +    if (isRegSeq(Reg))
> +      getSubRegs(Reg, Regs);
> +    Regs[Reg] = true;
> +    dbgs() << "Moving range " << Range << " for " << PrintReg(Reg, &TRI)
> +           << "\nbefore:\n";
> +    for (int x = Regs.find_first(); x >= 0; x = Regs.find_next(x)) {
> +      unsigned R = x;
> +      dbgs() << "Live[" << PrintReg(R, &TRI) << "]: " << LiveMap[R] << '\n';
> +      dbgs() << "Dead[" << PrintReg(R, &TRI) << "]: " << DeadMap[R] << '\n';
> +    }
> +  });
> +
> +  moveRange(Reg, Range, LiveMap, DeadMap, true);
> +
> +  DEBUG({
> +    BitVector Regs(NumRegs);
> +    if (isRegSeq(Reg))
> +      getSubRegs(Reg, Regs);
> +    Regs[Reg] = true;
> +    dbgs() << "after:\n";
> +    for (int x = Regs.find_first(); x >= 0; x = Regs.find_next(x)) {
> +      unsigned R = x;
> +      dbgs() << "Live[" << PrintReg(R, &TRI) << "]: " << LiveMap[R] << '\n';
> +      dbgs() << "Dead[" << PrintReg(R, &TRI) << "]: " << DeadMap[R] << '\n';
> +    }
> +  });
> +}
> +
> +
> +void HRC::makeRangeAlive(unsigned Reg, const IndexRange &Range,
> +      RegToRangeMap &DeadMap, RegToRangeMap &LiveMap) {
> +  DEBUG({
> +    BitVector Regs(NumRegs);
> +    if (isRegSeq(Reg))
> +      getSubRegs(Reg, Regs);
> +    Regs[Reg] = true;
> +    dbgs() << "Moving range " << Range << " for " << PrintReg(Reg, &TRI)
> +           << "\nbefore:\n";
> +    for (int x = Regs.find_first(); x >= 0; x = Regs.find_next(x)) {
> +      unsigned R = x;
> +      dbgs() << "Dead[" << PrintReg(R, &TRI) << "]: " << DeadMap[R] << '\n';
> +      dbgs() << "Live[" << PrintReg(R, &TRI) << "]: " << LiveMap[R] << '\n';
> +    }
> +  });
> +
> +  moveRange(Reg, Range, DeadMap, LiveMap, false);
> +
> +  DEBUG({
> +    BitVector Regs(NumRegs);
> +    if (isRegSeq(Reg))
> +      getSubRegs(Reg, Regs);
> +    Regs[Reg] = true;
> +    dbgs() << "after:\n";
> +    for (int x = Regs.find_first(); x >= 0; x = Regs.find_next(x)) {
> +      unsigned R = x;
> +      dbgs() << "Dead[" << PrintReg(R, &TRI) << "]: " << DeadMap[R] << '\n';
> +      dbgs() << "Live[" << PrintReg(R, &TRI) << "]: " << LiveMap[R] << '\n';
> +    }
> +  });
> +}
> +
> +
> +void HRC::renameRegInRange(unsigned OldReg, unsigned NewReg,
> +      const IndexRange &Range, InstrIndexMap &IndexMap) {
> +  DEBUG(dbgs() << "--renaming register " << PrintReg(OldReg, &TRI) << " to "
> +               << PrintReg(NewReg, &TRI) << " in " << Range << "\n");
> +  if (OldReg == NewReg)  // That would be strange.
> +    return;
> +
> +  // For sequences, we will be renaming all sub-registers at once, hence
> +  // we need a map of which sub-register of OldReg maps to which sub-
> +  // register of NewReg.
> +  typedef std::map<unsigned,unsigned> RenameMap;
> +  RenameMap Map;
> +
> +  bool Seq = isRegSeq(OldReg);
> +  assert(!(Seq ^ isRegSeq(NewReg)) && "Register type mismatch");
> +
> +  Map.insert(std::make_pair(OldReg, NewReg));
> +  if (Seq) {
> +    MCSubRegIterator OI(OldReg, &TRI), NI(NewReg, &TRI);
> +    while (OI.isValid() && NI.isValid()) {
> +      Map.insert(std::make_pair(*OI, *NI));
> +      ++OI;
> +      ++NI;
> +    }
> +    assert(!OI.isValid() && !NI.isValid());
> +  }
> +
> +  typedef MachineBasicBlock::instr_iterator instr_iterator;
> +  instr_iterator I = IndexMap.getInstr(Range.start());
> +  // Rename defs in the first instruction in range.  These are the only
> +  // references (usually one) that should be renamed in this instruction.
> +  for (unsigned i = 0, n = I->getNumOperands(); i < n; ++i) {
> +    MachineOperand &MO = I->getOperand(i);
> +    if (!MO.isReg() || !MO.isDef())
> +      continue;
> +    unsigned R = MO.getReg();
> +    RenameMap::iterator F = Map.find(R);
> +    if (F != Map.end())
> +      MO.setReg(F->second);
> +  }
> +  // If the end is "None", there are no uses (i.e. there is no explicit end
> +  // of the range), so leave early.
> +  if (Range.end() == IndexType::None)
> +    return;
> +
> +  instr_iterator Last = IndexMap.getInstr(Range.end());
> +  instr_iterator End = llvm::next(Last);
> +
> +  while (++I != End) {
> +    // Rename uses and defs.  Defs have to be renamed here as well:
> +    // when renaming D0 -> D1, the def of R1 has to be renamed too:
> +    //   R0 = ...
> +    //   R1 = ...
> +    //   ... = D0
> +    MachineInstr *MI = &*I;
> +    for (unsigned i = 0, n = MI->getNumOperands(); i < n; ++i) {
> +      MachineOperand &MO = MI->getOperand(i);
> +      if (!MO.isReg())
> +        continue;
> +      // Don't rename the def in the last instruction, except for TiedEnd.
> +      if (I == Last && !Range.TiedEnd)
> +        if (MO.isDef())
> +          continue;
> +      unsigned R = MO.getReg();
> +      RenameMap::iterator F = Map.find(R);
> +      if (F == Map.end())
> +        continue;
> +
> +      MO.setReg(F->second);
> +    }
> +  }
> +}
> +
> +
> +// Given a register and a range, find a replacement register whose closest
> +// live ranges are at least MinDist away.  On failure to find a replacement
> +// return "false", otherwise "true".
> +bool HRC::renameRange(unsigned Reg, IndexRange &Range, MachineBasicBlock &B,
> +      unsigned MinDist, InstrIndexMap &IndexMap, RegToRangeMap &LiveMap,
> +      RegToRangeMap &DeadMap) {
> +  DEBUG(dbgs() << "Try to rename range: " << Range << " for register "
> +               << PrintReg(Reg, &TRI) << " with MinDist=" << MinDist << "\n");
> +
> +  bool Found = false;
> +  unsigned NewReg;
> +  IndexRange NewRange;
> +
> +  // Have a vector of candidate replacement registers.  Having them in
> +  // an ordered container allows to give some registers priority over others.
> +  typedef std::vector<unsigned> UIntVect;
> +  UIntVect CandRegs;
> +
> +  const TargetRegisterClass *RC = getPhysRegClass(Reg);
> +  typedef TargetRegisterClass::iterator iterator;
> +  for (iterator I = RC->begin(), E = RC->end(); I != E; ++I) {
> +    unsigned R = *I;
> +    if (!MRI->isPhysRegUsed(R))
> +      continue;
> +    // Put the used registers in front.  This could prevent save/restore
> +    // of a callee-saved-register, if there is an already-used register
> +    // available.
> +    CandRegs.push_back(R);
> +  }
> +  for (iterator I = RC->begin(), E = RC->end(); I != E; ++I) {
> +    unsigned R = *I;
> +    if (MRI->isPhysRegUsed(R))
> +      continue;
> +    CandRegs.push_back(R);
> +  }
> +
> +
> +  for (UIntVect::iterator I = CandRegs.begin(), E = CandRegs.end();
> +       I != E; ++I) {
> +    unsigned R = *I;
> +    if (R == Reg || Reserved[R])
> +      continue;
> +    RangeList &RL = DeadMap[R];
> +    // Find a range that is not closer to another range than the one
> +    // we're trying to replace.
> +    for (unsigned Dist = OptDepDist; Dist > MinDist; --Dist) {
> +      Found = findSuperRange(RL, Range, Dist, IndexMap.Last, NewRange);
> +      if (Found)
> +        break;
> +    }
> +    if (Found) {
> +      NewReg = R;
> +      break;
> +    }
> +  }
> +
> +  if (!Found) {
> +    DEBUG(dbgs() << "Replacement register not found\n");
> +    return false;
> +  }
> +
> +  DEBUG(dbgs() << "Found replacement register: " << PrintReg(NewReg, &TRI)
> +               << " range: " << NewRange << "\n");
> +
> +  IndexRange Copy = Range;
> +  renameRegInRange(Reg, NewReg, Copy, IndexMap);
> +
> +  makeRangeDead(Reg,    Copy, LiveMap, DeadMap);
> +  makeRangeAlive(NewReg, Copy, DeadMap, LiveMap);
> +
> +  MRI->setPhysRegUsed(NewReg);
> +
> +  return true;
> +}
> +
> +
> +// Break anti-dependencies whose distance is exactly Dist.  This prevents
> +// repeated discovering of anti-dependencies with smaller distances, which
> +// have already failed to be renamed.
> +bool HRC::breakAntiDepForReg(unsigned Reg, unsigned Dist, RangeList &RL,
> +      MachineBasicBlock &B, InstrIndexMap &IndexMap,
> +      RegToRangeMap &LiveMap, RegToRangeMap &DeadMap) {
> +
> +  if (RenameLimit != UINT_MAX) {
> +    if (Debug_RRC >= RenameLimit)
> +      return false;
> +    Debug_RRC++;
> +  }
> +
> +  if (RL.empty())
> +    return false;
> +
> +  RangeList::iterator P = RL.begin();
> +  bool Changed = false;
> +  // If Reg is a sequence, only handle sequenced live ranges.
> +  bool Seq = isRegSeq(Reg);
> +
> +  bool Again = true;
> +  while (Again) {
> +    Again = false;
> +
> +    // Find the list of pairs of consecutive ranges where the distance is
> +    // the smallest.
> +    typedef std::vector<RangeList::iterator> RangeIterVect;
> +    RangeIterVect AntiDeps;
> +
> +    for (RangeList::iterator P = RL.begin(), Z = RL.end()-1; P != Z; ++P) {
> +      RangeList::iterator Next = P+1;
> +      // Compute distance between P and P+1.
> +      if (P->Fixed || Next->Fixed)  // Neither can be fixed.
> +        continue;
> +
> +      if (Seq) {
> +        // For sequences, only rename seq-seq dependencies.  Seq-nonseq will
> +        // be handled by renaming of non-sequences.
> +        if (!P->Sequenced || !Next->Sequenced)
> +          continue;
> +      } else {
> +        // The register is not a sequence.  Allow a single sequenced range,
> +        // but not both.  The register renaming will be attempted only in
> +        // the non-sequenced ranges.
> +        if (P->Sequenced && Next->Sequenced)
> +          continue;
> +      }
> +
> +      unsigned ThisEnd = (P->end() == IndexType::None) ? P->start() : P->end();
> +      unsigned NextStart = Next->start();
> +      assert(ThisEnd <= NextStart);
> +      unsigned D = NextStart-ThisEnd;
> +      if (D == Dist)
> +        AntiDeps.push_back(P);
> +    }
> +
> +    for (RangeIterVect::iterator I = AntiDeps.begin(), E = AntiDeps.end();
> +         I != E; ++I) {
> +      // If a candidate pair of ranges was found, rename one of them.
> +      RangeList::iterator F = *I;
> +      RangeList::iterator G = F+1;
> +      DEBUG(dbgs() << "Anti-dep on " << PrintReg(Reg, &TRI) << "  dist: "
> +                   << Dist << "  " << *F << ".." << *G << "\n");
> +      // Try to rename the second range, and if that fails, try the first.
> +      // Remember to check for maching "sequencedness".
> +      // If Seq is true, they both have to be sequenced, or else they
> +      // wouldn't have been picked as candidates.  If Seq is false,
> +      // try to rename the range that is not sequenced (possibly both).
> +      bool TryThis = (Seq == F->Sequenced);
> +      bool TryNext = (Seq == G->Sequenced);
> +      if (TryNext)
> +        Again = renameRange(Reg, *G, B, Dist+1, IndexMap, LiveMap, DeadMap);
> +      if (!Again && TryThis)
> +        Again = renameRange(Reg, *F, B, Dist+1, IndexMap, LiveMap, DeadMap);
> +
> +      Changed |= Again;
> +      // If we renamed something, break this loop, since the next candidate
> +      // on this list may have been renamed.
> +      if (Again)
> +        break;
> +    }
> +  }
> +
> +  return Changed;
> +}
> +
> +
> +bool HRC::breakAntiDep(MachineBasicBlock &B, InstrIndexMap &IndexMap,
> +      RegToRangeMap &LiveMap) {
> +  bool Changed = false;
> +
> +  RegToRangeMap DeadMap;
> +  computeDeadRanges(DeadMap, LiveMap, IndexMap);
> +  DEBUG(dbgs() << "Dead Map:\n"; dump_map(dbgs(), DeadMap, TRI));
> +
> +  // Start from the distance of 0, and go up to OptDepDist.
> +
> +  // First, process all the register sequences.
> +  for (unsigned Dist = 0; Dist < OptDepDist; ++Dist) {
> +    const TargetRegisterClass *DRC = &Hexagon::DoubleRegsRegClass;
> +    for (TargetRegisterClass::iterator I = DRC->begin(), E = DRC->end();
> +         I != E; ++I) {
> +      RegToRangeMap::iterator F = LiveMap.find(*I);
> +      unsigned R = *I;
> +      if (!Reserved[R] && F != LiveMap.end())
> +        Changed |= breakAntiDepForReg(F->first, Dist, F->second, B,
> +                                      IndexMap, LiveMap, DeadMap);
> +    }
> +  }
> +
> +  // Then process the non-sequence registers.
> +  for (unsigned Dist = 0; Dist < OptDepDist; ++Dist) {
> +    for (RegToRangeMap::iterator I = LiveMap.begin(), E = LiveMap.end();
> +         I != E; ++I) {
> +      unsigned R = I->first;
> +      if (isRegSeq(R) || Reserved[R])
> +        continue;
> +      Changed |= breakAntiDepForReg(I->first, Dist, I->second, B,
> +                                    IndexMap, LiveMap, DeadMap);
> +    }
> +  }
> +
> +  return Changed;
> +}
> +
> +
> +// Undo the temporary changes made in the earlier stages:
> +// - clear the "undef" flag from sequence registers,
> +// - remove previously added implicit refs for sub-registers,
> +// - add register sequences to the live-in sets.
> +bool HRC::processBlockLate(MachineBasicBlock &B) {
> +  bool Changed = false;
> +
> +  BitVector HwUses(NumRegs), HwDefs(NumRegs);
> +
> +  typedef MachineBasicBlock::instr_iterator instr_iterator;
> +  for (instr_iterator I = B.instr_begin(), E = B.instr_end(); I != E; ++I) {
> +    MachineInstr *MI = &*I;
> +    if (MI->isCall() || MI->isInlineAsm())
> +      continue;
> +
> +    getHwImplicits(MI->getDesc(), HwUses, HwDefs);
> +
> +    for (unsigned i = 0, n = MI->getNumOperands(); i < n; ++i) {
> +      MachineOperand &MO = MI->getOperand(i);
> +      if (!MO.isReg())
> +        continue;
> +      unsigned R = MO.getReg();
> +      if (HwUses[R] || HwDefs[R])
> +        continue;
> +      MO.setIsUndef(false);
> +      Changed = true;
> +    }
> +  }
> +
> +  for (instr_iterator I = B.instr_begin(), E = B.instr_end(); I != E; ++I)
> +    Changed |= removeExtraImplicitRefs(&*I);
> +
> +  Changed |= addLiveInSeq(B);
> +
> +  return Changed;
> +}
> +
> +
> +bool HRC::isSpillStore(MachineInstr *MI) {
> +  unsigned Opc = MI->getOpcode();
> +  unsigned Fx;
> +
> +  switch (Opc) {
> +    case Hexagon::STriw:
> +    case Hexagon::STrid:
> +    case Hexagon::STriw_indexed:
> +    case Hexagon::STrid_indexed:
> +      Fx = 0;
> +      break;
> +    default:
> +      return false;
> +  }
> +
> +  MachineOperand &MO = MI->getOperand(Fx);
> +  if (!MO.isFI() || MO.getIndex() < 0)
> +    return false;
> +
> +  // Not 100% accurate, but false positive is not wrong.
> +  return true;
> +}
> +
> +
> +unsigned HRC::getSpilledRegister(MachineInstr *MI) {
> +  unsigned Opc = MI->getOpcode();
> +
> +  switch (Opc) {
> +    case Hexagon::STriw:
> +    case Hexagon::STrid:
> +      return MI->getOperand(2).getReg();
> +    case Hexagon::STriw_indexed:
> +    case Hexagon::STrid_indexed:
> +      return MI->getOperand(2).getReg();
> +  }
> +
> +  llvm_unreachable("Cannot identify spilled register");
> +  return UINT_MAX;
> +}
> +
> +
> +// Add definitions (initialization with 0) of the undefined sub-registers
> +// of any spill instruction in a given block.  See comment for "cleanupRewrite"
> +// for more information.
> +bool HRC::correctPartialSpills(MachineBasicBlock &B) {
> +  BitVector Defined(NumRegs);
> +
> +  bool Changed = false;
> +  getLiveIns(B, Defined);
> +  subRegClosure(Defined);
> +
> +  typedef MachineBasicBlock::instr_iterator instr_iterator;
> +  for (instr_iterator I = B.instr_begin(), Next; I != B.instr_end(); I = Next) {
> +    MachineInstr *MI = &*I;
> +    Next = llvm::next(I);
> +
> +    if (isSpillStore(MI)) {
> +      // If this is a spill store, check if the spilled register has been
> +      // fully defined.  It's possible that a 64-bit register may be spilled
> +      // after only one sub-register has been defined.
> +      unsigned R = getSpilledRegister(MI);
> +      if (Defined[R])
> +        continue;
> +      BitVector SR(NumRegs);
> +      getSubRegs(R, SR);
> +      for (int x = SR.find_first(); x >= 0; x = SR.find_next(x)) {
> +        unsigned S = x;
> +        if (Defined[S])
> +          continue;
> +        // Sub-register S has not been defined.  Add a phony definition.
> +        DEBUG(dbgs() << "Warning: in BB#" << B.getNumber() << " spill "
> +                        "instruction uses an undefined subregister "
> +                     << PrintReg(S, &TRI) << ":\n" << *MI
> +                     << "Adding initialization.\n");
> +        DebugLoc DL = MI->getDebugLoc();
> +        BuildMI(B, prior(I), DL, TII.get(Hexagon::TFRI), S)
> +          .addImm(0);
> +        Changed = true;
> +      }
> +    }
> +
> +    // Collect info which registers have been defined.
> +    for (unsigned i = 0, n = MI->getNumOperands(); i < n; ++i) {
> +      MachineOperand &MO = MI->getOperand(i);
> +      if (!MO.isReg() || !MO.isDef())
> +        continue;
> +      unsigned R = MO.getReg();
> +      Defined[R] = true;
> +      if (isRegSeq(R))
> +        getSubRegs(R, Defined);
> +    }
> +  }
> +
> +  return Changed;
> +}
> +
> +
> +bool HRC::correctUndefinedSubregisterReads(MachineBasicBlock &B) {
> +  BitVector Defined(NumRegs);
> +
> +  bool Changed = false;
> +  getLiveIns(B, Defined);
> +  subRegClosure(Defined);
> +
> +  typedef MachineBasicBlock::instr_iterator instr_iterator;
> +  for (instr_iterator I = B.instr_begin(), Next; I != B.instr_end(); I = Next) {
> +    MachineInstr *MI = &*I;
> +    Next = llvm::next(I);
> +    BitVector Uses(NumRegs);
> +
> +    // Collect all subregisters used in this instruction.
> +    for (unsigned i = 0, n = MI->getNumOperands(); i < n; ++i) {
> +      MachineOperand &MO = MI->getOperand(i);
> +      if (!MO.isReg() || !MO.isUse() || MO.isUndef())
> +        continue;
> +      unsigned R = MO.getReg();
> +      if (Reserved[R])
> +        continue;
> +      if (isRegSeq(R))
> +        getSubRegs(R, Uses);
> +    }
> +    // Insert definitions, if necessary.
> +    for (int x = Uses.find_first(); x >= 0; x = Uses.find_next(x)) {
> +      unsigned R = x;
> +      if (Defined[R])
> +        continue;
> +      DEBUG(dbgs() << "Warning: use of undefined subregister "
> +                   << PrintReg(R, &TRI) << " in BB#" << B.getNumber()
> +                   << ":\n  " << *MI);
> +      DebugLoc DL = MI->getDebugLoc();
> +      BuildMI(B, I, DL, TII.get(Hexagon::TFRI), R)
> +        .addImm(0);
> +      Defined[R] = true;
> +      Changed = true;
> +    }
> +
> +    for (unsigned i = 0, n = MI->getNumOperands(); i < n; ++i) {
> +      MachineOperand &MO = MI->getOperand(i);
> +      if (!MO.isReg() || !MO.isDef() || MO.isUndef())
> +        continue;
> +      unsigned R = MO.getReg();
> +      Defined[R] = true;
> +      if (isRegSeq(R))
> +        getSubRegs(R, Defined);
> +    }
> +  }
> +
> +  return Changed;
> +}
> +
> +
> +bool HRC::cleanupRewrite(MachineFunction &MF) {
> +  // This pass exists only to address the following situation:
> +  // 
> +  // BB#5: derived from LLVM BB %for.body.lr.ph
> +  //     Live Ins: %R16 %R22 %R0 %R28
> +  //     Predecessors according to CFG: BB#4
> +  //         %R2<def> = LDriw <fi#9>, 0; mem:LD4[FixedStack9]
> +  //         STriw_indexed <fi#22>, 0, %R2; mem:ST4[FixedStack22]
> +  //         %R4<def> = MPYI_riu %R16, 44
> +  //         %R5<def> = TFR_FI <fi#5>, 0
> +  //         %R1<def> = LDriw <fi#10>, 0; mem:LD4[FixedStack10]
> +  //         %R3<def> = TFRI_V4 <ga:@foo>
> +  //         %R1<def> = ADDr_MPYir_V4 %R3, 88, %R1
> +  //         %R3<def> = COPY %R4
> +  //         %R20<def> = TFRI -2
> +  //         %R21<def> = TFRI 32767
> +  //   -->   %R7<def> = TFRI 0, %D3<imp-def>
> +  //   -->   STrid <fi#23>, 0, %D3; mem:ST8[FixedStack23]
> +  //         ...
> +  // In case this is not clear: D3 is only "implicitly" defined: R6 is not
> +  // live, but the entire D3 is spilled.
> +
> +  bool Changed = false;
> +  for (MachineFunction::iterator I = MF.begin(), E = MF.end(); I != E; ++I) {
> +    MachineBasicBlock &B = *I;
> +    typedef MachineBasicBlock::instr_iterator instr_iterator;
> +    for (instr_iterator J = B.instr_begin(), F = B.instr_end(); J != F; ++J)
> +      Changed |= removeExtraImplicitRefs(&*J);
> +  }
> +
> +  for (MachineFunction::iterator I = MF.begin(), E = MF.end(); I != E; ++I)
> +    Changed |= correctUndefinedSubregisterReads(*I);
> +
> +// This is replaced by a more general "correctUndefinedSubregisterReads".
> +//  for (MachineFunction::iterator I = MF.begin(), E = MF.end(); I != E; ++I)
> +//    Changed |= correctPartialSpills(*I);
> +
> +  return Changed;
> +}
> +
> +
> +// Compute live ranges and perform register renaming to eliminate anti-
> +// dependencies between instruction that are close to each other in a
> +// basic block.
> +bool HRC::processDependencies(MachineFunction &MF) {
> +  bool Changed = false;
> +
> +  // First off, replace all register sequences in the live-in sets.
> +  // This has to be done on all blocks first, because the live-on-exit
> +  // set for a given block will depend on the live-ins of its successors.
> +  for (MachineFunction::iterator I = MF.begin(), E = MF.end(); I != E; ++I) {
> +    MachineBasicBlock &B = *I;
> +    Changed |= expandLiveIns(B);
> +  }
> +
> +  for (MachineFunction::iterator I = MF.begin(), E = MF.end(); I != E; ++I) {
> +    MachineBasicBlock &B = *I;
> +    Changed |= processImplicitsEarly(B);
> +
> +    InstrIndexMap IndexMap(B);
> +    RegToRangeMap LiveMap;
> +
> +    computeLiveRangesNonSeq(B, IndexMap, LiveMap);
> +    completeLiveRanges(IndexMap, LiveMap);
> +
> +    DEBUG({
> +      dbgs() << "Index Map for BB#" << B.getNumber() << "\n" << IndexMap;
> +      dbgs() << "Live Map:\n";
> +      dump_map(dbgs(), LiveMap, TRI);
> +    });
> +
> +    Changed |= breakAntiDep(B, IndexMap, LiveMap);
> +    Changed |= addLiveInSeq(B);
> +    Changed |= markDeadKill(B, IndexMap, LiveMap);
> +  }
> +
> +  return Changed;
> +}
> +
> +
> +bool HRC::finalCleanup(MachineFunction &MF) {
> +  bool Changed = false;
> +  for (MachineFunction::iterator I = MF.begin(), E = MF.end(); I != E; ++I)
> +    Changed |= processBlockLate(*I);
> +  return Changed;
> +}
> +
> +
> +bool HRC::runOnMachineFunction(MachineFunction &MF) {
> +  DEBUG({if (!RunHRC) dbgs() << "HRC not enabled!";});
> +  DEBUG({if (!IgnoreUndef) dbgs() << "HRC needs IgnoreUndef to run!";});
> +
> +  if (!RunHRC || !IgnoreUndef)
> +    return false;
> +
> +  const Function *F = MF.getFunction();
> +  DEBUG(MF.print(dbgs() << "Before " << getPassName() << "\n", 0));
> +
> +  MRI = &MF.getRegInfo();
> +  Reserved = TRI.getReservedRegs(MF);
> +  // The registers SAn and LCn are not tracked accurately in the live-in
> +  // sets.  Because of that, We can mark them as "dead" when in fact they
> +  // are not.  This hurts the packetizer that refuses to packetize any
> +  // instrution with a dead def (in this case LOOP0_i, for example).
> +  // Because of that, consider SAn and LCn to be reserved.
> +  Reserved[Hexagon::SA0] = Reserved[Hexagon::SA1] = true;
> +  Reserved[Hexagon::LC0] = Reserved[Hexagon::LC1] = true;
> +
> +
> +  // Set the registers for the return value from this function.
> +  Return.resize(NumRegs);
> +  Return.reset();
> +  Type *RTy = F->getReturnType();
> +  unsigned Bits = 0;
> +  if (RTy->isPrimitiveType()) {
> +    Bits = RTy->getPrimitiveSizeInBits();
> +  } else {
> +    if (IntegerType *IntTy = dyn_cast<IntegerType>(RTy))
> +      Bits = IntTy->getBitWidth();
> +    else if (F->getParent()->getPointerSize() == Module::Pointer32)
> +      Bits = 32;  // If not int, then assume 32- or 64-bit pointer.
> +    else
> +      Bits = 64;
> +  }
> +  if (Bits > 0) {
> +    Return[Hexagon::R0] = true;
> +    if (Bits > 32)
> +      Return[Hexagon::R1] = true;
> +  }
> +
> +
> +  bool Changed = false;
> +
> +  switch (Stage) {
> +    case HRC::PostRewrite:
> +      Changed = cleanupRewrite(MF);
> +      break;
> +    case HRC::PreSchedule:
> +      Changed = processDependencies(MF);
> +      break;
> +    case HRC::Finalize:
> +      Changed = finalCleanup(MF);
> +      break;
> +  }
> +
> +  DEBUG(MF.print(dbgs() << "After " << getPassName() << "\n", 0));
> +  return Changed;
> +}
> +
> +
> +// ------------------------------------------------------------------
> +// Pass registration/initialization/creation.
> +
> +static void initializeHRC_PostRewritePassOnce(PassRegistry &Registry) {
> +  const char &ID = HexagonRegisterCleanup_PostRewrite::ID;
> +  const char *Name = PassNames[0];
> +  PassInfo *PI = new PassInfo(Name, "hexagon-rc1", &ID, 0, false, false);
> +  Registry.registerPass(*PI, true);
> +}
> +
> +static void initializeHRC_PreSchedulePassOnce(PassRegistry &Registry) {
> +  const char &ID = HexagonRegisterCleanup_PreSchedule::ID;
> +  const char *Name = PassNames[1];
> +  PassInfo *PI = new PassInfo(Name, "hexagon-rc2", &ID, 0, false, false);
> +  Registry.registerPass(*PI, true);
> +}
> +
> +static void initializeHRC_FinalizePassOnce(PassRegistry &Registry) {
> +  const char &ID = HexagonRegisterCleanup_Finalize::ID;
> +  const char *Name = PassNames[2];
> +  PassInfo *PI = new PassInfo(Name, "hexagon-rc3", &ID, 0, false, false);
> +  Registry.registerPass(*PI, true);
> +}
> +
> +
> +namespace llvm {
> +  void initializeHexagonRegisterCleanup_PostRewritePass(
> +        PassRegistry &Registry) {
> +    CALL_ONCE_INITIALIZATION(initializeHRC_PostRewritePassOnce)
> +  }
> +
> +  void initializeHexagonRegisterCleanup_PreSchedulePass(
> +        PassRegistry &Registry) {
> +    CALL_ONCE_INITIALIZATION(initializeHRC_PreSchedulePassOnce)
> +  }
> +
> +  void initializeHexagonRegisterCleanup_FinalizePass(
> +        PassRegistry &Registry) {
> +    CALL_ONCE_INITIALIZATION(initializeHRC_FinalizePassOnce)
> +  }
> +
> +
> +  FunctionPass *createHexagonRegisterCleanup_PostRewrite(
> +        const HexagonTargetMachine &TM) {
> +    return new HexagonRegisterCleanup_PostRewrite(TM);
> +  }
> +
> +  FunctionPass *createHexagonRegisterCleanup_PreSchedule(
> +        const HexagonTargetMachine &TM) {
> +    return new HexagonRegisterCleanup_PreSchedule(TM);
> +  }
> +
> +  FunctionPass *createHexagonRegisterCleanup_Finalize(
> +        const HexagonTargetMachine &TM) {
> +    return new HexagonRegisterCleanup_Finalize(TM);
> +  }
> +}
> diff --git a/lib/Target/Hexagon/HexagonTargetMachine.cpp b/lib/Target/Hexagon/HexagonTargetMachine.cpp
> index 2d5529b..e8fadd9 100644
> --- a/lib/Target/Hexagon/HexagonTargetMachine.cpp
> +++ b/lib/Target/Hexagon/HexagonTargetMachine.cpp
> @@ -26,6 +26,9 @@
>  
>  using namespace llvm;
>  
> +extern cl::opt<bool> IgnoreUndef;
> +extern cl::opt<bool> RunHRC;
> +
>  static cl:: opt<bool> DisableHardwareLoops("disable-hexagon-hwloops",
>        cl::Hidden, cl::desc("Disable Hardware Loops for Hexagon target"));
>  
> @@ -78,8 +81,14 @@ HexagonTargetMachine::HexagonTargetMachine(const Target &T, StringRef TT,
>      TSInfo(*this),
>      FrameLowering(Subtarget),
>      InstrItins(&Subtarget.getInstrItineraryData()) {
> -    setMCUseCFI(false);
> -    initAsmInfo();
> +  setMCUseCFI(false);
> +  initAsmInfo();
> +
> +  // Hexagon Register Cleanup needs IgnoreUndef to be set.  Make sure that
> +  // IgnoreUndef is set whenever we run HRC, and (more importantly) that it's
> +  // off when HRC is not going to be executed.
> +  if (!IgnoreUndef.getPosition())
> +    IgnoreUndef.setValue(RunHRC, false);
>  }
>  
>  // addPassesForOptimizations - Allow the backend (target) to add Target
> @@ -107,6 +116,12 @@ public:
>        enablePass(&MachineSchedulerID);
>        MachineSchedRegistry::setDefault(createVLIWMachineSched);
>      }
> +    if (getOptLevel() != CodeGenOpt::None) {
> +      // Add insertion information to the pass config _before_ actual passes
> +      // are added to it.
> +      Pass *HRC_PostRewrite = createHexagonRegisterCleanup_PostRewrite(*TM);
> +      insertPass(&VirtRegRewriterID, IdentifyingPassPtr(HRC_PostRewrite));
> +    }
>    }
>  
>    HexagonTargetMachine &getHexagonTargetMachine() const {
> @@ -151,9 +166,11 @@ bool HexagonPassConfig::addPreRegAlloc() {
>  
>  bool HexagonPassConfig::addPostRegAlloc() {
>    const HexagonTargetMachine &TM = getHexagonTargetMachine();
> -  if (getOptLevel() != CodeGenOpt::None)
> +  if (getOptLevel() != CodeGenOpt::None) {
> +    addPass(createHexagonRegisterCleanup_PreSchedule(TM));
>      if (!DisableHexagonCFGOpt)
>        addPass(createHexagonCFGOptimizer(TM));
> +  }
>    return false;
>  }
>  
> @@ -191,6 +208,7 @@ bool HexagonPassConfig::addPreEmitPass() {
>    if (!NoOpt) {
>      if (!DisableHardwareLoops)
>        addPass(createHexagonFixupHwLoops());
> +    addPass(createHexagonRegisterCleanup_Finalize(TM));
>      addPass(createHexagonPacketizer());
>    }
>  
> diff --git a/test/CodeGen/Hexagon/hrc-basic.ll b/test/CodeGen/Hexagon/hrc-basic.ll
> new file mode 100644
> index 0000000..6a4f3d8
> --- /dev/null
> +++ b/test/CodeGen/Hexagon/hrc-basic.ll
> @@ -0,0 +1,27 @@
> +; RUN: llc -march=hexagon -mcpu=hexagonv4 -O2 -run-hrc < %s | FileCheck %s
> +target triple = "hexagon"
> +
> +define i32 @foo(i64 %a, i32 %x, i32 %y) #0 {
> +entry:
> +; CHECK: {
> +; CHECK: += mpyi
> +; CHECK-NOT: {
> +; CHECK-NOT: }
> +; CHECK: += mpyi
> +  %u.sroa.0.0.extract.trunc = trunc i64 %a to i32
> +  %u.sroa.1.4.extract.shift = lshr i64 %a, 32
> +  %u.sroa.1.4.extract.trunc = trunc i64 %u.sroa.1.4.extract.shift to i32
> +  %mul = mul nsw i32 %u.sroa.0.0.extract.trunc, %x
> +  %add = add nsw i32 %mul, %u.sroa.0.0.extract.trunc
> +  %mul6 = mul nsw i32 %u.sroa.1.4.extract.trunc, %y
> +  %add9 = add nsw i32 %mul6, %u.sroa.1.4.extract.trunc
> +  %c = icmp eq i32 %add, %add9
> +  %res = select i1 %c, i32 1, i32 -1
> +  ret i32 %res
> +}
> +
> +attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
> +
> +!0 = metadata !{metadata !"int", metadata !1}
> +!1 = metadata !{metadata !"omnipotent char", metadata !2}
> +!2 = metadata !{metadata !"Simple C/C++ TBAA"}
> -- 
> 1.7.6.4
> 

> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits




More information about the llvm-commits mailing list