[llvm] r180193 - MI Sched: eliminate local vreg copies.

Hal Finkel hfinkel at anl.gov
Wed Apr 24 11:04:59 PDT 2013


----- Original Message -----
> From: "Andrew Trick" <atrick at apple.com>
> To: llvm-commits at cs.uiuc.edu
> Sent: Wednesday, April 24, 2013 10:54:43 AM
> Subject: [llvm] r180193 - MI Sched: eliminate local vreg copies.
> 
> Author: atrick
> Date: Wed Apr 24 10:54:43 2013
> New Revision: 180193
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=180193&view=rev
> Log:
> MI Sched: eliminate local vreg copies.
> 
> For now, we just reschedule instructions that use the copied vregs
> and
> let regalloc elliminate it. I would really like to eliminate the
> copies on-the-fly during scheduling, but we need a complete
> implementation of repairIntervalsInRange() first.
> 
> The general strategy is for the register coalescer to eliminate as
> many global copies as possible and shrink live ranges to be
> extended-basic-block local. The coalescer should not have to worry
> about resolving local copies (e.g. it shouldn't attemp to reorder
> instructions). The scheduler is a much better place to deal with
> local
> interference. The coalescer side of this equation needs work.
> 
> Added:
>     llvm/trunk/test/CodeGen/ARM/misched-copy-arm.ll
> Modified:
>     llvm/trunk/include/llvm/CodeGen/LiveInterval.h
>     llvm/trunk/include/llvm/CodeGen/MachineScheduler.h
>     llvm/trunk/include/llvm/CodeGen/ScheduleDAG.h
>     llvm/trunk/include/llvm/CodeGen/ScheduleDAGInstrs.h
>     llvm/trunk/lib/CodeGen/MachineScheduler.cpp
> 
> Modified: llvm/trunk/include/llvm/CodeGen/LiveInterval.h
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/LiveInterval.h?rev=180193&r1=180192&r2=180193&view=diff
> ==============================================================================
> --- llvm/trunk/include/llvm/CodeGen/LiveInterval.h (original)
> +++ llvm/trunk/include/llvm/CodeGen/LiveInterval.h Wed Apr 24
> 10:54:43 2013
> @@ -399,6 +399,15 @@ namespace llvm {
>        return r != end() && r->containsRange(Start, End);
>      }
>  
> +    /// True iff this live range is a single segment that lies
> between the
> +    /// specified boundaries, exclusively. Vregs live across a
> backedge are not
> +    /// considered local. The boundaries are expected to lie within
> an extended
> +    /// basic block, so vregs that are not live out should contain
> no holes.
> +    bool isLocal(SlotIndex Start, SlotIndex End) const {
> +      return beginIndex() > Start.getBaseIndex() &&
> +        endIndex() < End.getBoundaryIndex();
> +    }
> +
>      /// removeRange - Remove the specified range from this interval.
>       Note that
>      /// the range must be a single LiveRange in its entirety.
>      void removeRange(SlotIndex Start, SlotIndex End,
> 
> Modified: llvm/trunk/include/llvm/CodeGen/MachineScheduler.h
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/MachineScheduler.h?rev=180193&r1=180192&r2=180193&view=diff
> ==============================================================================
> --- llvm/trunk/include/llvm/CodeGen/MachineScheduler.h (original)
> +++ llvm/trunk/include/llvm/CodeGen/MachineScheduler.h Wed Apr 24
> 10:54:43 2013
> @@ -274,6 +274,10 @@ public:
>      Mutations.push_back(Mutation);
>    }
>  
> +  /// \brief True if an edge can be added from PredSU to SuccSU
> without creating
> +  /// a cycle.
> +  bool canAddEdge(SUnit *SuccSU, SUnit *PredSU);
> +
>    /// \brief Add a DAG edge to the given SU with the given
>    predecessor
>    /// dependence data.
>    ///
> 
> Modified: llvm/trunk/include/llvm/CodeGen/ScheduleDAG.h
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/ScheduleDAG.h?rev=180193&r1=180192&r2=180193&view=diff
> ==============================================================================
> --- llvm/trunk/include/llvm/CodeGen/ScheduleDAG.h (original)
> +++ llvm/trunk/include/llvm/CodeGen/ScheduleDAG.h Wed Apr 24 10:54:43
> 2013
> @@ -727,9 +727,8 @@ namespace llvm {
>      /// IsReachable - Checks if SU is reachable from TargetSU.
>      bool IsReachable(const SUnit *SU, const SUnit *TargetSU);
>  
> -    /// WillCreateCycle - Returns true if adding an edge from SU to
> TargetSU
> -    /// will create a cycle.
> -    bool WillCreateCycle(SUnit *SU, SUnit *TargetSU);
> +    /// WillCreateCycle - Return true if addPred(TargetSU, SU)
> creates a cycle.
> +    bool WillCreateCycle(SUnit *TargetSU, SUnit *SU);
>  
>      /// AddPred - Updates the topological ordering to accommodate an
>      edge
>      /// to be added from SUnit X to SUnit Y.
> 
> Modified: llvm/trunk/include/llvm/CodeGen/ScheduleDAGInstrs.h
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/ScheduleDAGInstrs.h?rev=180193&r1=180192&r2=180193&view=diff
> ==============================================================================
> --- llvm/trunk/include/llvm/CodeGen/ScheduleDAGInstrs.h (original)
> +++ llvm/trunk/include/llvm/CodeGen/ScheduleDAGInstrs.h Wed Apr 24
> 10:54:43 2013
> @@ -150,6 +150,9 @@ namespace llvm {
>  
>      virtual ~ScheduleDAGInstrs() {}
>  
> +    /// \brief Expose LiveIntervals for use in DAG mutators and
> such.
> +    LiveIntervals *getLIS() const { return LIS; }
> +
>      /// \brief Get the machine model for instruction scheduling.
>      const TargetSchedModel *getSchedModel() const { return
>      &SchedModel; }
>  
> 
> Modified: llvm/trunk/lib/CodeGen/MachineScheduler.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineScheduler.cpp?rev=180193&r1=180192&r2=180193&view=diff
> ==============================================================================
> --- llvm/trunk/lib/CodeGen/MachineScheduler.cpp (original)
> +++ llvm/trunk/lib/CodeGen/MachineScheduler.cpp Wed Apr 24 10:54:43
> 2013
> @@ -51,7 +51,11 @@ static cl::opt<unsigned> MISchedCutoff("
>  static bool ViewMISchedDAGs = false;
>  #endif // NDEBUG
>  
> -// Experimental heuristics
> +// FIXME: remove this flag after initial testing. It should always
> be a good
> +// thing.
> +static cl::opt<bool> EnableCopyConstrain("misched-vcopy",
> cl::Hidden,
> +    cl::desc("Constrain vreg copies."), cl::init(true));
> +
>  static cl::opt<bool> EnableLoadCluster("misched-cluster",
>  cl::Hidden,
>    cl::desc("Enable load clustering."), cl::init(true));
>  
> @@ -323,6 +327,10 @@ ScheduleDAGMI::~ScheduleDAGMI() {
>    delete SchedImpl;
>  }
>  
> +bool ScheduleDAGMI::canAddEdge(SUnit *SuccSU, SUnit *PredSU) {
> +  return SuccSU == &ExitSU || !Topo.IsReachable(PredSU, SuccSU);
> +}
> +
>  bool ScheduleDAGMI::addEdge(SUnit *SuccSU, const SDep &PredDep) {
>    if (SuccSU != &ExitSU) {
>      // Do not use WillCreateCycle, it assumes SD scheduling.
> @@ -915,6 +923,180 @@ void MacroFusion::apply(ScheduleDAGMI *D
>  }
>  
>  //===----------------------------------------------------------------------===//
> +// CopyConstrain - DAG post-processing to encourage copy
> elimination.
> +//===----------------------------------------------------------------------===//
> +
> +namespace {
> +/// \brief Post-process the DAG to create weak edges from all uses
> of a copy to
> +/// the one use that defines the copy's source vreg, most likely an
> induction
> +/// variable increment.
> +class CopyConstrain : public ScheduleDAGMutation {
> +  // Transient state.
> +  SlotIndex RegionBeginIdx;
> +  SlotIndex RegionEndIdx;
> +public:
> +  CopyConstrain(const TargetInstrInfo *, const TargetRegisterInfo *)
> {}
> +
> +  virtual void apply(ScheduleDAGMI *DAG);
> +
> +protected:
> +  void constrainLocalCopy(SUnit *CopySU, ScheduleDAGMI *DAG);
> +};
> +} // anonymous
> +
> +/// constrainLocalCopy handles two possibilities:
> +/// 1) Local src:
> +/// I0:     = dst
> +/// I1: src = ...
> +/// I2:     = dst
> +/// I3: dst = src (copy)
> +/// (create pred->succ edges I0->I1, I2->I1)
> +///
> +/// 2) Local copy:
> +/// I0: dst = src (copy)
> +/// I1:     = dst
> +/// I2: src = ...
> +/// I3:     = dst
> +/// (create pred->succ edges I1->I2, I3->I2)
> +///
> +/// Although the MachineScheduler is currently constrained to single
> blocks,
> +/// this algorithm should handle extended blocks. An EBB is a set of
> +/// contiguously numbered blocks such that the previous block in the
> EBB is
> +/// always the single predecessor.
> +void CopyConstrain::constrainLocalCopy(SUnit *CopySU, ScheduleDAGMI
> *DAG) {
> +  LiveIntervals *LIS = DAG->getLIS();
> +  MachineInstr *Copy = CopySU->getInstr();
> +
> +  // Check for pure vreg copies.
> +  unsigned SrcReg = Copy->getOperand(1).getReg();
> +  if (!TargetRegisterInfo::isVirtualRegister(SrcReg))
> +    return;
> +
> +  unsigned DstReg = Copy->getOperand(0).getReg();
> +  if (!TargetRegisterInfo::isVirtualRegister(DstReg))
> +    return;
> +
> +  // Check if either the dest or source is local. If it's live
> across a back
> +  // edge, it's not local. Note that if both vregs are live across
> the back
> +  // edge, we cannot successfully contrain the copy without cyclic
> scheduling.
> +  unsigned LocalReg = DstReg;
> +  unsigned GlobalReg = SrcReg;
> +  LiveInterval *LocalLI = &LIS->getInterval(LocalReg);
> +  if (!LocalLI->isLocal(RegionBeginIdx, RegionEndIdx)) {
> +    LocalReg = SrcReg;
> +    GlobalReg = DstReg;
> +    LocalLI = &LIS->getInterval(LocalReg);
> +    if (!LocalLI->isLocal(RegionBeginIdx, RegionEndIdx))
> +      return;
> +  }
> +  LiveInterval *GlobalLI = &LIS->getInterval(GlobalReg);
> +
> +  // Find the global segment after the start of the local LI.
> +  LiveInterval::iterator GlobalSegment =
> GlobalLI->find(LocalLI->beginIndex());
> +  // If GlobalLI does not overlap LocalLI->start, then a copy
> directly feeds a
> +  // local live range. We could create edges from other global uses
> to the local
> +  // start, but the coalescer should have already eliminated these
> cases, so
> +  // don't bother dealing with it.
> +  if (GlobalSegment == GlobalLI->end())
> +    return;
> +
> +  // If GlobalSegment is killed at the LocalLI->start, the call to
> find()
> +  // returned the next global segment. But if GlobalSegment overlaps
> with
> +  // LocalLI->start, then advance to the next segement. If a hole in
> GlobalLI
> +  // exists in LocalLI's vicinity, GlobalSegment will be the end of
> the hole.
> +  if (GlobalSegment->contains(LocalLI->beginIndex()))
> +    ++GlobalSegment;
> +
> +  if (GlobalSegment == GlobalLI->end())
> +    return;
> +
> +  // Check if GlobalLI contains a hole in the vicinity of LocalLI.
> +  if (GlobalSegment != GlobalLI->begin()) {
> +    // Two address defs have no hole.
> +    if (SlotIndex::isSameInstr(llvm::prior(GlobalSegment)->end,
> +                               GlobalSegment->start)) {
> +      return;
> +    }
> +    // If GlobalLI has a prior segment, it must be live into the
> EBB. Otherwise
> +    // it would be a disconnected component in the live range.
> +    assert(llvm::prior(GlobalSegment)->start < LocalLI->beginIndex()
> &&
> +           "Disconnected LRG within the scheduling region.");
> +  }
> +  MachineInstr *GlobalDef =
> LIS->getInstructionFromIndex(GlobalSegment->start);
> +  if (!GlobalDef)
> +    return;
> +
> +  SUnit *GlobalSU = DAG->getSUnit(GlobalDef);
> +  if (!GlobalSU)
> +    return;
> +
> +  // GlobalDef is the bottom of the GlobalLI hole. Open the hole by
> +  // constraining the uses of the last local def to precede
> GlobalDef.
> +  SmallVector<SUnit*,8> LocalUses;
> +  const VNInfo *LastLocalVN =
> LocalLI->getVNInfoBefore(LocalLI->endIndex());
> +  MachineInstr *LastLocalDef =
> LIS->getInstructionFromIndex(LastLocalVN->def);
> +  SUnit *LastLocalSU = DAG->getSUnit(LastLocalDef);
> +  for (SUnit::const_succ_iterator
> +         I = LastLocalSU->Succs.begin(), E =
> LastLocalSU->Succs.end();
> +       I != E; ++I) {
> +    if (I->getKind() != SDep::Data || I->getReg() != LocalReg)
> +      continue;
> +    if (I->getSUnit() == GlobalSU)
> +      continue;
> +    if (!DAG->canAddEdge(GlobalSU, I->getSUnit()))
> +      return;
> +    LocalUses.push_back(I->getSUnit());
> +  }
> +  // Open the top of the GlobalLI hole by constraining any earlier
> global uses
> +  // to precede the start of LocalLI.
> +  SmallVector<SUnit*,8> GlobalUses;
> +  MachineInstr *FirstLocalDef =
> +    LIS->getInstructionFromIndex(LocalLI->beginIndex());
> +  SUnit *FirstLocalSU = DAG->getSUnit(FirstLocalDef);
> +  for (SUnit::const_pred_iterator
> +         I = GlobalSU->Preds.begin(), E = GlobalSU->Preds.end(); I
> != E; ++I) {
> +    if (I->getKind() != SDep::Anti || I->getReg() != GlobalReg)
> +      continue;
> +    if (I->getSUnit() == FirstLocalSU)
> +      continue;
> +    if (!DAG->canAddEdge(FirstLocalSU, I->getSUnit()))
> +      return;
> +    GlobalUses.push_back(I->getSUnit());
> +  }
> +  DEBUG(dbgs() << "Constraining copy SU(" << CopySU->NodeNum <<
> ")\n");
> +  // Add the weak edges.
> +  for (SmallVectorImpl<SUnit*>::const_iterator
> +         I = LocalUses.begin(), E = LocalUses.end(); I != E; ++I) {
> +    DEBUG(dbgs() << "  Local use SU(" << (*I)->NodeNum << ") -> SU("
> +          << GlobalSU->NodeNum << ")\n");
> +    DAG->addEdge(GlobalSU, SDep(*I, SDep::Weak));
> +  }
> +  for (SmallVectorImpl<SUnit*>::const_iterator
> +         I = GlobalUses.begin(), E = GlobalUses.end(); I != E; ++I)
> {
> +    DEBUG(dbgs() << "  Global use SU(" << (*I)->NodeNum << ") ->
> SU("
> +          << FirstLocalSU->NodeNum << ")\n");
> +    DAG->addEdge(FirstLocalSU, SDep(*I, SDep::Weak));
> +  }
> +}
> +
> +/// \brief Callback from DAG postProcessing to create weak edges to
> encourage
> +/// copy elimination.
> +void CopyConstrain::apply(ScheduleDAGMI *DAG) {
> +  RegionBeginIdx = DAG->getLIS()->getInstructionIndex(
> +    &*nextIfDebug(DAG->begin(), DAG->end()));
> +  RegionEndIdx = DAG->getLIS()->getInstructionIndex(
> +    &*priorNonDebug(DAG->end(), DAG->begin()));
> +
> +  for (unsigned Idx = 0, End = DAG->SUnits.size(); Idx != End;
> ++Idx) {
> +    SUnit *SU = &DAG->SUnits[Idx];
> +    if (!SU->getInstr()->isCopy())
> +      continue;
> +
> +    constrainLocalCopy(SU, DAG);
> +  }
> +}
> +
> +//===----------------------------------------------------------------------===//
>  // ConvergingScheduler - Implementation of the standard
>  MachineSchedStrategy.
>  //===----------------------------------------------------------------------===//
>  
> @@ -926,7 +1108,7 @@ public:
>    /// Represent the type of SchedCandidate found within a single
>    queue.
>    /// pickNodeBidirectional depends on these listed by decreasing
>    priority.
>    enum CandReason {
> -    NoCand, PhysRegCopy, SingleExcess, SingleCritical, Cluster,
> +    NoCand, PhysRegCopy, SingleExcess, SingleCritical, Cluster,
> Weak,
>      ResourceReduce, ResourceDemand, BotHeightReduce, BotPathReduce,
>      TopDepthReduce, TopPathReduce, SingleMax, MultiPressure,
>      NextDefUse,
>      NodeOrder};
> @@ -1802,13 +1984,11 @@ void ConvergingScheduler::tryCandidate(S
>    if (tryGreater(TryCand.SU == NextClusterSU, Cand.SU ==
>    NextClusterSU,
>                   TryCand, Cand, Cluster))
>      return;
> -  // Currently, weak edges are for clustering, so we hard-code that
> reason.
> -  // However, deferring the current TryCand will not change Cand's
> reason.
> -  CandReason OrigReason = Cand.Reason;
> +
> +  // Weak edges are for clustering and other constraints.
>    if (tryLess(getWeakLeft(TryCand.SU, Zone.isTop()),
>                getWeakLeft(Cand.SU, Zone.isTop()),
> -              TryCand, Cand, Cluster)) {
> -    Cand.Reason = OrigReason;
> +              TryCand, Cand, Weak)) {
>      return;
>    }
>    // Avoid critical resource consumption and balance the schedule.
> @@ -1908,6 +2088,7 @@ const char *ConvergingScheduler::getReas
>    case SingleExcess:   return "REG-EXCESS";
>    case SingleCritical: return "REG-CRIT  ";
>    case Cluster:        return "CLUSTER   ";
> +  case Weak:           return "WEAK      ";
>    case SingleMax:      return "REG-MAX   ";
>    case MultiPressure:  return "REG-MULTI ";
>    case ResourceReduce: return "RES-REDUCE";
> @@ -2177,6 +2358,12 @@ static ScheduleDAGInstrs *createConvergi
>           "-misched-topdown incompatible with -misched-bottomup");
>    ScheduleDAGMI *DAG = new ScheduleDAGMI(C, new
>    ConvergingScheduler());
>    // Register DAG post-processors.
> +  //
> +  // FIXME: extend the mutation API to allow earlier mutations to
> instantiate
> +  // data and pass it to later mutations. Have a single mutation
> that gathers
> +  // the interesting nodes in one pass.
> +  if (EnableCopyConstrain)
> +    DAG->addMutation(new CopyConstrain(DAG->TII, DAG->TRI));
>    if (EnableLoadCluster)
>      DAG->addMutation(new LoadClusterMutation(DAG->TII, DAG->TRI));
>    if (EnableMacroFusion)
> 
> Added: llvm/trunk/test/CodeGen/ARM/misched-copy-arm.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/misched-copy-arm.ll?rev=180193&view=auto
> ==============================================================================
> --- llvm/trunk/test/CodeGen/ARM/misched-copy-arm.ll (added)
> +++ llvm/trunk/test/CodeGen/ARM/misched-copy-arm.ll Wed Apr 24
> 10:54:43 2013
> @@ -0,0 +1,30 @@
> +; REQUIRES: asserts
> +; RUN: llc < %s -march=thumb -mcpu=swift -pre-RA-sched=source
> -enable-misched -verify-misched -debug-only=misched -o - 2>&1 >
> /dev/null | FileCheck %s
> +;
> +; Loop counter copies should be eliminated.
> +; There is also a MUL here, but we don't care where it is scheduled.
> +; CHECK: postinc
> +; CHECK: *** Final schedule for BB#2 ***
> +; CHECK: t2LDRs
> +; CHECK: t2ADDrr
> +; CHECK: t2CMPrr
> +; CHECK: COPY

Why can't you disable post-ra scheduling and check the actual output assembly?

 -Hal

> +define i32 @postinc(i32 %a, i32* nocapture %d, i32 %s) nounwind {
> +entry:
> +  %cmp4 = icmp eq i32 %a, 0
> +  br i1 %cmp4, label %for.end, label %for.body
> +
> +for.body:                                         ; preds = %entry,
> %for.body
> +  %indvars.iv = phi i32 [ %indvars.iv.next, %for.body ], [ 0, %entry
> ]
> +  %s.05 = phi i32 [ %mul, %for.body ], [ 0, %entry ]
> +  %indvars.iv.next = add i32 %indvars.iv, %s
> +  %arrayidx = getelementptr inbounds i32* %d, i32 %indvars.iv
> +  %0 = load i32* %arrayidx, align 4
> +  %mul = mul nsw i32 %0, %s.05
> +  %exitcond = icmp eq i32 %indvars.iv.next, %a
> +  br i1 %exitcond, label %for.end, label %for.body
> +
> +for.end:                                          ; preds =
> %for.body, %entry
> +  %s.0.lcssa = phi i32 [ 0, %entry ], [ %mul, %for.body ]
> +  ret i32 %s.0.lcssa
> +}
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 



More information about the llvm-commits mailing list