[llvm] r176498 - R600: initial scheduler code

Tue Mar 5 15:59:31 PST 2013

On Mar 5, 2013, at 3:12 PM, Vincent Lejeune <vljn at ovi.com> wrote:

> Hi Andy,
> 
> Thank for your review.
> I'd like to put some feature into this scheduler before using some unit tests.
> Currently ScheduleDAGInstr enforces dependencies between instructions writing/reading to a reg although not to the same subreg.
> Our arch does not have constrains on accessing different subreg of a 128 bits regs thus schedule pass has to remove such deps.
> In a previous patch this was done in the initialize function but I think this would be better if it was directly supported by ScheduleDAGInstr using a target hook.

Yes. I was deferring this partially because of the problems you're now seeing in handleMove. I was waiting until LiveIntervals understood subregisters, which would result in a cleaner, more robust implementation.

> I'm working on this atm, currently I'm experimenting things in R600MachineRegister.cpp following ScheduleDAGInstr algorithm :
> http://cgit.freedesktop.org/~vlj/llvm/commit/?h=codesize4&id=62a4109bd126efc1f267843402b624b7729c79f7
> This is WIP, I'm evaluating different idea (using a VReg2SUnit structure that would store several SUnit*, one for each accessed subreg, or
> having a SubVReg2SUnit structure that is similar to the current VReg2SUnit but also store subreg)
> 
> Changing ScheduleDAG also means changing schedule pass behavior and I don't know how to properly catch this in a test case ; I have (few) tests but
> they check the expected result of the complete schedule strategy (ie without deps between subreg).

You should do this in postprocessDAG. You can use ScheduleDAGMI::addMutation() without even defining your own ScheduleDAG subclass. Note that findRootsAndBiasEdges is supposed to be called before SchedImpl->initialized, so that edges are biased before potentially running some DAG analysis for some targets.

This approach is a good one for now. It will only affect your target. It's just more expensive in terms of compile time than doing it during initial DAG building. Hopefully that isn't a serious problem. Eventually the shared DAG builder will support this in a clean way.

-Andy

> ----- Mail original -----
>> De : Andrew Trick <atrick at apple.com>
>> À : Vincent Lejeune <vljn at ovi.com>
>> Cc : llvm-commits at cs.uiuc.edu
>> Envoyé le : Mardi 5 mars 2013 20h29
>> Objet : Re: [llvm] r176498 - R600: initial scheduler code
>> 
>> 
>> On Mar 5, 2013, at 10:41 AM, Vincent Lejeune <vljn at ovi.com> wrote:
>> 
>>> Author: vljn
>>> Date: Tue Mar  5 12:41:32 2013
>>> New Revision: 176498
>>> 
>>> URL: http://llvm.org/viewvc/llvm-project?rev=176498&view=rev
>>> Log:
>>> R600: initial scheduler code
>>> 
>>> This is a skeleton for a pre-RA MachineInstr scheduler strategy. Currently
>>> it only tries to expose more parallelism for ALU instructions (this also
>>> makes the distribution of GPR channels more uniform and increases the
>>> chances of ALU instructions to be packed together in a single VLIW group).
>>> Also it tries to reduce clause switching by grouping instruction of the
>>> same kind (ALU/FETCH/CF) together.
>>> 
>>> Vincent Lejeune:
>>> - Support for VLIW4 Slot assignement
>>> - Recomputation of ScheduleDAG to get more parallelism opportunities
>>> 
>>> Tom Stellard:
>>> - Fix assertion failure when trying to determine an instruction's slot
>>>    based on its destination register's class
>>> - Fix some compiler warnings
>>> 
>>> Vincent Lejeune: [v2]
>>> - Remove recomputation of ScheduleDAG (will be provided in a later patch)
>>> - Improve estimation of an ALU clause size so that heuristic does not emit 
>> cf
>>> instructions at the wrong position.
>>> - Make schedule heuristic smarter using SUnit Depth
>>> - Take constant read limitations into account
>>> 
>>> Vincent Lejeune: [v3]
>>> - Fix some uninitialized values in ConstPair
>>> - Add asserts to ensure an ALU slot is always populated
>>> 
>>> Added:
>>>     llvm/trunk/lib/Target/R600/R600MachineScheduler.cpp
>>>     llvm/trunk/lib/Target/R600/R600MachineScheduler.h
>>> Modified:
>>>     llvm/trunk/lib/Target/R600/AMDGPUTargetMachine.cpp
>>> 
>>> Modified: llvm/trunk/lib/Target/R600/AMDGPUTargetMachine.cpp
>>> URL: 
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/R600/AMDGPUTargetMachine.cpp?rev=176498&r1=176497&r2=176498&view=diff
>>> 
>> ==============================================================================
>>> --- llvm/trunk/lib/Target/R600/AMDGPUTargetMachine.cpp (original)
>>> +++ llvm/trunk/lib/Target/R600/AMDGPUTargetMachine.cpp Tue Mar  5 12:41:32 
>> 2013
>>> @@ -17,6 +17,7 @@
>>> #include "AMDGPU.h"
>>> #include "R600ISelLowering.h"
>>> #include "R600InstrInfo.h"
>>> +#include "R600MachineScheduler.h"
>>> #include "SIISelLowering.h"
>>> #include "SIInstrInfo.h"
>>> #include "llvm/Analysis/Passes.h"
>>> @@ -39,6 +40,14 @@ extern "C" void LLVMInitializeR600Target
>>>    RegisterTargetMachine<AMDGPUTargetMachine> X(TheAMDGPUTarget);
>>> }
>>> 
>>> +static ScheduleDAGInstrs *createR600MachineScheduler(MachineSchedContext 
>> *C) {
>>> +  return new ScheduleDAGMI(C, new R600SchedStrategy());
>>> +}
>>> +
>>> +static MachineSchedRegistry
>>> +SchedCustomRegistry("r600", "Run R600's custom 
>> scheduler",
>>> +                    createR600MachineScheduler);
>>> +
>>> AMDGPUTargetMachine::AMDGPUTargetMachine(const Target &T, StringRef TT,
>>>      StringRef CPU, StringRef FS,
>>>    TargetOptions Options,
>>> @@ -70,7 +79,13 @@ namespace {
>>> class AMDGPUPassConfig : public TargetPassConfig {
>>> public:
>>>    AMDGPUPassConfig(AMDGPUTargetMachine *TM, PassManagerBase &PM)
>>> -    : TargetPassConfig(TM, PM) {}
>>> +    : TargetPassConfig(TM, PM) {
>>> +    const AMDGPUSubtarget &ST = 
>> TM->getSubtarget<AMDGPUSubtarget>();
>>> +    if (ST.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX) {
>>> +      enablePass(&MachineSchedulerID);
>>> +      MachineSchedRegistry::setDefault(createR600MachineScheduler);
>>> +    }
>>> +  }
>>> 
>>>    AMDGPUTargetMachine &getAMDGPUTargetMachine() const {
>>>      return getTM<AMDGPUTargetMachine>();
>>> 
>>> Added: llvm/trunk/lib/Target/R600/R600MachineScheduler.cpp
>>> URL: 
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/R600/R600MachineScheduler.cpp?rev=176498&view=auto
>>> 
>> ==============================================================================
>>> --- llvm/trunk/lib/Target/R600/R600MachineScheduler.cpp (added)
>>> +++ llvm/trunk/lib/Target/R600/R600MachineScheduler.cpp Tue Mar  5 12:41:32 
>> 2013
>>> @@ -0,0 +1,487 @@
>>> +//===-- R600MachineScheduler.cpp - R600 Scheduler Interface -*- C++ 
>> -*-----===//
>>> +//
>>> +//                     The LLVM Compiler Infrastructure
>>> +//
>>> +// This file is distributed under the University of Illinois Open Source
>>> +// License. See LICENSE.TXT for details.
>>> +//
>>> 
>> +//===----------------------------------------------------------------------===//
>>> +//
>>> +/// \file
>>> +/// \brief R600 Machine Scheduler interface
>>> +// TODO: Scheduling is optimised for VLIW4 arch, modify it to support 
>> TRANS slot
>>> +//
>>> 
>> +//===----------------------------------------------------------------------===//
>>> +
>>> +#define DEBUG_TYPE "misched"
>>> +
>>> +#include "R600MachineScheduler.h"
>>> +#include "llvm/CodeGen/MachineRegisterInfo.h"
>>> +#include "llvm/CodeGen/LiveIntervalAnalysis.h"
>>> +#include "llvm/Pass.h"
>>> +#include "llvm/PassManager.h"
>>> +#include <set>
>>> +#include <iostream>
>>> +using namespace llvm;
>>> +
>>> +void R600SchedStrategy::initialize(ScheduleDAGMI *dag) {
>>> +
>>> +  DAG = dag;
>>> +  TII = static_cast<const R600InstrInfo*>(DAG->TII);
>>> +  TRI = static_cast<const R600RegisterInfo*>(DAG->TRI);
>>> +  MRI = &DAG->MRI;
>>> +  Available[IDAlu]->clear();
>>> +  Available[IDFetch]->clear();
>>> +  Available[IDOther]->clear();
>>> +  CurInstKind = IDOther;
>>> +  CurEmitted = 0;
>>> +  OccupedSlotsMask = 15;
>>> +  memset(InstructionsGroupCandidate, 0, 
>> sizeof(InstructionsGroupCandidate));
>>> +  InstKindLimit[IDAlu] = 120; // 120 minus 8 for security
>>> +
>>> +
>>> +  const AMDGPUSubtarget &ST = 
>> DAG->TM.getSubtarget<AMDGPUSubtarget>();
>>> +  if (ST.device()->getGeneration() <= AMDGPUDeviceInfo::HD5XXX) {
>>> +    InstKindLimit[IDFetch] = 7; // 8 minus 1 for security
>>> +  } else {
>>> +    InstKindLimit[IDFetch] = 15; // 16 minus 1 for security
>>> +  }
>>> +}
>>> +
>>> +void R600SchedStrategy::MoveUnits(ReadyQueue *QSrc, ReadyQueue *QDst)
>>> +{
>>> +  if (QSrc->empty())
>>> +    return;
>>> +  for (ReadyQueue::iterator I = QSrc->begin(),
>>> +      E = QSrc->end(); I != E; ++I) {
>>> +    (*I)->NodeQueueId &= ~QSrc->getID();
>>> +    QDst->push(*I);
>>> +  }
>>> +  QSrc->clear();
>>> +}
>>> +
>>> +SUnit* R600SchedStrategy::pickNode(bool &IsTopNode) {
>>> +  SUnit *SU = 0;
>>> +  IsTopNode = true;
>>> +  NextInstKind = IDOther;
>>> +
>>> +  // check if we might want to switch current clause type
>>> +  bool AllowSwitchToAlu = (CurInstKind == IDOther) ||
>>> +      (CurEmitted > InstKindLimit[CurInstKind]) ||
>>> +      (Available[CurInstKind]->empty());
>>> +  bool AllowSwitchFromAlu = (CurEmitted > InstKindLimit[CurInstKind]) 
>> &&
>>> +      (!Available[IDFetch]->empty() || 
>> !Available[IDOther]->empty());
>>> +
>>> +  if ((AllowSwitchToAlu && CurInstKind != IDAlu) ||
>>> +      (!AllowSwitchFromAlu && CurInstKind == IDAlu)) {
>>> +    // try to pick ALU
>>> +    SU = pickAlu();
>>> +    if (SU) {
>>> +      if (CurEmitted >  InstKindLimit[IDAlu])
>>> +        CurEmitted = 0;
>>> +      NextInstKind = IDAlu;
>>> +    }
>>> +  }
>>> +
>>> +  if (!SU) {
>>> +    // try to pick FETCH
>>> +    SU = pickOther(IDFetch);
>>> +    if (SU)
>>> +      NextInstKind = IDFetch;
>>> +  }
>>> +
>>> +  // try to pick other
>>> +  if (!SU) {
>>> +    SU = pickOther(IDOther);
>>> +    if (SU)
>>> +      NextInstKind = IDOther;
>>> +  }
>>> +
>>> +  DEBUG(
>>> +      if (SU) {
>>> +        dbgs() << "picked node: ";
>>> +        SU->dump(DAG);
>>> +      } else {
>>> +        dbgs() << "NO NODE ";
>>> +        for (int i = 0; i < IDLast; ++i) {
>>> +          Available[i]->dump();
>>> +          Pending[i]->dump();
>>> +        }
>>> +        for (unsigned i = 0; i < DAG->SUnits.size(); i++) {
>>> +          const SUnit &S = DAG->SUnits[i];
>>> +          if (!S.isScheduled)
>>> +            S.dump(DAG);
>>> +        }
>>> +      }
>>> +  );
>>> +
>>> +  return SU;
>>> +}
>>> +
>>> +void R600SchedStrategy::schedNode(SUnit *SU, bool IsTopNode) {
>>> +
>>> +  DEBUG(dbgs() << "scheduled: ");
>>> +  DEBUG(SU->dump(DAG));
>>> +
>>> +  if (NextInstKind != CurInstKind) {
>>> +    DEBUG(dbgs() << "Instruction Type Switch\n");
>>> +    if (NextInstKind != IDAlu)
>>> +      OccupedSlotsMask = 15;
>>> +    CurEmitted = 0;
>>> +    CurInstKind = NextInstKind;
>>> +  }
>>> +
>>> +  if (CurInstKind == IDAlu) {
>>> +    switch (getAluKind(SU)) {
>>> +    case AluT_XYZW:
>>> +      CurEmitted += 4;
>>> +      break;
>>> +    case AluDiscarded:
>>> +      break;
>>> +    default: {
>>> +      ++CurEmitted;
>>> +      for (MachineInstr::mop_iterator It = 
>> SU->getInstr()->operands_begin(),
>>> +          E = SU->getInstr()->operands_end(); It != E; ++It) {
>>> +        MachineOperand &MO = *It;
>>> +        if (MO.isReg() && MO.getReg() == AMDGPU::ALU_LITERAL_X)
>>> +          ++CurEmitted;
>>> +      }
>>> +    }
>>> +    }
>>> +  } else {
>>> +    ++CurEmitted;
>>> +  }
>>> +
>>> +
>>> +  DEBUG(dbgs() << CurEmitted << " Instructions Emitted in 
>> this clause\n");
>>> +
>>> +  if (CurInstKind != IDFetch) {
>>> +    MoveUnits(Pending[IDFetch], Available[IDFetch]);
>>> +  }
>>> +  MoveUnits(Pending[IDOther], Available[IDOther]);
>>> +}
>>> +
>>> +void R600SchedStrategy::releaseTopNode(SUnit *SU) {
>>> +  int IK = getInstKind(SU);
>>> +
>>> +  DEBUG(dbgs() << IK << " <= ");
>>> +  DEBUG(SU->dump(DAG));
>>> +
>>> +  Pending[IK]->push(SU);
>>> +}
>>> +
>>> +void R600SchedStrategy::releaseBottomNode(SUnit *SU) {
>>> +}
>>> +
>>> +bool R600SchedStrategy::regBelongsToClass(unsigned Reg,
>>> +                                          const TargetRegisterClass *RC) 
>> const {
>>> +  if (!TargetRegisterInfo::isVirtualRegister(Reg)) {
>>> +    return RC->contains(Reg);
>>> +  } else {
>>> +    return MRI->getRegClass(Reg) == RC;
>>> +  }
>>> +}
>>> +
>>> +R600SchedStrategy::AluKind R600SchedStrategy::getAluKind(SUnit *SU) const 
>> {
>>> +  MachineInstr *MI = SU->getInstr();
>>> +
>>> +    switch (MI->getOpcode()) {
>>> +    case AMDGPU::INTERP_PAIR_XY:
>>> +    case AMDGPU::INTERP_PAIR_ZW:
>>> +    case AMDGPU::INTERP_VEC_LOAD:
>>> +      return AluT_XYZW;
>>> +    case AMDGPU::COPY:
>>> +      if 
>> (TargetRegisterInfo::isPhysicalRegister(MI->getOperand(1).getReg())) {
>>> +        // %vregX = COPY Tn_X is likely to be discarded in favor of an
>>> +        // assignement of Tn_X to %vregX, don't considers it in 
>> scheduling
>>> +        return AluDiscarded;
>>> +      }
>>> +      else if (MI->getOperand(1).isUndef()) {
>>> +        // MI will become a KILL, don't considers it in scheduling
>>> +        return AluDiscarded;
>>> +      }
>>> +    default:
>>> +      break;
>>> +    }
>>> +
>>> +    // Does the instruction take a whole IG ?
>>> +    if(TII->isVector(*MI) ||
>>> +        TII->isCubeOp(MI->getOpcode()) ||
>>> +        TII->isReductionOp(MI->getOpcode()))
>>> +      return AluT_XYZW;
>>> +
>>> +    // Is the result already assigned to a channel ?
>>> +    unsigned DestSubReg = MI->getOperand(0).getSubReg();
>>> +    switch (DestSubReg) {
>>> +    case AMDGPU::sub0:
>>> +      return AluT_X;
>>> +    case AMDGPU::sub1:
>>> +      return AluT_Y;
>>> +    case AMDGPU::sub2:
>>> +      return AluT_Z;
>>> +    case AMDGPU::sub3:
>>> +      return AluT_W;
>>> +    default:
>>> +      break;
>>> +    }
>>> +
>>> +    // Is the result already member of a X/Y/Z/W class ?
>>> +    unsigned DestReg = MI->getOperand(0).getReg();
>>> +    if (regBelongsToClass(DestReg, &AMDGPU::R600_TReg32_XRegClass) ||
>>> +        regBelongsToClass(DestReg, &AMDGPU::R600_AddrRegClass))
>>> +      return AluT_X;
>>> +    if (regBelongsToClass(DestReg, &AMDGPU::R600_TReg32_YRegClass))
>>> +      return AluT_Y;
>>> +    if (regBelongsToClass(DestReg, &AMDGPU::R600_TReg32_ZRegClass))
>>> +      return AluT_Z;
>>> +    if (regBelongsToClass(DestReg, &AMDGPU::R600_TReg32_WRegClass))
>>> +      return AluT_W;
>>> +    if (regBelongsToClass(DestReg, &AMDGPU::R600_Reg128RegClass))
>>> +      return AluT_XYZW;
>>> +
>>> +    return AluAny;
>>> +
>>> +}
>>> +
>>> +int R600SchedStrategy::getInstKind(SUnit* SU) {
>>> +  int Opcode = SU->getInstr()->getOpcode();
>>> +
>>> +  if (TII->isALUInstr(Opcode)) {
>>> +    return IDAlu;
>>> +  }
>>> +
>>> +  switch (Opcode) {
>>> +  case AMDGPU::COPY:
>>> +  case AMDGPU::CONST_COPY:
>>> +  case AMDGPU::INTERP_PAIR_XY:
>>> +  case AMDGPU::INTERP_PAIR_ZW:
>>> +  case AMDGPU::INTERP_VEC_LOAD:
>>> +  case AMDGPU::DOT4_eg_pseudo:
>>> +  case AMDGPU::DOT4_r600_pseudo:
>>> +    return IDAlu;
>>> +  case AMDGPU::TEX_VTX_CONSTBUF:
>>> +  case AMDGPU::TEX_VTX_TEXBUF:
>>> +  case AMDGPU::TEX_LD:
>>> +  case AMDGPU::TEX_GET_TEXTURE_RESINFO:
>>> +  case AMDGPU::TEX_GET_GRADIENTS_H:
>>> +  case AMDGPU::TEX_GET_GRADIENTS_V:
>>> +  case AMDGPU::TEX_SET_GRADIENTS_H:
>>> +  case AMDGPU::TEX_SET_GRADIENTS_V:
>>> +  case AMDGPU::TEX_SAMPLE:
>>> +  case AMDGPU::TEX_SAMPLE_C:
>>> +  case AMDGPU::TEX_SAMPLE_L:
>>> +  case AMDGPU::TEX_SAMPLE_C_L:
>>> +  case AMDGPU::TEX_SAMPLE_LB:
>>> +  case AMDGPU::TEX_SAMPLE_C_LB:
>>> +  case AMDGPU::TEX_SAMPLE_G:
>>> +  case AMDGPU::TEX_SAMPLE_C_G:
>>> +  case AMDGPU::TXD:
>>> +  case AMDGPU::TXD_SHADOW:
>>> +    return IDFetch;
>>> +  default:
>>> +    DEBUG(
>>> +        dbgs() << "other inst: ";
>>> +        SU->dump(DAG);
>>> +    );
>>> +    return IDOther;
>>> +  }
>>> +}
>>> +
>>> +class ConstPairs {
>>> +private:
>>> +  unsigned XYPair;
>>> +  unsigned ZWPair;
>>> +public:
>>> +  ConstPairs(unsigned ReadConst[3]) : XYPair(0), ZWPair(0) {
>>> +    for (unsigned i = 0; i < 3; i++) {
>>> +      unsigned ReadConstChan = ReadConst[i] & 3;
>>> +      unsigned ReadConstIndex = ReadConst[i] & (~3);
>>> +      if (ReadConstChan < 2) {
>>> +        if (!XYPair) {
>>> +          XYPair = ReadConstIndex;
>>> +        }
>>> +      } else {
>>> +        if (!ZWPair) {
>>> +          ZWPair = ReadConstIndex;
>>> +        }
>>> +      }
>>> +    }
>>> +  }
>>> +
>>> +  bool isCompatibleWith(const ConstPairs& CP) const {
>>> +    return (!XYPair || !CP.XYPair || CP.XYPair == XYPair) &&
>>> +        (!ZWPair || !CP.ZWPair || CP.ZWPair == ZWPair);
>>> +  }
>>> +};
>>> +
>>> +static
>>> +const ConstPairs getPairs(const R600InstrInfo *TII, const 
>> MachineInstr& MI) {
>>> +  unsigned ReadConsts[3] = {0, 0, 0};
>>> +  R600Operands::Ops OpTable[3][2] = {
>>> +    {R600Operands::SRC0, R600Operands::SRC0_SEL},
>>> +    {R600Operands::SRC1, R600Operands::SRC1_SEL},
>>> +    {R600Operands::SRC2, R600Operands::SRC2_SEL},
>>> +  };
>>> +
>>> +  if (!TII->isALUInstr(MI.getOpcode()))
>>> +    return ConstPairs(ReadConsts);
>>> +
>>> +  for (unsigned i = 0; i < 3; i++) {
>>> +    int SrcIdx = TII->getOperandIdx(MI.getOpcode(), OpTable[i][0]);
>>> +    if (SrcIdx < 0)
>>> +      break;
>>> +    if (MI.getOperand(SrcIdx).getReg() == AMDGPU::ALU_CONST)
>>> +      ReadConsts[i] =MI.getOperand(
>>> +          TII->getOperandIdx(MI.getOpcode(), OpTable[i][1])).getImm();
>>> +  }
>>> +  return ConstPairs(ReadConsts);
>>> +}
>>> +
>>> +bool
>>> +R600SchedStrategy::isBundleable(const MachineInstr& MI) {
>>> +  const ConstPairs &MIPair = getPairs(TII, MI);
>>> +  for (unsigned i = 0; i < 4; i++) {
>>> +    if (!InstructionsGroupCandidate[i])
>>> +      continue;
>>> +    const ConstPairs &IGPair = getPairs(TII,
>>> +        *InstructionsGroupCandidate[i]->getInstr());
>>> +    if (!IGPair.isCompatibleWith(MIPair))
>>> +      return false;
>>> +  }
>>> +  return true;
>>> +}
>>> +
>>> +SUnit *R600SchedStrategy::PopInst(std::multiset<SUnit *, 
>> CompareSUnit> &Q) {
>>> +  if (Q.empty())
>>> +    return NULL;
>>> +  for (std::set<SUnit *, CompareSUnit>::iterator It = Q.begin(), E = 
>> Q.end();
>>> +      It != E; ++It) {
>>> +    SUnit *SU = *It;
>>> +    if (isBundleable(*SU->getInstr())) {
>>> +      Q.erase(It);
>>> +      return SU;
>>> +    }
>>> +  }
>>> +  return NULL;
>>> +}
>>> +
>>> +void R600SchedStrategy::LoadAlu() {
>>> +  ReadyQueue *QSrc = Pending[IDAlu];
>>> +  for (ReadyQueue::iterator I = QSrc->begin(),
>>> +        E = QSrc->end(); I != E; ++I) {
>>> +      (*I)->NodeQueueId &= ~QSrc->getID();
>>> +      AluKind AK = getAluKind(*I);
>>> +      AvailableAlus[AK].insert(*I);
>>> +    }
>>> +    QSrc->clear();
>>> +}
>>> +
>>> +void R600SchedStrategy::PrepareNextSlot() {
>>> +  DEBUG(dbgs() << "New Slot\n");
>>> +  assert (OccupedSlotsMask && "Slot wasn't filled");
>>> +  OccupedSlotsMask = 0;
>>> +  memset(InstructionsGroupCandidate, 0, 
>> sizeof(InstructionsGroupCandidate));
>>> +  LoadAlu();
>>> +}
>>> +
>>> +void R600SchedStrategy::AssignSlot(MachineInstr* MI, unsigned Slot) {
>>> +  unsigned DestReg = MI->getOperand(0).getReg();
>>> +  // PressureRegister crashes if an operand is def and used in the same 
>> inst
>>> +  // and we try to constraint its regclass
>>> +  for (MachineInstr::mop_iterator It = MI->operands_begin(),
>>> +      E = MI->operands_end(); It != E; ++It) {
>>> +    MachineOperand &MO = *It;
>>> +    if (MO.isReg() && !MO.isDef() &&
>>> +        MO.getReg() == MI->getOperand(0).getReg())
>>> +      return;
>>> +  }
>>> +  // Constrains the regclass of DestReg to assign it to Slot
>>> +  switch (Slot) {
>>> +  case 0:
>>> +    MRI->constrainRegClass(DestReg, 
>> &AMDGPU::R600_TReg32_XRegClass);
>>> +    break;
>>> +  case 1:
>>> +    MRI->constrainRegClass(DestReg, 
>> &AMDGPU::R600_TReg32_YRegClass);
>>> +    break;
>>> +  case 2:
>>> +    MRI->constrainRegClass(DestReg, 
>> &AMDGPU::R600_TReg32_ZRegClass);
>>> +    break;
>>> +  case 3:
>>> +    MRI->constrainRegClass(DestReg, 
>> &AMDGPU::R600_TReg32_WRegClass);
>>> +    break;
>>> +  }
>>> +}
>>> +
>>> +SUnit *R600SchedStrategy::AttemptFillSlot(unsigned Slot) {
>>> +  static const AluKind IndexToID[] = {AluT_X, AluT_Y, AluT_Z, AluT_W};
>>> +  SUnit *SlotedSU = PopInst(AvailableAlus[IndexToID[Slot]]);
>>> +  SUnit *UnslotedSU = PopInst(AvailableAlus[AluAny]);
>>> +  if (!UnslotedSU) {
>>> +    return SlotedSU;
>>> +  } else if (!SlotedSU) {
>>> +    AssignSlot(UnslotedSU->getInstr(), Slot);
>>> +    return UnslotedSU;
>>> +  } else {
>>> +    //Determine which one to pick (the lesser one)
>>> +    if (CompareSUnit()(SlotedSU, UnslotedSU)) {
>>> +      AvailableAlus[AluAny].insert(UnslotedSU);
>>> +      return SlotedSU;
>>> +    } else {
>>> +      AvailableAlus[IndexToID[Slot]].insert(SlotedSU);
>>> +      AssignSlot(UnslotedSU->getInstr(), Slot);
>>> +      return UnslotedSU;
>>> +    }
>>> +  }
>>> +}
>>> +
>>> +bool R600SchedStrategy::isAvailablesAluEmpty() const {
>>> +  return Pending[IDAlu]->empty() && 
>> AvailableAlus[AluAny].empty() &&
>>> +      AvailableAlus[AluT_XYZW].empty() && 
>> AvailableAlus[AluT_X].empty() &&
>>> +      AvailableAlus[AluT_Y].empty() && 
>> AvailableAlus[AluT_Z].empty() &&
>>> +      AvailableAlus[AluT_W].empty() && 
>> AvailableAlus[AluDiscarded].empty();
>>> +}
>>> +
>>> +SUnit* R600SchedStrategy::pickAlu() {
>>> +  while (!isAvailablesAluEmpty()) {
>>> +    if (!OccupedSlotsMask) {
>>> +      // Flush physical reg copies (RA will discard them)
>>> +      if (!AvailableAlus[AluDiscarded].empty()) {
>>> +        OccupedSlotsMask = 15;
>>> +        return PopInst(AvailableAlus[AluDiscarded]);
>>> +      }
>>> +      // If there is a T_XYZW alu available, use it
>>> +      if (!AvailableAlus[AluT_XYZW].empty()) {
>>> +        OccupedSlotsMask = 15;
>>> +        return PopInst(AvailableAlus[AluT_XYZW]);
>>> +      }
>>> +    }
>>> +    for (unsigned Chan = 0; Chan < 4; ++Chan) {
>>> +      bool isOccupied = OccupedSlotsMask & (1 << Chan);
>>> +      if (!isOccupied) {
>>> +        SUnit *SU = AttemptFillSlot(Chan);
>>> +        if (SU) {
>>> +          OccupedSlotsMask |= (1 << Chan);
>>> +          InstructionsGroupCandidate[Chan] = SU;
>>> +          return SU;
>>> +        }
>>> +      }
>>> +    }
>>> +    PrepareNextSlot();
>>> +  }
>>> +  return NULL;
>>> +}
>>> +
>>> +SUnit* R600SchedStrategy::pickOther(int QID) {
>>> +  SUnit *SU = 0;
>>> +  ReadyQueue *AQ = Available[QID];
>>> +
>>> +  if (AQ->empty()) {
>>> +    MoveUnits(Pending[QID], AQ);
>>> +  }
>>> +  if (!AQ->empty()) {
>>> +    SU = *AQ->begin();
>>> +    AQ->remove(AQ->begin());
>>> +  }
>>> +  return SU;
>>> +}
>>> +
>>> 
>>> Added: llvm/trunk/lib/Target/R600/R600MachineScheduler.h
>>> URL: 
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/R600/R600MachineScheduler.h?rev=176498&view=auto
>>> 
>> ==============================================================================
>>> --- llvm/trunk/lib/Target/R600/R600MachineScheduler.h (added)
>>> +++ llvm/trunk/lib/Target/R600/R600MachineScheduler.h Tue Mar  5 12:41:32 
>> 2013
>>> @@ -0,0 +1,121 @@
>>> +//===-- R600MachineScheduler.h - R600 Scheduler Interface -*- C++ 
>> -*-------===//
>>> +//
>>> +//                     The LLVM Compiler Infrastructure
>>> +//
>>> +// This file is distributed under the University of Illinois Open Source
>>> +// License. See LICENSE.TXT for details.
>>> +//
>>> 
>> +//===----------------------------------------------------------------------===//
>>> +//
>>> +/// \file
>>> +/// \brief R600 Machine Scheduler interface
>>> +//
>>> 
>> +//===----------------------------------------------------------------------===//
>>> +
>>> +#ifndef R600MACHINESCHEDULER_H_
>>> +#define R600MACHINESCHEDULER_H_
>>> +
>>> +#include "R600InstrInfo.h"
>>> +#include "llvm/CodeGen/MachineScheduler.h"
>>> +#include "llvm/Support/Debug.h"
>>> +#include "llvm/ADT/PriorityQueue.h"
>>> +
>>> +using namespace llvm;
>>> +
>>> +namespace llvm {
>>> +
>>> +class CompareSUnit {
>>> +public:
>>> +  bool operator()(const SUnit *S1, const SUnit *S2) {
>>> +    return S1->getDepth() > S2->getDepth();
>>> +  }
>>> +};
>>> +
>>> +class R600SchedStrategy : public MachineSchedStrategy {
>>> +
>>> +  const ScheduleDAGMI *DAG;
>>> +  const R600InstrInfo *TII;
>>> +  const R600RegisterInfo *TRI;
>>> +  MachineRegisterInfo *MRI;
>>> +
>>> +  enum InstQueue {
>>> +    QAlu = 1,
>>> +    QFetch = 2,
>>> +    QOther = 4
>>> +  };
>>> +
>>> +  enum InstKind {
>>> +    IDAlu,
>>> +    IDFetch,
>>> +    IDOther,
>>> +    IDLast
>>> +  };
>>> +
>>> +  enum AluKind {
>>> +    AluAny,
>>> +    AluT_X,
>>> +    AluT_Y,
>>> +    AluT_Z,
>>> +    AluT_W,
>>> +    AluT_XYZW,
>>> +    AluDiscarded, // LLVM Instructions that are going to be eliminated
>>> +    AluLast
>>> +  };
>>> +
>>> +  ReadyQueue *Available[IDLast], *Pending[IDLast];
>>> +  std::multiset<SUnit *, CompareSUnit> AvailableAlus[AluLast];
>>> +
>>> +  InstKind CurInstKind;
>>> +  int CurEmitted;
>>> +  InstKind NextInstKind;
>>> +
>>> +  int InstKindLimit[IDLast];
>>> +
>>> +  int OccupedSlotsMask;
>>> +
>>> +public:
>>> +  R600SchedStrategy() :
>>> +    DAG(0), TII(0), TRI(0), MRI(0) {
>>> +    Available[IDAlu] = new ReadyQueue(QAlu, "AAlu");
>>> +    Available[IDFetch] = new ReadyQueue(QFetch, "AFetch");
>>> +    Available[IDOther] = new ReadyQueue(QOther, "AOther");
>>> +    Pending[IDAlu] = new ReadyQueue(QAlu<<4, "PAlu");
>>> +    Pending[IDFetch] = new ReadyQueue(QFetch<<4, 
>> "PFetch");
>>> +    Pending[IDOther] = new ReadyQueue(QOther<<4, 
>> "POther");
>>> +  }
>>> +
>>> +  virtual ~R600SchedStrategy() {
>>> +    for (unsigned I = 0; I < IDLast; ++I) {
>>> +      delete Available[I];
>>> +      delete Pending[I];
>>> +    }
>>> +  }
>>> +
>>> +  virtual void initialize(ScheduleDAGMI *dag);
>>> +  virtual SUnit *pickNode(bool &IsTopNode);
>>> +  virtual void schedNode(SUnit *SU, bool IsTopNode);
>>> +  virtual void releaseTopNode(SUnit *SU);
>>> +  virtual void releaseBottomNode(SUnit *SU);
>>> +
>>> +private:
>>> +  SUnit *InstructionsGroupCandidate[4];
>>> +
>>> +  int getInstKind(SUnit *SU);
>>> +  bool regBelongsToClass(unsigned Reg, const TargetRegisterClass *RC) 
>> const;
>>> +  AluKind getAluKind(SUnit *SU) const;
>>> +  void LoadAlu();
>>> +  bool isAvailablesAluEmpty() const;
>>> +  SUnit *AttemptFillSlot (unsigned Slot);
>>> +  void PrepareNextSlot();
>>> +  SUnit *PopInst(std::multiset<SUnit *, CompareSUnit> &Q);
>>> +
>>> +  void AssignSlot(MachineInstr *MI, unsigned Slot);
>>> +  SUnit* pickAlu();
>>> +  SUnit* pickOther(int QID);
>>> +  bool isBundleable(const MachineInstr& MI);
>>> +  void MoveUnits(ReadyQueue *QSrc, ReadyQueue *QDst);
>>> +};
>>> +
>>> +} // namespace llvm
>>> +
>>> +#endif /* R600MACHINESCHEDULER_H_ */
>> 
>> Hi Vincent,
>> 
>> I only did a quick review, but it looks great.
>> 
>> Please keep in mind that the MachineScheduler is still under development. Unit 
>> tests will prevent future changes from breaking your target too. I know it's 
>> extremely hard to write scheduler unit tests but you may be able to come up with 
>> a few.
>> 
>> -Andy
>>