PATCH: R600/SI: Replacement for SIFixSGPRLiveRanges pass

Mon Sep 8 14:39:21 PDT 2014

On 09/08/2014 04:38 PM, Tom Stellard wrote:
> Hi,
>
> The attached patches replace the SIFixSGPRLiveRanges pass with something that
> works and enables SALU selection inside of branches.
>
> -Tom
>
> 0001-R600-SI-Fix-SIRegisterInfo-getPhysRegSubReg.patch
>
>
>  From 314112996102603f5d239151d14fddc08fab6584 Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Wed, 3 Sep 2014 19:14:06 -0400
> Subject: [PATCH 1/5] R600/SI: Fix SIRegisterInfo::getPhysRegSubReg()
>
> Correctly handle special registers: EXEC, EXEC_LO, EXEC_HI, VCC_LO,
> VCC_HI, and M0.  The previous implementation would assertion fail
> when passed these registers.
> ---
>   lib/Target/R600/SIRegisterInfo.cpp | 16 ++++++++++++++++
>   1 file changed, 16 insertions(+)
>
> diff --git a/lib/Target/R600/SIRegisterInfo.cpp b/lib/Target/R600/SIRegisterInfo.cpp
> index cbb7a81..a7e2dad 100644
> --- a/lib/Target/R600/SIRegisterInfo.cpp
> +++ b/lib/Target/R600/SIRegisterInfo.cpp
> @@ -245,7 +245,23 @@ unsigned SIRegisterInfo::getPhysRegSubReg(unsigned Reg,
>           case 1: return AMDGPU::VCC_HI;
>           default: llvm_unreachable("Invalid SubIdx for VCC");
>         }
> +    case AMDGPU::EXEC:
> +      switch(Channel) {
> +        case 0: return AMDGPU::EXEC_LO;
> +        case 1: return AMDGPU::EXEC_HI;
> +        default: llvm_unreachable("Invalid SubIdx for EXEC");
>         break;
> +    }
> +  }
> +
> +  const TargetRegisterClass *RC = getPhysRegClass(Reg);
> +  // 32-bit registers don't have sub-registers, so we can just return the
> +  // Reg.  We need to have this check here, because yhe calculation below
Typo: yhe
> +  // using getHWRegIndex() will fail with special 32-bit registers like
> +  // VCC_LO, VCC_HI, EXEC_LO, EXEC_HI and M0.
> +  if (RC->getSize() == 4) {
> +    assert(Channel == 0);
> +    return Reg;
>     }
>   
>     unsigned Index = getHWRegIndex(Reg);
> -- 1.8.5.5
>
> 0002-R600-SI-Mark-EXEC_LO-and-EXEC_HI-as-reserved.patch
>
>
>  From 5b1ba52312e466266711c2262aedb5d3f466c61d Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Wed, 3 Sep 2014 19:20:44 -0400
> Subject: [PATCH 2/5] R600/SI: Mark EXEC_LO and EXEC_HI as reserved
>
> These registers can be allocated and used like other 32-bit registers,
> but it seems like a likely source for bugs.
> ---
>   lib/Target/R600/SIRegisterInfo.cpp | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/lib/Target/R600/SIRegisterInfo.cpp b/lib/Target/R600/SIRegisterInfo.cpp
> index a7e2dad..2c08ca8 100644
> --- a/lib/Target/R600/SIRegisterInfo.cpp
> +++ b/lib/Target/R600/SIRegisterInfo.cpp
> @@ -32,6 +32,12 @@ SIRegisterInfo::SIRegisterInfo(const AMDGPUSubtarget &st)
>   BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
>     BitVector Reserved(getNumRegs());
>     Reserved.set(AMDGPU::EXEC);
> +
> +  // EXEC_LO and EXEC_HI could be allocated and used as regular register,
> +  // but this seems likely to result in bugs, so I'm marking them as reserved.
> +  Reserved.set(AMDGPU::EXEC_LO);
> +  Reserved.set(AMDGPU::EXEC_HI);
> +
I think it would be helpful in some cases to write comparison results 
directly into exec, but that might be better handled with a peephole for 
those cases anyway

>     Reserved.set(AMDGPU::INDIRECT_BASE_ADDR);
>     return Reserved;
>   }
> -- 1.8.5.5
>
> 0003-R600-SI-Fix-the-FixSGPRLiveRanges-pass.patch
>
>
>  From 16791a3cf1765c6384df502a50b3b2887bea170e Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Thu, 31 Jul 2014 15:36:13 -0400
> Subject: [PATCH 3/5] R600/SI: Fix the FixSGPRLiveRanges pass
>
> The previous implementation was extending the live range of SGPRs
> by modifying the live intervals directly.  This was causing a lot
> of machine verification errors when the machine scheduler was enabled.
>
> The new implementation adds pseudo instructions with implicit uses to
> extend the live ranges of SGPRs, which works much better.
> ---
>   lib/Target/R600/AMDGPUTargetMachine.cpp |   5 +-
>   lib/Target/R600/SIFixSGPRLiveRanges.cpp | 147 +++++++++++++++++++++++++-------
>   lib/Target/R600/SIInstrInfo.cpp         |   4 +
>   lib/Target/R600/SIInstructions.td       |   4 +
>   4 files changed, 127 insertions(+), 33 deletions(-)
>
> diff --git a/lib/Target/R600/AMDGPUTargetMachine.cpp b/lib/Target/R600/AMDGPUTargetMachine.cpp
> index 1203cce..a831d72 100644
> --- a/lib/Target/R600/AMDGPUTargetMachine.cpp
> +++ b/lib/Target/R600/AMDGPUTargetMachine.cpp
> @@ -149,8 +149,9 @@ bool AMDGPUPassConfig::addPreRegAlloc() {
>       // so we need to run MachineCSE afterwards.
>       addPass(&MachineCSEID);
>       addPass(createSIShrinkInstructionsPass());
> -    initializeSIFixSGPRLiveRangesPass(*PassRegistry::getPassRegistry());
> -    insertPass(&RegisterCoalescerID, &SIFixSGPRLiveRangesID);
> +//    initializeSIFixSGPRLiveRangesPass(*PassRegistry::getPassRegistry());
> +//    insertPass(&RegisterCoalescerID, &SIFixSGPRLiveRangesID);
Should delete these instead of commenting them out

> +    addPass(createSIFixSGPRLiveRangesPass());
>     }
>     return false;
>   }
> diff --git a/lib/Target/R600/SIFixSGPRLiveRanges.cpp b/lib/Target/R600/SIFixSGPRLiveRanges.cpp
> index 36e0e48..a59befe 100644
> --- a/lib/Target/R600/SIFixSGPRLiveRanges.cpp
> +++ b/lib/Target/R600/SIFixSGPRLiveRanges.cpp
> @@ -9,18 +9,49 @@
>   //
>   /// \file
>   /// SALU instructions ignore control flow, so we need to modify the live ranges
> -/// of the registers they define.
> +/// of the registers they define in some cases.
>   ///
> -/// The strategy is to view the entire program as if it were a single basic
> -/// block and calculate the intervals accordingly.  We implement this
> -/// by walking this list of segments for each LiveRange and setting the
> -/// end of each segment equal to the start of the segment that immediately
> -/// follows it.
> +/// The main case we need to handle is when a def is used in one side of a
> +/// branch and not another.  For example:
> +///
> +/// %def
> +/// IF
> +///   ...
> +///   ...
> +/// ELSE
> +///   %use
> +///   ...
> +/// ENDIF
> +///
> +/// Here we need the register allocator to avoid assigning any of the defs
> +/// inside of the IF to the same register as %def.  In traditional live
> +/// interval analysis %def is not live inside the IF branch, however, since
> +/// SALU instructions inside of IF will be executed even if the branch is not
> +/// taken, there is the chance that one of the instructions will overwrite the
> +/// value of %def, so the use in ELSE will see the wrong value.
> +///
> +/// The strategy we use for solving this is to add an extra use after the ENDIF:
> +///
> +/// %def
> +/// IF
> +///   ...
> +///   ...
> +/// ELSE
> +///   %use
> +///   ...
> +/// ENDIF
> +/// %use
> +///
> +/// Adding this use will make the def live thoughout the IF branch, which is
> +/// what we want.
>   
>   #include "AMDGPU.h"
> +#include "SIInstrInfo.h"
>   #include "SIRegisterInfo.h"
>   #include "llvm/CodeGen/LiveIntervalAnalysis.h"
> +#include "llvm/CodeGen/MachinePostDominators.h"
MachinePostDominators isn't in alphabetical order
>   #include "llvm/CodeGen/MachineFunctionPass.h"
> +#include "llvm/CodeGen/MachineInstrBuilder.h"
>   #include "llvm/CodeGen/MachineRegisterInfo.h"
>   #include "llvm/Support/Debug.h"
>   #include "llvm/Target/TargetMachine.h"
> @@ -48,8 +79,7 @@ public:
>   
>     void getAnalysisUsage(AnalysisUsage &AU) const override {
>       AU.addRequired<LiveIntervals>();
> -    AU.addPreserved<LiveIntervals>();
> -    AU.addPreserved<SlotIndexes>();
> +    AU.addRequired<MachinePostDominatorTree>();
>       AU.setPreservesCFG();
>       MachineFunctionPass::getAnalysisUsage(AU);
>     }
> @@ -60,6 +90,7 @@ public:
>   INITIALIZE_PASS_BEGIN(SIFixSGPRLiveRanges, DEBUG_TYPE,
>                         "SI Fix SGPR Live Ranges", false, false)
>   INITIALIZE_PASS_DEPENDENCY(LiveIntervals)
> +INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree)
>   INITIALIZE_PASS_END(SIFixSGPRLiveRanges, DEBUG_TYPE,
>                       "SI Fix SGPR Live Ranges", false, false)
>   
> @@ -73,36 +104,90 @@ FunctionPass *llvm::createSIFixSGPRLiveRangesPass() {
>   
>   bool SIFixSGPRLiveRanges::runOnMachineFunction(MachineFunction &MF) {
>     MachineRegisterInfo &MRI = MF.getRegInfo();
> -  const SIRegisterInfo *TRI =
> -      static_cast<const SIRegisterInfo *>(MF.getSubtarget().getRegisterInfo());
> +  const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();
> +  const SIRegisterInfo *TRI = static_cast<const SIRegisterInfo *>(
> +      MF.getSubtarget().getRegisterInfo());
>     LiveIntervals *LIS = &getAnalysis<LiveIntervals>();
> +  MachinePostDominatorTree *PDT = &getAnalysis<MachinePostDominatorTree>();
I think PDT is missing const
> +  std::vector<std::pair<unsigned, LiveRange *> > SGPRLiveRanges;
Don't need space between > > anymore
> +
> +  // First pass, collect all live intervals for SGPRs
> +  for (MachineFunction::const_iterator BI = MF.begin(), BE = MF.end();
> +                                                        BI != BE; ++BI) {
> +    const MachineBasicBlock &MBB = *BI;
> +    for (MachineBasicBlock::const_iterator I = MBB.begin(), E = MBB.end();
> +                                                            I != E; ++I) {
> +      const MachineInstr &MI = *I;
> +      for (const MachineOperand &MO : MI.defs(666666666)) {
Can use range for loops. What is 66666666?
> +        if (MO.isImplicit())
> +          continue;
> +        unsigned Def = MO.getReg();
> +        if (TargetRegisterInfo::isVirtualRegister(Def)) {
> +          if (TRI->isSGPRClass(MRI.getRegClass(Def)))
> +            SGPRLiveRanges.push_back(
> +                std::pair<unsigned, LiveRange*>(Def, &LIS->getInterval(Def)));
std::make_pair?
> +        } else if (TRI->isSGPRClass(TRI->getPhysRegClass(Def))) {
> +            SGPRLiveRanges.push_back(
> +                std::pair<unsigned, LiveRange*>(Def, &LIS->getRegUnit(Def)));
> +        }
> +      }
> +    }
> +  }
>   
> +  // Second pass fix the intervals
>     for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
>                                                     BI != BE; ++BI) {
> -
>       MachineBasicBlock &MBB = *BI;
> -    for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();
> -                                                      I != E; ++I) {
> -      MachineInstr &MI = *I;
> -      MachineOperand *ExecUse = MI.findRegisterUseOperand(AMDGPU::EXEC);
> -      if (ExecUse)
> +    if (MBB.succ_size() < 2)
> +      continue;
> +
> +    // We have structured control flow, so number of succesors should be two.
> +    assert(MBB.succ_size() == 2);
> +    MachineBasicBlock *SuccA = *MBB.succ_begin();
> +    MachineBasicBlock *SuccB = *(++MBB.succ_begin());
> +    MachineBasicBlock *NCD = PDT->findNearestCommonDominator(SuccA, SuccB);
> +
> +    if (!NCD)
> +      continue;
> +
> +    MachineBasicBlock::iterator NCDTerm = NCD->getFirstTerminator();
> +
> +    if (NCDTerm != NCD->end() && NCDTerm->getOpcode() == AMDGPU::SI_ELSE) {
> +      assert(NCD->succ_size() == 2);
> +      // We want to make sure we insert the Use after the ENDIF, not after
> +      // the ELSE.
> +      NCD = PDT->findNearestCommonDominator(*NCD->succ_begin(),
> +                                            *(++NCD->succ_begin()));
> +    }
> +    assert(SuccA && SuccB);
> +    for (std::pair<unsigned, LiveRange*> RegLR : SGPRLiveRanges) {
> +      unsigned Reg = RegLR.first;
> +      LiveRange *LR = RegLR.second;
> +
> +      // FIXME: We could be smarter here.  If the register is Live-In to
> +      // one block, but the other doesn't have any SGPR defs, then there
> +      // won't be a conflict.  Also, if the branch decision is based on
> +      // a value in an SGPR, then there will be no conflict.
> +      bool LiveInToA = LIS->isLiveInToMBB(*LR, SuccA);
> +      bool LiveInToB = LIS->isLiveInToMBB(*LR, SuccB);
> +
> +      if ((!LiveInToA && !LiveInToB) ||
> +          (LiveInToA && LiveInToB))
>           continue;
>   
> -      for (const MachineOperand &Def : MI.operands()) {
> -        if (!Def.isReg() || !Def.isDef() ||!TargetRegisterInfo::isVirtualRegister(Def.getReg()))
> -          continue;
> -
> -        const TargetRegisterClass *RC = MRI.getRegClass(Def.getReg());
> -
> -        if (!TRI->isSGPRClass(RC))
> -          continue;
> -        LiveInterval &LI = LIS->getInterval(Def.getReg());
> -        for (unsigned i = 0, e = LI.size() - 1; i != e; ++i) {
> -          LiveRange::Segment &Seg = LI.segments[i];
> -          LiveRange::Segment &Next = LI.segments[i + 1];
> -          Seg.end = Next.start;
> -        }
> -      }
> +      // This interval is live in to one successor, but not the other, so
> +      // we need to update its range so it is live in to both.
> +      DEBUG(dbgs() << "Possible SGPR conflict detected " <<  " in " << *LR <<
> +                      " BB#" << SuccA->getNumber() << ", BB#" <<
> +                      SuccB->getNumber() <<
> +                      " with NCD = " << NCD->getNumber() << '\n');
> +
> +      // FIXME: Need to figure out how to update LiveRange here so this pass
> +      // will be able to preserve LiveInterval analysis.
> +      BuildMI(*NCD, NCD->getFirstNonPHI(), DebugLoc(),
> +              TII->get(AMDGPU::SGPR_USE))
> +              .addReg(Reg, RegState::Implicit);
> +      DEBUG(NCD->getFirstNonPHI()->dump());
>       }
>     }
>   
> diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
> index 93cf7ac..294aa70 100644
> --- a/lib/Target/R600/SIInstrInfo.cpp
> +++ b/lib/Target/R600/SIInstrInfo.cpp
> @@ -506,6 +506,10 @@ bool SIInstrInfo::expandPostRAPseudo(MachineBasicBlock::iterator MI) const {
>       MI->eraseFromParent();
>       break;
>     }
> +  case AMDGPU::SGPR_USE:
> +    // This is just a placeholder for register allocation.
> +    MI->eraseFromParent();
> +    break;
>     }
>     return true;
>   }
> diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
> index 8bd5133..212c230 100644
> --- a/lib/Target/R600/SIInstructions.td
> +++ b/lib/Target/R600/SIInstructions.td
> @@ -1580,6 +1580,10 @@ def V_XOR_I1 : InstSI <
>     [(set i1:$dst, (xor i1:$src0, i1:$src1))]
>   >;
>   
> +let hasSideEffects = 1 in {
> +def SGPR_USE : InstSI <(outs),(ins), "", []>;
> +}
> +
>   // SI pseudo instructions. These are used by the CFG structurizer pass
>   // and should be lowered to ISA instructions prior to codegen.
>   
> -- 1.8.5.5
>
> 0004-R600-SI-Move-PHIs-that-define-SGPRs-to-the-VALU-in-m.patch
>
>
>  From 955c1b069583de2affec155b9bfa6b4637aad0b3 Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Mon, 8 Sep 2014 15:17:48 -0400
> Subject: [PATCH 4/5] R600/SI: Move PHIs that define SGPRs to the VALU in most
>   cases
>
> This fixes a bug that is uncovered by a future commit and will
> be tested by the test/CodeGen/R600/sgpr-control-flow.ll test case.
> ---
>   lib/Target/R600/SIFixSGPRCopies.cpp | 50 +++++++++++++++++++++++++++++++++++++
>   1 file changed, 50 insertions(+)
>
> diff --git a/lib/Target/R600/SIFixSGPRCopies.cpp b/lib/Target/R600/SIFixSGPRCopies.cpp
> index c108571..c2f9444 100644
> --- a/lib/Target/R600/SIFixSGPRCopies.cpp
> +++ b/lib/Target/R600/SIFixSGPRCopies.cpp
> @@ -238,14 +238,64 @@ bool SIFixSGPRCopies::runOnMachineFunction(MachineFunction &MF) {
>   
>           // If a PHI node defines an SGPR and any of its operands are VGPRs,
>           // then we need to move it to the VALU.
> +        //
> +        // Also, if a PHI node defines an SGPR and has all SGPR operands
> +        // we must move it to the VALU, because the SGPR operands will
> +        // all end up being assigned the same register, which means
> +        // there is a potential for a conflict if different threads.
"for a conflict if different threads." doesn't sound right
> +        //
> +        // For Example:
> +        //
> +        // sgpr0 = def;
> +        // ...
> +        // sgpr1 = def;
> +        // ...
> +        // sgpr2 = PHI sgpr0, sgpr1
> +        // use sgpr2;
> +        //
> +        // Will Become:
> +        //
> +        // sgpr2 = def;
> +        // ...
> +        // sgpr2 = def;
> +        // ...
> +        // use sgpr2
> +        //
> +        // FIXME: This is OK if the branching decision is made based on an
> +        // SGPR value.
> +        bool SGPRBranch = false;
> +
> +        // There is one exception to this rule is when one of the operands

  The one exception to this rule is when one of the operands


> +        // is defined by a SI_BREAK, SI_IF_BREAK, or SI_ELSE_BREAK
> +        // instruction.  In this case, there we know the program will
> +        // never enter the second block (the loop) without entering
> +        // the first block (where the condition is computed), so there
> +        // is no chance for values to be over-written.
> +
> +        bool HasBreakDef = false;
>           for (unsigned i = 1; i < MI.getNumOperands(); i+=2) {
>             unsigned Reg = MI.getOperand(i).getReg();
>             if (TRI->hasVGPRs(MRI.getRegClass(Reg))) {
>               TII->moveToVALU(MI);
>               break;
>             }
> +          MachineInstr *DefInstr = MRI.getUniqueVRegDef(Reg);
> +          assert(DefInstr);
> +          switch(DefInstr->getOpcode()) {
Missing space after switch
> +          case AMDGPU::SI_BREAK:
> +          case AMDGPU::SI_IF_BREAK:
> +          case AMDGPU::SI_ELSE_BREAK:
> +          // If we see a PHI instruction that defines an SGPR, then that PHI
> +          // instruction has already been considered and should have
> +          // a *_BREAK as an operand.
> +          case AMDGPU::PHI:
> +            HasBreakDef = true;
> +            break;
> +          }
>           }
>   
> +        if (!SGPRBranch && !HasBreakDef)
> +          TII->moveToVALU(MI);
>           break;
>         }
>         case AMDGPU::REG_SEQUENCE: {
> -- 1.8.5.5
>
> 0005-R600-SI-Enable-selecting-SALU-inside-branches.patch
>
>
>  From 05ee026c38c23dceec0ceac7c9dd351a7c6be308 Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Thu, 31 Jul 2014 18:59:36 -0400
> Subject: [PATCH 5/5] R600/SI: Enable selecting SALU inside branches
>
> We can do this now that the FixSGPRLiveRanges pass is working.
> ---
>   lib/Target/R600/AMDGPUISelDAGToDAG.cpp | 18 ---------------
>   lib/Target/R600/SIInstructions.td      |  9 --------
>   test/CodeGen/R600/add.ll               |  4 ++--
>   test/CodeGen/R600/ctpop.ll             | 22 +++++++++---------
>   test/CodeGen/R600/ctpop64.ll           | 27 +++++++++++-----------
>   test/CodeGen/R600/mul.ll               |  4 ++--
>   test/CodeGen/R600/sgpr-control-flow.ll | 41 ++++++++++++++++++++++++++++++++--
>   test/CodeGen/R600/xor.ll               |  3 +--
>   8 files changed, 68 insertions(+), 60 deletions(-)
>
> diff --git a/lib/Target/R600/AMDGPUISelDAGToDAG.cpp b/lib/Target/R600/AMDGPUISelDAGToDAG.cpp
> index 3c68e45..a75a2ac 100644
> --- a/lib/Target/R600/AMDGPUISelDAGToDAG.cpp
> +++ b/lib/Target/R600/AMDGPUISelDAGToDAG.cpp
> @@ -77,11 +77,6 @@ private:
>     bool isLocalLoad(const LoadSDNode *N) const;
>     bool isRegionLoad(const LoadSDNode *N) const;
>   
> -  /// \returns True if the current basic block being selected is at control
> -  ///          flow depth 0.  Meaning that the current block dominates the
> -  //           exit block.
> -  bool isCFDepth0() const;
> -
>     const TargetRegisterClass *getOperandRegClass(SDNode *N, unsigned OpNo) const;
>     bool SelectGlobalValueConstantOffset(SDValue Addr, SDValue& IntPtr);
>     bool SelectGlobalValueVariableOffset(SDValue Addr, SDValue &BaseReg,
> @@ -591,14 +586,6 @@ bool AMDGPUDAGToDAGISel::isPrivateLoad(const LoadSDNode *N) const {
>     return false;
>   }
>   
> -bool AMDGPUDAGToDAGISel::isCFDepth0() const {
> -  // FIXME: Figure out a way to use DominatorTree analysis here.
> -  const BasicBlock *CurBlock = FuncInfo->MBB->getBasicBlock();
> -  const Function *Fn = FuncInfo->Fn;
> -  return &Fn->front() == CurBlock || &Fn->back() == CurBlock;
> -}
> -
> -
>   const char *AMDGPUDAGToDAGISel::getPassName() const {
>     return "AMDGPU DAG->DAG Pattern Instruction Selection";
>   }
> @@ -704,11 +691,6 @@ SDNode *AMDGPUDAGToDAGISel::SelectADD_SUB_I64(SDNode *N) {
>     unsigned Opc = IsAdd ? AMDGPU::S_ADD_U32 : AMDGPU::S_SUB_U32;
>     unsigned CarryOpc = IsAdd ? AMDGPU::S_ADDC_U32 : AMDGPU::S_SUBB_U32;
>   
> -  if (!isCFDepth0()) {
> -    Opc = IsAdd ? AMDGPU::V_ADD_I32_e32 : AMDGPU::V_SUB_I32_e32;
> -    CarryOpc = IsAdd ? AMDGPU::V_ADDC_U32_e32 : AMDGPU::V_SUBB_U32_e32;
> -  }
> -
>     SDNode *AddLo = CurDAG->getMachineNode( Opc, DL, VTList, AddLoArgs);
>     SDValue Carry(AddLo, 1);
>     SDNode *AddHi
> diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
> index 212c230..9418622 100644
> --- a/lib/Target/R600/SIInstructions.td
> +++ b/lib/Target/R600/SIInstructions.td
> @@ -32,12 +32,9 @@ def isSI : Predicate<"Subtarget.getGeneration() "
>   def isCI : Predicate<"Subtarget.getGeneration() "
>                         ">= AMDGPUSubtarget::SEA_ISLANDS">;
>   
> -def isCFDepth0 : Predicate<"isCFDepth0()">;
> -
>   def WAIT_FLAG : InstFlag<"printWaitFlag">;
>   
>   let SubtargetPredicate = isSI in {
> -let OtherPredicates  = [isCFDepth0] in {
>   
>   //===----------------------------------------------------------------------===//
>   // SMRD Instructions
> @@ -364,8 +361,6 @@ def S_GETREG_REGRD_B32 : SOPK_32 <0x00000014, "S_GETREG_REGRD_B32", []>;
>   //def S_SETREG_IMM32_B32 : SOPK_32 <0x00000015, "S_SETREG_IMM32_B32", []>;
>   //def EXP : EXP_ <0x00000000, "EXP", []>;
>   
> -} // End let OtherPredicates = [isCFDepth0]
> -
>   //===----------------------------------------------------------------------===//
>   // SOPP Instructions
>   //===----------------------------------------------------------------------===//
> @@ -1851,8 +1846,6 @@ def : Pat <
>   // SOP1 Patterns
>   //===----------------------------------------------------------------------===//
>   
> -let Predicates = [isSI, isCFDepth0] in {
> -
>   def : Pat <
>     (i64 (ctpop i64:$src)),
>     (INSERT_SUBREG (INSERT_SUBREG (i64 (IMPLICIT_DEF)),
> @@ -1871,8 +1864,6 @@ def : Pat <
>     (S_ADD_U32 $src0, $src1)
>   >;
>   
> -} // Predicates = [isSI, isCFDepth0]
> -
>   let  Predicates = [isSI] in {
>   
>   //===----------------------------------------------------------------------===//
> diff --git a/test/CodeGen/R600/add.ll b/test/CodeGen/R600/add.ll
> index f62c9d6..fbe6437 100644
> --- a/test/CodeGen/R600/add.ll
> +++ b/test/CodeGen/R600/add.ll
> @@ -145,8 +145,8 @@ entry:
>   ; branches.
>   ; FIXME: We are being conservative here.  We could allow this in some cases.

This FIXME can be removed now
>   ; FUNC-LABEL: @add64_in_branch
> -; SI-CHECK-NOT: S_ADD_I32
> -; SI-CHECK-NOT: S_ADDC_U32
> +; SI-CHECK: S_ADD_U32
> +; SI-CHECK: S_ADDC_U32
>   define void @add64_in_branch(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) {
>   entry:
>     %0 = icmp eq i64 %a, 0
> diff --git a/test/CodeGen/R600/ctpop.ll b/test/CodeGen/R600/ctpop.ll
> index c7c406a..fe46900 100644
> --- a/test/CodeGen/R600/ctpop.ll
> +++ b/test/CodeGen/R600/ctpop.ll
> @@ -257,28 +257,28 @@ define void @v_ctpop_i32_add_vvar_inv(i32 addrspace(1)* noalias %out, i32 addrsp
>   ; but there are some cases when the should be allowed.
>   
>   ; FUNC-LABEL: @ctpop_i32_in_br
> -; SI: BUFFER_LOAD_DWORD [[VAL:v[0-9]+]],
> -; SI: V_BCNT_U32_B32_e64 [[RESULT:v[0-9]+]], [[VAL]], 0
> +; SI: S_LOAD_DWORD [[VAL:s[0-9]+]], s[{{[0-9]+:[0-9]+}}], 0xd
> +; SI: S_BCNT1_I32_B32  [[SRESULT:s[0-9]+]], [[VAL]]
> +; SI: V_MOV_B32_e32 [[RESULT]], [[SRESULT]]
>   ; SI: BUFFER_STORE_DWORD [[RESULT]],
>   ; SI: S_ENDPGM
>   ; EG: BCNT_INT
> -define void @ctpop_i32_in_br(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %cond) {
> +define void @ctpop_i32_in_br(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %ctpop_arg, i32 %cond) {
>   entry:
> -  %0 = icmp eq i32 %cond, 0
> -  br i1 %0, label %if, label %else
> +  %tmp0 = icmp eq i32 %cond, 0
> +  br i1 %tmp0, label %if, label %else
>   
>   if:
> -  %1 = load i32 addrspace(1)* %in
> -  %2 = call i32 @llvm.ctpop.i32(i32 %1)
> +  %tmp2 = call i32 @llvm.ctpop.i32(i32 %ctpop_arg)
>     br label %endif
>   
>   else:
> -  %3 = getelementptr i32 addrspace(1)* %in, i32 1
> -  %4 = load i32 addrspace(1)* %3
> +  %tmp3 = getelementptr i32 addrspace(1)* %in, i32 1
> +  %tmp4 = load i32 addrspace(1)* %tmp3
>     br label %endif
>   
>   endif:
> -  %5 = phi i32 [%2, %if], [%4, %else]
> -  store i32 %5, i32 addrspace(1)* %out
> +  %tmp5 = phi i32 [%tmp2, %if], [%tmp4, %else]
> +  store i32 %tmp5, i32 addrspace(1)* %out
>     ret void
>   }
> diff --git a/test/CodeGen/R600/ctpop64.ll b/test/CodeGen/R600/ctpop64.ll
> index 37a174f..76091c5 100644
> --- a/test/CodeGen/R600/ctpop64.ll
> +++ b/test/CodeGen/R600/ctpop64.ll
> @@ -94,29 +94,28 @@ define void @v_ctpop_v4i64(<4 x i32> addrspace(1)* noalias %out, <4 x i64> addrs
>   ; but there are some cases when the should be allowed.
>   
>   ; FUNC-LABEL: @ctpop_i64_in_br
> -; SI: V_BCNT_U32_B32_e64 [[BCNT_LO:v[0-9]+]], v{{[0-9]+}}, 0
> -; SI: V_BCNT_U32_B32_e32 v[[BCNT:[0-9]+]], v{{[0-9]+}}, [[BCNT_LO]]
> -; SI: V_MOV_B32_e32 v[[ZERO:[0-9]+]], 0
> -; SI: BUFFER_STORE_DWORDX2 v[
> -; SI: [[BCNT]]:[[ZERO]]]
> +; SI: S_LOAD_DWORDX2 s{{\[}}[[LOVAL:[0-9]+]]:[[HIVAL:[0-9]+]]{{\]}}, s[{{[0-9]+:[0-9]+}}], 0xd
> +; SI: S_BCNT1_I32_B64 [[RESULT:s[0-9]+]], {{s\[}}[[LOVAL]]:[[HIVAL]]{{\]}}
> +; SI: V_MOV_B32_e32 v[[VLO:[0-9]+]], [[RESULT]]
> +; SI: V_MOV_B32_e32 v[[VHI:[0-9]+]], s[[HIVAL]]
> +; SI: BUFFER_STORE_DWORDX2 {{v\[}}[[VLO]]:[[VHI]]{{\]}}
>   ; SI: S_ENDPGM
> -define void @ctpop_i64_in_br(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i32 %cond) {
> +define void @ctpop_i64_in_br(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %ctpop_arg, i32 %cond) {
>   entry:
> -  %0 = icmp eq i32 %cond, 0
> -  br i1 %0, label %if, label %else
> +  %tmp0 = icmp eq i32 %cond, 0
> +  br i1 %tmp0, label %if, label %else
>   
>   if:
> -  %1 = load i64 addrspace(1)* %in
> -  %2 = call i64 @llvm.ctpop.i64(i64 %1)
> +  %tmp2 = call i64 @llvm.ctpop.i64(i64 %ctpop_arg)
>     br label %endif
>   
>   else:
> -  %3 = getelementptr i64 addrspace(1)* %in, i32 1
> -  %4 = load i64 addrspace(1)* %3
> +  %tmp3 = getelementptr i64 addrspace(1)* %in, i32 1
> +  %tmp4 = load i64 addrspace(1)* %tmp3
>     br label %endif
>   
>   endif:
> -  %5 = phi i64 [%2, %if], [%4, %else]
> -  store i64 %5, i64 addrspace(1)* %out
> +  %tmp5 = phi i64 [%tmp2, %if], [%tmp4, %else]
> +  store i64 %tmp5, i64 addrspace(1)* %out
>     ret void
>   }
> diff --git a/test/CodeGen/R600/mul.ll b/test/CodeGen/R600/mul.ll
> index fe9c1b9..11de3e3 100644
> --- a/test/CodeGen/R600/mul.ll
> +++ b/test/CodeGen/R600/mul.ll
> @@ -155,7 +155,7 @@ define void @v_mul_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 addr
>   }
>   
>   ; FUNC-LABEL: @mul32_in_branch
> -; SI: V_MUL_LO_I32
> +; SI: S_MUL_I32
>   define void @mul32_in_branch(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %a, i32 %b, i32 %c) {
>   entry:
>     %0 = icmp eq i32 %a, 0
> @@ -176,7 +176,7 @@ endif:
>   }
>   
>   ; FUNC-LABEL: @mul64_in_branch
> -; SI-DAG: V_MUL_LO_I32
> +; SI-DAG: S_MUL_I32
>   ; SI-DAG: V_MUL_HI_U32
>   ; SI: S_ENDPGM
>   define void @mul64_in_branch(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) {
> diff --git a/test/CodeGen/R600/sgpr-control-flow.ll b/test/CodeGen/R600/sgpr-control-flow.ll
> index 06ad24d..326b37a 100644
> --- a/test/CodeGen/R600/sgpr-control-flow.ll
> +++ b/test/CodeGen/R600/sgpr-control-flow.ll
> @@ -4,9 +4,14 @@
>   ; Most SALU instructions ignore control flow, so we need to make sure
>   ; they don't overwrite values from other blocks.
>   
> -; SI-NOT: S_ADD
> +; If the branch decision is made based on a value in an SGPR then all
> +; threads will execute the same code paths, so we don't need to worry
> +; about instructions in different blocks overwriting each other.
> +; SI-LABEL: @sgpr_if_else_salu_br
> +; SI: S_ADD
> +; SI: S_ADD
>   
> -define void @sgpr_if_else(i32 addrspace(1)* %out, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) {
> +define void @sgpr_if_else_salu_br(i32 addrspace(1)* %out, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) {
>   entry:
>     %0 = icmp eq i32 %a, 0
>     br i1 %0, label %if, label %else
> @@ -25,3 +30,35 @@ endif:
>     store i32 %4, i32 addrspace(1)* %out
>     ret void
>   }
> +
> +; The two S_ADD instructions should write to different registers, since
> +; different threads will take different control flow paths.
> +
> +; SI-LABEL: @sgpr_if_else_valu_br
> +; SI: S_ADD_I32 [[SGPR:s[0-9]+]]
> +; SI-NOT: S_ADD_I32 [[SGPR]]
> +
> +define void @sgpr_if_else_valu_br(i32 addrspace(1)* %out, float %a, i32 %b, i32 %c, i32 %d, i32 %e) {
> +entry:
> +  %tid = call i32 @llvm.r600.read.tidig.x() #0
> +  %tid_f = uitofp i32 %tid to float
> +  %tmp1 = fcmp ueq float %tid_f, 0.0
> +  br i1 %tmp1, label %if, label %else
> +
> +if:
> +  %tmp2 = add i32 %b, %c
> +  br label %endif
> +
> +else:
> +  %tmp3 = add i32 %d, %e
> +  br label %endif
> +
> +endif:
> +  %tmp4 = phi i32 [%tmp2, %if], [%tmp3, %else]
> +  store i32 %tmp4, i32 addrspace(1)* %out
> +  ret void
> +}
> +
> +declare i32 @llvm.r600.read.tidig.x() #0
> +
> +attributes #0 = { readnone }
> diff --git a/test/CodeGen/R600/xor.ll b/test/CodeGen/R600/xor.ll
> index e14bd71..8c2c80e 100644
> --- a/test/CodeGen/R600/xor.ll
> +++ b/test/CodeGen/R600/xor.ll
> @@ -136,8 +136,7 @@ define void @vector_not_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %in0, i64
>   ; use an SALU instruction for this.
>   
>   ; SI-CHECK-LABEL: @xor_cf
> -; SI-CHECK: V_XOR
> -; SI-CHECK: V_XOR
> +; SI-CHECK: S_XOR_B64
>   define void @xor_cf(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b) {
>   entry:
>     %0 = icmp eq i64 %a, 0
> -- 1.8.5.5
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140908/5b83bc2e/attachment.html>