PATCH: R600/SI: Replacement for SIFixSGPRLiveRanges pass
Matt Arsenault
Matthew.Arsenault at amd.com
Mon Sep 8 14:39:21 PDT 2014
On 09/08/2014 04:38 PM, Tom Stellard wrote:
> Hi,
>
> The attached patches replace the SIFixSGPRLiveRanges pass with something that
> works and enables SALU selection inside of branches.
>
> -Tom
>
> 0001-R600-SI-Fix-SIRegisterInfo-getPhysRegSubReg.patch
>
>
> From 314112996102603f5d239151d14fddc08fab6584 Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Wed, 3 Sep 2014 19:14:06 -0400
> Subject: [PATCH 1/5] R600/SI: Fix SIRegisterInfo::getPhysRegSubReg()
>
> Correctly handle special registers: EXEC, EXEC_LO, EXEC_HI, VCC_LO,
> VCC_HI, and M0. The previous implementation would assertion fail
> when passed these registers.
> ---
> lib/Target/R600/SIRegisterInfo.cpp | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/lib/Target/R600/SIRegisterInfo.cpp b/lib/Target/R600/SIRegisterInfo.cpp
> index cbb7a81..a7e2dad 100644
> --- a/lib/Target/R600/SIRegisterInfo.cpp
> +++ b/lib/Target/R600/SIRegisterInfo.cpp
> @@ -245,7 +245,23 @@ unsigned SIRegisterInfo::getPhysRegSubReg(unsigned Reg,
> case 1: return AMDGPU::VCC_HI;
> default: llvm_unreachable("Invalid SubIdx for VCC");
> }
> + case AMDGPU::EXEC:
> + switch(Channel) {
> + case 0: return AMDGPU::EXEC_LO;
> + case 1: return AMDGPU::EXEC_HI;
> + default: llvm_unreachable("Invalid SubIdx for EXEC");
> break;
> + }
> + }
> +
> + const TargetRegisterClass *RC = getPhysRegClass(Reg);
> + // 32-bit registers don't have sub-registers, so we can just return the
> + // Reg. We need to have this check here, because yhe calculation below
Typo: yhe
> + // using getHWRegIndex() will fail with special 32-bit registers like
> + // VCC_LO, VCC_HI, EXEC_LO, EXEC_HI and M0.
> + if (RC->getSize() == 4) {
> + assert(Channel == 0);
> + return Reg;
> }
>
> unsigned Index = getHWRegIndex(Reg);
> -- 1.8.5.5
>
> 0002-R600-SI-Mark-EXEC_LO-and-EXEC_HI-as-reserved.patch
>
>
> From 5b1ba52312e466266711c2262aedb5d3f466c61d Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Wed, 3 Sep 2014 19:20:44 -0400
> Subject: [PATCH 2/5] R600/SI: Mark EXEC_LO and EXEC_HI as reserved
>
> These registers can be allocated and used like other 32-bit registers,
> but it seems like a likely source for bugs.
> ---
> lib/Target/R600/SIRegisterInfo.cpp | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/lib/Target/R600/SIRegisterInfo.cpp b/lib/Target/R600/SIRegisterInfo.cpp
> index a7e2dad..2c08ca8 100644
> --- a/lib/Target/R600/SIRegisterInfo.cpp
> +++ b/lib/Target/R600/SIRegisterInfo.cpp
> @@ -32,6 +32,12 @@ SIRegisterInfo::SIRegisterInfo(const AMDGPUSubtarget &st)
> BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
> BitVector Reserved(getNumRegs());
> Reserved.set(AMDGPU::EXEC);
> +
> + // EXEC_LO and EXEC_HI could be allocated and used as regular register,
> + // but this seems likely to result in bugs, so I'm marking them as reserved.
> + Reserved.set(AMDGPU::EXEC_LO);
> + Reserved.set(AMDGPU::EXEC_HI);
> +
I think it would be helpful in some cases to write comparison results
directly into exec, but that might be better handled with a peephole for
those cases anyway
> Reserved.set(AMDGPU::INDIRECT_BASE_ADDR);
> return Reserved;
> }
> -- 1.8.5.5
>
> 0003-R600-SI-Fix-the-FixSGPRLiveRanges-pass.patch
>
>
> From 16791a3cf1765c6384df502a50b3b2887bea170e Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Thu, 31 Jul 2014 15:36:13 -0400
> Subject: [PATCH 3/5] R600/SI: Fix the FixSGPRLiveRanges pass
>
> The previous implementation was extending the live range of SGPRs
> by modifying the live intervals directly. This was causing a lot
> of machine verification errors when the machine scheduler was enabled.
>
> The new implementation adds pseudo instructions with implicit uses to
> extend the live ranges of SGPRs, which works much better.
> ---
> lib/Target/R600/AMDGPUTargetMachine.cpp | 5 +-
> lib/Target/R600/SIFixSGPRLiveRanges.cpp | 147 +++++++++++++++++++++++++-------
> lib/Target/R600/SIInstrInfo.cpp | 4 +
> lib/Target/R600/SIInstructions.td | 4 +
> 4 files changed, 127 insertions(+), 33 deletions(-)
>
> diff --git a/lib/Target/R600/AMDGPUTargetMachine.cpp b/lib/Target/R600/AMDGPUTargetMachine.cpp
> index 1203cce..a831d72 100644
> --- a/lib/Target/R600/AMDGPUTargetMachine.cpp
> +++ b/lib/Target/R600/AMDGPUTargetMachine.cpp
> @@ -149,8 +149,9 @@ bool AMDGPUPassConfig::addPreRegAlloc() {
> // so we need to run MachineCSE afterwards.
> addPass(&MachineCSEID);
> addPass(createSIShrinkInstructionsPass());
> - initializeSIFixSGPRLiveRangesPass(*PassRegistry::getPassRegistry());
> - insertPass(&RegisterCoalescerID, &SIFixSGPRLiveRangesID);
> +// initializeSIFixSGPRLiveRangesPass(*PassRegistry::getPassRegistry());
> +// insertPass(&RegisterCoalescerID, &SIFixSGPRLiveRangesID);
Should delete these instead of commenting them out
> + addPass(createSIFixSGPRLiveRangesPass());
> }
> return false;
> }
> diff --git a/lib/Target/R600/SIFixSGPRLiveRanges.cpp b/lib/Target/R600/SIFixSGPRLiveRanges.cpp
> index 36e0e48..a59befe 100644
> --- a/lib/Target/R600/SIFixSGPRLiveRanges.cpp
> +++ b/lib/Target/R600/SIFixSGPRLiveRanges.cpp
> @@ -9,18 +9,49 @@
> //
> /// \file
> /// SALU instructions ignore control flow, so we need to modify the live ranges
> -/// of the registers they define.
> +/// of the registers they define in some cases.
> ///
> -/// The strategy is to view the entire program as if it were a single basic
> -/// block and calculate the intervals accordingly. We implement this
> -/// by walking this list of segments for each LiveRange and setting the
> -/// end of each segment equal to the start of the segment that immediately
> -/// follows it.
> +/// The main case we need to handle is when a def is used in one side of a
> +/// branch and not another. For example:
> +///
> +/// %def
> +/// IF
> +/// ...
> +/// ...
> +/// ELSE
> +/// %use
> +/// ...
> +/// ENDIF
> +///
> +/// Here we need the register allocator to avoid assigning any of the defs
> +/// inside of the IF to the same register as %def. In traditional live
> +/// interval analysis %def is not live inside the IF branch, however, since
> +/// SALU instructions inside of IF will be executed even if the branch is not
> +/// taken, there is the chance that one of the instructions will overwrite the
> +/// value of %def, so the use in ELSE will see the wrong value.
> +///
> +/// The strategy we use for solving this is to add an extra use after the ENDIF:
> +///
> +/// %def
> +/// IF
> +/// ...
> +/// ...
> +/// ELSE
> +/// %use
> +/// ...
> +/// ENDIF
> +/// %use
> +///
> +/// Adding this use will make the def live thoughout the IF branch, which is
> +/// what we want.
>
> #include "AMDGPU.h"
> +#include "SIInstrInfo.h"
> #include "SIRegisterInfo.h"
> #include "llvm/CodeGen/LiveIntervalAnalysis.h"
> +#include "llvm/CodeGen/MachinePostDominators.h"
MachinePostDominators isn't in alphabetical order
> #include "llvm/CodeGen/MachineFunctionPass.h"
> +#include "llvm/CodeGen/MachineInstrBuilder.h"
> #include "llvm/CodeGen/MachineRegisterInfo.h"
> #include "llvm/Support/Debug.h"
> #include "llvm/Target/TargetMachine.h"
> @@ -48,8 +79,7 @@ public:
>
> void getAnalysisUsage(AnalysisUsage &AU) const override {
> AU.addRequired<LiveIntervals>();
> - AU.addPreserved<LiveIntervals>();
> - AU.addPreserved<SlotIndexes>();
> + AU.addRequired<MachinePostDominatorTree>();
> AU.setPreservesCFG();
> MachineFunctionPass::getAnalysisUsage(AU);
> }
> @@ -60,6 +90,7 @@ public:
> INITIALIZE_PASS_BEGIN(SIFixSGPRLiveRanges, DEBUG_TYPE,
> "SI Fix SGPR Live Ranges", false, false)
> INITIALIZE_PASS_DEPENDENCY(LiveIntervals)
> +INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree)
> INITIALIZE_PASS_END(SIFixSGPRLiveRanges, DEBUG_TYPE,
> "SI Fix SGPR Live Ranges", false, false)
>
> @@ -73,36 +104,90 @@ FunctionPass *llvm::createSIFixSGPRLiveRangesPass() {
>
> bool SIFixSGPRLiveRanges::runOnMachineFunction(MachineFunction &MF) {
> MachineRegisterInfo &MRI = MF.getRegInfo();
> - const SIRegisterInfo *TRI =
> - static_cast<const SIRegisterInfo *>(MF.getSubtarget().getRegisterInfo());
> + const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();
> + const SIRegisterInfo *TRI = static_cast<const SIRegisterInfo *>(
> + MF.getSubtarget().getRegisterInfo());
> LiveIntervals *LIS = &getAnalysis<LiveIntervals>();
> + MachinePostDominatorTree *PDT = &getAnalysis<MachinePostDominatorTree>();
I think PDT is missing const
> + std::vector<std::pair<unsigned, LiveRange *> > SGPRLiveRanges;
Don't need space between > > anymore
> +
> + // First pass, collect all live intervals for SGPRs
> + for (MachineFunction::const_iterator BI = MF.begin(), BE = MF.end();
> + BI != BE; ++BI) {
> + const MachineBasicBlock &MBB = *BI;
> + for (MachineBasicBlock::const_iterator I = MBB.begin(), E = MBB.end();
> + I != E; ++I) {
> + const MachineInstr &MI = *I;
> + for (const MachineOperand &MO : MI.defs(666666666)) {
Can use range for loops. What is 66666666?
> + if (MO.isImplicit())
> + continue;
> + unsigned Def = MO.getReg();
> + if (TargetRegisterInfo::isVirtualRegister(Def)) {
> + if (TRI->isSGPRClass(MRI.getRegClass(Def)))
> + SGPRLiveRanges.push_back(
> + std::pair<unsigned, LiveRange*>(Def, &LIS->getInterval(Def)));
std::make_pair?
> + } else if (TRI->isSGPRClass(TRI->getPhysRegClass(Def))) {
> + SGPRLiveRanges.push_back(
> + std::pair<unsigned, LiveRange*>(Def, &LIS->getRegUnit(Def)));
> + }
> + }
> + }
> + }
>
> + // Second pass fix the intervals
> for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
> BI != BE; ++BI) {
> -
> MachineBasicBlock &MBB = *BI;
> - for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();
> - I != E; ++I) {
> - MachineInstr &MI = *I;
> - MachineOperand *ExecUse = MI.findRegisterUseOperand(AMDGPU::EXEC);
> - if (ExecUse)
> + if (MBB.succ_size() < 2)
> + continue;
> +
> + // We have structured control flow, so number of succesors should be two.
> + assert(MBB.succ_size() == 2);
> + MachineBasicBlock *SuccA = *MBB.succ_begin();
> + MachineBasicBlock *SuccB = *(++MBB.succ_begin());
> + MachineBasicBlock *NCD = PDT->findNearestCommonDominator(SuccA, SuccB);
> +
> + if (!NCD)
> + continue;
> +
> + MachineBasicBlock::iterator NCDTerm = NCD->getFirstTerminator();
> +
> + if (NCDTerm != NCD->end() && NCDTerm->getOpcode() == AMDGPU::SI_ELSE) {
> + assert(NCD->succ_size() == 2);
> + // We want to make sure we insert the Use after the ENDIF, not after
> + // the ELSE.
> + NCD = PDT->findNearestCommonDominator(*NCD->succ_begin(),
> + *(++NCD->succ_begin()));
> + }
> + assert(SuccA && SuccB);
> + for (std::pair<unsigned, LiveRange*> RegLR : SGPRLiveRanges) {
> + unsigned Reg = RegLR.first;
> + LiveRange *LR = RegLR.second;
> +
> + // FIXME: We could be smarter here. If the register is Live-In to
> + // one block, but the other doesn't have any SGPR defs, then there
> + // won't be a conflict. Also, if the branch decision is based on
> + // a value in an SGPR, then there will be no conflict.
> + bool LiveInToA = LIS->isLiveInToMBB(*LR, SuccA);
> + bool LiveInToB = LIS->isLiveInToMBB(*LR, SuccB);
> +
> + if ((!LiveInToA && !LiveInToB) ||
> + (LiveInToA && LiveInToB))
> continue;
>
> - for (const MachineOperand &Def : MI.operands()) {
> - if (!Def.isReg() || !Def.isDef() ||!TargetRegisterInfo::isVirtualRegister(Def.getReg()))
> - continue;
> -
> - const TargetRegisterClass *RC = MRI.getRegClass(Def.getReg());
> -
> - if (!TRI->isSGPRClass(RC))
> - continue;
> - LiveInterval &LI = LIS->getInterval(Def.getReg());
> - for (unsigned i = 0, e = LI.size() - 1; i != e; ++i) {
> - LiveRange::Segment &Seg = LI.segments[i];
> - LiveRange::Segment &Next = LI.segments[i + 1];
> - Seg.end = Next.start;
> - }
> - }
> + // This interval is live in to one successor, but not the other, so
> + // we need to update its range so it is live in to both.
> + DEBUG(dbgs() << "Possible SGPR conflict detected " << " in " << *LR <<
> + " BB#" << SuccA->getNumber() << ", BB#" <<
> + SuccB->getNumber() <<
> + " with NCD = " << NCD->getNumber() << '\n');
> +
> + // FIXME: Need to figure out how to update LiveRange here so this pass
> + // will be able to preserve LiveInterval analysis.
> + BuildMI(*NCD, NCD->getFirstNonPHI(), DebugLoc(),
> + TII->get(AMDGPU::SGPR_USE))
> + .addReg(Reg, RegState::Implicit);
> + DEBUG(NCD->getFirstNonPHI()->dump());
> }
> }
>
> diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
> index 93cf7ac..294aa70 100644
> --- a/lib/Target/R600/SIInstrInfo.cpp
> +++ b/lib/Target/R600/SIInstrInfo.cpp
> @@ -506,6 +506,10 @@ bool SIInstrInfo::expandPostRAPseudo(MachineBasicBlock::iterator MI) const {
> MI->eraseFromParent();
> break;
> }
> + case AMDGPU::SGPR_USE:
> + // This is just a placeholder for register allocation.
> + MI->eraseFromParent();
> + break;
> }
> return true;
> }
> diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
> index 8bd5133..212c230 100644
> --- a/lib/Target/R600/SIInstructions.td
> +++ b/lib/Target/R600/SIInstructions.td
> @@ -1580,6 +1580,10 @@ def V_XOR_I1 : InstSI <
> [(set i1:$dst, (xor i1:$src0, i1:$src1))]
> >;
>
> +let hasSideEffects = 1 in {
> +def SGPR_USE : InstSI <(outs),(ins), "", []>;
> +}
> +
> // SI pseudo instructions. These are used by the CFG structurizer pass
> // and should be lowered to ISA instructions prior to codegen.
>
> -- 1.8.5.5
>
> 0004-R600-SI-Move-PHIs-that-define-SGPRs-to-the-VALU-in-m.patch
>
>
> From 955c1b069583de2affec155b9bfa6b4637aad0b3 Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Mon, 8 Sep 2014 15:17:48 -0400
> Subject: [PATCH 4/5] R600/SI: Move PHIs that define SGPRs to the VALU in most
> cases
>
> This fixes a bug that is uncovered by a future commit and will
> be tested by the test/CodeGen/R600/sgpr-control-flow.ll test case.
> ---
> lib/Target/R600/SIFixSGPRCopies.cpp | 50 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 50 insertions(+)
>
> diff --git a/lib/Target/R600/SIFixSGPRCopies.cpp b/lib/Target/R600/SIFixSGPRCopies.cpp
> index c108571..c2f9444 100644
> --- a/lib/Target/R600/SIFixSGPRCopies.cpp
> +++ b/lib/Target/R600/SIFixSGPRCopies.cpp
> @@ -238,14 +238,64 @@ bool SIFixSGPRCopies::runOnMachineFunction(MachineFunction &MF) {
>
> // If a PHI node defines an SGPR and any of its operands are VGPRs,
> // then we need to move it to the VALU.
> + //
> + // Also, if a PHI node defines an SGPR and has all SGPR operands
> + // we must move it to the VALU, because the SGPR operands will
> + // all end up being assigned the same register, which means
> + // there is a potential for a conflict if different threads.
"for a conflict if different threads." doesn't sound right
> + //
> + // For Example:
> + //
> + // sgpr0 = def;
> + // ...
> + // sgpr1 = def;
> + // ...
> + // sgpr2 = PHI sgpr0, sgpr1
> + // use sgpr2;
> + //
> + // Will Become:
> + //
> + // sgpr2 = def;
> + // ...
> + // sgpr2 = def;
> + // ...
> + // use sgpr2
> + //
> + // FIXME: This is OK if the branching decision is made based on an
> + // SGPR value.
> + bool SGPRBranch = false;
> +
> + // There is one exception to this rule is when one of the operands
The one exception to this rule is when one of the operands
> + // is defined by a SI_BREAK, SI_IF_BREAK, or SI_ELSE_BREAK
> + // instruction. In this case, there we know the program will
> + // never enter the second block (the loop) without entering
> + // the first block (where the condition is computed), so there
> + // is no chance for values to be over-written.
> +
> + bool HasBreakDef = false;
> for (unsigned i = 1; i < MI.getNumOperands(); i+=2) {
> unsigned Reg = MI.getOperand(i).getReg();
> if (TRI->hasVGPRs(MRI.getRegClass(Reg))) {
> TII->moveToVALU(MI);
> break;
> }
> + MachineInstr *DefInstr = MRI.getUniqueVRegDef(Reg);
> + assert(DefInstr);
> + switch(DefInstr->getOpcode()) {
Missing space after switch
> + case AMDGPU::SI_BREAK:
> + case AMDGPU::SI_IF_BREAK:
> + case AMDGPU::SI_ELSE_BREAK:
> + // If we see a PHI instruction that defines an SGPR, then that PHI
> + // instruction has already been considered and should have
> + // a *_BREAK as an operand.
> + case AMDGPU::PHI:
> + HasBreakDef = true;
> + break;
> + }
> }
>
> + if (!SGPRBranch && !HasBreakDef)
> + TII->moveToVALU(MI);
> break;
> }
> case AMDGPU::REG_SEQUENCE: {
> -- 1.8.5.5
>
> 0005-R600-SI-Enable-selecting-SALU-inside-branches.patch
>
>
> From 05ee026c38c23dceec0ceac7c9dd351a7c6be308 Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Thu, 31 Jul 2014 18:59:36 -0400
> Subject: [PATCH 5/5] R600/SI: Enable selecting SALU inside branches
>
> We can do this now that the FixSGPRLiveRanges pass is working.
> ---
> lib/Target/R600/AMDGPUISelDAGToDAG.cpp | 18 ---------------
> lib/Target/R600/SIInstructions.td | 9 --------
> test/CodeGen/R600/add.ll | 4 ++--
> test/CodeGen/R600/ctpop.ll | 22 +++++++++---------
> test/CodeGen/R600/ctpop64.ll | 27 +++++++++++-----------
> test/CodeGen/R600/mul.ll | 4 ++--
> test/CodeGen/R600/sgpr-control-flow.ll | 41 ++++++++++++++++++++++++++++++++--
> test/CodeGen/R600/xor.ll | 3 +--
> 8 files changed, 68 insertions(+), 60 deletions(-)
>
> diff --git a/lib/Target/R600/AMDGPUISelDAGToDAG.cpp b/lib/Target/R600/AMDGPUISelDAGToDAG.cpp
> index 3c68e45..a75a2ac 100644
> --- a/lib/Target/R600/AMDGPUISelDAGToDAG.cpp
> +++ b/lib/Target/R600/AMDGPUISelDAGToDAG.cpp
> @@ -77,11 +77,6 @@ private:
> bool isLocalLoad(const LoadSDNode *N) const;
> bool isRegionLoad(const LoadSDNode *N) const;
>
> - /// \returns True if the current basic block being selected is at control
> - /// flow depth 0. Meaning that the current block dominates the
> - // exit block.
> - bool isCFDepth0() const;
> -
> const TargetRegisterClass *getOperandRegClass(SDNode *N, unsigned OpNo) const;
> bool SelectGlobalValueConstantOffset(SDValue Addr, SDValue& IntPtr);
> bool SelectGlobalValueVariableOffset(SDValue Addr, SDValue &BaseReg,
> @@ -591,14 +586,6 @@ bool AMDGPUDAGToDAGISel::isPrivateLoad(const LoadSDNode *N) const {
> return false;
> }
>
> -bool AMDGPUDAGToDAGISel::isCFDepth0() const {
> - // FIXME: Figure out a way to use DominatorTree analysis here.
> - const BasicBlock *CurBlock = FuncInfo->MBB->getBasicBlock();
> - const Function *Fn = FuncInfo->Fn;
> - return &Fn->front() == CurBlock || &Fn->back() == CurBlock;
> -}
> -
> -
> const char *AMDGPUDAGToDAGISel::getPassName() const {
> return "AMDGPU DAG->DAG Pattern Instruction Selection";
> }
> @@ -704,11 +691,6 @@ SDNode *AMDGPUDAGToDAGISel::SelectADD_SUB_I64(SDNode *N) {
> unsigned Opc = IsAdd ? AMDGPU::S_ADD_U32 : AMDGPU::S_SUB_U32;
> unsigned CarryOpc = IsAdd ? AMDGPU::S_ADDC_U32 : AMDGPU::S_SUBB_U32;
>
> - if (!isCFDepth0()) {
> - Opc = IsAdd ? AMDGPU::V_ADD_I32_e32 : AMDGPU::V_SUB_I32_e32;
> - CarryOpc = IsAdd ? AMDGPU::V_ADDC_U32_e32 : AMDGPU::V_SUBB_U32_e32;
> - }
> -
> SDNode *AddLo = CurDAG->getMachineNode( Opc, DL, VTList, AddLoArgs);
> SDValue Carry(AddLo, 1);
> SDNode *AddHi
> diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
> index 212c230..9418622 100644
> --- a/lib/Target/R600/SIInstructions.td
> +++ b/lib/Target/R600/SIInstructions.td
> @@ -32,12 +32,9 @@ def isSI : Predicate<"Subtarget.getGeneration() "
> def isCI : Predicate<"Subtarget.getGeneration() "
> ">= AMDGPUSubtarget::SEA_ISLANDS">;
>
> -def isCFDepth0 : Predicate<"isCFDepth0()">;
> -
> def WAIT_FLAG : InstFlag<"printWaitFlag">;
>
> let SubtargetPredicate = isSI in {
> -let OtherPredicates = [isCFDepth0] in {
>
> //===----------------------------------------------------------------------===//
> // SMRD Instructions
> @@ -364,8 +361,6 @@ def S_GETREG_REGRD_B32 : SOPK_32 <0x00000014, "S_GETREG_REGRD_B32", []>;
> //def S_SETREG_IMM32_B32 : SOPK_32 <0x00000015, "S_SETREG_IMM32_B32", []>;
> //def EXP : EXP_ <0x00000000, "EXP", []>;
>
> -} // End let OtherPredicates = [isCFDepth0]
> -
> //===----------------------------------------------------------------------===//
> // SOPP Instructions
> //===----------------------------------------------------------------------===//
> @@ -1851,8 +1846,6 @@ def : Pat <
> // SOP1 Patterns
> //===----------------------------------------------------------------------===//
>
> -let Predicates = [isSI, isCFDepth0] in {
> -
> def : Pat <
> (i64 (ctpop i64:$src)),
> (INSERT_SUBREG (INSERT_SUBREG (i64 (IMPLICIT_DEF)),
> @@ -1871,8 +1864,6 @@ def : Pat <
> (S_ADD_U32 $src0, $src1)
> >;
>
> -} // Predicates = [isSI, isCFDepth0]
> -
> let Predicates = [isSI] in {
>
> //===----------------------------------------------------------------------===//
> diff --git a/test/CodeGen/R600/add.ll b/test/CodeGen/R600/add.ll
> index f62c9d6..fbe6437 100644
> --- a/test/CodeGen/R600/add.ll
> +++ b/test/CodeGen/R600/add.ll
> @@ -145,8 +145,8 @@ entry:
> ; branches.
> ; FIXME: We are being conservative here. We could allow this in some cases.
This FIXME can be removed now
> ; FUNC-LABEL: @add64_in_branch
> -; SI-CHECK-NOT: S_ADD_I32
> -; SI-CHECK-NOT: S_ADDC_U32
> +; SI-CHECK: S_ADD_U32
> +; SI-CHECK: S_ADDC_U32
> define void @add64_in_branch(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) {
> entry:
> %0 = icmp eq i64 %a, 0
> diff --git a/test/CodeGen/R600/ctpop.ll b/test/CodeGen/R600/ctpop.ll
> index c7c406a..fe46900 100644
> --- a/test/CodeGen/R600/ctpop.ll
> +++ b/test/CodeGen/R600/ctpop.ll
> @@ -257,28 +257,28 @@ define void @v_ctpop_i32_add_vvar_inv(i32 addrspace(1)* noalias %out, i32 addrsp
> ; but there are some cases when the should be allowed.
>
> ; FUNC-LABEL: @ctpop_i32_in_br
> -; SI: BUFFER_LOAD_DWORD [[VAL:v[0-9]+]],
> -; SI: V_BCNT_U32_B32_e64 [[RESULT:v[0-9]+]], [[VAL]], 0
> +; SI: S_LOAD_DWORD [[VAL:s[0-9]+]], s[{{[0-9]+:[0-9]+}}], 0xd
> +; SI: S_BCNT1_I32_B32 [[SRESULT:s[0-9]+]], [[VAL]]
> +; SI: V_MOV_B32_e32 [[RESULT]], [[SRESULT]]
> ; SI: BUFFER_STORE_DWORD [[RESULT]],
> ; SI: S_ENDPGM
> ; EG: BCNT_INT
> -define void @ctpop_i32_in_br(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %cond) {
> +define void @ctpop_i32_in_br(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %ctpop_arg, i32 %cond) {
> entry:
> - %0 = icmp eq i32 %cond, 0
> - br i1 %0, label %if, label %else
> + %tmp0 = icmp eq i32 %cond, 0
> + br i1 %tmp0, label %if, label %else
>
> if:
> - %1 = load i32 addrspace(1)* %in
> - %2 = call i32 @llvm.ctpop.i32(i32 %1)
> + %tmp2 = call i32 @llvm.ctpop.i32(i32 %ctpop_arg)
> br label %endif
>
> else:
> - %3 = getelementptr i32 addrspace(1)* %in, i32 1
> - %4 = load i32 addrspace(1)* %3
> + %tmp3 = getelementptr i32 addrspace(1)* %in, i32 1
> + %tmp4 = load i32 addrspace(1)* %tmp3
> br label %endif
>
> endif:
> - %5 = phi i32 [%2, %if], [%4, %else]
> - store i32 %5, i32 addrspace(1)* %out
> + %tmp5 = phi i32 [%tmp2, %if], [%tmp4, %else]
> + store i32 %tmp5, i32 addrspace(1)* %out
> ret void
> }
> diff --git a/test/CodeGen/R600/ctpop64.ll b/test/CodeGen/R600/ctpop64.ll
> index 37a174f..76091c5 100644
> --- a/test/CodeGen/R600/ctpop64.ll
> +++ b/test/CodeGen/R600/ctpop64.ll
> @@ -94,29 +94,28 @@ define void @v_ctpop_v4i64(<4 x i32> addrspace(1)* noalias %out, <4 x i64> addrs
> ; but there are some cases when the should be allowed.
>
> ; FUNC-LABEL: @ctpop_i64_in_br
> -; SI: V_BCNT_U32_B32_e64 [[BCNT_LO:v[0-9]+]], v{{[0-9]+}}, 0
> -; SI: V_BCNT_U32_B32_e32 v[[BCNT:[0-9]+]], v{{[0-9]+}}, [[BCNT_LO]]
> -; SI: V_MOV_B32_e32 v[[ZERO:[0-9]+]], 0
> -; SI: BUFFER_STORE_DWORDX2 v[
> -; SI: [[BCNT]]:[[ZERO]]]
> +; SI: S_LOAD_DWORDX2 s{{\[}}[[LOVAL:[0-9]+]]:[[HIVAL:[0-9]+]]{{\]}}, s[{{[0-9]+:[0-9]+}}], 0xd
> +; SI: S_BCNT1_I32_B64 [[RESULT:s[0-9]+]], {{s\[}}[[LOVAL]]:[[HIVAL]]{{\]}}
> +; SI: V_MOV_B32_e32 v[[VLO:[0-9]+]], [[RESULT]]
> +; SI: V_MOV_B32_e32 v[[VHI:[0-9]+]], s[[HIVAL]]
> +; SI: BUFFER_STORE_DWORDX2 {{v\[}}[[VLO]]:[[VHI]]{{\]}}
> ; SI: S_ENDPGM
> -define void @ctpop_i64_in_br(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i32 %cond) {
> +define void @ctpop_i64_in_br(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %ctpop_arg, i32 %cond) {
> entry:
> - %0 = icmp eq i32 %cond, 0
> - br i1 %0, label %if, label %else
> + %tmp0 = icmp eq i32 %cond, 0
> + br i1 %tmp0, label %if, label %else
>
> if:
> - %1 = load i64 addrspace(1)* %in
> - %2 = call i64 @llvm.ctpop.i64(i64 %1)
> + %tmp2 = call i64 @llvm.ctpop.i64(i64 %ctpop_arg)
> br label %endif
>
> else:
> - %3 = getelementptr i64 addrspace(1)* %in, i32 1
> - %4 = load i64 addrspace(1)* %3
> + %tmp3 = getelementptr i64 addrspace(1)* %in, i32 1
> + %tmp4 = load i64 addrspace(1)* %tmp3
> br label %endif
>
> endif:
> - %5 = phi i64 [%2, %if], [%4, %else]
> - store i64 %5, i64 addrspace(1)* %out
> + %tmp5 = phi i64 [%tmp2, %if], [%tmp4, %else]
> + store i64 %tmp5, i64 addrspace(1)* %out
> ret void
> }
> diff --git a/test/CodeGen/R600/mul.ll b/test/CodeGen/R600/mul.ll
> index fe9c1b9..11de3e3 100644
> --- a/test/CodeGen/R600/mul.ll
> +++ b/test/CodeGen/R600/mul.ll
> @@ -155,7 +155,7 @@ define void @v_mul_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 addr
> }
>
> ; FUNC-LABEL: @mul32_in_branch
> -; SI: V_MUL_LO_I32
> +; SI: S_MUL_I32
> define void @mul32_in_branch(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %a, i32 %b, i32 %c) {
> entry:
> %0 = icmp eq i32 %a, 0
> @@ -176,7 +176,7 @@ endif:
> }
>
> ; FUNC-LABEL: @mul64_in_branch
> -; SI-DAG: V_MUL_LO_I32
> +; SI-DAG: S_MUL_I32
> ; SI-DAG: V_MUL_HI_U32
> ; SI: S_ENDPGM
> define void @mul64_in_branch(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) {
> diff --git a/test/CodeGen/R600/sgpr-control-flow.ll b/test/CodeGen/R600/sgpr-control-flow.ll
> index 06ad24d..326b37a 100644
> --- a/test/CodeGen/R600/sgpr-control-flow.ll
> +++ b/test/CodeGen/R600/sgpr-control-flow.ll
> @@ -4,9 +4,14 @@
> ; Most SALU instructions ignore control flow, so we need to make sure
> ; they don't overwrite values from other blocks.
>
> -; SI-NOT: S_ADD
> +; If the branch decision is made based on a value in an SGPR then all
> +; threads will execute the same code paths, so we don't need to worry
> +; about instructions in different blocks overwriting each other.
> +; SI-LABEL: @sgpr_if_else_salu_br
> +; SI: S_ADD
> +; SI: S_ADD
>
> -define void @sgpr_if_else(i32 addrspace(1)* %out, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) {
> +define void @sgpr_if_else_salu_br(i32 addrspace(1)* %out, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) {
> entry:
> %0 = icmp eq i32 %a, 0
> br i1 %0, label %if, label %else
> @@ -25,3 +30,35 @@ endif:
> store i32 %4, i32 addrspace(1)* %out
> ret void
> }
> +
> +; The two S_ADD instructions should write to different registers, since
> +; different threads will take different control flow paths.
> +
> +; SI-LABEL: @sgpr_if_else_valu_br
> +; SI: S_ADD_I32 [[SGPR:s[0-9]+]]
> +; SI-NOT: S_ADD_I32 [[SGPR]]
> +
> +define void @sgpr_if_else_valu_br(i32 addrspace(1)* %out, float %a, i32 %b, i32 %c, i32 %d, i32 %e) {
> +entry:
> + %tid = call i32 @llvm.r600.read.tidig.x() #0
> + %tid_f = uitofp i32 %tid to float
> + %tmp1 = fcmp ueq float %tid_f, 0.0
> + br i1 %tmp1, label %if, label %else
> +
> +if:
> + %tmp2 = add i32 %b, %c
> + br label %endif
> +
> +else:
> + %tmp3 = add i32 %d, %e
> + br label %endif
> +
> +endif:
> + %tmp4 = phi i32 [%tmp2, %if], [%tmp3, %else]
> + store i32 %tmp4, i32 addrspace(1)* %out
> + ret void
> +}
> +
> +declare i32 @llvm.r600.read.tidig.x() #0
> +
> +attributes #0 = { readnone }
> diff --git a/test/CodeGen/R600/xor.ll b/test/CodeGen/R600/xor.ll
> index e14bd71..8c2c80e 100644
> --- a/test/CodeGen/R600/xor.ll
> +++ b/test/CodeGen/R600/xor.ll
> @@ -136,8 +136,7 @@ define void @vector_not_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %in0, i64
> ; use an SALU instruction for this.
>
> ; SI-CHECK-LABEL: @xor_cf
> -; SI-CHECK: V_XOR
> -; SI-CHECK: V_XOR
> +; SI-CHECK: S_XOR_B64
> define void @xor_cf(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b) {
> entry:
> %0 = icmp eq i64 %a, 0
> -- 1.8.5.5
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140908/5b83bc2e/attachment.html>
More information about the llvm-commits
mailing list