R600: Various Fixes
Vincent Lejeune
vljn at ovi.com
Tue Jul 9 06:34:42 PDT 2013
Patch 2 can be a good candidate for the stable branch,
Patch 5 too (updated version attached) but I'm not sure it will apply cleanly.
----- Mail original -----
> De : Tom Stellard <tom at stellard.net>
> À : Vincent Lejeune <vljn at ovi.com>
> Cc : "llvm-commits at cs.uiuc.edu" <llvm-commits at cs.uiuc.edu>
> Envoyé le : Lundi 8 juillet 2013 18h11
> Objet : Re: R600: Various Fixes
>
> On Sun, Jul 07, 2013 at 07:34:27AM -0700, Vincent Lejeune wrote:
>> Hi,
>>
>> This patches serie fixes some bugs uncovered by running some webgl samples.
>>
>> Patches 1, 3, 4 fix 2 bugs in the VectorMerger pass.
>> Patch 2 makes fcos/fsin works for all values, and not only the one between
> -Pi and Pi.
>> Last patch rework the predication we're doing by taking clause marker
> into account.
>> Currently we loose some predicate opportunities (all machine block with
> Kcache constant
>> becomes unpredicable atm, this can be solved later but need the
> isPredicableFunction to accept
>> an extra argument) but the pass output should be correct in most
> situations.
>>
>>
>
> Patches 1-4 are:
>
> Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
>
> Are any of these candidates for stable?
>
> -Tom
>
>> From cc427492b38d672f33ffd13ac3762d8064476663 Mon Sep 17 00:00:00 2001
>> From: Vincent Lejeune <vljn at ovi.com>
>> Date: Wed, 3 Jul 2013 21:25:38 +0200
>> Subject: [PATCH 5/5] R600: Do not predicated basic block with multiple alu
>> clause
>>
>> Test is not included as it is several 1000 lines long.
>> To test this functionnality, a test case must generate at least 2 ALU
> clauses,
>> where an ALU clause is ~110 instructions long.
>> ---
>> lib/Target/R600/AMDGPUTargetMachine.cpp | 3 +-
>> lib/Target/R600/R600ControlFlowFinalizer.cpp | 2 ++
>> lib/Target/R600/R600EmitClauseMarkers.cpp | 10 ++++---
>> lib/Target/R600/R600InstrInfo.cpp | 45
> ++++++++++++++++++++++++++++
>> lib/Target/R600/R600Instructions.td | 2 +-
>> lib/Target/R600/R600Packetizer.cpp | 3 +-
>> test/CodeGen/R600/jump-address.ll | 2 +-
>> 7 files changed, 58 insertions(+), 9 deletions(-)
>>
>> diff --git a/lib/Target/R600/AMDGPUTargetMachine.cpp
> b/lib/Target/R600/AMDGPUTargetMachine.cpp
>> index 90f72de..59b1fd6 100644
>> --- a/lib/Target/R600/AMDGPUTargetMachine.cpp
>> +++ b/lib/Target/R600/AMDGPUTargetMachine.cpp
>> @@ -148,7 +148,7 @@ bool AMDGPUPassConfig::addPostRegAlloc() {
>> }
>>
>> bool AMDGPUPassConfig::addPreSched2() {
>> -
>> + addPass(createR600EmitClauseMarkers(*TM));
>
> Does adding this pass here mean it will be run for SI too?
>
>> addPass(&IfConverterID);
>> return false;
>> }
>> @@ -158,7 +158,6 @@ bool AMDGPUPassConfig::addPreEmitPass() {
>> if (ST.getGeneration() <= AMDGPUSubtarget::NORTHERN_ISLANDS) {
>> addPass(createAMDGPUCFGPreparationPass(*TM));
>> addPass(createAMDGPUCFGStructurizerPass(*TM));
>> - addPass(createR600EmitClauseMarkers(*TM));
>> addPass(createR600ExpandSpecialInstrsPass(*TM));
>> addPass(&FinalizeMachineBundlesID);
>> addPass(createR600Packetizer(*TM));
>> diff --git a/lib/Target/R600/R600ControlFlowFinalizer.cpp
> b/lib/Target/R600/R600ControlFlowFinalizer.cpp
>> index 887c808..932a6a7 100644
>> --- a/lib/Target/R600/R600ControlFlowFinalizer.cpp
>> +++ b/lib/Target/R600/R600ControlFlowFinalizer.cpp
>> @@ -256,6 +256,7 @@ private:
>> ClauseContent.push_back(MILit);
>> }
>> }
>> + assert(ClauseContent.size() < 128 && "ALU clause is
> too big");
>> ClauseHead->getOperand(7).setImm(ClauseContent.size() - 1);
>> return ClauseFile(ClauseHead, ClauseContent);
>> }
>> @@ -276,6 +277,7 @@ private:
>> void
>> EmitALUClause(MachineBasicBlock::iterator InsertPos, ClauseFile
> &Clause,
>> unsigned &CfCount) {
>> + Clause.first->getOperand(0).setImm(0);
>> CounterPropagateAddr(Clause.first, CfCount);
>> MachineBasicBlock *BB = Clause.first->getParent();
>> BuildMI(BB, InsertPos->getDebugLoc(),
> TII->get(AMDGPU::ALU_CLAUSE))
>> diff --git a/lib/Target/R600/R600EmitClauseMarkers.cpp
> b/lib/Target/R600/R600EmitClauseMarkers.cpp
>> index 0aea2d7..8056634 100644
>> --- a/lib/Target/R600/R600EmitClauseMarkers.cpp
>> +++ b/lib/Target/R600/R600EmitClauseMarkers.cpp
>> @@ -32,6 +32,7 @@ class R600EmitClauseMarkersPass : public
> MachineFunctionPass {
>> private:
>> static char ID;
>> const R600InstrInfo *TII;
>> + int address;
>
> Coding Style address should be capitalized.
>>
>> unsigned OccupiedDwords(MachineInstr *MI) const {
>> switch (MI->getOpcode()) {
>> @@ -159,7 +160,7 @@ private:
>> }
>>
>> MachineBasicBlock::iterator
>> - MakeALUClause(MachineBasicBlock &MBB, MachineBasicBlock::iterator I)
> const {
>> + MakeALUClause(MachineBasicBlock &MBB, MachineBasicBlock::iterator I)
> {
>> MachineBasicBlock::iterator ClauseHead = I;
>> std::vector<std::pair<unsigned, unsigned> > KCacheBanks;
>> bool PushBeforeModifier = false;
>> @@ -199,20 +200,21 @@ private:
>> unsigned Opcode = PushBeforeModifier ?
>> AMDGPU::CF_ALU_PUSH_BEFORE : AMDGPU::CF_ALU;
>> BuildMI(MBB, ClauseHead, MBB.findDebugLoc(ClauseHead),
> TII->get(Opcode))
>> - .addImm(0) // ADDR
>> + .addImm(address++) // ADDR
>> .addImm(KCacheBanks.empty()?0:KCacheBanks[0].first) // KB0
>> .addImm((KCacheBanks.size() < 2)?0:KCacheBanks[1].first) // KB1
>> .addImm(KCacheBanks.empty()?0:2) // KM0
>> .addImm((KCacheBanks.size() < 2)?0:2) // KM1
>> .addImm(KCacheBanks.empty()?0:KCacheBanks[0].second) // KLINE0
>> .addImm((KCacheBanks.size() < 2)?0:KCacheBanks[1].second) //
> KLINE1
>> - .addImm(AluInstCount); // COUNT
>> + .addImm(AluInstCount) // COUNT
>> + .addImm(1); // Enabled
>> return I;
>> }
>>
>> public:
>> R600EmitClauseMarkersPass(TargetMachine &tm) :
> MachineFunctionPass(ID),
>> - TII(0) { }
>> + TII(0), address(0) { }
>>
>> virtual bool runOnMachineFunction(MachineFunction &MF) {
>> TII = static_cast<const R600InstrInfo
> *>(MF.getTarget().getInstrInfo());
>> diff --git a/lib/Target/R600/R600InstrInfo.cpp
> b/lib/Target/R600/R600InstrInfo.cpp
>> index 974a579..8d622d6 100644
>> --- a/lib/Target/R600/R600InstrInfo.cpp
>> +++ b/lib/Target/R600/R600InstrInfo.cpp
>> @@ -650,6 +650,17 @@ int R600InstrInfo::getBranchInstr(const MachineOperand
> &op) const {
>> };
>> }
>>
>> +static
>> +MachineBasicBlock::iterator FindLastAluClause(MachineBasicBlock &MBB)
> {
>> + for (MachineBasicBlock::reverse_iterator It = MBB.rbegin(), E =
> MBB.rend();
>> + It != E; ++It) {
>> + if (It->getOpcode() == AMDGPU::CF_ALU ||
>> + It->getOpcode() == AMDGPU::CF_ALU_PUSH_BEFORE)
>> + return llvm::prior(It.base());
>> + }
>> + return MBB.end();
>> +}
>> +
>> unsigned
>> R600InstrInfo::InsertBranch(MachineBasicBlock &MBB,
>> MachineBasicBlock *TBB,
>> @@ -671,6 +682,11 @@ R600InstrInfo::InsertBranch(MachineBasicBlock
> &MBB,
>> BuildMI(&MBB, DL, get(AMDGPU::JUMP_COND))
>> .addMBB(TBB)
>> .addReg(AMDGPU::PREDICATE_BIT, RegState::Kill);
>> + MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);
>> + if (CfAlu == MBB.end())
>> + return 1;
>> + assert (CfAlu->getOpcode() == AMDGPU::CF_ALU);
>> + CfAlu->setDesc(get(AMDGPU::CF_ALU_PUSH_BEFORE));
>> return 1;
>> }
>> } else {
>> @@ -682,6 +698,11 @@ R600InstrInfo::InsertBranch(MachineBasicBlock
> &MBB,
>> .addMBB(TBB)
>> .addReg(AMDGPU::PREDICATE_BIT, RegState::Kill);
>> BuildMI(&MBB, DL, get(AMDGPU::JUMP)).addMBB(FBB);
>> + MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);
>> + if (CfAlu == MBB.end())
>> + return 2;
>> + assert (CfAlu->getOpcode() == AMDGPU::CF_ALU);
>> + CfAlu->setDesc(get(AMDGPU::CF_ALU_PUSH_BEFORE));
>> return 2;
>> }
>> }
>> @@ -705,6 +726,11 @@ R600InstrInfo::RemoveBranch(MachineBasicBlock
> &MBB) const {
>> MachineInstr *predSet = findFirstPredicateSetterFrom(MBB, I);
>> clearFlag(predSet, 0, MO_FLAG_PUSH);
>> I->eraseFromParent();
>> + MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);
>> + if (CfAlu == MBB.end())
>> + break;
>> + assert (CfAlu->getOpcode() == AMDGPU::CF_ALU_PUSH_BEFORE);
>> + CfAlu->setDesc(get(AMDGPU::CF_ALU));
>> break;
>> }
>> case AMDGPU::JUMP:
>> @@ -725,6 +751,11 @@ R600InstrInfo::RemoveBranch(MachineBasicBlock
> &MBB) const {
>> MachineInstr *predSet = findFirstPredicateSetterFrom(MBB, I);
>> clearFlag(predSet, 0, MO_FLAG_PUSH);
>> I->eraseFromParent();
>> + MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);
>> + if (CfAlu == MBB.end())
>> + break;
>> + assert (CfAlu->getOpcode() == AMDGPU::CF_ALU_PUSH_BEFORE);
>> + CfAlu->setDesc(get(AMDGPU::CF_ALU));
>> break;
>> }
>> case AMDGPU::JUMP:
>> @@ -759,6 +790,15 @@ R600InstrInfo::isPredicable(MachineInstr *MI) const {
>>
>> if (MI->getOpcode() == AMDGPU::KILLGT) {
>> return false;
>> + } else if (MI->getOpcode() == AMDGPU::CF_ALU) {
>> + // If the clause start in the middle of MBB then the MBB has more
>> + // than a single clause, unable to predicate several clauses.
>> + if (MI->getParent()->begin() != MachineBasicBlock::iterator(MI))
>> + return false;
>> + // TODO: We don't support KC merging atm
>> + if (MI->getOperand(3).getImm() != 0 ||
> MI->getOperand(4).getImm() != 0)
>> + return false;
>> + return true;
>> } else if (isVector(*MI)) {
>> return false;
>> } else {
>> @@ -854,6 +894,11 @@ R600InstrInfo::PredicateInstruction(MachineInstr *MI,
>> const SmallVectorImpl<MachineOperand>
> &Pred) const {
>> int PIdx = MI->findFirstPredOperandIdx();
>>
>> + if (MI->getOpcode() == AMDGPU::CF_ALU) {
>> + MI->getOperand(8).setImm(0);
>> + return true;
>> + }
>> +
>> if (PIdx != -1) {
>> MachineOperand &PMO = MI->getOperand(PIdx);
>> PMO.setReg(Pred[2].getReg());
>> diff --git a/lib/Target/R600/R600Instructions.td
> b/lib/Target/R600/R600Instructions.td
>> index 735dcfc..df5c438 100644
>> --- a/lib/Target/R600/R600Instructions.td
>> +++ b/lib/Target/R600/R600Instructions.td
>> @@ -563,7 +563,7 @@ class ALU_CLAUSE<bits<4> inst, string
> OpName> : AMDGPUInst <(outs),
>> (ins i32imm:$ADDR, i32imm:$KCACHE_BANK0, i32imm:$KCACHE_BANK1,
>> KCACHE:$KCACHE_MODE0, KCACHE:$KCACHE_MODE1,
>> i32imm:$KCACHE_ADDR0, i32imm:$KCACHE_ADDR1,
>> -i32imm:$COUNT),
>> +i32imm:$COUNT, i32imm:$Enabled),
>> !strconcat(OpName, " $COUNT, @$ADDR, "
>> "KC0[$KCACHE_MODE0], KC1[$KCACHE_MODE1]"),
>> [] >, CF_ALU_WORD0, CF_ALU_WORD1 {
>> diff --git a/lib/Target/R600/R600Packetizer.cpp
> b/lib/Target/R600/R600Packetizer.cpp
>> index 5ee51fa..f4219bd 100644
>> --- a/lib/Target/R600/R600Packetizer.cpp
>> +++ b/lib/Target/R600/R600Packetizer.cpp
>> @@ -304,7 +304,8 @@ bool
> R600Packetizer::runOnMachineFunction(MachineFunction &Fn) {
>> MachineBasicBlock::iterator End = MBB->end();
>> MachineBasicBlock::iterator MI = MBB->begin();
>> while (MI != End) {
>> - if (MI->isKill()) {
>> + if (MI->isKill() ||
>> + (MI->getOpcode() == AMDGPU::CF_ALU &&
> !MI->getOperand(8).getImm())) {
>> MachineBasicBlock::iterator DeleteMI = MI;
>> ++MI;
>> MBB->erase(DeleteMI);
>> diff --git a/test/CodeGen/R600/jump-address.ll
> b/test/CodeGen/R600/jump-address.ll
>> index ae9c8bb..9a5f1bc 100644
>> --- a/test/CodeGen/R600/jump-address.ll
>> +++ b/test/CodeGen/R600/jump-address.ll
>> @@ -1,6 +1,6 @@
>> ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
>>
>> -; CHECK: JUMP @3
>> +; CHECK: JUMP @7
>> ; CHECK: EXPORT
>> ; CHECK-NOT: EXPORT
>>
>> --
>> 1.8.3.1
>>
>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-R600-Do-not-predicated-basic-block-with-multiple-alu.patch
Type: text/x-patch
Size: 10242 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130709/913ababc/attachment.bin>
More information about the llvm-commits
mailing list