R600: Generate native ALU instructions
Vincent Lejeune
vljn at ovi.com
Mon Apr 29 15:06:04 PDT 2013
Hi,
here are updated patches.
Vincent
----- Mail original -----
> De : Tom Stellard <tom at stellard.net>
> À : Vincent Lejeune <vljn at ovi.com>
> Cc : "llvm-commits at cs.uiuc.edu" <llvm-commits at cs.uiuc.edu>
> Envoyé le : Lundi 29 avril 2013 19h16
> Objet : Re: R600: Generate native ALU instructions
>
> On Mon, Apr 29, 2013 at 10:09:00AM -0700, Tom Stellard wrote:
>> On Sat, Apr 27, 2013 at 04:12:36PM -0700, Vincent Lejeune wrote:
>> > I worked on the patch serie.
>> > I added a FeatureCacheVertex to determine what cache to use for vtx
> fetch, fixed the mega fetch bit.
>> > On the packetize side, the isTransOnly is now using scheduling info. I
> also implemented PV substitution.
>> >
>> > Vincent
>> >
>>
>> Hi Vincent,
>>
>> Looks good, just a few more comments.
>>
>
> Also, I forgot to mentioned, we should always be using the texture cache
> for compute shaders. See the attached patch.
>
> -Tom
>>
>> > From 2f50294f2ac7b52bb2fcf40a9048a19f7421cf5d Mon Sep 17 00:00:00 2001
>> > From: Vincent Lejeune <vljn at ovi.com>
>> > Date: Sat, 20 Apr 2013 03:10:21 +0200
>> > Subject: [PATCH 9/9] R600: use native for alu
>> >
>> > ---
>> > lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp | 7 +-
>> > lib/Target/R600/R600ControlFlowFinalizer.cpp | 110
> ++++++++++++++++++++-
>> > lib/Target/R600/R600Instructions.td | 17 ++++
>> > lib/Target/R600/R600RegisterInfo.td | 5 +-
>> > test/CodeGen/R600/alu-split.ll | 1 +
>> > .../CodeGen/R600/disconnected-predset-break-bug.ll | 2 +-
>> > test/CodeGen/R600/predicates.ll | 8 +-
>> > 7 files changed, 141 insertions(+), 9 deletions(-)
>> >
>> > diff --git a/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
> b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
>> > index bc5c9d8..7c83d86 100644
>> > --- a/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
>> > +++ b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
>> > @@ -143,6 +143,7 @@ void R600MCCodeEmitter::EncodeInstruction(const
> MCInst &MI, raw_ostream &OS,
>> > EmitFCInstr(MI, OS);
>> > } else if (MI.getOpcode() == AMDGPU::RETURN ||
>> > MI.getOpcode() == AMDGPU::FETCH_CLAUSE ||
>> > + MI.getOpcode() == AMDGPU::ALU_CLAUSE ||
>> > MI.getOpcode() == AMDGPU::BUNDLE ||
>> > MI.getOpcode() == AMDGPU::KILL) {
>> > return;
>> > @@ -255,7 +256,7 @@ void R600MCCodeEmitter::EncodeInstruction(const
> MCInst &MI, raw_ostream &OS,
>> > case AMDGPU::CF_ALU:
>> > case AMDGPU::CF_ALU_PUSH_BEFORE: {
>> > uint64_t Inst = getBinaryCodeForInstr(MI, Fixups);
>> > - EmitByte(INSTR_CFALU, OS);
>> > + EmitByte(INSTR_NATIVE, OS);
>>
>> Can we eliminate all of the instruction type bytes now? Is everything
>> INSTR_NATIVE?
>>
>> > Emit(Inst, OS);
>> > break;
>> > }
>> > @@ -294,7 +295,9 @@ void R600MCCodeEmitter::EncodeInstruction(const
> MCInst &MI, raw_ostream &OS,
>> > break;
>> > }
>> > default:
>> > - EmitALUInstr(MI, Fixups, OS);
>> > + uint64_t Inst = getBinaryCodeForInstr(MI, Fixups);
>> > + EmitByte(INSTR_NATIVE, OS);
>> > + Emit(Inst, OS);
>> > break;
>> > }
>> > }
>> > diff --git a/lib/Target/R600/R600ControlFlowFinalizer.cpp
> b/lib/Target/R600/R600ControlFlowFinalizer.cpp
>> > index 9e23c25..9471cec 100644
>> > --- a/lib/Target/R600/R600ControlFlowFinalizer.cpp
>> > +++ b/lib/Target/R600/R600ControlFlowFinalizer.cpp
>> > @@ -165,6 +165,97 @@ private:
>> > return ClauseFile(MIb, ClauseContent);
>> > }
>> >
>> > + void getLiteral(MachineInstr *MI, std::vector<unsigned>
> &Lits) const {
>> > + unsigned LiteralRegs[] = {
>> > + AMDGPU::ALU_LITERAL_X,
>> > + AMDGPU::ALU_LITERAL_Y,
>> > + AMDGPU::ALU_LITERAL_Z,
>> > + AMDGPU::ALU_LITERAL_W
>> > + };
>> > + for (unsigned i = 0, e = MI->getNumOperands(); i < e; ++i)
> {
>> > + MachineOperand &MO = MI->getOperand(i);
>> > + if (!MO.isReg())
>> > + continue;
>> > + if (MO.getReg() != AMDGPU::ALU_LITERAL_X)
>> > + continue;
>> > + unsigned ImmIdx = TII->getOperandIdx(MI->getOpcode(),
> R600Operands::IMM);
>> > + int64_t Imm = MI->getOperand(ImmIdx).getImm();
>> > + std::vector<unsigned>::iterator It =
>> > + std::find(Lits.begin(), Lits.end(), Imm);
>> > + if (It != Lits.end()) {
>> > + unsigned Index = It - Lits.begin();
>> > + MO.setReg(LiteralRegs[Index]);
>> > + } else {
>> > + assert(Lits.size() < 4 && "Too many literals
> in Instruction Group");
>> > + MO.setReg(LiteralRegs[Lits.size()]);
>> > + Lits.push_back(Imm);
>> > + }
>> > + }
>> > + }
>> > +
>> > + MachineBasicBlock::iterator insertLiterals(
>> > + MachineBasicBlock::iterator InsertPos,
>> > + const std::vector<unsigned> &Literals) const {
>> > + MachineBasicBlock *MBB = InsertPos->getParent();
>> > + for (unsigned i = 0, e = Literals.size(); i < e; i+=2) {
>> > + unsigned LiteralPair0 = Literals[i];
>> > + unsigned LiteralPair1 = (i + 1 < e)?Literals[i + 1]:0;
>> > + InsertPos = BuildMI(MBB, InsertPos->getDebugLoc(),
>> > + TII->get(AMDGPU::LITERALS))
>> > + .addImm(LiteralPair0)
>> > + .addImm(LiteralPair1);
>> > + }
>> > + return InsertPos;
>> > + }
>> > +
>> > + ClauseFile
>> > + MakeALUClause(MachineBasicBlock &MBB,
> MachineBasicBlock::iterator &I)
>> > + const {
>> > + MachineBasicBlock::iterator ClauseHead = I;
>> > + std::vector<MachineInstr *> ClauseContent;
>> > + I++;
>> > + for (MachineBasicBlock::instr_iterator E = MBB.instr_end(); I !=
> E;) {
>> > + if (IsTrivialInst(I)) {
>> > + ++I;
>> > + continue;
>> > + }
>> > + if (!I->isBundle() &&
> !TII->isALUInstr(I->getOpcode()))
>> > + break;
>> > + std::vector<unsigned> Literals;
>> > + if (I->isBundle()) {
>> > + MachineInstr *DeleteMI = I;
>> > + MachineBasicBlock::instr_iterator BI = I.getInstrIterator();
>> > + while (++BI != E && BI->isBundledWithPred()) {
>> > + BI->unbundleFromPred();
>> > + for (unsigned i = 0, e = BI->getNumOperands(); i != e;
> ++i) {
>> > + MachineOperand &MO = BI->getOperand(i);
>> > + if (MO.isReg() && MO.isInternalRead())
>> > + MO.setIsInternalRead(false);
>> > + }
>> > + getLiteral(BI, Literals);
>> > + ClauseContent.push_back(BI);
>> > + }
>> > + I = BI;
>> > + DeleteMI->eraseFromParent();
>> > + } else {
>> > + getLiteral(I, Literals);
>> > + ClauseContent.push_back(I);
>> > + I++;
>> > + }
>> > + for (unsigned i = 0, e = Literals.size(); i < e; i+=2) {
>> > + unsigned literal0 = Literals[i];
>> > + unsigned literal2 = (i + 1 < e)?Literals[i + 1]:0;
>> > + MachineInstr *MILit = BuildMI(MBB, I, I->getDebugLoc(),
>> > + TII->get(AMDGPU::LITERALS))
>> > + .addImm(literal0)
>> > + .addImm(literal2);
>> > + ClauseContent.push_back(MILit);
>> > + }
>> > + }
>> > + ClauseHead->getOperand(7).setImm(ClauseContent.size() - 1);
>> > + return ClauseFile(ClauseHead, ClauseContent);
>> > + }
>> > +
>> > void
>> > EmitFetchClause(MachineBasicBlock::iterator InsertPos, ClauseFile
> &Clause,
>> > unsigned &CfCount) {
>> > @@ -178,6 +269,19 @@ private:
>> > CfCount += 2 * Clause.second.size();
>> > }
>> >
>> > + void
>> > + EmitALUClause(MachineBasicBlock::iterator InsertPos, ClauseFile
> &Clause,
>> > + unsigned &CfCount) {
>> > + CounterPropagateAddr(Clause.first, CfCount);
>> > + MachineBasicBlock *BB = Clause.first->getParent();
>> > + BuildMI(BB, InsertPos->getDebugLoc(),
> TII->get(AMDGPU::ALU_CLAUSE))
>> > + .addImm(CfCount);
>> > + for (unsigned i = 0, e = Clause.second.size(); i < e; ++i) {
>> > + BB->splice(InsertPos, BB, Clause.second[i]);
>> > + }
>> > + CfCount += Clause.second.size();
>> > + }
>> > +
>> > void CounterPropagateAddr(MachineInstr *MI, unsigned Addr) const {
>> > MI->getOperand(0).setImm(Addr +
> MI->getOperand(0).getImm());
>> > }
>> > @@ -234,7 +338,7 @@ public:
>> > getHWInstrDesc(CF_CALL_FS));
>> > CfCount++;
>> > }
>> > - std::vector<ClauseFile> FetchClauses;
>> > + std::vector<ClauseFile> FetchClauses, AluClauses;
>> > for (MachineBasicBlock::iterator I = MBB.begin(), E =
> MBB.end();
>> > I != E;) {
>> > if (TII->usesTextureCache(I) ||
> TII->usesVertexCache(I)) {
>> > @@ -252,6 +356,8 @@ public:
>> > MaxStack = std::max(MaxStack, CurrentStack);
>> > hasPush = true;
>> > case AMDGPU::CF_ALU:
>> > + I = MI;
>> > + AluClauses.push_back(MakeALUClause(MBB, I));
>> > case AMDGPU::EG_ExportBuf:
>> > case AMDGPU::EG_ExportSwz:
>> > case AMDGPU::R600_ExportBuf:
>> > @@ -362,6 +468,8 @@ public:
>> > }
>> > for (unsigned i = 0, e = FetchClauses.size(); i < e;
> i++)
>> > EmitFetchClause(I, FetchClauses[i], CfCount);
>> > + for (unsigned i = 0, e = AluClauses.size(); i < e; i++)
>> > + EmitALUClause(I, AluClauses[i], CfCount);
>> > }
>> > default:
>> > break;
>> > diff --git a/lib/Target/R600/R600Instructions.td
> b/lib/Target/R600/R600Instructions.td
>> > index bb8e145..ea8ee05 100644
>> > --- a/lib/Target/R600/R600Instructions.td
>> > +++ b/lib/Target/R600/R600Instructions.td
>> > @@ -941,6 +941,23 @@ def FETCH_CLAUSE : AMDGPUInst <(outs),
>> > let Inst = num;
>> > }
>> >
>> > +def ALU_CLAUSE : AMDGPUInst <(outs),
>> > +(ins i32imm:$addr), "ALU clause starting at $addr:", []
>> {
>> > + field bits<8> Inst;
>> > + bits<8> num;
>> > + let Inst = num;
>> > +}
>> > +
>> > +def LITERALS : AMDGPUInst <(outs),
>> > +(ins LITERAL:$literal1, LITERAL:$literal2), "$literal1,
> $literal2", [] > {
>> > + field bits<64> Inst;
>> > + bits<32> literal1;
>> > + bits<32> literal2;
>> > +
>> > + let Inst{31-0} = literal1;
>> > + let Inst{63-32} = literal2;
>> > +}
>> > +
>> > def PAD : AMDGPUInst <(outs), (ins), "PAD", [] > {
>> > field bits<64> Inst;
>> > }
>> > diff --git a/lib/Target/R600/R600RegisterInfo.td
> b/lib/Target/R600/R600RegisterInfo.td
>> > index 6944319..5a2e65c 100644
>> > --- a/lib/Target/R600/R600RegisterInfo.td
>> > +++ b/lib/Target/R600/R600RegisterInfo.td
>> > @@ -88,7 +88,10 @@ def NEG_ONE : R600Reg<"-1.0", 249>;
>> > def ONE_INT : R600Reg<"1", 250>;
>> > def HALF : R600Reg<"0.5", 252>;
>> > def NEG_HALF : R600Reg<"-0.5", 252>;
>> > -def ALU_LITERAL_X : R600Reg<"literal.x", 253>;
>> > +def ALU_LITERAL_X : R600RegWithChan<"literal.x", 253,
> "X">;
>> > +def ALU_LITERAL_Y : R600RegWithChan<"literal.x", 253,
> "Y">;
>> > +def ALU_LITERAL_Z : R600RegWithChan<"literal.x", 253,
> "Z">;
>> > +def ALU_LITERAL_W : R600RegWithChan<"literal.x", 253,
> "W">;
>> > def PV_X : R600RegWithChan<"PV.x", 254,
> "X">;
>> > def PV_Y : R600RegWithChan<"PV.y", 254,
> "Y">;
>> > def PV_Z : R600RegWithChan<"PV.z", 254,
> "Z">;
>> > diff --git a/test/CodeGen/R600/alu-split.ll
> b/test/CodeGen/R600/alu-split.ll
>> > index afefcd9..48496f6 100644
>> > --- a/test/CodeGen/R600/alu-split.ll
>> > +++ b/test/CodeGen/R600/alu-split.ll
>> > @@ -4,6 +4,7 @@
>> > ;CHECK: ALU
>> > ;CHECK: ALU
>> > ;CHECK-NOT: ALU
>> > +;CHECK: CF_END
>> >
>> > define void @main() #0 {
>> > main_body:
>> > diff --git a/test/CodeGen/R600/disconnected-predset-break-bug.ll
> b/test/CodeGen/R600/disconnected-predset-break-bug.ll
>> > index 09baee7..012c17b 100644
>> > --- a/test/CodeGen/R600/disconnected-predset-break-bug.ll
>> > +++ b/test/CodeGen/R600/disconnected-predset-break-bug.ll
>> > @@ -6,7 +6,7 @@
>> >
>> > ; CHECK: @loop_ge
>> > ; CHECK: LOOP_START_DX10
>> > -; CHECK: PRED_SET
>> > +; CHECK: ALU_PUSH_BEFORE
>> > ; CHECK-NEXT: JUMP
>> > ; CHECK-NEXT: LOOP_BREAK
>> > define void @loop_ge(i32 addrspace(1)* nocapture %out, i32
> %iterations) nounwind {
>> > diff --git a/test/CodeGen/R600/predicates.ll
> b/test/CodeGen/R600/predicates.ll
>> > index eb8b052..fb093ed 100644
>> > --- a/test/CodeGen/R600/predicates.ll
>> > +++ b/test/CodeGen/R600/predicates.ll
>> > @@ -46,11 +46,11 @@ ENDIF:
>> >
>> > ; CHECK: @nested_if
>> > ; CHECK: ALU_PUSH_BEFORE
>> > -; CHECK: PRED_SET{{[EGN][ET]*}}_INT Exec
>> > ; CHECK: JUMP
>> > +; CHECK: POP
>> > +; CHECK: PRED_SET{{[EGN][ET]*}}_INT Exec
>> > ; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred,
>> > ; CHECK: LSHL T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1,
> 0(0.000000e+00) Pred_sel
>> > -; CHECK: POP
>> > define void @nested_if(i32 addrspace(1)* %out, i32 %in) {
>> > entry:
>> > %0 = icmp sgt i32 %in, 0
>> > @@ -73,12 +73,12 @@ ENDIF:
>> >
>> > ; CHECK: @nested_if_else
>> > ; CHECK: ALU_PUSH_BEFORE
>> > -; CHECK: PRED_SET{{[EGN][ET]*}}_INT Exec
>> > ; CHECK: JUMP
>> > +; CHECK: POP
>> > +; CHECK: PRED_SET{{[EGN][ET]*}}_INT Exec
>> > ; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred,
>> > ; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1,
> 0(0.000000e+00) Pred_sel
>> > ; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1,
> 0(0.000000e+00) Pred_sel
>> > -; CHECK: POP
>> > define void @nested_if_else(i32 addrspace(1)* %out, i32 %in) {
>> > entry:
>> > %0 = icmp sgt i32 %in, 0
>> > --
>> > 1.8.1.4
>> >
>>
>> > From 70b24a0417c9ac3e1d87558bc84b8b1c7f787b75 Mon Sep 17 00:00:00 2001
>> > From: Vincent Lejeune <vljn at ovi.com>
>> > Date: Mon, 22 Apr 2013 18:14:07 +0200
>> > Subject: [PATCH 7/9] R600: Rework Scheduling to handle difference
> between
>> > VLIW4 and VLIW5 chips
>> >
>> > ---
>> > lib/Target/R600/Processors.td | 28 ++++++++---------
>> > lib/Target/R600/R600InstrInfo.cpp | 15 +++++++++
>> > lib/Target/R600/R600InstrInfo.h | 3 ++
>> > lib/Target/R600/R600Instructions.td | 62
> ++++++++++++++++++++++++++++---------
>> > lib/Target/R600/R600Schedule.td | 13 +++++++-
>> > 5 files changed, 91 insertions(+), 30 deletions(-)
>> >
>> > diff --git a/lib/Target/R600/Processors.td
> b/lib/Target/R600/Processors.td
>> > index 046125b..a3dae34 100644
>> > --- a/lib/Target/R600/Processors.td
>> > +++ b/lib/Target/R600/Processors.td
>> > @@ -13,33 +13,33 @@
>> >
>> > class Proc<string Name, ProcessorItineraries itin,
> list<SubtargetFeature> Features>
>> > : Processor<Name, itin, Features>;
>> > -def : Proc<"", R600_EG_Itin,
>> > +def : Proc<"", R600_VLIW5_Itin,
>> > [FeatureR600ALUInst, FeatureVertexCache]>;
>> > -def : Proc<"r600", R600_EG_Itin,
>> > +def : Proc<"r600", R600_VLIW5_Itin,
>> > [FeatureR600ALUInst , FeatureVertexCache]>;
>> > -def : Proc<"rv670", R600_EG_Itin,
>> > +def : Proc<"rv670", R600_VLIW5_Itin,
>> > [FeatureR600ALUInst, FeatureFP64, FeatureVertexCache]>;
>> > -def : Proc<"rv710", R600_EG_Itin,
>> > +def : Proc<"rv710", R600_VLIW5_Itin,
>> > [FeatureVertexCache]>;
>> > -def : Proc<"rv730", R600_EG_Itin,
>> > +def : Proc<"rv730", R600_VLIW5_Itin,
>> > [FeatureVertexCache]>;
>> > -def : Proc<"rv770", R600_EG_Itin,
>> > +def : Proc<"rv770", R600_VLIW5_Itin,
>> > [FeatureFP64, FeatureVertexCache]>;
>> > -def : Proc<"cedar", R600_EG_Itin,
>> > +def : Proc<"cedar", R600_VLIW5_Itin,
>> > [FeatureByteAddress, FeatureImages, FeatureVertexCache]>;
>> > -def : Proc<"redwood", R600_EG_Itin,
>> > +def : Proc<"redwood", R600_VLIW5_Itin,
>> > [FeatureByteAddress, FeatureImages, FeatureVertexCache]>;
>> > -def : Proc<"juniper", R600_EG_Itin,
>> > +def : Proc<"juniper", R600_VLIW5_Itin,
>> > [FeatureByteAddress, FeatureImages, FeatureVertexCache]>;
>> > -def : Proc<"cypress", R600_EG_Itin,
>> > +def : Proc<"cypress", R600_VLIW5_Itin,
>> > [FeatureByteAddress, FeatureImages, FeatureFP64,
> FeatureVertexCache]>;
>> > -def : Proc<"barts", R600_EG_Itin,
>> > +def : Proc<"barts", R600_VLIW5_Itin,
>> > [FeatureByteAddress, FeatureImages, FeatureVertexCache]>;
>> > -def : Proc<"turks", R600_EG_Itin,
>> > +def : Proc<"turks", R600_VLIW5_Itin,
>> > [FeatureByteAddress, FeatureImages, FeatureVertexCache]>;
>> > -def : Proc<"caicos", R600_EG_Itin,
>> > +def : Proc<"caicos", R600_VLIW5_Itin,
>> > [FeatureByteAddress, FeatureImages]>;
>> > -def : Proc<"cayman", R600_EG_Itin,
>> > +def : Proc<"cayman", R600_VLIW4_Itin,
>> > [FeatureByteAddress, FeatureImages, FeatureFP64]>;
>> > def : Proc<"SI", SI_Itin, [Feature64BitPtr,
> FeatureFP64]>;
>> > def : Proc<"tahiti", SI_Itin, [Feature64BitPtr,
> FeatureFP64]>;
>> > diff --git a/lib/Target/R600/R600InstrInfo.cpp
> b/lib/Target/R600/R600InstrInfo.cpp
>> > index 3c4f181..8442101 100644
>> > --- a/lib/Target/R600/R600InstrInfo.cpp
>> > +++ b/lib/Target/R600/R600InstrInfo.cpp
>> > @@ -140,6 +140,21 @@ bool R600InstrInfo::isALUInstr(unsigned Opcode)
> const {
>> > (TargetFlags & R600_InstFlag::OP3));
>> > }
>> >
>> > +bool R600InstrInfo::isTransOnly(unsigned Opcode) const {
>> > + const MCInstrDesc& MIDesc = get(Opcode);
>> > + // From AMDGPUGenSubtargetInfo.inc
>> > + // 0 NoInstrModel
>> > + // 1 AnyALU
>> > + // 2 NullALU
>> > + // 3 VecALU
>> > + // 4 TransALU
>> > + return MIDesc.getSchedClass() == 4;
>>
>> I don't think we can count on these values remaining the same. Can you
>> add a TransOnly bit to the tablegen instructions classes.
>>
>> > +}
>> > +
>> > +bool R600InstrInfo::isTransOnly(const MachineInstr *MI) const {
>> > + return isTransOnly(MI->getOpcode());
>> > +}
>> > +
>> > bool R600InstrInfo::isCayman() const {
>> > return ST.device()->getGeneration() >
> AMDGPUDeviceInfo::HD5XXX;
>> > }
>> > diff --git a/lib/Target/R600/R600InstrInfo.h
> b/lib/Target/R600/R600InstrInfo.h
>> > index 136023f..e0ba12b 100644
>> > --- a/lib/Target/R600/R600InstrInfo.h
>> > +++ b/lib/Target/R600/R600InstrInfo.h
>> > @@ -55,6 +55,9 @@ namespace llvm {
>> > /// \returns true if this \p Opcode represents an ALU
> instruction.
>> > bool isALUInstr(unsigned Opcode) const;
>> >
>> > + bool isTransOnly(unsigned Opcode) const;
>> > + bool isTransOnly(const MachineInstr *MI) const;
>> > +
>> > bool usesVertexCache(unsigned Opcode) const;
>> > bool usesVertexCache(const MachineInstr *MI) const;
>> > bool usesTextureCache(unsigned Opcode) const;
>> > diff --git a/lib/Target/R600/R600Instructions.td
> b/lib/Target/R600/R600Instructions.td
>> > index 619af96..bb8e145 100644
>> > --- a/lib/Target/R600/R600Instructions.td
>> > +++ b/lib/Target/R600/R600Instructions.td
>> > @@ -1300,23 +1300,33 @@ multiclass CUBE_Common <bits<11>
> inst> {
>> >
>> > class EXP_IEEE_Common <bits<11> inst> : R600_1OP_Helper
> <
>> > inst, "EXP_IEEE", fexp2
>> > ->;
>> > +> {
>> > + let Itinerary = TransALU;
>> > +}
>> >
>> > class FLT_TO_INT_Common <bits<11> inst> : R600_1OP_Helper
> <
>> > inst, "FLT_TO_INT", fp_to_sint
>> > ->;
>> > +> {
>> > + let Itinerary = TransALU;
>> > +}
>> >
>> > class INT_TO_FLT_Common <bits<11> inst> : R600_1OP_Helper
> <
>> > inst, "INT_TO_FLT", sint_to_fp
>> > ->;
>> > +> {
>> > + let Itinerary = TransALU;
>> > +}
>> >
>> > class FLT_TO_UINT_Common <bits<11> inst> :
> R600_1OP_Helper <
>> > inst, "FLT_TO_UINT", fp_to_uint
>> > ->;
>> > +> {
>> > + let Itinerary = TransALU;
>> > +}
>> >
>> > class UINT_TO_FLT_Common <bits<11> inst> :
> R600_1OP_Helper <
>> > inst, "UINT_TO_FLT", uint_to_fp
>> > ->;
>> > +> {
>> > + let Itinerary = TransALU;
>> > +}
>> >
>> > class LOG_CLAMPED_Common <bits<11> inst> : R600_1OP <
>> > inst, "LOG_CLAMPED", []
>> > @@ -1324,50 +1334,72 @@ class LOG_CLAMPED_Common <bits<11>
> inst> : R600_1OP <
>> >
>> > class LOG_IEEE_Common <bits<11> inst> : R600_1OP_Helper
> <
>> > inst, "LOG_IEEE", flog2
>> > ->;
>> > +> {
>> > + let Itinerary = TransALU;
>> > +}
>> >
>> > class LSHL_Common <bits<11> inst> : R600_2OP_Helper
> <inst, "LSHL", shl>;
>> > class LSHR_Common <bits<11> inst> : R600_2OP_Helper
> <inst, "LSHR", srl>;
>> > class ASHR_Common <bits<11> inst> : R600_2OP_Helper
> <inst, "ASHR", sra>;
>> > class MULHI_INT_Common <bits<11> inst> : R600_2OP_Helper
> <
>> > inst, "MULHI_INT", mulhs
>> > ->;
>> > +> {
>> > + let Itinerary = TransALU;
>> > +}
>> > class MULHI_UINT_Common <bits<11> inst> : R600_2OP_Helper
> <
>> > inst, "MULHI", mulhu
>> > ->;
>> > +> {
>> > + let Itinerary = TransALU;
>> > +}
>> > class MULLO_INT_Common <bits<11> inst> : R600_2OP_Helper
> <
>> > inst, "MULLO_INT", mul
>> > ->;
>> > -class MULLO_UINT_Common <bits<11> inst> : R600_2OP
> <inst, "MULLO_UINT", []>;
>> > +> {
>> > + let Itinerary = TransALU;
>> > +}
>> > +class MULLO_UINT_Common <bits<11> inst> : R600_2OP
> <inst, "MULLO_UINT", []> {
>> > + let Itinerary = TransALU;
>> > +}
>> >
>> > class RECIP_CLAMPED_Common <bits<11> inst> : R600_1OP
> <
>> > inst, "RECIP_CLAMPED", []
>> > ->;
>> > +> {
>> > + let Itinerary = TransALU;
>> > +}
>> >
>> > class RECIP_IEEE_Common <bits<11> inst> : R600_1OP <
>> > inst, "RECIP_IEEE", [(set R600_Reg32:$dst, (fdiv FP_ONE,
> R600_Reg32:$src0))]
>> > ->;
>> > +> {
>> > + let Itinerary = TransALU;
>> > +}
>> >
>> > class RECIP_UINT_Common <bits<11> inst> : R600_1OP_Helper
> <
>> > inst, "RECIP_UINT", AMDGPUurecip
>> > ->;
>> > +> {
>> > + let Itinerary = TransALU;
>> > +}
>> >
>> > class RECIPSQRT_CLAMPED_Common <bits<11> inst> :
> R600_1OP_Helper <
>> > inst, "RECIPSQRT_CLAMPED", int_AMDGPU_rsq
>> > ->;
>> > +> {
>> > + let Itinerary = TransALU;
>> > +}
>> >
>> > class RECIPSQRT_IEEE_Common <bits<11> inst> : R600_1OP
> <
>> > inst, "RECIPSQRT_IEEE", []
>> > ->;
>> > +> {
>> > + let Itinerary = TransALU;
>> > +}
>> >
>> > class SIN_Common <bits<11> inst> : R600_1OP <
>> > inst, "SIN", []>{
>> > let Trig = 1;
>> > + let Itinerary = TransALU;
>> > }
>> >
>> > class COS_Common <bits<11> inst> : R600_1OP <
>> > inst, "COS", []> {
>> > let Trig = 1;
>> > + let Itinerary = TransALU;
>> > }
>> >
>> >
> //===----------------------------------------------------------------------===//
>> > diff --git a/lib/Target/R600/R600Schedule.td
> b/lib/Target/R600/R600Schedule.td
>> > index 7ede181..78a460a 100644
>> > --- a/lib/Target/R600/R600Schedule.td
>> > +++ b/lib/Target/R600/R600Schedule.td
>> > @@ -24,7 +24,7 @@ def AnyALU : InstrItinClass;
>> > def VecALU : InstrItinClass;
>> > def TransALU : InstrItinClass;
>> >
>> > -def R600_EG_Itin : ProcessorItineraries <
>> > +def R600_VLIW5_Itin : ProcessorItineraries <
>> > [ALU_X, ALU_Y, ALU_Z, ALU_W, TRANS, ALU_NULL],
>> > [],
>> > [
>> > @@ -34,3 +34,14 @@ def R600_EG_Itin : ProcessorItineraries <
>> > InstrItinData<NullALU, [InstrStage<1, [ALU_NULL]>]>
>> > ]
>> > >;
>> > +
>> > +def R600_VLIW4_Itin : ProcessorItineraries <
>> > + [ALU_X, ALU_Y, ALU_Z, ALU_W, ALU_NULL],
>> > + [],
>> > + [
>> > + InstrItinData<AnyALU, [InstrStage<1, [ALU_X, ALU_Y, ALU_Z,
> ALU_W]>]>,
>> > + InstrItinData<VecALU, [InstrStage<1, [ALU_X, ALU_Y, ALU_X,
> ALU_W]>]>,
>> > + InstrItinData<TransALU, [InstrStage<1, [ALU_NULL]>]>,
>> > + InstrItinData<NullALU, [InstrStage<1, [ALU_NULL]>]>
>> > + ]
>> > +>;
>> > --
>> > 1.8.1.4
>> >
>>
>> > From 66560a1ec844847f4e54659271e84e86033f41cf Mon Sep 17 00:00:00 2001
>> > From: Tom Stellard <thomas.stellard at amd.com>
>> > Date: Thu, 11 Apr 2013 13:47:43 -0700
>> > Subject: [PATCH 3/9] R600: Add FetchInst bit to instruction defs to
> denote
>> > vertex/tex instructions
>> >
>> > v2[Vincent Lejeune]: Split FetchInst into
> usesTextureCache/usesVertexCache
>> > ---
>> > lib/Target/R600/AMDGPUSubtarget.cpp | 5 ++++
>> > lib/Target/R600/AMDGPUSubtarget.h | 2 ++
>> > lib/Target/R600/AMDILBase.td | 4 +++
>> > lib/Target/R600/Processors.td | 42
> +++++++++++++++++---------
>> > lib/Target/R600/R600ControlFlowFinalizer.cpp | 45
> ++++++----------------------
>> > lib/Target/R600/R600Defines.h | 4 ++-
>> > lib/Target/R600/R600InstrInfo.cpp | 24 ++++++++++++++-
>> > lib/Target/R600/R600InstrInfo.h | 7 +++++
>> > lib/Target/R600/R600Instructions.td | 18 +++++++++--
>> > test/CodeGen/R600/loop-address.ll | 4 +--
>> > 10 files changed, 98 insertions(+), 57 deletions(-)
>> >
>> > diff --git a/lib/Target/R600/AMDGPUSubtarget.cpp
> b/lib/Target/R600/AMDGPUSubtarget.cpp
>> > index 0f356a1..a7e1d7b 100644
>> > --- a/lib/Target/R600/AMDGPUSubtarget.cpp
>> > +++ b/lib/Target/R600/AMDGPUSubtarget.cpp
>> > @@ -33,6 +33,7 @@ AMDGPUSubtarget::AMDGPUSubtarget(StringRef TT,
> StringRef CPU, StringRef FS) :
>> > DefaultSize[0] = 64;
>> > DefaultSize[1] = 1;
>> > DefaultSize[2] = 1;
>> > + HasVertexCache = false;
>> > ParseSubtargetFeatures(GPU, FS);
>> > DevName = GPU;
>> > Device = AMDGPUDeviceInfo::getDeviceFromName(DevName, this,
> Is64bit);
>> > @@ -53,6 +54,10 @@ AMDGPUSubtarget::is64bit() const {
>> > return Is64bit;
>> > }
>> > bool
>> > +AMDGPUSubtarget::hasVertexCache() const {
>> > + return HasVertexCache;
>> > +}
>> > +bool
>> > AMDGPUSubtarget::isTargetELF() const {
>> > return false;
>> > }
>> > diff --git a/lib/Target/R600/AMDGPUSubtarget.h
> b/lib/Target/R600/AMDGPUSubtarget.h
>> > index 1973fc6..b6501a4 100644
>> > --- a/lib/Target/R600/AMDGPUSubtarget.h
>> > +++ b/lib/Target/R600/AMDGPUSubtarget.h
>> > @@ -36,6 +36,7 @@ private:
>> > bool Is32on64bit;
>> > bool DumpCode;
>> > bool R600ALUInst;
>> > + bool HasVertexCache;
>> >
>> > InstrItineraryData InstrItins;
>> >
>> > @@ -48,6 +49,7 @@ public:
>> >
>> > bool isOverride(AMDGPUDeviceInfo::Caps) const;
>> > bool is64bit() const;
>> > + bool hasVertexCache() const;
>> >
>> > // Helper functions to simplify if statements
>> > bool isTargetELF() const;
>> > diff --git a/lib/Target/R600/AMDILBase.td
> b/lib/Target/R600/AMDILBase.td
>> > index c12cedc..e221110 100644
>> > --- a/lib/Target/R600/AMDILBase.td
>> > +++ b/lib/Target/R600/AMDILBase.td
>> > @@ -74,6 +74,10 @@ def FeatureR600ALUInst :
> SubtargetFeature<"R600ALUInst",
>> > "false",
>> > "Older version of ALU instructions encoding.">;
>> >
>> > +def FeatureVertexCache :
> SubtargetFeature<"HasVertexCache",
>> > + "HasVertexCache",
>> > + "true",
>> > + "Specify use of dedicated vertex cache.">;
>> >
>> >
> //===----------------------------------------------------------------------===//
>> > // Register File, Calling Conv, Instruction Descriptions
>> > diff --git a/lib/Target/R600/Processors.td
> b/lib/Target/R600/Processors.td
>> > index b9229d4..046125b 100644
>> > --- a/lib/Target/R600/Processors.td
>> > +++ b/lib/Target/R600/Processors.td
>> > @@ -13,20 +13,34 @@
>> >
>> > class Proc<string Name, ProcessorItineraries itin,
> list<SubtargetFeature> Features>
>> > : Processor<Name, itin, Features>;
>> > -def : Proc<"", R600_EG_Itin,
> [FeatureR600ALUInst]>;
>> > -def : Proc<"r600", R600_EG_Itin,
> [FeatureR600ALUInst]>;
>> > -def : Proc<"rv670", R600_EG_Itin,
> [FeatureR600ALUInst, FeatureFP64]>;
>> > -def : Proc<"rv710", R600_EG_Itin, []>;
>> > -def : Proc<"rv730", R600_EG_Itin, []>;
>> > -def : Proc<"rv770", R600_EG_Itin,
> [FeatureFP64]>;
>> > -def : Proc<"cedar", R600_EG_Itin,
> [FeatureByteAddress, FeatureImages]>;
>> > -def : Proc<"redwood", R600_EG_Itin,
> [FeatureByteAddress, FeatureImages]>;
>> > -def : Proc<"juniper", R600_EG_Itin,
> [FeatureByteAddress, FeatureImages]>;
>> > -def : Proc<"cypress", R600_EG_Itin,
> [FeatureByteAddress, FeatureImages, FeatureFP64]>;
>> > -def : Proc<"barts", R600_EG_Itin,
> [FeatureByteAddress, FeatureImages]>;
>> > -def : Proc<"turks", R600_EG_Itin,
> [FeatureByteAddress, FeatureImages]>;
>> > -def : Proc<"caicos", R600_EG_Itin,
> [FeatureByteAddress, FeatureImages]>;
>> > -def : Proc<"cayman", R600_EG_Itin,
> [FeatureByteAddress, FeatureImages, FeatureFP64]>;
>> > +def : Proc<"", R600_EG_Itin,
>> > + [FeatureR600ALUInst, FeatureVertexCache]>;
>> > +def : Proc<"r600", R600_EG_Itin,
>> > + [FeatureR600ALUInst , FeatureVertexCache]>;
>> > +def : Proc<"rv670", R600_EG_Itin,
>> > + [FeatureR600ALUInst, FeatureFP64, FeatureVertexCache]>;
>> > +def : Proc<"rv710", R600_EG_Itin,
>> > + [FeatureVertexCache]>;
>> > +def : Proc<"rv730", R600_EG_Itin,
>> > + [FeatureVertexCache]>;
>> > +def : Proc<"rv770", R600_EG_Itin,
>> > + [FeatureFP64, FeatureVertexCache]>;
>> > +def : Proc<"cedar", R600_EG_Itin,
>> > + [FeatureByteAddress, FeatureImages, FeatureVertexCache]>;
>> > +def : Proc<"redwood", R600_EG_Itin,
>> > + [FeatureByteAddress, FeatureImages, FeatureVertexCache]>;
>> > +def : Proc<"juniper", R600_EG_Itin,
>> > + [FeatureByteAddress, FeatureImages, FeatureVertexCache]>;
>> > +def : Proc<"cypress", R600_EG_Itin,
>> > + [FeatureByteAddress, FeatureImages, FeatureFP64,
> FeatureVertexCache]>;
>> > +def : Proc<"barts", R600_EG_Itin,
>> > + [FeatureByteAddress, FeatureImages, FeatureVertexCache]>;
>> > +def : Proc<"turks", R600_EG_Itin,
>> > + [FeatureByteAddress, FeatureImages, FeatureVertexCache]>;
>> > +def : Proc<"caicos", R600_EG_Itin,
>> > + [FeatureByteAddress, FeatureImages]>;
>> > +def : Proc<"cayman", R600_EG_Itin,
>> > + [FeatureByteAddress, FeatureImages, FeatureFP64]>;
>> > def : Proc<"SI", SI_Itin, [Feature64BitPtr,
> FeatureFP64]>;
>> > def : Proc<"tahiti", SI_Itin, [Feature64BitPtr,
> FeatureFP64]>;
>> > def : Proc<"pitcairn", SI_Itin, [Feature64BitPtr,
> FeatureFP64]>;
>> > diff --git a/lib/Target/R600/R600ControlFlowFinalizer.cpp
> b/lib/Target/R600/R600ControlFlowFinalizer.cpp
>> > index 9271b39..cc6e0b2 100644
>> > --- a/lib/Target/R600/R600ControlFlowFinalizer.cpp
>> > +++ b/lib/Target/R600/R600ControlFlowFinalizer.cpp
>> > @@ -32,6 +32,7 @@ class R600ControlFlowFinalizer : public
> MachineFunctionPass {
>> > private:
>> > enum ControlFlowInstruction {
>> > CF_TC,
>> > + CF_VC,
>> > CF_CALL_FS,
>> > CF_WHILE_LOOP,
>> > CF_END_LOOP,
>> > @@ -48,39 +49,6 @@ private:
>> > unsigned MaxFetchInst;
>> > const AMDGPUSubtarget &ST;
>> >
>> > - bool isFetch(const MachineInstr *MI) const {
>> > - switch (MI->getOpcode()) {
>> > - case AMDGPU::TEX_VTX_CONSTBUF:
>> > - case AMDGPU::TEX_VTX_TEXBUF:
>> > - case AMDGPU::TEX_LD:
>> > - case AMDGPU::TEX_GET_TEXTURE_RESINFO:
>> > - case AMDGPU::TEX_GET_GRADIENTS_H:
>> > - case AMDGPU::TEX_GET_GRADIENTS_V:
>> > - case AMDGPU::TEX_SET_GRADIENTS_H:
>> > - case AMDGPU::TEX_SET_GRADIENTS_V:
>> > - case AMDGPU::TEX_SAMPLE:
>> > - case AMDGPU::TEX_SAMPLE_C:
>> > - case AMDGPU::TEX_SAMPLE_L:
>> > - case AMDGPU::TEX_SAMPLE_C_L:
>> > - case AMDGPU::TEX_SAMPLE_LB:
>> > - case AMDGPU::TEX_SAMPLE_C_LB:
>> > - case AMDGPU::TEX_SAMPLE_G:
>> > - case AMDGPU::TEX_SAMPLE_C_G:
>> > - case AMDGPU::TXD:
>> > - case AMDGPU::TXD_SHADOW:
>> > - case AMDGPU::VTX_READ_GLOBAL_8_eg:
>> > - case AMDGPU::VTX_READ_GLOBAL_32_eg:
>> > - case AMDGPU::VTX_READ_GLOBAL_128_eg:
>> > - case AMDGPU::VTX_READ_PARAM_8_eg:
>> > - case AMDGPU::VTX_READ_PARAM_16_eg:
>> > - case AMDGPU::VTX_READ_PARAM_32_eg:
>> > - case AMDGPU::VTX_READ_PARAM_128_eg:
>> > - return true;
>> > - default:
>> > - return false;
>> > - }
>> > - }
>> > -
>> > bool IsTrivialInst(MachineInstr *MI) const {
>> > switch (MI->getOpcode()) {
>> > case AMDGPU::KILL:
>> > @@ -98,6 +66,9 @@ private:
>> > case CF_TC:
>> > Opcode = isEg ? AMDGPU::CF_TC_EG : AMDGPU::CF_TC_R600;
>> > break;
>> > + case CF_VC:
>> > + Opcode = isEg ? AMDGPU::CF_VC_EG : AMDGPU::CF_VC_R600;
>> > + break;
>> > case CF_CALL_FS:
>> > Opcode = isEg ? AMDGPU::CF_CALL_FS_EG :
> AMDGPU::CF_CALL_FS_R600;
>> > break;
>> > @@ -139,17 +110,19 @@ private:
>> > unsigned CfAddress) const {
>> > MachineBasicBlock::iterator ClauseHead = I;
>> > unsigned AluInstCount = 0;
>> > + bool IsTex = TII->usesTextureCache(ClauseHead);
>> > for (MachineBasicBlock::iterator E = MBB.end(); I != E; ++I) {
>> > if (IsTrivialInst(I))
>> > continue;
>> > - if (!isFetch(I))
>> > + if ((IsTex && !TII->usesTextureCache(I)) ||
>> > + (!IsTex && !TII->usesVertexCache(I)))
>> > break;
>> > AluInstCount ++;
>> > if (AluInstCount > MaxFetchInst)
>> > break;
>> > }
>> > BuildMI(MBB, ClauseHead, MBB.findDebugLoc(ClauseHead),
>> > - getHWInstrDesc(CF_TC))
>> > + getHWInstrDesc(IsTex?CF_TC:CF_VC))
>> > .addImm(CfAddress) // ADDR
>> > .addImm(AluInstCount); // COUNT
>> > return I;
>> > @@ -211,7 +184,7 @@ public:
>> > }
>> > for (MachineBasicBlock::iterator I = MBB.begin(), E =
> MBB.end();
>> > I != E;) {
>> > - if (isFetch(I)) {
>> > + if (TII->usesTextureCache(I) ||
> TII->usesVertexCache(I)) {
>> > DEBUG(dbgs() << CfCount << ":";
> I->dump(););
>> > I = MakeFetchClause(MBB, I, 0);
>> > CfCount++;
>> > diff --git a/lib/Target/R600/R600Defines.h
> b/lib/Target/R600/R600Defines.h
>> > index 16cfcf5..bdda232 100644
>> > --- a/lib/Target/R600/R600Defines.h
>> > +++ b/lib/Target/R600/R600Defines.h
>> > @@ -39,7 +39,9 @@ namespace R600_InstFlag {
>> > //FlagOperand bits 7, 8
>> > NATIVE_OPERANDS = (1 << 9),
>> > OP1 = (1 << 10),
>> > - OP2 = (1 << 11)
>> > + OP2 = (1 << 11),
>> > + VTX_INST = (1 << 12),
>> > + TEX_INST = (1 << 13)
>> > };
>> > }
>> >
>> > diff --git a/lib/Target/R600/R600InstrInfo.cpp
> b/lib/Target/R600/R600InstrInfo.cpp
>> > index b232188..b3996ba 100644
>> > --- a/lib/Target/R600/R600InstrInfo.cpp
>> > +++ b/lib/Target/R600/R600InstrInfo.cpp
>> > @@ -29,7 +29,8 @@ using namespace llvm;
>> >
>> > R600InstrInfo::R600InstrInfo(AMDGPUTargetMachine &tm)
>> > : AMDGPUInstrInfo(tm),
>> > - RI(tm, *this)
>> > + RI(tm, *this),
>> > + ST(tm.getSubtarget<AMDGPUSubtarget>())
>> > { }
>> >
>> > const R600RegisterInfo &R600InstrInfo::getRegisterInfo() const {
>> > @@ -139,6 +140,27 @@ bool R600InstrInfo::isALUInstr(unsigned Opcode)
> const {
>> > (TargetFlags & R600_InstFlag::OP3));
>> > }
>> >
>> > +bool R600InstrInfo::isCayman() const {
>> > + return ST.device()->getGeneration() >
> AMDGPUDeviceInfo::HD5XXX;
>> > +}
>> > +
>>
>> This function is incorrect, see the isCayman predicate in
>> R600Instructions.td
>>
>> > +bool R600InstrInfo::usesVertexCache(unsigned Opcode) const {
>> > + return ST.hasVertexCache() && get(Opcode).TSFlags &
> R600_InstFlag::VTX_INST;
>> > +}
>> > +
>> > +bool R600InstrInfo::usesVertexCache(const MachineInstr *MI) const {
>> > + return usesVertexCache(MI->getOpcode());
>> > +}
>> > +
>> > +bool R600InstrInfo::usesTextureCache(unsigned Opcode) const {
>> > + return (!ST.hasVertexCache() && get(Opcode).TSFlags &
> R600_InstFlag::VTX_INST) ||
>> > + (get(Opcode).TSFlags & R600_InstFlag::TEX_INST);
>> > +}
>> > +
>> > +bool R600InstrInfo::usesTextureCache(const MachineInstr *MI) const {
>> > + return usesTextureCache(MI->getOpcode());
>> > +}
>> > +
>> > bool
>> > R600InstrInfo::fitsConstReadLimitations(const
> std::vector<unsigned> &Consts)
>> > const {
>> > diff --git a/lib/Target/R600/R600InstrInfo.h
> b/lib/Target/R600/R600InstrInfo.h
>> > index dbae900..136023f 100644
>> > --- a/lib/Target/R600/R600InstrInfo.h
>> > +++ b/lib/Target/R600/R600InstrInfo.h
>> > @@ -33,7 +33,9 @@ namespace llvm {
>> > class R600InstrInfo : public AMDGPUInstrInfo {
>> > private:
>> > const R600RegisterInfo RI;
>> > + const AMDGPUSubtarget &ST;
>> >
>> > + bool isCayman() const;
>> > int getBranchInstr(const MachineOperand &op) const;
>> >
>> > public:
>> > @@ -53,6 +55,11 @@ namespace llvm {
>> > /// \returns true if this \p Opcode represents an ALU
> instruction.
>> > bool isALUInstr(unsigned Opcode) const;
>> >
>> > + bool usesVertexCache(unsigned Opcode) const;
>> > + bool usesVertexCache(const MachineInstr *MI) const;
>> > + bool usesTextureCache(unsigned Opcode) const;
>> > + bool usesTextureCache(const MachineInstr *MI) const;
>> > +
>> > bool fitsConstReadLimitations(const
> std::vector<unsigned>&) const;
>> > bool canBundle(const std::vector<MachineInstr *> &)
> const;
>> >
>> > diff --git a/lib/Target/R600/R600Instructions.td
> b/lib/Target/R600/R600Instructions.td
>> > index 1bde3d5..5eb6797 100644
>> > --- a/lib/Target/R600/R600Instructions.td
>> > +++ b/lib/Target/R600/R600Instructions.td
>> > @@ -25,6 +25,8 @@ class InstR600 <dag outs, dag ins, string asm,
> list<dag> pattern,
>> > bit Op1 = 0;
>> > bit Op2 = 0;
>> > bit HasNativeOperands = 0;
>> > + bit VTXInst = 0;
>> > + bit TEXInst = 0;
>> >
>> > let Namespace = "AMDGPU";
>> > let OutOperandList = outs;
>> > @@ -43,6 +45,8 @@ class InstR600 <dag outs, dag ins, string asm,
> list<dag> pattern,
>> > let TSFlags{9} = HasNativeOperands;
>> > let TSFlags{10} = Op1;
>> > let TSFlags{11} = Op2;
>> > + let TSFlags{12} = VTXInst;
>> > + let TSFlags{13} = TEXInst;
>> > }
>> >
>> > class InstR600ISA <dag outs, dag ins, string asm, list<dag>
> pattern> :
>> > @@ -478,6 +482,8 @@ class R600_TEX <bits<11> inst, string
> opName, list<dag> pattern,
>> > let COORD_TYPE_Y = 0;
>> > let COORD_TYPE_Z = 0;
>> > let COORD_TYPE_W = 0;
>> > +
>> > + let TEXInst = 1;
>> > }
>> >
>> > } // End mayLoad = 1, mayStore = 0, hasSideEffects = 0
>> > @@ -1783,6 +1789,8 @@ class VTX_READ_eg <string name, bits<8>
> buffer_id, dag outs, list<dag> pattern>
>> > // VTX_WORD3 (Padding)
>> > //
>> > // Inst{127-96} = 0;
>> > +
>> > + let VTXInst = 1;
>> > }
>> >
>> > class VTX_READ_8_eg <bits<8> buffer_id, list<dag>
> pattern>
>> > @@ -2011,15 +2019,17 @@ def TXD: InstR600 <
>> > (ins R600_Reg128:$src0, R600_Reg128:$src1, R600_Reg128:$src2,
> i32imm:$resourceId, i32imm:$samplerId, i32imm:$textureTarget),
>> > "TXD $dst, $src0, $src1, $src2, $resourceId, $samplerId,
> $textureTarget",
>> > [(set R600_Reg128:$dst, (int_AMDGPU_txd R600_Reg128:$src0,
> R600_Reg128:$src1, R600_Reg128:$src2, imm:$resourceId, imm:$samplerId,
> imm:$textureTarget))], NullALU> {
>> > ->;
>> > + let TEXInst = 1;
>> > +}
>> >
>> > def TXD_SHADOW: InstR600 <
>> > (outs R600_Reg128:$dst),
>> > (ins R600_Reg128:$src0, R600_Reg128:$src1, R600_Reg128:$src2,
> i32imm:$resourceId, i32imm:$samplerId, i32imm:$textureTarget),
>> > "TXD_SHADOW $dst, $src0, $src1, $src2, $resourceId,
> $samplerId, $textureTarget",
>> > [(set R600_Reg128:$dst, (int_AMDGPU_txd R600_Reg128:$src0,
> R600_Reg128:$src1, R600_Reg128:$src2, imm:$resourceId, imm:$samplerId,
> TEX_SHADOW:$textureTarget))], NullALU
>> > ->;
>> > -
>> > +> {
>> > + let TEXInst = 1;
>> > +}
>> > } // End isPseudo = 1
>> > } // End usesCustomInserter = 1
>> >
>> > @@ -2105,6 +2115,7 @@ def TEX_VTX_CONSTBUF :
>> > // VTX_WORD3 (Padding)
>> > //
>> > // Inst{127-96} = 0;
>> > + let VTXInst = 1;
>> > }
>> >
>> > def TEX_VTX_TEXBUF:
>> > @@ -2158,6 +2169,7 @@ let Inst{63-32} = Word1;
>> > // VTX_WORD3 (Padding)
>> > //
>> > // Inst{127-96} = 0;
>> > + let VTXInst = 1;
>> > }
>> >
>> >
>> > diff --git a/test/CodeGen/R600/loop-address.ll
> b/test/CodeGen/R600/loop-address.ll
>> > index dc9295e..a3986b2 100644
>> > --- a/test/CodeGen/R600/loop-address.ll
>> > +++ b/test/CodeGen/R600/loop-address.ll
>> > @@ -1,10 +1,10 @@
>> > ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
>> >
>> > -;CHECK: TEX
>> > +;CHECK: VTX
>> > ;CHECK: ALU_PUSH
>> > ;CHECK: JUMP @4
>> > ;CHECK: ELSE @16
>> > -;CHECK: TEX
>> > +;CHECK: VTX
>> > ;CHECK: LOOP_START_DX10 @15
>> > ;CHECK: LOOP_BREAK @14
>> > ;CHECK: POP @16
>> > --
>> > 1.8.1.4
>> >
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0006-R600-Rework-Scheduling-to-handle-difference-between-.patch
Type: application/octet-stream
Size: 9113 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130429/1c222fad/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-R600-Add-FetchInst-bit-to-instruction-defs-to-denote.patch
Type: application/octet-stream
Size: 14464 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130429/1c222fad/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-R600-Turn-TEX-VTX-into-native-instructions.patch
Type: application/octet-stream
Size: 6727 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130429/1c222fad/attachment-0002.obj>
More information about the llvm-commits
mailing list