R600: Initial support for vliw5 scheduling
Vincent Lejeune
vljn at ovi.com
Fri Jun 28 14:25:14 PDT 2013
Sorry, I made a mistake in a rebase and the grouping wasn't occuring.
Here are fixed patches
----- Mail original -----
> De : Vincent Lejeune <vljn at ovi.com>
> À : Tom Stellard <tom at stellard.net>
> Cc : "llvm-commits at cs.uiuc.edu" <llvm-commits at cs.uiuc.edu>
> Envoyé le : Vendredi 28 juin 2013 22h21
> Objet : Re: R600: Initial support for vliw5 scheduling
>
>
>
>
>
> ----- Mail original -----
>> De : Tom Stellard <tom at stellard.net>
>> À : Vincent Lejeune <vljn at ovi.com>
>> Cc : "llvm-commits at cs.uiuc.edu" <llvm-commits at cs.uiuc.edu>
>> Envoyé le : Jeudi 27 juin 2013 23h59
>> Objet : Re: R600: Initial support for vliw5 scheduling
>>
>> On Thu, Jun 27, 2013 at 01:56:58PM -0700, Vincent Lejeune wrote:
>>> Hi,
>>>
>>> These 2 patches allows trans only instructions to be grouped with
> vector
>> instructions to form 5 instructions bundle on vliw5 processors.
>>> I had to remove the isTransOnly attribute of FLT_TO_INT_eg
> instructions
>> because it looks like the docs are wrong saying it's a trans only
>> instruction :
>>> Single FLT_TO_INT_eg instructions does not write to PS register (trans
> only
>> instructions always write to PS register) and there is no regression
> introduced
>> when
>>> making it non trans only. (SB backend seems to mark it not trans only,
> too
>> )
>>>
>>
>> Patch 1 is:
>>
>> Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
>>
>> See my inline comments for patch #2.
>>
>> -Tom
>>
>>> I have another set of patches to generalize vliw 5 support to all non
>> vector-only instructions but it's still WIP at the moment.
>>>
>>>
>>> Vincent
>>
>>> From 619dd4ef5223050e81a5f80eae2582f588efe3c7 Mon Sep 17 00:00:00 2001
>>> From: Vincent Lejeune <vljn at ovi.com>
>>> Date: Wed, 26 Jun 2013 18:09:58 +0200
>>> Subject: [PATCH 2/2] R600: Support schedule and packetization of
> trans-only
>>> inst
>>>
>>> ---
>>> lib/Target/R600/R600InstrInfo.cpp | 160
>> +++++++++++++++++++++++++------
>>> lib/Target/R600/R600InstrInfo.h | 6 +-
>>> lib/Target/R600/R600Instructions.td | 2 +
>>> lib/Target/R600/R600MachineScheduler.cpp | 23 +++--
>>> lib/Target/R600/R600MachineScheduler.h | 1 +
>>> lib/Target/R600/R600Packetizer.cpp | 88 ++++++++++-------
>>> lib/Target/R600/R600RegisterInfo.td | 1 +
>>> test/CodeGen/R600/fdiv.ll | 8 +-
>>> test/CodeGen/R600/llvm.cos.ll | 2 +-
>>> test/CodeGen/R600/llvm.pow.ll | 4 +-
>>> test/CodeGen/R600/llvm.sin.ll | 2 +-
>>> 11 files changed, 218 insertions(+), 79 deletions(-)
>>>
>>> diff --git a/lib/Target/R600/R600InstrInfo.cpp
>> b/lib/Target/R600/R600InstrInfo.cpp
>>> index 8f65cc2..f972bcf 100644
>>> --- a/lib/Target/R600/R600InstrInfo.cpp
>>> +++ b/lib/Target/R600/R600InstrInfo.cpp
>>> @@ -225,24 +225,27 @@ R600InstrInfo::getSrcs(MachineInstr *MI) const {
>>>
>>> std::vector<std::pair<int, unsigned> >
>>> R600InstrInfo::ExtractSrcs(MachineInstr *MI,
>>> - const DenseMap<unsigned, unsigned>
>> &PV)
>>> - const {
>>> + const DenseMap<unsigned, unsigned>
>> &PV,
>>> + unsigned &ConstCount) const {
>>> + ConstCount = 0;
>>> const SmallVector<std::pair<MachineOperand *, int64_t>,
> 3>
>> Srcs = getSrcs(MI);
>>> const std::pair<int, unsigned> DummyPair(-1, 0);
>>> std::vector<std::pair<int, unsigned> > Result;
>>> unsigned i = 0;
>>> for (unsigned n = Srcs.size(); i < n; ++i) {
>>> unsigned Reg = Srcs[i].first->getReg();
>>> - unsigned Index = RI.getEncodingValue(Reg) & 0xff;
>>> - unsigned Chan = RI.getHWRegChan(Reg);
>>> - if (Index > 127) {
>>> - Result.push_back(DummyPair);
>>> + if (PV.find(Reg) != PV.end()) {
>>> + // 255 is used to tells its a PS/PV reg
>>> + Result.push_back(std::pair<int, unsigned>(255, 0));
>>> continue;
>>> }
>>> - if (PV.find(Reg) != PV.end()) {
>>> + unsigned Index = RI.getEncodingValue(Reg) & 0xff;
>>> + if (Index > 127) {
>>> + ConstCount++;
>>> Result.push_back(DummyPair);
>>> continue;
>>> }
>>> + unsigned Chan = RI.getHWRegChan(Reg);
>>> Result.push_back(std::pair<int, unsigned>(Index, Chan));
>>> }
>>> for (; i < 3; ++i)
>>> @@ -277,66 +280,161 @@ Swizzle(std::vector<std::pair<int,
>> unsigned> > Src,
>>> return Src;
>>> }
>>>
>>> -static bool
>>> -isLegal(const std::vector<std::vector<std::pair<int,
> unsigned>
>>> > &IGSrcs,
>>> +static unsigned
>>> +getTransSwizzle(R600InstrInfo::BankSwizzle Swz, unsigned Op) {
>>> + switch (Swz) {
>>> + case R600InstrInfo::ALU_VEC_012_SCL_210: {
>>> + unsigned Cycles[3] = { 2, 1, 0};
>>> + return Cycles[Op];
>>> + }
>>> + case R600InstrInfo::ALU_VEC_021_SCL_122: {
>>> + unsigned Cycles[3] = { 1, 2, 2};
>>> + return Cycles[Op];
>>> + }
>>> + case R600InstrInfo::ALU_VEC_120_SCL_212: {
>>> + unsigned Cycles[3] = { 2, 1, 2};
>>> + return Cycles[Op];
>>> + }
>>> + case R600InstrInfo::ALU_VEC_102_SCL_221: {
>>> + unsigned Cycles[3] = { 2, 2, 1};
>>> + return Cycles[Op];
>>> + }
>>> + default:
>>> + llvm_unreachable("Wrong Swizzle for Trans Slot");
>>> + return 0;
>>> + }
>>> +}
>>> +
>>> +static unsigned
>>> +isLegalUpTo(const std::vector<std::vector<std::pair<int,
>> unsigned> > > &IGSrcs,
>>> const std::vector<R600InstrInfo::BankSwizzle> &Swz,
>>> - unsigned CheckedSize) {
>>> + const std::vector<std::pair<int, unsigned> >
>> &TransSrcs,
>>> + R600InstrInfo::BankSwizzle TransSwz) {
>>> int Vector[4][3];
>>> memset(Vector, -1, sizeof(Vector));
>>> - for (unsigned i = 0; i < CheckedSize; i++) {
>>> + for (unsigned i = 0, e = IGSrcs.size(); i < e; i++) {
>>> const std::vector<std::pair<int, unsigned> >
> &Srcs =
>>> Swizzle(IGSrcs[i], Swz[i]);
>>> for (unsigned j = 0; j < 3; j++) {
>>> const std::pair<int, unsigned> &Src = Srcs[j];
>>> - if (Src.first < 0)
>>> + if (Src.first < 0 || Src.first == 255)
>>> continue;
>>> if (Vector[Src.second][j] < 0)
>>> Vector[Src.second][j] = Src.first;
>>> if (Vector[Src.second][j] != Src.first)
>>> - return false;
>>> + return i;
>>> }
>>> }
>>> + // Now check Trans Alu
>>> + for (unsigned i = 0, e = TransSrcs.size(); i < e; ++i) {
>>> + const std::pair<int, unsigned> &Src = TransSrcs[i];
>>> + unsigned Cycle = getTransSwizzle(TransSwz, i);
>>> + if (Src.first < 0)
>>> + continue;
>>> + if (Src.first == 255)
>>> + continue;
>>> + if (Vector[Src.second][Cycle] < 0)
>>> + Vector[Src.second][Cycle] = Src.first;
>>> + if (Vector[Src.second][Cycle] != Src.first)
>>> + return IGSrcs.size() - 1;
>>> + }
>>> + return IGSrcs.size();
>>> +}
>>> +
>>> +static bool
>>> +NextPossibleSolution(
>>> + std::vector<R600InstrInfo::BankSwizzle> &SwzCandidate,
>>> + unsigned From) {
>>> + assert(From < SwzCandidate.size());
>>> + int ResetFrom = From;
>>> + while (ResetFrom > -1 && SwzCandidate[ResetFrom] ==
>> R600InstrInfo::ALU_VEC_210)
>>> + ResetFrom --;
>>> + for (unsigned i = ResetFrom + 1, e = SwzCandidate.size(); i < e;
> i++)
>> {
>>> + SwzCandidate[i] = R600InstrInfo::ALU_VEC_012_SCL_210;
>>> + }
>>> + if (ResetFrom == -1)
>>> + return false;
>>> + SwzCandidate[ResetFrom]++;
>>> return true;
>>> }
>>>
>>> static bool recursiveFitsFPLimitation(
>>> const std::vector<std::vector<std::pair<int, unsigned>
>>
>>> &IGSrcs,
>>> std::vector<R600InstrInfo::BankSwizzle> &SwzCandidate,
>>> -unsigned Depth = 0) {
>>> - if (!isLegal(IGSrcs, SwzCandidate, Depth))
>>> - return false;
>>> - if (IGSrcs.size() == Depth)
>>> - return true;
>>> - unsigned i = SwzCandidate[Depth];
>>> - for (; i < 6; i++) {
>>> - SwzCandidate[Depth] = (R600InstrInfo::BankSwizzle) i;
>>> - if (recursiveFitsFPLimitation(IGSrcs, SwzCandidate, Depth + 1))
>>> +const std::vector<std::pair<int, unsigned> >
> &TransSrcs,
>>> +R600InstrInfo::BankSwizzle TransSwz) {
>>> + unsigned ValidUpTo = 0;
>>> + do {
>>> + ValidUpTo = isLegalUpTo(IGSrcs, SwzCandidate, TransSrcs,
> TransSwz);
>>> + if (ValidUpTo == IGSrcs.size())
>>> return true;
>>> - }
>>> - SwzCandidate[Depth] = R600InstrInfo::ALU_VEC_012;
>>> + } while (NextPossibleSolution(SwzCandidate, ValidUpTo));
>>> return false;
>>> }
>>>
>>> +static bool
>>> +isConstCompatible(R600InstrInfo::BankSwizzle TransSwz,
>>> + const std::vector<std::pair<int, unsigned>
>>
>> &TransOps,
>>> + unsigned ConstCount) {
>>> + for (unsigned i = 0, e = TransOps.size(); i < e; ++i) {
>>> + const std::pair<int, unsigned> &Src = TransOps[i];
>>> + unsigned Cycle = getTransSwizzle(TransSwz, i);
>>> + if (Src.first < 0)
>>> + continue;
>>> + if (ConstCount > 0 && Cycle == 0)
>>> + return false;
>>> + if (ConstCount > 1 && Cycle == 1)
>>> + return false;
>>> + }
>>> + return true;
>>> +}
>>> +
>>> bool
>>> R600InstrInfo::fitsReadPortLimitations(const
> std::vector<MachineInstr
>> *> &IG,
>>> - const DenseMap<unsigned,
>> unsigned> &PV,
>>> - std::vector<BankSwizzle>
>> &ValidSwizzle)
>>> + const DenseMap<unsigned,
>> unsigned> &PV,
>>> + std::vector<BankSwizzle>
>
>> &ValidSwizzle,
>>> + bool isLastAluTrans)
>>> const {
>>> //Todo : support shared src0 - src1 operand
>>>
>>> std::vector<std::vector<std::pair<int, unsigned> >
>>
>> IGSrcs;
>>> ValidSwizzle.clear();
>>> + unsigned ConstCount;
>>> + BankSwizzle TransBS;
>>> for (unsigned i = 0, e = IG.size(); i < e; ++i) {
>>> - IGSrcs.push_back(ExtractSrcs(IG[i], PV));
>>> + IGSrcs.push_back(ExtractSrcs(IG[i], PV, ConstCount));
>>> unsigned Op = getOperandIdx(IG[i]->getOpcode(),
>>> R600Operands::BANK_SWIZZLE);
>>> ValidSwizzle.push_back( (R600InstrInfo::BankSwizzle)
>>> IG[i]->getOperand(Op).getImm());
>>> }
>>> - bool Result = recursiveFitsFPLimitation(IGSrcs, ValidSwizzle);
>>> - if (!Result)
>>> - return false;
>>> - return true;
>>> + std::vector<std::pair<int, unsigned> > TransOps;
>>> + if (!isLastAluTrans)
>>> + return recursiveFitsFPLimitation(IGSrcs, ValidSwizzle, TransOps,
>> TransBS);
>>> +
>>> + TransOps = IGSrcs.back();
>>> + IGSrcs.pop_back();
>>> + ValidSwizzle.pop_back();
>>> +
>>> + static const R600InstrInfo::BankSwizzle TransSwz[] = {
>>> + ALU_VEC_012_SCL_210,
>>> + ALU_VEC_021_SCL_122,
>>> + ALU_VEC_120_SCL_212,
>>> + ALU_VEC_102_SCL_221
>>> + };
>>> + for (unsigned i = 0; i < 4; i++) {
>>> + TransBS = TransSwz[i];
>>> + if (!isConstCompatible(TransBS, TransOps, ConstCount))
>>> + continue;
>>> + bool Result = recursiveFitsFPLimitation(IGSrcs, ValidSwizzle,
>> TransOps,
>>> + TransBS);
>>> + if (Result) {
>>> + ValidSwizzle.push_back(TransBS);
>>> + return true;
>>> + }
>>> + }
>>> +
>>> + return false;
>>> }
>>>
>>>
>>> diff --git a/lib/Target/R600/R600InstrInfo.h
>> b/lib/Target/R600/R600InstrInfo.h
>>> index 79c7cdc..28fcbfd 100644
>>> --- a/lib/Target/R600/R600InstrInfo.h
>>> +++ b/lib/Target/R600/R600InstrInfo.h
>>> @@ -85,10 +85,14 @@ namespace llvm {
>>> /// starting from the one already provided in the Instruction Group
> MIs
>> that
>>> /// fits Read Port limitations in BS if available. Otherwise
> returns
>> false
>>> /// and undefined content in BS.
>>> + /// isLastAluTrans should be set if the last Alu of MIs will be
> executed
>> on
>>> + /// Trans ALU. In this case, ValidTSwizzle returns the BankSwizzle
> value
>> to
>>> + /// apply to the last instruction.
>>> /// PV holds GPR to PV registers in the Instruction Group MIs.
>>> bool fitsReadPortLimitations(const std::vector<MachineInstr
> *>
>> &MIs,
>>> const DenseMap<unsigned,
> unsigned>
>> &PV,
>>> - std::vector<BankSwizzle>
> &BS)
>> const;
>>> + std::vector<BankSwizzle>
> &BS,
>>> + bool isLastAluTrans) const;
>>> bool fitsConstReadLimitations(const
> std::vector<unsigned>&)
>> const;
>>> bool canBundle(const std::vector<MachineInstr *> &)
> const;
>>>
>>> diff --git a/lib/Target/R600/R600Instructions.td
>> b/lib/Target/R600/R600Instructions.td
>>> index 83d735f..f324146 100644
>>> --- a/lib/Target/R600/R600Instructions.td
>>> +++ b/lib/Target/R600/R600Instructions.td
>>> @@ -1478,12 +1478,14 @@ let hasSideEffects = 1 in {
>>>
>>> def FLT_TO_INT_eg : FLT_TO_INT_Common<0x50> {
>>> let Pattern = [];
>>> + let TransOnly = 0;
>>> }
>>
>> I've confirmed that this is correct. FLT_TO_INT is trans only for
>> r6xx/r7xx and has no restrictions on Evergreen. However, it looks like
>> you forgot to change the itinerary, so the packetizer still thinks it's
>> trans only.
>>
>>>
>>> def INT_TO_FLT_eg : INT_TO_FLT_Common<0x9B>;
>>>
>>> def FLT_TO_UINT_eg : FLT_TO_UINT_Common<0x9A> {
>>> let Pattern = [];
>>> + let TransOnly = 0;
>>> }
>>>
>>
>> FLT_TO_UINT is trans only for all GPU families. My guess is you didn't
>> see any regressions because the itinerary is still TransALU.
>>
>>> def UINT_TO_FLT_eg : UINT_TO_FLT_Common<0x9C>;
>>> diff --git a/lib/Target/R600/R600MachineScheduler.cpp
>> b/lib/Target/R600/R600MachineScheduler.cpp
>>> index a330d88..050a12f 100644
>>> --- a/lib/Target/R600/R600MachineScheduler.cpp
>>> +++ b/lib/Target/R600/R600MachineScheduler.cpp
>>> @@ -32,7 +32,7 @@ void R600SchedStrategy::initialize(ScheduleDAGMI
> *dag) {
>>> MRI = &DAG->MRI;
>>> CurInstKind = IDOther;
>>> CurEmitted = 0;
>>> - OccupedSlotsMask = 15;
>>> + OccupedSlotsMask = 31;
>>> InstKindLimit[IDAlu] = TII->getMaxAlusPerClause();
>>> InstKindLimit[IDOther] = 32;
>>>
>>> @@ -160,7 +160,7 @@ void R600SchedStrategy::schedNode(SUnit *SU, bool
>> IsTopNode) {
>>> if (NextInstKind != CurInstKind) {
>>> DEBUG(dbgs() << "Instruction Type Switch\n");
>>> if (NextInstKind != IDAlu)
>>> - OccupedSlotsMask = 15;
>>> + OccupedSlotsMask |= 31;
>>> CurEmitted = 0;
>>> CurInstKind = NextInstKind;
>>> }
>>> @@ -251,6 +251,9 @@ bool R600SchedStrategy::regBelongsToClass(unsigned
> Reg,
>>> R600SchedStrategy::AluKind R600SchedStrategy::getAluKind(SUnit *SU)
> const
>> {
>>> MachineInstr *MI = SU->getInstr();
>>>
>>> + if (TII->isTransOnly(MI))
>>> + return AluTrans;
>>> +
>>> switch (MI->getOpcode()) {
>>> case AMDGPU::PRED_X:
>>> return AluPredX;
>>> @@ -409,7 +412,8 @@ unsigned R600SchedStrategy::AvailablesAluCount()
> const
>> {
>>> return AvailableAlus[AluAny].size() +
> AvailableAlus[AluT_XYZW].size() +
>>> AvailableAlus[AluT_X].size() + AvailableAlus[AluT_Y].size() +
>>> AvailableAlus[AluT_Z].size() + AvailableAlus[AluT_W].size() +
>>> - AvailableAlus[AluDiscarded].size() +
> AvailableAlus[AluPredX].size();
>>> + AvailableAlus[AluTrans].size() +
> AvailableAlus[AluDiscarded].size()
>> +
>>> + AvailableAlus[AluPredX].size();
>>> }
>>>
>>> SUnit* R600SchedStrategy::pickAlu() {
>>> @@ -417,20 +421,27 @@ SUnit* R600SchedStrategy::pickAlu() {
>>> if (!OccupedSlotsMask) {
>>> // Bottom up scheduling : predX must comes first
>>> if (!AvailableAlus[AluPredX].empty()) {
>>> - OccupedSlotsMask = 15;
>>> + OccupedSlotsMask |= 31;
>>> return PopInst(AvailableAlus[AluPredX]);
>>> }
>>> // Flush physical reg copies (RA will discard them)
>>> if (!AvailableAlus[AluDiscarded].empty()) {
>>> - OccupedSlotsMask = 15;
>>> + OccupedSlotsMask |= 31;
>>> return PopInst(AvailableAlus[AluDiscarded]);
>>> }
>>> // If there is a T_XYZW alu available, use it
>>> if (!AvailableAlus[AluT_XYZW].empty()) {
>>> - OccupedSlotsMask = 15;
>>> + OccupedSlotsMask |= 15;
>>> return PopInst(AvailableAlus[AluT_XYZW]);
>>> }
>>> }
>>> + bool TransSlotOccuped = OccupedSlotsMask & 16;
>>> + if (!TransSlotOccuped) {
>>> + if (!AvailableAlus[AluTrans].empty()) {
>>> + OccupedSlotsMask |= 16;
>>> + return PopInst(AvailableAlus[AluTrans]);
>>> + }
>>> + }
>>> for (int Chan = 3; Chan > -1; --Chan) {
>>> bool isOccupied = OccupedSlotsMask & (1 << Chan);
>>> if (!isOccupied) {
>>> diff --git a/lib/Target/R600/R600MachineScheduler.h
>> b/lib/Target/R600/R600MachineScheduler.h
>>> index aae8b3f..f8965d8 100644
>>> --- a/lib/Target/R600/R600MachineScheduler.h
>>> +++ b/lib/Target/R600/R600MachineScheduler.h
>>> @@ -46,6 +46,7 @@ class R600SchedStrategy : public
> MachineSchedStrategy {
>>> AluT_W,
>>> AluT_XYZW,
>>> AluPredX,
>>> + AluTrans,
>>> AluDiscarded, // LLVM Instructions that are going to be
> eliminated
>>> AluLast
>>> };
>>> diff --git a/lib/Target/R600/R600Packetizer.cpp
>> b/lib/Target/R600/R600Packetizer.cpp
>>> index da614c7..7d6eef1 100644
>>> --- a/lib/Target/R600/R600Packetizer.cpp
>>> +++ b/lib/Target/R600/R600Packetizer.cpp
>>> @@ -77,12 +77,14 @@ private:
>>> do {
>>> if (TII->isPredicated(BI))
>>> continue;
>>> - if (TII->isTransOnly(BI))
>>> - continue;
>>> int OperandIdx = TII->getOperandIdx(BI->getOpcode(),
>> R600Operands::WRITE);
>>> if (OperandIdx > -1 &&
>> BI->getOperand(OperandIdx).getImm() == 0)
>>> continue;
>>> unsigned Dst = BI->getOperand(0).getReg();
>>> + if (TII->isTransOnly(BI)) {
>>> + Result[Dst] = AMDGPU::PS;
>>> + continue;
>>> + }
>>> if (BI->getOpcode() == AMDGPU::DOT4_r600 ||
>>> BI->getOpcode() == AMDGPU::DOT4_eg) {
>>> Result[Dst] = AMDGPU::PV_X;
>>> @@ -150,10 +152,6 @@ public:
>>> return true;
>>> if (!TII->isALUInstr(MI->getOpcode()))
>>> return true;
>>> - if (TII->get(MI->getOpcode()).TSFlags &
>> R600_InstFlag::TRANS_ONLY)
>>> - return true;
>>> - if (TII->isTransOnly(MI))
>>> - return true;
>>> return false;
>>> }
>>>
>>> @@ -195,11 +193,16 @@ public:
>>> MI->getOperand(LastOp).setImm(Bit);
>>> }
>>>
>>> - MachineBasicBlock::iterator addToPacket(MachineInstr *MI) {
>>> + bool isBundlableWithCurrentPMI(MachineInstr *MI,
>>> + const DenseMap<unsigned,
> unsigned>
>> &PV,
>>> +
>> std::vector<R600InstrInfo::BankSwizzle> &BS,
>>> + bool &isTransSlot) {
>>> + isTransSlot = TII->isTransOnly(MI);
>>> +
>>> + // Are the Constants limitations met ?
>>> CurrentPacketMIs.push_back(MI);
>>> - bool FitsConstLimits = TII->canBundle(CurrentPacketMIs);
>>> - DEBUG(
>>> - if (!FitsConstLimits) {
>>> + if (!TII->canBundle(CurrentPacketMIs)) {
>>> + DEBUG(
>>> dbgs() << "Couldn't pack :\n";
>>> MI->dump();
>>> dbgs() << "with the following packets
> :\n";
>>> @@ -208,14 +211,15 @@ public:
>>> dbgs() << "\n";
>>> }
>>> dbgs() << "because of Consts read
>> limitations\n";
>>> - });
>>> - const DenseMap<unsigned, unsigned> &PV =
>>> - getPreviousVector(CurrentPacketMIs.front());
>>> - std::vector<R600InstrInfo::BankSwizzle> BS;
>>> - bool FitsReadPortLimits =
>>> - TII->fitsReadPortLimitations(CurrentPacketMIs, PV, BS);
>>> - DEBUG(
>>> - if (!FitsReadPortLimits) {
>>> + );
>>> + CurrentPacketMIs.pop_back();
>>> + return false;
>>> + }
>>> +
>>> + // Is there a BankSwizzle set that meet Read Port limitations ?
>>> + if (!TII->fitsReadPortLimitations(CurrentPacketMIs,
>>> + PV, BS, isTransSlot)) {
>>> + DEBUG(
>>> dbgs() << "Couldn't pack :\n";
>>> MI->dump();
>>> dbgs() << "with the following packets
> :\n";
>>> @@ -224,25 +228,43 @@ public:
>>> dbgs() << "\n";
>>> }
>>> dbgs() << "because of Read port
>> limitations\n";
>>> - });
>>> - bool isBundlable = FitsConstLimits && FitsReadPortLimits;
>>> - if (isBundlable) {
>>> + );
>>> + CurrentPacketMIs.pop_back();
>>> + return false;
>>> + }
>>> +
>>> + CurrentPacketMIs.pop_back();
>>> + return true;
>>> + }
>>> +
>>> + MachineBasicBlock::iterator addToPacket(MachineInstr *MI) {
>>> + MachineBasicBlock::iterator FirstInBundle =
>>> + CurrentPacketMIs.empty() ? MI : CurrentPacketMIs.front();
>>> + const DenseMap<unsigned, unsigned> &PV =
>>> + getPreviousVector(FirstInBundle);
>>> + std::vector<R600InstrInfo::BankSwizzle> BS;
>>> + bool isTransSlot;
>>> +
>>> + if (isBundlableWithCurrentPMI(MI, PV, BS, isTransSlot)) {
>>> for (unsigned i = 0, e = CurrentPacketMIs.size(); i < e;
> i++) {
>>> MachineInstr *MI = CurrentPacketMIs[i];
>>> - unsigned Op = TII->getOperandIdx(MI->getOpcode(),
>>> - R600Operands::BANK_SWIZZLE);
>>> - MI->getOperand(Op).setImm(BS[i]);
>>> + unsigned Op = TII->getOperandIdx(MI->getOpcode(),
>>> + R600Operands::BANK_SWIZZLE);
>>> + MI->getOperand(Op).setImm(BS[i]);
>>> }
>>> + unsigned Op = TII->getOperandIdx(MI->getOpcode(),
>>> + R600Operands::BANK_SWIZZLE);
>>> + MI->getOperand(Op).setImm(BS.back());
>>> + if (!CurrentPacketMIs.empty())
>>> + setIsLastBit(CurrentPacketMIs.back(), 0);
>>> + substitutePV(MI, PV);
>>> + MachineBasicBlock::iterator It =
>> VLIWPacketizerList::addToPacket(MI);
>>> + if (isTransSlot) {
>>> + endPacket(llvm::next(It)->getParent(), llvm::next(It));
>>> + }
>>> + return It;
>>> }
>>> - CurrentPacketMIs.pop_back();
>>> - if (!isBundlable) {
>>> - endPacket(MI->getParent(), MI);
>>> - substitutePV(MI, getPreviousVector(MI));
>>> - return VLIWPacketizerList::addToPacket(MI);
>>> - }
>>> - if (!CurrentPacketMIs.empty())
>>> - setIsLastBit(CurrentPacketMIs.back(), 0);
>>> - substitutePV(MI, PV);
>>> + endPacket(MI->getParent(), MI);
>>> return VLIWPacketizerList::addToPacket(MI);
>>> }
>>> };
>>> diff --git a/lib/Target/R600/R600RegisterInfo.td
>> b/lib/Target/R600/R600RegisterInfo.td
>>> index a8b9b70..323bf9f 100644
>>> --- a/lib/Target/R600/R600RegisterInfo.td
>>> +++ b/lib/Target/R600/R600RegisterInfo.td
>>> @@ -96,6 +96,7 @@ def PV_X : R600RegWithChan<"PV.X", 254,
>
>> "X">;
>>> def PV_Y : R600RegWithChan<"PV.Y", 254,
> "Y">;
>>> def PV_Z : R600RegWithChan<"PV.Z", 254,
> "Z">;
>>> def PV_W : R600RegWithChan<"PV.W", 254,
> "W">;
>>> +def PS: R600Reg<"PS", 255>;
>>> def PREDICATE_BIT : R600Reg<"PredicateBit", 0>;
>>> def PRED_SEL_OFF: R600Reg<"Pred_sel_off", 0>;
>>> def PRED_SEL_ZERO : R600Reg<"Pred_sel_zero", 2>;
>>> diff --git a/test/CodeGen/R600/fdiv.ll b/test/CodeGen/R600/fdiv.ll
>>> index 003590b..21ed486 100644
>>> --- a/test/CodeGen/R600/fdiv.ll
>>> +++ b/test/CodeGen/R600/fdiv.ll
>>> @@ -1,13 +1,13 @@
>>> ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
>>>
>>> ;CHECK: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
>>> -;CHECK: MUL_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW],
>> T[0-9]+\.[XYZW]}}
>>> +;CHECK: MUL_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], PS}}
>>> ;CHECK: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
>>> +;CHECK: MUL_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], PS}}
>>> ;CHECK: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
>>> -;CHECK: MUL_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW],
>> T[0-9]+\.[XYZW]}}
>>> -;CHECK: MUL_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW],
>> T[0-9]+\.[XYZW]}}
>>> +;CHECK: MUL_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], PS}}
>>> ;CHECK: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
>>> -;CHECK: MUL_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW],
>> T[0-9]+\.[XYZW]}}
>>> +;CHECK: MUL_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], PS}}
>>>
>>> define void @test(<4 x float> addrspace(1)* %out, <4 x
> float>
>> addrspace(1)* %in) {
>>> %b_ptr = getelementptr <4 x float> addrspace(1)* %in, i32 1
>>> diff --git a/test/CodeGen/R600/llvm.cos.ll
> b/test/CodeGen/R600/llvm.cos.ll
>>> index 9b28167..b444fa7 100644
>>> --- a/test/CodeGen/R600/llvm.cos.ll
>>> +++ b/test/CodeGen/R600/llvm.cos.ll
>>> @@ -1,6 +1,6 @@
>>> ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
>>>
>>> -;CHECK: COS * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
>>> +;CHECK: COS * T{{[0-9]+\.[XYZW], PV\.[XYZW]}}
>>>
>>> define void @test() {
>>> %r0 = call float @llvm.R600.load.input(i32 0)
>>> diff --git a/test/CodeGen/R600/llvm.pow.ll
> b/test/CodeGen/R600/llvm.pow.ll
>>> index 1422083..0f51cf4 100644
>>> --- a/test/CodeGen/R600/llvm.pow.ll
>>> +++ b/test/CodeGen/R600/llvm.pow.ll
>>> @@ -1,8 +1,8 @@
>>> ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
>>>
>>> ;CHECK: LOG_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
>>> -;CHECK: MUL NON-IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW],
>> T[0-9]+\.[XYZW]}}
>>> -;CHECK-NEXT: EXP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
>>> +;CHECK: MUL NON-IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW],
> PS}}
>>> +;CHECK-NEXT: EXP_IEEE * T{{[0-9]+\.[XYZW], PV\.[XYZW]}}
>>>
>>> define void @test() {
>>> %r0 = call float @llvm.R600.load.input(i32 0)
>>> diff --git a/test/CodeGen/R600/llvm.sin.ll
> b/test/CodeGen/R600/llvm.sin.ll
>>> index 803dc2d..09cc3d2 100644
>>> --- a/test/CodeGen/R600/llvm.sin.ll
>>> +++ b/test/CodeGen/R600/llvm.sin.ll
>>> @@ -1,6 +1,6 @@
>>> ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
>>>
>>> -;CHECK: SIN * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
>>> +;CHECK: SIN * T{{[0-9]+\.[XYZW], PV\.[XYZW]}}
>>>
>>> define void @test() {
>>> %r0 = call float @llvm.R600.load.input(i32 0)
>>> --
>>> 1.8.3.1
>>>
>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-R600-Support-schedule-and-packetization-of-trans-onl.patch
Type: text/x-patch
Size: 25661 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130628/97f2ef4a/attachment.bin>
More information about the llvm-commits
mailing list