[PATCHES] R600/SI: VI fixes for bit shifts, GS

Wed Jan 28 12:17:22 PST 2015

On 01/28/2015 05:28 AM, Marek Olšák wrote:
> Hi Matt,
>
> When can I expect the hazard recognizer?
I'm not sure. I haven't had a chance to look at it recently. I had it 
roughly working for the case I was working on (except without the cycle 
data at the time). I can send you the WIP patches for it if you want to 
try adding this to it.

> Will it insert S_NOPs as
> required by section "3.1.2. Manually inserted wait states" in the VI
> shader programming doc?
Yes. I was mostly working on the other cases from writes of VCC and 
div_fmas uses.


> There is at least one more case that we should
> handle, specifically case #9. I was about to implement it, but would
> it conflict with your work?
>
> Thanks,
>
> Marek


>
> On Tue, Jan 27, 2015 at 11:34 PM, Matt Arsenault
> <Matthew.Arsenault at amd.com> wrote:
>> On 01/27/2015 02:16 PM, Marek Olšák wrote:
>>
>> Hi,
>>
>> This is another set of fixes for VI. Patches 1-2, 5-6 fix real issues.
>> Patches 3-4, 7-8 are mostly cosmetic. Only patch 1 should fix an issue
>> that is reproducible by piglit.
>>
>> I couldn't test these, because my VI hw is very unstable. I'll try and
>> test Bonaire tomorrow. That said, I'm pretty sure patches 1-7 are
>> important improvements over the current state. I'm not sure about
>> patch 8.
>>
>> Please review.
>>
>> Michel, would you be so kind as to test the first patch whether it
>> fixes the GS hang? Sorry, I'm not able to tell the difference. Please
>> apply patch 1 alone and please don't update your LLVM repo (just in
>> case it uncovers some other bug).
>>
>> Thank you very much,
>>
>> Marek
>>
>>
>> 0001-R600-SI-Fix-dependency-between-an-instruction-writin.patch
>>
>>  From 4740298959a4ebb361415019eaa15d899c80614e Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
>> Date: Mon, 26 Jan 2015 16:54:35 +0100
>> Subject: [PATCH 1/8] R600/SI: Fix dependency between an instruction writing
>> M0
>>   and S_SENDMSG on VI
>>
>> This fixes a hang when using an empty geometry shader.
>> ---
>>   lib/Target/R600/SIInsertWaits.cpp       | 33
>> +++++++++++++++++++++++++++++++++
>>   test/CodeGen/R600/llvm.SI.sendmsg-m0.ll | 25 +++++++++++++++++++++++++
>>   2 files changed, 58 insertions(+)
>>   create mode 100644 test/CodeGen/R600/llvm.SI.sendmsg-m0.ll
>>
>> diff --git a/lib/Target/R600/SIInsertWaits.cpp
>> b/lib/Target/R600/SIInsertWaits.cpp
>> index 181b116..6075001 100644
>> --- a/lib/Target/R600/SIInsertWaits.cpp
>> +++ b/lib/Target/R600/SIInsertWaits.cpp
>> @@ -82,6 +82,8 @@ private:
>>     /// \brief Type of the last opcode.
>>     InstType LastOpcodeType;
>>
>> +  bool LastInstWritesM0;
>> +
>>     /// \brief Get increment/decrement amount for this instruction.
>>     Counters getHwCounts(MachineInstr &MI);
>>
>> @@ -106,6 +108,9 @@ private:
>>     /// \brief Resolve all operand dependencies to counter requirements
>>     Counters handleOperands(MachineInstr &MI);
>>
>> +  /// \brief Insert S_NOP between an instruction writing M0 and S_SENDMSG.
>> +  void handleSendMsg(MachineBasicBlock &MBB, MachineBasicBlock::iterator
>> I);
>> +
>>   public:
>>     SIInsertWaits(TargetMachine &tm) :
>>       MachineFunctionPass(ID),
>> @@ -403,6 +408,31 @@ Counters SIInsertWaits::handleOperands(MachineInstr
>> &MI) {
>>     return Result;
>>   }
>>
>> +void SIInsertWaits::handleSendMsg(MachineBasicBlock &MBB,
>> +                                  MachineBasicBlock::iterator I)
>> +{
>>
>> Brace on  previous line
>>
>> +  if (TRI->ST.getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS)
>> +    return;
>> +
>> +  // There must be "S_NOP 0" between an instruction writing M0 and
>> S_SENDMSG.
>>
>> I think this should be handled by the scheduling hazard recognizer. I have
>> this partially implemented already, but this is probably fine for now.
>>
>> +  if (LastInstWritesM0 && I->getOpcode() == AMDGPU::S_SENDMSG) {
>> +    BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_NOP)).addImm(0);
>> +    LastInstWritesM0 = false;
>> +    return;
>> +  }
>> +
>> +  // Set whether this instruction sets M0
>> +  LastInstWritesM0 = false;
>> +
>> +  unsigned NumOperands = I->getNumOperands();
>> +  for (unsigned i = 0; i < NumOperands; i++) {
>> +    const MachineOperand &Op = I->getOperand(i);
>> +
>> +    if (Op.isReg() && Op.isDef() && Op.getReg() == AMDGPU::M0)
>> +      LastInstWritesM0 = true;
>> +  }
>> +}
>> +
>>   // FIXME: Insert waits listed in Table 4.2 "Required User-Inserted Wait
>> States"
>>   // around other non-memory instructions.
>>   bool SIInsertWaits::runOnMachineFunction(MachineFunction &MF) {
>> @@ -417,6 +447,7 @@ bool SIInsertWaits::runOnMachineFunction(MachineFunction
>> &MF) {
>>     WaitedOn = ZeroCounts;
>>     LastIssued = ZeroCounts;
>>     LastOpcodeType = OTHER;
>> +  LastInstWritesM0 = false;
>>
>>     memset(&UsedRegs, 0, sizeof(UsedRegs));
>>     memset(&DefinedRegs, 0, sizeof(DefinedRegs));
>> @@ -433,6 +464,8 @@ bool SIInsertWaits::runOnMachineFunction(MachineFunction
>> &MF) {
>>           Changes |= insertWait(MBB, I, LastIssued);
>>         else
>>           Changes |= insertWait(MBB, I, handleOperands(*I));
>> +
>> +      handleSendMsg(MBB, I);
>>         pushInstruction(MBB, I);
>>       }
>>
>> diff --git a/test/CodeGen/R600/llvm.SI.sendmsg-m0.ll
>> b/test/CodeGen/R600/llvm.SI.sendmsg-m0.ll
>> new file mode 100644
>> index 0000000..4de8993
>> --- /dev/null
>> +++ b/test/CodeGen/R600/llvm.SI.sendmsg-m0.ll
>> @@ -0,0 +1,25 @@
>> +;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs | FileCheck
>> --check-prefix=SI %s
>> +;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs | FileCheck
>> --check-prefix=VI %s
>> +
>> +; SI-LABEL: {{^}}main:
>>
>> These should use a common check prefix for the -LABEL and other common parts
>>
>> +; SI: s_mov_b32 m0, s0
>> +; SI-NEXT: s_sendmsg Gs_done(nop)
>> +; SI-NEXT: s_endpgm
>> +
>> +; VI-LABEL: {{^}}main:
>> +; VI: s_mov_b32 m0, s0
>> +; VI-NEXT: s_nop 0
>> +; VI-NEXT: s_sendmsg Gs_done(nop)
>> +; VI-NEXT: s_endpgm
>> +
>> +define void @main(i32 inreg %a) #0 {
>> +main_body:
>> +  call void @llvm.SI.sendmsg(i32 3, i32 %a)
>> +  ret void
>> +}
>> +
>> +; Function Attrs: nounwind
>> +declare void @llvm.SI.sendmsg(i32, i32) #1
>> +
>> +attributes #0 = { "ShaderType"="2" "unsafe-fp-math"="true" }
>> +attributes #1 = { nounwind }
>> --
>> 2.1.0
>>
>>
>> 0002-R600-SI-Determine-target-specific-encoding-of-READLA.patch
>>
>>  From 259e212087782161d4b4ff069b768cd3f04ff0eb Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
>> Date: Mon, 26 Jan 2015 22:41:09 +0100
>> Subject: [PATCH 2/8] R600/SI: Determine target-specific encoding of READLANE
>>   and WRITELANE early
>>
>> These are VOP2 on SI and VOP3 on VI, and their pseudos are neither, which
>> can
>> be a problem. In order to make isVOP2 and isVOP3 queries behave as expected,
>> the encoding must be determined first.
>>
>> This doesn't fix any known issue, but better safe than sorry.
>> ---
>>   lib/Target/R600/SIRegisterInfo.cpp | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/Target/R600/SIRegisterInfo.cpp
>> b/lib/Target/R600/SIRegisterInfo.cpp
>> index 380c98b..2bc6416 100644
>> --- a/lib/Target/R600/SIRegisterInfo.cpp
>> +++ b/lib/Target/R600/SIRegisterInfo.cpp
>> @@ -183,7 +183,9 @@ void
>> SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
>>              Ctx.emitError("Ran out of VGPRs for spilling SGPR");
>>           }
>>
>> -        BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_WRITELANE_B32),
>> Spill.VGPR)
>> +        BuildMI(*MBB, MI, DL,
>> +                TII->get(TII->pseudoToMCOpcode(AMDGPU::V_WRITELANE_B32)),
>>
>> I would introduce a helper TII->getPseudoMCOpcode() or something like that
>>
>> +                Spill.VGPR)
>>                   .addReg(SubReg)
>>                   .addImm(Spill.Lane);
>>
>> @@ -217,7 +219,9 @@ void
>> SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
>>             SubReg = RS->scavengeRegister(&AMDGPU::SGPR_32RegClass, MI, 0);
>>           }
>>
>> -        BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_READLANE_B32), SubReg)
>> +        BuildMI(*MBB, MI, DL,
>> +                TII->get(TII->pseudoToMCOpcode(AMDGPU::V_READLANE_B32)),
>> +                SubReg)
>>                   .addReg(Spill.VGPR)
>>                   .addImm(Spill.Lane)
>>                   .addReg(MI->getOperand(0).getReg(),
>> RegState::ImplicitDefine);
>> --
>> 2.1.0
>>
>>
>> 0003-R600-SI-Trivial-instruction-definition-corrections-f.patch
>>
>>  From 9e1f9c0f62b6a1f54da6d2cdbbf09bb671902632 Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
>> Date: Tue, 27 Jan 2015 13:04:32 +0100
>> Subject: [PATCH 3/8] R600/SI: Trivial instruction definition corrections for
>>   VI
>>
>> - V_MAC_LEGACY_F32 exists on VI, but it's VOP3-only.
>>
>> - Remove V_MUL_LO_U32, because it's identical to V_MUL_LO_I32.
>>    Both instructions are even defined the same on VI.
>>
>>
>> I'm not sure removing this is ideal, since we probably want the AsmParser to
>> be able to understand it.
>>
>>
>> 0004-R600-SI-Remove-VOP2_REV-definitions-from-target-spec.patch
>>
>>  From 8a0f3ecec050dc28d8cce14531ae8720c08b5e17 Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
>> Date: Wed, 14 Jan 2015 20:47:28 +0100
>> Subject: [PATCH 4/8] R600/SI: Remove VOP2_REV definitions from
>> target-specific
>>   instructions
>>
>> The getCommute* functions are only used with pseudos, so this commit doesn't
>> change anything.
>>
>> The issue with missing non-rev versions of shift instructions on VI will
>> fixed
>> separately.
>> ---
>>   lib/Target/R600/SIInstrInfo.td    | 45
>> +++++++++++++++++----------------------
>>   lib/Target/R600/SIInstructions.td |  9 +++-----
>>   2 files changed, 22 insertions(+), 32 deletions(-)
>>
>> diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
>> index a4e258e..5699d49 100644
>> --- a/lib/Target/R600/SIInstrInfo.td
>> +++ b/lib/Target/R600/SIInstrInfo.td
>> @@ -850,25 +850,22 @@ class VOP2_Pseudo <dag outs, dag ins, list<dag>
>> pattern, string opName> :
>>   }
>>
>>   multiclass VOP2SI_m <vop2 op, dag outs, dag ins, string asm, list<dag>
>> pattern,
>> -                     string opName, string revOpSI> {
>> +                     string opName, string revOp> {
>>     def "" : VOP2_Pseudo <outs, ins, pattern, opName>,
>> -           VOP2_REV<revOpSI#"_e32", !eq(revOpSI, opName)>;
>> +           VOP2_REV<revOp#"_e32", !eq(revOp, opName)>;
>>
>>     def _si : VOP2 <op.SI, outs, ins, opName#asm, []>,
>> -            VOP2_REV<revOpSI#"_e32_si", !eq(revOpSI, opName)>,
>>               SIMCInstr <opName#"_e32", SISubtarget.SI>;
>>   }
>>
>>   multiclass VOP2_m <vop2 op, dag outs, dag ins, string asm, list<dag>
>> pattern,
>> -                   string opName, string revOpSI, string revOpVI> {
>> +                   string opName, string revOp> {
>>     def "" : VOP2_Pseudo <outs, ins, pattern, opName>,
>> -           VOP2_REV<revOpSI#"_e32", !eq(revOpSI, opName)>;
>> +           VOP2_REV<revOp#"_e32", !eq(revOp, opName)>;
>>
>>     def _si : VOP2 <op.SI, outs, ins, opName#asm, []>,
>> -            VOP2_REV<revOpSI#"_e32_si", !eq(revOpSI, opName)>,
>>               SIMCInstr <opName#"_e32", SISubtarget.SI>;
>>     def _vi : VOP2 <op.VI, outs, ins, opName#asm, []>,
>> -            VOP2_REV<revOpVI#"_e32_vi", !eq(revOpVI, opName)>,
>>               SIMCInstr <opName#"_e32", SISubtarget.VI>;
>>   }
>>
>> @@ -942,20 +939,18 @@ multiclass VOP3_1_m <vop op, dag outs, dag ins, string
>> asm,
>>   }
>>
>>   multiclass VOP3_2_m <vop op, dag outs, dag ins, string asm,
>> -                     list<dag> pattern, string opName, string revOpSI,
>> string revOpVI,
>> +                     list<dag> pattern, string opName, string revOp,
>>                        bit HasMods = 1, bit UseFullOp = 0> {
>>
>>     def "" : VOP3_Pseudo <outs, ins, pattern, opName>,
>> -           VOP2_REV<revOpSI#"_e64", !eq(revOpSI, opName)>;
>> +           VOP2_REV<revOp#"_e64", !eq(revOp, opName)>;
>>
>>     def _si : VOP3_Real_si <op.SI3,
>>                 outs, ins, asm, opName>,
>> -            VOP2_REV<revOpSI#"_e64_si", !eq(revOpSI, opName)>,
>>               VOP3DisableFields<1, 0, HasMods>;
>>
>>     def _vi : VOP3_Real_vi <op.VI3,
>>                 outs, ins, asm, opName>,
>> -            VOP2_REV<revOpVI#"_e64_vi", !eq(revOpVI, opName)>,
>>               VOP3DisableFields<1, 0, HasMods>;
>>   }
>>
>> @@ -971,14 +966,12 @@ multiclass VOP3b_2_m <vop op, dag outs, dag ins,
>> string asm,
>>     let sdst = SIOperand.VCC, Defs = [VCC] in {
>>       def _si : VOP3b <op.SI3, outs, ins, asm, []>,
>>                 VOP3DisableFields<1, 0, HasMods>,
>> -              SIMCInstr<opName#"_e64", SISubtarget.SI>,
>> -              VOP2_REV<revOp#"_e64_si", !eq(revOp, opName)>;
>> +              SIMCInstr<opName#"_e64", SISubtarget.SI>;
>>
>>       // TODO: Do we need this VI variant here?
>>       /*def _vi : VOP3b_vi <op.VI3, outs, ins, asm, []>,
>>                 VOP3DisableFields<1, 0, HasMods>,
>> -              SIMCInstr<opName#"_e64", SISubtarget.VI>,
>> -              VOP2_REV<revOp#"_e64_vi", !eq(revOp, opName)>;*/
>> +              SIMCInstr<opName#"_e64", SISubtarget.VI>;*/
>>     } // End sdst = SIOperand.VCC, Defs = [VCC]
>>   }
>>
>> @@ -1057,17 +1050,17 @@ multiclass VOP1InstSI <vop1 op, string opName,
>> VOPProfile P,
>>   multiclass VOP2_Helper <vop2 op, string opName, dag outs,
>>                           dag ins32, string asm32, list<dag> pat32,
>>                           dag ins64, string asm64, list<dag> pat64,
>> -                        string revOpSI, string revOpVI, bit HasMods> {
>> -  defm _e32 : VOP2_m <op, outs, ins32, asm32, pat32, opName, revOpSI,
>> revOpVI>;
>> +                        string revOp, bit HasMods> {
>> +  defm _e32 : VOP2_m <op, outs, ins32, asm32, pat32, opName, revOp>;
>>
>>     defm _e64 : VOP3_2_m <op,
>> -    outs, ins64, opName#"_e64"#asm64, pat64, opName, revOpSI, revOpVI,
>> HasMods
>> +    outs, ins64, opName#"_e64"#asm64, pat64, opName, revOp, HasMods
>>     >;
>>   }
>>
>>   multiclass VOP2Inst <vop2 op, string opName, VOPProfile P,
>>                        SDPatternOperator node = null_frag,
>> -                     string revOpSI = opName, string revOpVI = revOpSI> :
>> VOP2_Helper <
>> +                     string revOp = opName> : VOP2_Helper <
>>     op, opName, P.Outs,
>>     P.Ins32, P.Asm32, [],
>>     P.Ins64, P.Asm64,
>> @@ -1077,7 +1070,7 @@ multiclass VOP2Inst <vop2 op, string opName,
>> VOPProfile P,
>>                                         i1:$clamp, i32:$omod)),
>>                    (P.Src1VT (VOP3Mods P.Src1VT:$src1,
>> i32:$src1_modifiers))))],
>>         [(set P.DstVT:$dst, (node P.Src0VT:$src0, P.Src1VT:$src1))]),
>> -  revOpSI, revOpVI, P.HasModifiers
>> +  revOp, P.HasModifiers
>>   >;
>>
>>   multiclass VOP2b_Helper <vop2 op, string opName, dag outs,
>> @@ -1085,7 +1078,7 @@ multiclass VOP2b_Helper <vop2 op, string opName, dag
>> outs,
>>                            dag ins64, string asm64, list<dag> pat64,
>>                            string revOp, bit HasMods> {
>>
>> -  defm _e32 : VOP2_m <op, outs, ins32, asm32, pat32, opName, revOp, revOp>;
>> +  defm _e32 : VOP2_m <op, outs, ins32, asm32, pat32, opName, revOp>;
>>
>>     defm _e64 : VOP3b_2_m <op,
>>       outs, ins64, opName#"_e64"#asm64, pat64, opName, revOp, HasMods
>> @@ -1111,16 +1104,16 @@ multiclass VOP2bInst <vop2 op, string opName,
>> VOPProfile P,
>>   multiclass VOP2_VI3_Helper <vop23 op, string opName, dag outs,
>>                               dag ins32, string asm32, list<dag> pat32,
>>                               dag ins64, string asm64, list<dag> pat64,
>> -                            string revOpSI, string revOpVI, bit HasMods> {
>> -  defm _e32 : VOP2SI_m <op, outs, ins32, asm32, pat32, opName, revOpSI>;
>> +                            string revOp, bit HasMods> {
>> +  defm _e32 : VOP2SI_m <op, outs, ins32, asm32, pat32, opName, revOp>;
>>
>>     defm _e64 : VOP3_2_m <op, outs, ins64, opName#"_e64"#asm64, pat64,
>> opName,
>> -                        revOpSI, revOpVI, HasMods>;
>> +                        revOp, HasMods>;
>>   }
>>
>>   multiclass VOP2_VI3_Inst <vop23 op, string opName, VOPProfile P,
>>                             SDPatternOperator node = null_frag,
>> -                          string revOpSI = opName, string revOpVI =
>> revOpSI>
>> +                          string revOp = opName>
>>                             : VOP2_VI3_Helper <
>>     op, opName, P.Outs,
>>     P.Ins32, P.Asm32, [],
>> @@ -1131,7 +1124,7 @@ multiclass VOP2_VI3_Inst <vop23 op, string opName,
>> VOPProfile P,
>>                                         i1:$clamp, i32:$omod)),
>>                    (P.Src1VT (VOP3Mods P.Src1VT:$src1,
>> i32:$src1_modifiers))))],
>>         [(set P.DstVT:$dst, (node P.Src0VT:$src0, P.Src1VT:$src1))]),
>> -  revOpSI, revOpVI, P.HasModifiers
>> +  revOp, P.HasModifiers
>>   >;
>>
>>   class VOPC_Pseudo <dag outs, dag ins, list<dag> pattern, string opName> :
>> diff --git a/lib/Target/R600/SIInstructions.td
>> b/lib/Target/R600/SIInstructions.td
>> index 953c360..ca2abf8 100644
>> --- a/lib/Target/R600/SIInstructions.td
>> +++ b/lib/Target/R600/SIInstructions.td
>> @@ -1457,22 +1457,19 @@ defm V_MAX_U32 : VOP2Inst <vop2<0x14, 0xf>,
>> "v_max_u32", VOP_I32_I32_I32,
>>     AMDGPUumax
>>   >;
>>
>> -// No non-Rev Op on VI
>>   defm V_LSHRREV_B32 : VOP2Inst <
>>     vop2<0x16, 0x10>, "v_lshrrev_b32", VOP_I32_I32_I32, null_frag,
>> -    "v_lshr_b32", "v_lshrrev_b32"
>> +    "v_lshr_b32"
>>   >;
>>
>> -// No non-Rev OP on VI
>>   defm V_ASHRREV_I32 : VOP2Inst <
>>     vop2<0x18, 0x11>, "v_ashrrev_i32", VOP_I32_I32_I32, null_frag,
>> -    "v_ashr_i32", "v_ashrrev_i32"
>> +    "v_ashr_i32"
>>   >;
>>
>> -// No non-Rev OP on VI
>>   defm V_LSHLREV_B32 : VOP2Inst <
>>     vop2<0x1a, 0x12>, "v_lshlrev_b32", VOP_I32_I32_I32, null_frag,
>> -    "v_lshl_b32", "v_lshlrev_b32"
>> +    "v_lshl_b32"
>>   >;
>>
>>   defm V_AND_B32 : VOP2Inst <vop2<0x1b, 0x13>, "v_and_b32",
>> --
>> 2.1.0
>>
>>
>> 0005-R600-SI-Don-t-generate-non-existent-LSHL-LSHR-ASHR-B.patch
>>
>>  From 448f8a654ac800b80eba512545425bcb0e3f8cc9 Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
>> Date: Tue, 27 Jan 2015 15:29:32 +0100
>> Subject: [PATCH 5/8] R600/SI: Don't generate non-existent LSHL, LSHR, ASHR
>> B32
>>   variants on VI
>>
>> This can happen when a REV instruction is commuted.
>>
>> The trick is not to define the _vi versions of instructions, which has these
>> consequences:
>> - code generation will always fail if a pseudo cannot be lowered
>>    (very useful to catch bugs where an unsupported instruction somehow makes
>>     it to the printer)
>> - ability to query if a pseudo can be lowered, which is done in
>> commuteOpcode
>>    to prevent REV from commuting to non-REV on VI
>> ---
>>   lib/Target/R600/SIInstrInfo.cpp   |  8 ++++++--
>>   lib/Target/R600/SIInstrInfo.td    | 34 ++++++++++++++++++++++++++++++----
>>   lib/Target/R600/SIInstructions.td | 10 +++++-----
>>   test/CodeGen/R600/shl.ll          | 25 ++++++++++++++++++++++++-
>>   test/CodeGen/R600/sra.ll          | 30 +++++++++++++++++++++++++++++-
>>   5 files changed, 94 insertions(+), 13 deletions(-)
>>
>> diff --git a/lib/Target/R600/SIInstrInfo.cpp
>> b/lib/Target/R600/SIInstrInfo.cpp
>> index 80b560e..53a1d8b 100644
>> --- a/lib/Target/R600/SIInstrInfo.cpp
>> +++ b/lib/Target/R600/SIInstrInfo.cpp
>> @@ -408,11 +408,15 @@ unsigned SIInstrInfo::commuteOpcode(unsigned Opcode)
>> const {
>>     int NewOpc;
>>
>>     // Try to map original to commuted opcode
>> -  if ((NewOpc = AMDGPU::getCommuteRev(Opcode)) != -1)
>> +  NewOpc = AMDGPU::getCommuteRev(Opcode);
>> +  // Check if the commuted (REV) opcode exists on the target.
>> +  if (NewOpc != -1 && pseudoToMCOpcode(NewOpc) != -1)
>>       return NewOpc;
>>
>>     // Try to map commuted to original opcode
>> -  if ((NewOpc = AMDGPU::getCommuteOrig(Opcode)) != -1)
>> +  NewOpc = AMDGPU::getCommuteOrig(Opcode);
>> +  // Check if the original (non-REV) opcode exists on the target.
>> +  if (NewOpc != -1 && pseudoToMCOpcode(NewOpc) != -1)
>>       return NewOpc;
>>
>>     return Opcode;
>> diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
>> index 5699d49..fd0dfd3 100644
>> --- a/lib/Target/R600/SIInstrInfo.td
>> +++ b/lib/Target/R600/SIInstrInfo.td
>> @@ -945,13 +945,24 @@ multiclass VOP3_2_m <vop op, dag outs, dag ins, string
>> asm,
>>     def "" : VOP3_Pseudo <outs, ins, pattern, opName>,
>>              VOP2_REV<revOp#"_e64", !eq(revOp, opName)>;
>>
>> -  def _si : VOP3_Real_si <op.SI3,
>> -              outs, ins, asm, opName>,
>> +  def _si : VOP3_Real_si <op.SI3, outs, ins, asm, opName>,
>>               VOP3DisableFields<1, 0, HasMods>;
>>
>> -  def _vi : VOP3_Real_vi <op.VI3,
>> -              outs, ins, asm, opName>,
>> +  def _vi : VOP3_Real_vi <op.VI3, outs, ins, asm, opName>,
>> +            VOP3DisableFields<1, 0, HasMods>;
>> +}
>> +
>> +multiclass VOP3SI_2_m <vop op, dag outs, dag ins, string asm,
>> +                     list<dag> pattern, string opName, string revOp,
>> +                     bit HasMods = 1, bit UseFullOp = 0> {
>> +
>> +  def "" : VOP3_Pseudo <outs, ins, pattern, opName>,
>> +           VOP2_REV<revOp#"_e64", !eq(revOp, opName)>;
>> +
>> +  def _si : VOP3_Real_si <op.SI3, outs, ins, asm, opName>,
>>               VOP3DisableFields<1, 0, HasMods>;
>> +
>> +  // No VI instruction. This class is for SI only.
>>   }
>>
>>   multiclass VOP3b_2_m <vop op, dag outs, dag ins, string asm,
>> @@ -1073,6 +1084,21 @@ multiclass VOP2Inst <vop2 op, string opName,
>> VOPProfile P,
>>     revOp, P.HasModifiers
>>   >;
>>
>> +multiclass VOP2InstSI <vop2 op, string opName, VOPProfile P,
>> +                       SDPatternOperator node = null_frag,
>> +                       string revOp = opName> {
>> +  defm _e32 : VOP2SI_m <op, P.Outs, P.Ins32, P.Asm32, [], opName, revOp>;
>> +
>> +  defm _e64 : VOP3SI_2_m <op, P.Outs, P.Ins64, opName#"_e64"#P.Asm64,
>> +    !if(P.HasModifiers,
>> +        [(set P.DstVT:$dst,
>> +             (node (P.Src0VT (VOP3Mods0 P.Src0VT:$src0,
>> i32:$src0_modifiers,
>> +                                        i1:$clamp, i32:$omod)),
>> +                   (P.Src1VT (VOP3Mods P.Src1VT:$src1,
>> i32:$src1_modifiers))))],
>> +        [(set P.DstVT:$dst, (node P.Src0VT:$src0, P.Src1VT:$src1))]),
>> +    opName, revOp, P.HasModifiers>;
>> +}
>> +
>>   multiclass VOP2b_Helper <vop2 op, string opName, dag outs,
>>                            dag ins32, string asm32, list<dag> pat32,
>>                            dag ins64, string asm64, list<dag> pat64,
>> diff --git a/lib/Target/R600/SIInstructions.td
>> b/lib/Target/R600/SIInstructions.td
>> index ca2abf8..e62306d 100644
>> --- a/lib/Target/R600/SIInstructions.td
>> +++ b/lib/Target/R600/SIInstructions.td
>> @@ -1540,21 +1540,21 @@ defm V_WRITELANE_B32 : VOP2SI_3VI_m <
>>   // These instructions only exist on SI and CI
>>   let SubtargetPredicate = isSICI in {
>>
>> -defm V_MIN_LEGACY_F32 : VOP2Inst <vop2<0xd>, "v_min_legacy_f32",
>> +defm V_MIN_LEGACY_F32 : VOP2InstSI <vop2<0xd>, "v_min_legacy_f32",
>>     VOP_F32_F32_F32, AMDGPUfmin_legacy
>>   >;
>> -defm V_MAX_LEGACY_F32 : VOP2Inst <vop2<0xe>, "v_max_legacy_f32",
>> +defm V_MAX_LEGACY_F32 : VOP2InstSI <vop2<0xe>, "v_max_legacy_f32",
>>     VOP_F32_F32_F32, AMDGPUfmax_legacy
>>   >;
>>
>>   let isCommutable = 1 in {
>> -defm V_LSHR_B32 : VOP2Inst <vop2<0x15>, "v_lshr_b32", VOP_I32_I32_I32,
>> srl>;
>> -defm V_ASHR_I32 : VOP2Inst <vop2<0x17>, "v_ashr_i32",
>> +defm V_LSHR_B32 : VOP2InstSI <vop2<0x15>, "v_lshr_b32", VOP_I32_I32_I32,
>> srl>;
>> +defm V_ASHR_I32 : VOP2InstSI <vop2<0x17>, "v_ashr_i32",
>>     VOP_I32_I32_I32, sra
>>   >;
>>
>>   let hasPostISelHook = 1 in {
>> -defm V_LSHL_B32 : VOP2Inst <vop2<0x19>, "v_lshl_b32", VOP_I32_I32_I32,
>> shl>;
>> +defm V_LSHL_B32 : VOP2InstSI <vop2<0x19>, "v_lshl_b32", VOP_I32_I32_I32,
>> shl>;
>>   }
>>
>>   } // End isCommutable = 1
>> diff --git a/test/CodeGen/R600/shl.ll b/test/CodeGen/R600/shl.ll
>> index 75341a2..ff2f096 100644
>> --- a/test/CodeGen/R600/shl.ll
>> +++ b/test/CodeGen/R600/shl.ll
>> @@ -1,6 +1,6 @@
>>   ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck
>> --check-prefix=EG-CHECK %s
>>   ;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs | FileCheck
>> --check-prefix=SI-CHECK %s
>> -;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs | FileCheck
>> --check-prefix=SI-CHECK %s
>> +;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs | FileCheck
>> --check-prefix=VI-CHECK %s
>>
>>   ;EG-CHECK: {{^}}shl_v2i32:
>>   ;EG-CHECK: LSHL {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW],
>> T[0-9]+\.[XYZW]}}
>> @@ -10,6 +10,10 @@
>>   ;SI-CHECK: v_lshl_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>>   ;SI-CHECK: v_lshl_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>>
>> +;VI-CHECK: {{^}}shl_v2i32:
>> +;VI-CHECK: v_lshlrev_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>> +;VI-CHECK: v_lshlrev_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>> +
>>   define void @shl_v2i32(<2 x i32> addrspace(1)* %out, <2 x i32>
>> addrspace(1)* %in) {
>>     %b_ptr = getelementptr <2 x i32> addrspace(1)* %in, i32 1
>>     %a = load <2 x i32> addrspace(1) * %in
>> @@ -31,6 +35,12 @@ define void @shl_v2i32(<2 x i32> addrspace(1)* %out, <2 x
>> i32> addrspace(1)* %in
>>   ;SI-CHECK: v_lshl_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>>   ;SI-CHECK: v_lshl_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>>
>> +;VI-CHECK: {{^}}shl_v4i32:
>> +;VI-CHECK: v_lshlrev_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>> +;VI-CHECK: v_lshlrev_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>> +;VI-CHECK: v_lshlrev_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>> +;VI-CHECK: v_lshlrev_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>> +
>>   define void @shl_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32>
>> addrspace(1)* %in) {
>>     %b_ptr = getelementptr <4 x i32> addrspace(1)* %in, i32 1
>>     %a = load <4 x i32> addrspace(1) * %in
>> @@ -55,6 +65,9 @@ define void @shl_v4i32(<4 x i32> addrspace(1)* %out, <4 x
>> i32> addrspace(1)* %in
>>   ;SI-CHECK: {{^}}shl_i64:
>>   ;SI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>>
>> +;VI-CHECK: {{^}}shl_i64:
>> +;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +
>>   define void @shl_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %in) {
>>     %b_ptr = getelementptr i64 addrspace(1)* %in, i64 1
>>     %a = load i64 addrspace(1) * %in
>> @@ -90,6 +103,10 @@ define void @shl_i64(i64 addrspace(1)* %out, i64
>> addrspace(1)* %in) {
>>   ;SI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>>   ;SI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>>
>> +;VI-CHECK: {{^}}shl_v2i64:
>> +;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +
>>   define void @shl_v2i64(<2 x i64> addrspace(1)* %out, <2 x i64>
>> addrspace(1)* %in) {
>>     %b_ptr = getelementptr <2 x i64> addrspace(1)* %in, i64 1
>>     %a = load <2 x i64> addrspace(1) * %in
>> @@ -147,6 +164,12 @@ define void @shl_v2i64(<2 x i64> addrspace(1)* %out, <2
>> x i64> addrspace(1)* %in
>>   ;SI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>>   ;SI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>>
>> +;VI-CHECK: {{^}}shl_v4i64:
>> +;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +
>>   define void @shl_v4i64(<4 x i64> addrspace(1)* %out, <4 x i64>
>> addrspace(1)* %in) {
>>     %b_ptr = getelementptr <4 x i64> addrspace(1)* %in, i64 1
>>     %a = load <4 x i64> addrspace(1) * %in
>> diff --git a/test/CodeGen/R600/sra.ll b/test/CodeGen/R600/sra.ll
>> index f062e4c..44c1101 100644
>> --- a/test/CodeGen/R600/sra.ll
>> +++ b/test/CodeGen/R600/sra.ll
>> @@ -1,6 +1,6 @@
>>   ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck
>> --check-prefix=EG-CHECK %s
>>   ;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs | FileCheck
>> --check-prefix=SI-CHECK %s
>> -;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs | FileCheck
>> --check-prefix=SI-CHECK %s
>> +;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs | FileCheck
>> --check-prefix=VI-CHECK %s
>>
>>
>> Please remove the -CHECK parts of these. Most of the tests only use only
>> "SI", and using both naming conventions in different tests has proven to be
>> error prone.
>>
>>
>>   ;EG-CHECK-LABEL: {{^}}ashr_v2i32:
>>   ;EG-CHECK: ASHR {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW],
>> T[0-9]+\.[XYZW]}}
>> @@ -10,6 +10,10 @@
>>   ;SI-CHECK: v_ashr_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>>   ;SI-CHECK: v_ashr_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>>
>> +;VI-CHECK-LABEL: {{^}}ashr_v2i32:
>> +;VI-CHECK: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>> +;VI-CHECK: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>> +
>>   define void @ashr_v2i32(<2 x i32> addrspace(1)* %out, <2 x i32>
>> addrspace(1)* %in) {
>>     %b_ptr = getelementptr <2 x i32> addrspace(1)* %in, i32 1
>>     %a = load <2 x i32> addrspace(1) * %in
>> @@ -31,6 +35,12 @@ define void @ashr_v2i32(<2 x i32> addrspace(1)* %out, <2
>> x i32> addrspace(1)* %i
>>   ;SI-CHECK: v_ashr_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>>   ;SI-CHECK: v_ashr_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>>
>> +;VI-CHECK-LABEL: {{^}}ashr_v4i32:
>> +;VI-CHECK: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>> +;VI-CHECK: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>> +;VI-CHECK: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>> +;VI-CHECK: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>> +
>>   define void @ashr_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32>
>> addrspace(1)* %in) {
>>     %b_ptr = getelementptr <4 x i32> addrspace(1)* %in, i32 1
>>     %a = load <4 x i32> addrspace(1) * %in
>> @@ -45,6 +55,10 @@ define void @ashr_v4i32(<4 x i32> addrspace(1)* %out, <4
>> x i32> addrspace(1)* %i
>>
>>   ;SI-CHECK-LABEL: {{^}}ashr_i64:
>>   ;SI-CHECK: s_ashr_i64 s[{{[0-9]}}:{{[0-9]}}], s[{{[0-9]}}:{{[0-9]}}], 8
>> +
>> +;VI-CHECK-LABEL: {{^}}ashr_i64:
>> +;VI-CHECK: s_ashr_i64 s[{{[0-9]}}:{{[0-9]}}], s[{{[0-9]}}:{{[0-9]}}], 8
>> +
>>   define void @ashr_i64(i64 addrspace(1)* %out, i32 %in) {
>>   entry:
>>     %0 = sext i32 %in to i64
>> @@ -69,6 +83,10 @@ entry:
>>
>>   ;SI-CHECK-LABEL: {{^}}ashr_i64_2:
>>   ;SI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +
>> +;VI-CHECK-LABEL: {{^}}ashr_i64_2:
>> +;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +
>>   define void @ashr_i64_2(i64 addrspace(1)* %out, i64 addrspace(1)* %in) {
>>   entry:
>>     %b_ptr = getelementptr i64 addrspace(1)* %in, i64 1
>> @@ -109,6 +127,10 @@ entry:
>>   ;SI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>>   ;SI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>>
>> +;VI-CHECK-LABEL: {{^}}ashr_v2i64:
>> +;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +
>>   define void @ashr_v2i64(<2 x i64> addrspace(1)* %out, <2 x i64>
>> addrspace(1)* %in) {
>>     %b_ptr = getelementptr <2 x i64> addrspace(1)* %in, i64 1
>>     %a = load <2 x i64> addrspace(1) * %in
>> @@ -174,6 +196,12 @@ define void @ashr_v2i64(<2 x i64> addrspace(1)* %out,
>> <2 x i64> addrspace(1)* %i
>>   ;SI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>>   ;SI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>>
>> +;VI-CHECK-LABEL: {{^}}ashr_v4i64:
>> +;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +
>>   define void @ashr_v4i64(<4 x i64> addrspace(1)* %out, <4 x i64>
>> addrspace(1)* %in) {
>>     %b_ptr = getelementptr <4 x i64> addrspace(1)* %in, i64 1
>>     %a = load <4 x i64> addrspace(1) * %in
>> --
>> 2.1.0
>>
>>
>> 0006-R600-SI-Fix-B64-VALU-shifts-on-VI.patch
>>
>>  From cefe07504b0534fc864d2ec2189423a8208a1501 Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
>> Date: Tue, 27 Jan 2015 20:34:20 +0100
>> Subject: [PATCH 6/8] R600/SI: Fix B64 VALU shifts on VI
>>
>> SI only has standard versions. VI only has REV versions.
>> ---
>>   lib/Target/R600/SIInstrInfo.cpp   | 18 ++++++++++++++++++
>>   lib/Target/R600/SIInstrInfo.td    |  1 +
>>   lib/Target/R600/SIInstructions.td | 14 ++++++++++++++
>>   test/CodeGen/R600/rotl.i64.ll     | 28 +++++++++++++++-------------
>>   test/CodeGen/R600/rotr.i64.ll     | 28 +++++++++++++++-------------
>>   test/CodeGen/R600/shl.ll          | 14 +++++++-------
>>   test/CodeGen/R600/sra.ll          | 14 +++++++-------
>>   7 files changed, 77 insertions(+), 40 deletions(-)
>>
>> diff --git a/lib/Target/R600/SIInstrInfo.cpp
>> b/lib/Target/R600/SIInstrInfo.cpp
>> index 53a1d8b..bf8d589 100644
>> --- a/lib/Target/R600/SIInstrInfo.cpp
>> +++ b/lib/Target/R600/SIInstrInfo.cpp
>> @@ -2047,6 +2047,24 @@ void SIInstrInfo::moveToVALU(MachineInstr &TopInst)
>> const {
>>           swapOperands(Inst);
>>         }
>>         break;
>> +    case AMDGPU::S_LSHL_B64:
>> +      if (ST.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS) {
>> +        NewOpcode = AMDGPU::V_LSHLREV_B64;
>> +        swapOperands(Inst);
>> +      }
>> +      break;
>> +    case AMDGPU::S_ASHR_I64:
>> +      if (ST.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS) {
>> +        NewOpcode = AMDGPU::V_ASHRREV_I64;
>> +        swapOperands(Inst);
>> +      }
>> +      break;
>> +    case AMDGPU::S_LSHR_B64:
>> +      if (ST.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS) {
>> +        NewOpcode = AMDGPU::V_LSHRREV_B64;
>> +        swapOperands(Inst);
>> +      }
>> +      break;
>>
>>       case AMDGPU::S_BFE_U64:
>>       case AMDGPU::S_BFM_B64:
>> diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
>> index fd0dfd3..2cd5adc 100644
>> --- a/lib/Target/R600/SIInstrInfo.td
>> +++ b/lib/Target/R600/SIInstrInfo.td
>> @@ -803,6 +803,7 @@ def VOP_I1_F64_I32 : VOPProfile <[i1, f64, i32,
>> untyped]> {
>>   }
>>
>>   def VOP_I64_I64_I32 : VOPProfile <[i64, i64, i32, untyped]>;
>> +def VOP_I64_I32_I64 : VOPProfile <[i64, i32, i64, untyped]>;
>>   def VOP_I64_I64_I64 : VOPProfile <[i64, i64, i64, untyped]>;
>>
>>   def VOP_F32_F32_F32_F32 : VOPProfile <[f32, f32, f32, f32]>;
>> diff --git a/lib/Target/R600/SIInstructions.td
>> b/lib/Target/R600/SIInstructions.td
>> index e62306d..19710a3 100644
>> --- a/lib/Target/R600/SIInstructions.td
>> +++ b/lib/Target/R600/SIInstructions.td
>> @@ -1803,6 +1803,20 @@ defm V_MULLIT_F32 : VOP3Inst <vop3<0x150>,
>> "v_mullit_f32",
>>
>>   } // End SubtargetPredicate = isSICI
>>
>> +let SubtargetPredicate = isVI in {
>> +
>> +defm V_LSHLREV_B64 : VOP3Inst <vop3<0, 0x28f>, "v_lshlrev_b64",
>> +  VOP_I64_I32_I64
>> +>;
>> +defm V_LSHRREV_B64 : VOP3Inst <vop3<0, 0x290>, "v_lshrrev_b64",
>> +  VOP_I64_I32_I64
>> +>;
>> +defm V_ASHRREV_I64 : VOP3Inst <vop3<0, 0x291>, "v_ashrrev_i64",
>> +  VOP_I64_I32_I64
>> +>;
>> +
>> +} // End SubtargetPredicate = isVI
>> +
>>
>> //===----------------------------------------------------------------------===//
>>   // Pseudo Instructions
>>
>> //===----------------------------------------------------------------------===//
>> diff --git a/test/CodeGen/R600/rotl.i64.ll b/test/CodeGen/R600/rotl.i64.ll
>> index f094ece..6da17a4 100644
>> --- a/test/CodeGen/R600/rotl.i64.ll
>> +++ b/test/CodeGen/R600/rotl.i64.ll
>> @@ -1,12 +1,12 @@
>> -; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s | FileCheck
>> -check-prefix=SI -check-prefix=FUNC %s
>> -; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s | FileCheck
>> -check-prefix=SI -check-prefix=FUNC %s
>> +; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s | FileCheck
>> -check-prefix=SI -check-prefix=BOTH %s
>> +; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s | FileCheck
>> -check-prefix=VI -check-prefix=BOTH %s
>>
>> -; FUNC-LABEL: {{^}}s_rotl_i64:
>> -; SI-DAG: s_lshl_b64
>> -; SI-DAG: s_sub_i32
>> -; SI-DAG: s_lshr_b64
>> -; SI: s_or_b64
>> -; SI: s_endpgm
>> +; BOTH-LABEL: {{^}}s_rotl_i64:
>> +; BOTH-DAG: s_lshl_b64
>> +; BOTH-DAG: s_sub_i32
>> +; BOTH-DAG: s_lshr_b64
>> +; BOTH: s_or_b64
>> +; BOTH: s_endpgm
>>   define void @s_rotl_i64(i64 addrspace(1)* %in, i64 %x, i64 %y) {
>>   entry:
>>     %0 = shl i64 %x, %y
>> @@ -17,13 +17,15 @@ entry:
>>     ret void
>>   }
>>
>> -; FUNC-LABEL: {{^}}v_rotl_i64:
>> +; BOTH-LABEL: {{^}}v_rotl_i64:
>>   ; SI-DAG: v_lshl_b64
>> -; SI-DAG: v_sub_i32
>> +; VI-DAG: v_lshlrev_b64
>> +; BOTH-DAG: v_sub_i32
>>   ; SI: v_lshr_b64
>> -; SI: v_or_b32
>> -; SI: v_or_b32
>> -; SI: s_endpgm
>> +; VI: v_lshrrev_b64
>> +; BOTH: v_or_b32
>> +; BOTH: v_or_b32
>> +; BOTH: s_endpgm
>>   define void @v_rotl_i64(i64 addrspace(1)* %in, i64 addrspace(1)* %xptr, i64
>> addrspace(1)* %yptr) {
>>   entry:
>>     %x = load i64 addrspace(1)* %xptr, align 8
>> diff --git a/test/CodeGen/R600/rotr.i64.ll b/test/CodeGen/R600/rotr.i64.ll
>> index a637f71..f1d1d26 100644
>> --- a/test/CodeGen/R600/rotr.i64.ll
>> +++ b/test/CodeGen/R600/rotr.i64.ll
>> @@ -1,11 +1,11 @@
>> -; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s | FileCheck
>> -check-prefix=SI -check-prefix=FUNC %s
>> -; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s | FileCheck
>> -check-prefix=SI -check-prefix=FUNC %s
>> +; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s | FileCheck
>> -check-prefix=SI -check-prefix=BOTH %s
>> +; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s | FileCheck
>> -check-prefix=VI -check-prefix=BOTH %s
>>
>> -; FUNC-LABEL: {{^}}s_rotr_i64:
>> -; SI-DAG: s_sub_i32
>> -; SI-DAG: s_lshr_b64
>> -; SI-DAG: s_lshl_b64
>> -; SI: s_or_b64
>> +; BOTH-LABEL: {{^}}s_rotr_i64:
>> +; BOTH-DAG: s_sub_i32
>> +; BOTH-DAG: s_lshr_b64
>> +; BOTH-DAG: s_lshl_b64
>> +; BOTH: s_or_b64
>>   define void @s_rotr_i64(i64 addrspace(1)* %in, i64 %x, i64 %y) {
>>   entry:
>>     %tmp0 = sub i64 64, %y
>> @@ -16,12 +16,14 @@ entry:
>>     ret void
>>   }
>>
>> -; FUNC-LABEL: {{^}}v_rotr_i64:
>> -; SI-DAG: v_sub_i32
>> +; BOTH-LABEL: {{^}}v_rotr_i64:
>> +; BOTH-DAG: v_sub_i32
>>   ; SI-DAG: v_lshr_b64
>>   ; SI-DAG: v_lshl_b64
>> -; SI: v_or_b32
>> -; SI: v_or_b32
>> +; VI-DAG: v_lshrrev_b64
>> +; VI-DAG: v_lshlrev_b64
>> +; BOTH: v_or_b32
>> +; BOTH: v_or_b32
>>   define void @v_rotr_i64(i64 addrspace(1)* %in, i64 addrspace(1)* %xptr, i64
>> addrspace(1)* %yptr) {
>>   entry:
>>     %x = load i64 addrspace(1)* %xptr, align 8
>> @@ -34,7 +36,7 @@ entry:
>>     ret void
>>   }
>>
>> -; FUNC-LABEL: {{^}}s_rotr_v2i64:
>> +; BOTH-LABEL: {{^}}s_rotr_v2i64:
>>   define void @s_rotr_v2i64(<2 x i64> addrspace(1)* %in, <2 x i64> %x, <2 x
>> i64> %y) {
>>   entry:
>>     %tmp0 = sub <2 x i64> <i64 64, i64 64>, %y
>> @@ -45,7 +47,7 @@ entry:
>>     ret void
>>   }
>>
>> -; FUNC-LABEL: {{^}}v_rotr_v2i64:
>> +; BOTH-LABEL: {{^}}v_rotr_v2i64:
>>   define void @v_rotr_v2i64(<2 x i64> addrspace(1)* %in, <2 x i64>
>> addrspace(1)* %xptr, <2 x i64> addrspace(1)* %yptr) {
>>   entry:
>>     %x = load <2 x i64> addrspace(1)* %xptr, align 8
>> diff --git a/test/CodeGen/R600/shl.ll b/test/CodeGen/R600/shl.ll
>> index ff2f096..c6a18bf 100644
>> --- a/test/CodeGen/R600/shl.ll
>> +++ b/test/CodeGen/R600/shl.ll
>> @@ -66,7 +66,7 @@ define void @shl_v4i32(<4 x i32> addrspace(1)* %out, <4 x
>> i32> addrspace(1)* %in
>>   ;SI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>>
>>   ;VI-CHECK: {{^}}shl_i64:
>> -;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +;VI-CHECK: v_lshlrev_b64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
>> v\[[0-9]+:[0-9]+\]}}
>>
>>   define void @shl_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %in) {
>>     %b_ptr = getelementptr i64 addrspace(1)* %in, i64 1
>> @@ -104,8 +104,8 @@ define void @shl_i64(i64 addrspace(1)* %out, i64
>> addrspace(1)* %in) {
>>   ;SI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>>
>>   ;VI-CHECK: {{^}}shl_v2i64:
>> -;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> -;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +;VI-CHECK: v_lshlrev_b64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
>> v\[[0-9]+:[0-9]+\]}}
>> +;VI-CHECK: v_lshlrev_b64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
>> v\[[0-9]+:[0-9]+\]}}
>>
>>   define void @shl_v2i64(<2 x i64> addrspace(1)* %out, <2 x i64>
>> addrspace(1)* %in) {
>>     %b_ptr = getelementptr <2 x i64> addrspace(1)* %in, i64 1
>> @@ -165,10 +165,10 @@ define void @shl_v2i64(<2 x i64> addrspace(1)* %out,
>> <2 x i64> addrspace(1)* %in
>>   ;SI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>>
>>   ;VI-CHECK: {{^}}shl_v4i64:
>> -;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> -;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> -;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> -;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +;VI-CHECK: v_lshlrev_b64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
>> v\[[0-9]+:[0-9]+\]}}
>> +;VI-CHECK: v_lshlrev_b64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
>> v\[[0-9]+:[0-9]+\]}}
>> +;VI-CHECK: v_lshlrev_b64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
>> v\[[0-9]+:[0-9]+\]}}
>> +;VI-CHECK: v_lshlrev_b64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
>> v\[[0-9]+:[0-9]+\]}}
>>
>>   define void @shl_v4i64(<4 x i64> addrspace(1)* %out, <4 x i64>
>> addrspace(1)* %in) {
>>     %b_ptr = getelementptr <4 x i64> addrspace(1)* %in, i64 1
>> diff --git a/test/CodeGen/R600/sra.ll b/test/CodeGen/R600/sra.ll
>> index 44c1101..7b461ca 100644
>> --- a/test/CodeGen/R600/sra.ll
>> +++ b/test/CodeGen/R600/sra.ll
>> @@ -85,7 +85,7 @@ entry:
>>   ;SI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>>
>>   ;VI-CHECK-LABEL: {{^}}ashr_i64_2:
>> -;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +;VI-CHECK: v_ashrrev_i64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
>> v\[[0-9]+:[0-9]+\]}}
>>
>>   define void @ashr_i64_2(i64 addrspace(1)* %out, i64 addrspace(1)* %in) {
>>   entry:
>> @@ -128,8 +128,8 @@ entry:
>>   ;SI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>>
>>   ;VI-CHECK-LABEL: {{^}}ashr_v2i64:
>> -;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> -;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +;VI-CHECK: v_ashrrev_i64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
>> v\[[0-9]+:[0-9]+\]}}
>> +;VI-CHECK: v_ashrrev_i64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
>> v\[[0-9]+:[0-9]+\]}}
>>
>>   define void @ashr_v2i64(<2 x i64> addrspace(1)* %out, <2 x i64>
>> addrspace(1)* %in) {
>>     %b_ptr = getelementptr <2 x i64> addrspace(1)* %in, i64 1
>> @@ -197,10 +197,10 @@ define void @ashr_v2i64(<2 x i64> addrspace(1)* %out,
>> <2 x i64> addrspace(1)* %i
>>   ;SI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>>
>>   ;VI-CHECK-LABEL: {{^}}ashr_v4i64:
>> -;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> -;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> -;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> -;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>> +;VI-CHECK: v_ashrrev_i64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
>> v\[[0-9]+:[0-9]+\]}}
>> +;VI-CHECK: v_ashrrev_i64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
>> v\[[0-9]+:[0-9]+\]}}
>> +;VI-CHECK: v_ashrrev_i64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
>> v\[[0-9]+:[0-9]+\]}}
>> +;VI-CHECK: v_ashrrev_i64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
>> v\[[0-9]+:[0-9]+\]}}
>>
>>   define void @ashr_v4i64(<4 x i64> addrspace(1)* %out, <4 x i64>
>> addrspace(1)* %in) {
>>     %b_ptr = getelementptr <4 x i64> addrspace(1)* %in, i64 1
>> --
>> 2.1.0
>>
>>
>> 0007-R600-SI-Rewrite-VOP1InstSI-to-contain-a-pseudo-and-_.patch
>>
>>  From 4457a80bbb0972a530a1294179347b6e99bfa21c Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
>> Date: Tue, 27 Jan 2015 16:28:47 +0100
>> Subject: [PATCH 7/8] R600/SI: Rewrite VOP1InstSI to contain a pseudo and _si
>>   opcode
>>
>> What this does is that if you accidentally select these instructions on VI,
>> the code generation will fail, because the pseudo -> _vi mapping will be
>> undefined.
>>
>> The idea is to be able to catch possible future bugs easily.
>> ---
>>   lib/Target/R600/SIInstrInfo.td | 30 +++++++++++++++++++++++-------
>>   1 file changed, 23 insertions(+), 7 deletions(-)
>>
>> diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
>> index 2cd5adc..c16b84b 100644
>> --- a/lib/Target/R600/SIInstrInfo.td
>> +++ b/lib/Target/R600/SIInstrInfo.td
>> @@ -843,6 +843,15 @@ multiclass VOP1_m <vop1 op, dag outs, dag ins, string
>> asm, list<dag> pattern,
>>               SIMCInstr <opName#"_e32", SISubtarget.VI>;
>>   }
>>
>> +multiclass VOP1SI_m <vop1 op, dag outs, dag ins, string asm, list<dag>
>> pattern,
>> +                   string opName> {
>> +  def "" : VOP1_Pseudo <outs, ins, pattern, opName>;
>> +
>> +  def _si : VOP1<op.SI, outs, ins, asm, []>,
>> +            SIMCInstr <opName#"_e32", SISubtarget.SI>;
>> +  // No VI instruction. This class is for SI only.
>> +}
>> +
>>   class VOP2_Pseudo <dag outs, dag ins, list<dag> pattern, string opName> :
>>     VOP2Common <outs, ins, "", pattern>,
>>     VOP <opName>,
>> @@ -939,6 +948,16 @@ multiclass VOP3_1_m <vop op, dag outs, dag ins, string
>> asm,
>>               VOP3DisableFields<0, 0, HasMods>;
>>   }
>>
>> +multiclass VOP3SI_1_m <vop op, dag outs, dag ins, string asm,
>> +                     list<dag> pattern, string opName, bit HasMods = 1> {
>> +
>> +  def "" : VOP3_Pseudo <outs, ins, pattern, opName>;
>> +
>> +  def _si : VOP3_Real_si <op.SI3, outs, ins, asm, opName>,
>> +            VOP3DisableFields<0, 0, HasMods>;
>> +  // No VI instruction. This class is for SI only.
>> +}
>> +
>>   multiclass VOP3_2_m <vop op, dag outs, dag ins, string asm,
>>                        list<dag> pattern, string opName, string revOp,
>>                        bit HasMods = 1, bit UseFullOp = 0> {
>> @@ -1046,17 +1065,14 @@ multiclass VOP1Inst <vop1 op, string opName,
>> VOPProfile P,
>>   multiclass VOP1InstSI <vop1 op, string opName, VOPProfile P,
>>                          SDPatternOperator node = null_frag> {
>>
>> -  def _e32 : VOP1 <op.SI, P.Outs, P.Ins32, opName#P.Asm32, []>,
>> -             VOP <opName>;
>> +  defm _e32 : VOP1SI_m <op, P.Outs, P.Ins32, opName#P.Asm32, [], opName>;
>>
>> -  def _e64 : VOP3Common <P.Outs, P.Ins64, opName#P.Asm64,
>> +  defm _e64 : VOP3SI_1_m <op, P.Outs, P.Ins64, opName#P.Asm64,
>>       !if(P.HasModifiers,
>>         [(set P.DstVT:$dst, (node (P.Src0VT (VOP3Mods0 P.Src0VT:$src0,
>>                                   i32:$src0_modifiers, i1:$clamp,
>> i32:$omod))))],
>> -      [(set P.DstVT:$dst, (node P.Src0VT:$src0))])>,
>> -            VOP <opName>,
>> -            VOP3e <op.SI3>,
>> -            VOP3DisableFields<0, 0, P.HasModifiers>;
>> +      [(set P.DstVT:$dst, (node P.Src0VT:$src0))]),
>> +    opName, P.HasModifiers>;
>>   }
>>
>>   multiclass VOP2_Helper <vop2 op, string opName, dag outs,
>> --
>> 2.1.0
>>
>>
>> 0008-R600-SI-Remove-useless-patterns-in-VALU-which-are-al.patch
>>
>>  From 86cdd84c7a4ba10d09d8186cf80a881521681c7e Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
>> Date: Tue, 27 Jan 2015 18:57:55 +0100
>> Subject: [PATCH 8/8] R600/SI: Remove useless patterns in VALU which are
>>   already covered by SALU
>>
>> Also remove hasPostISelHook=1 from V_LSHL_B32. It's defined by InstSI
>> already.
>> ---
>>   lib/Target/R600/SIInstructions.td | 61
>> ++++++++++-----------------------------
>>   1 file changed, 16 insertions(+), 45 deletions(-)
>>
>>
>> LGTM
>>