[PATCHES] R600/SI: VI fixes for bit shifts, GS

Marek Olšák maraeo at gmail.com
Wed Jan 28 05:28:14 PST 2015


Hi Matt,

When can I expect the hazard recognizer? Will it insert S_NOPs as
required by section "3.1.2. Manually inserted wait states" in the VI
shader programming doc? There is at least one more case that we should
handle, specifically case #9. I was about to implement it, but would
it conflict with your work?

Thanks,

Marek

On Tue, Jan 27, 2015 at 11:34 PM, Matt Arsenault
<Matthew.Arsenault at amd.com> wrote:
> On 01/27/2015 02:16 PM, Marek Olšák wrote:
>
> Hi,
>
> This is another set of fixes for VI. Patches 1-2, 5-6 fix real issues.
> Patches 3-4, 7-8 are mostly cosmetic. Only patch 1 should fix an issue
> that is reproducible by piglit.
>
> I couldn't test these, because my VI hw is very unstable. I'll try and
> test Bonaire tomorrow. That said, I'm pretty sure patches 1-7 are
> important improvements over the current state. I'm not sure about
> patch 8.
>
> Please review.
>
> Michel, would you be so kind as to test the first patch whether it
> fixes the GS hang? Sorry, I'm not able to tell the difference. Please
> apply patch 1 alone and please don't update your LLVM repo (just in
> case it uncovers some other bug).
>
> Thank you very much,
>
> Marek
>
>
> 0001-R600-SI-Fix-dependency-between-an-instruction-writin.patch
>
> From 4740298959a4ebb361415019eaa15d899c80614e Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
> Date: Mon, 26 Jan 2015 16:54:35 +0100
> Subject: [PATCH 1/8] R600/SI: Fix dependency between an instruction writing
> M0
>  and S_SENDMSG on VI
>
> This fixes a hang when using an empty geometry shader.
> ---
>  lib/Target/R600/SIInsertWaits.cpp       | 33
> +++++++++++++++++++++++++++++++++
>  test/CodeGen/R600/llvm.SI.sendmsg-m0.ll | 25 +++++++++++++++++++++++++
>  2 files changed, 58 insertions(+)
>  create mode 100644 test/CodeGen/R600/llvm.SI.sendmsg-m0.ll
>
> diff --git a/lib/Target/R600/SIInsertWaits.cpp
> b/lib/Target/R600/SIInsertWaits.cpp
> index 181b116..6075001 100644
> --- a/lib/Target/R600/SIInsertWaits.cpp
> +++ b/lib/Target/R600/SIInsertWaits.cpp
> @@ -82,6 +82,8 @@ private:
>    /// \brief Type of the last opcode.
>    InstType LastOpcodeType;
>
> +  bool LastInstWritesM0;
> +
>    /// \brief Get increment/decrement amount for this instruction.
>    Counters getHwCounts(MachineInstr &MI);
>
> @@ -106,6 +108,9 @@ private:
>    /// \brief Resolve all operand dependencies to counter requirements
>    Counters handleOperands(MachineInstr &MI);
>
> +  /// \brief Insert S_NOP between an instruction writing M0 and S_SENDMSG.
> +  void handleSendMsg(MachineBasicBlock &MBB, MachineBasicBlock::iterator
> I);
> +
>  public:
>    SIInsertWaits(TargetMachine &tm) :
>      MachineFunctionPass(ID),
> @@ -403,6 +408,31 @@ Counters SIInsertWaits::handleOperands(MachineInstr
> &MI) {
>    return Result;
>  }
>
> +void SIInsertWaits::handleSendMsg(MachineBasicBlock &MBB,
> +                                  MachineBasicBlock::iterator I)
> +{
>
> Brace on  previous line
>
> +  if (TRI->ST.getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS)
> +    return;
> +
> +  // There must be "S_NOP 0" between an instruction writing M0 and
> S_SENDMSG.
>
> I think this should be handled by the scheduling hazard recognizer. I have
> this partially implemented already, but this is probably fine for now.
>
> +  if (LastInstWritesM0 && I->getOpcode() == AMDGPU::S_SENDMSG) {
> +    BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_NOP)).addImm(0);
> +    LastInstWritesM0 = false;
> +    return;
> +  }
> +
> +  // Set whether this instruction sets M0
> +  LastInstWritesM0 = false;
> +
> +  unsigned NumOperands = I->getNumOperands();
> +  for (unsigned i = 0; i < NumOperands; i++) {
> +    const MachineOperand &Op = I->getOperand(i);
> +
> +    if (Op.isReg() && Op.isDef() && Op.getReg() == AMDGPU::M0)
> +      LastInstWritesM0 = true;
> +  }
> +}
> +
>  // FIXME: Insert waits listed in Table 4.2 "Required User-Inserted Wait
> States"
>  // around other non-memory instructions.
>  bool SIInsertWaits::runOnMachineFunction(MachineFunction &MF) {
> @@ -417,6 +447,7 @@ bool SIInsertWaits::runOnMachineFunction(MachineFunction
> &MF) {
>    WaitedOn = ZeroCounts;
>    LastIssued = ZeroCounts;
>    LastOpcodeType = OTHER;
> +  LastInstWritesM0 = false;
>
>    memset(&UsedRegs, 0, sizeof(UsedRegs));
>    memset(&DefinedRegs, 0, sizeof(DefinedRegs));
> @@ -433,6 +464,8 @@ bool SIInsertWaits::runOnMachineFunction(MachineFunction
> &MF) {
>          Changes |= insertWait(MBB, I, LastIssued);
>        else
>          Changes |= insertWait(MBB, I, handleOperands(*I));
> +
> +      handleSendMsg(MBB, I);
>        pushInstruction(MBB, I);
>      }
>
> diff --git a/test/CodeGen/R600/llvm.SI.sendmsg-m0.ll
> b/test/CodeGen/R600/llvm.SI.sendmsg-m0.ll
> new file mode 100644
> index 0000000..4de8993
> --- /dev/null
> +++ b/test/CodeGen/R600/llvm.SI.sendmsg-m0.ll
> @@ -0,0 +1,25 @@
> +;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs | FileCheck
> --check-prefix=SI %s
> +;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs | FileCheck
> --check-prefix=VI %s
> +
> +; SI-LABEL: {{^}}main:
>
> These should use a common check prefix for the -LABEL and other common parts
>
> +; SI: s_mov_b32 m0, s0
> +; SI-NEXT: s_sendmsg Gs_done(nop)
> +; SI-NEXT: s_endpgm
> +
> +; VI-LABEL: {{^}}main:
> +; VI: s_mov_b32 m0, s0
> +; VI-NEXT: s_nop 0
> +; VI-NEXT: s_sendmsg Gs_done(nop)
> +; VI-NEXT: s_endpgm
> +
> +define void @main(i32 inreg %a) #0 {
> +main_body:
> +  call void @llvm.SI.sendmsg(i32 3, i32 %a)
> +  ret void
> +}
> +
> +; Function Attrs: nounwind
> +declare void @llvm.SI.sendmsg(i32, i32) #1
> +
> +attributes #0 = { "ShaderType"="2" "unsafe-fp-math"="true" }
> +attributes #1 = { nounwind }
> --
> 2.1.0
>
>
> 0002-R600-SI-Determine-target-specific-encoding-of-READLA.patch
>
> From 259e212087782161d4b4ff069b768cd3f04ff0eb Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
> Date: Mon, 26 Jan 2015 22:41:09 +0100
> Subject: [PATCH 2/8] R600/SI: Determine target-specific encoding of READLANE
>  and WRITELANE early
>
> These are VOP2 on SI and VOP3 on VI, and their pseudos are neither, which
> can
> be a problem. In order to make isVOP2 and isVOP3 queries behave as expected,
> the encoding must be determined first.
>
> This doesn't fix any known issue, but better safe than sorry.
> ---
>  lib/Target/R600/SIRegisterInfo.cpp | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/lib/Target/R600/SIRegisterInfo.cpp
> b/lib/Target/R600/SIRegisterInfo.cpp
> index 380c98b..2bc6416 100644
> --- a/lib/Target/R600/SIRegisterInfo.cpp
> +++ b/lib/Target/R600/SIRegisterInfo.cpp
> @@ -183,7 +183,9 @@ void
> SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
>             Ctx.emitError("Ran out of VGPRs for spilling SGPR");
>          }
>
> -        BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_WRITELANE_B32),
> Spill.VGPR)
> +        BuildMI(*MBB, MI, DL,
> +                TII->get(TII->pseudoToMCOpcode(AMDGPU::V_WRITELANE_B32)),
>
> I would introduce a helper TII->getPseudoMCOpcode() or something like that
>
> +                Spill.VGPR)
>                  .addReg(SubReg)
>                  .addImm(Spill.Lane);
>
> @@ -217,7 +219,9 @@ void
> SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
>            SubReg = RS->scavengeRegister(&AMDGPU::SGPR_32RegClass, MI, 0);
>          }
>
> -        BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_READLANE_B32), SubReg)
> +        BuildMI(*MBB, MI, DL,
> +                TII->get(TII->pseudoToMCOpcode(AMDGPU::V_READLANE_B32)),
> +                SubReg)
>                  .addReg(Spill.VGPR)
>                  .addImm(Spill.Lane)
>                  .addReg(MI->getOperand(0).getReg(),
> RegState::ImplicitDefine);
> --
> 2.1.0
>
>
> 0003-R600-SI-Trivial-instruction-definition-corrections-f.patch
>
> From 9e1f9c0f62b6a1f54da6d2cdbbf09bb671902632 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
> Date: Tue, 27 Jan 2015 13:04:32 +0100
> Subject: [PATCH 3/8] R600/SI: Trivial instruction definition corrections for
>  VI
>
> - V_MAC_LEGACY_F32 exists on VI, but it's VOP3-only.
>
> - Remove V_MUL_LO_U32, because it's identical to V_MUL_LO_I32.
>   Both instructions are even defined the same on VI.
>
>
> I'm not sure removing this is ideal, since we probably want the AsmParser to
> be able to understand it.
>
>
> 0004-R600-SI-Remove-VOP2_REV-definitions-from-target-spec.patch
>
> From 8a0f3ecec050dc28d8cce14531ae8720c08b5e17 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
> Date: Wed, 14 Jan 2015 20:47:28 +0100
> Subject: [PATCH 4/8] R600/SI: Remove VOP2_REV definitions from
> target-specific
>  instructions
>
> The getCommute* functions are only used with pseudos, so this commit doesn't
> change anything.
>
> The issue with missing non-rev versions of shift instructions on VI will
> fixed
> separately.
> ---
>  lib/Target/R600/SIInstrInfo.td    | 45
> +++++++++++++++++----------------------
>  lib/Target/R600/SIInstructions.td |  9 +++-----
>  2 files changed, 22 insertions(+), 32 deletions(-)
>
> diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
> index a4e258e..5699d49 100644
> --- a/lib/Target/R600/SIInstrInfo.td
> +++ b/lib/Target/R600/SIInstrInfo.td
> @@ -850,25 +850,22 @@ class VOP2_Pseudo <dag outs, dag ins, list<dag>
> pattern, string opName> :
>  }
>
>  multiclass VOP2SI_m <vop2 op, dag outs, dag ins, string asm, list<dag>
> pattern,
> -                     string opName, string revOpSI> {
> +                     string opName, string revOp> {
>    def "" : VOP2_Pseudo <outs, ins, pattern, opName>,
> -           VOP2_REV<revOpSI#"_e32", !eq(revOpSI, opName)>;
> +           VOP2_REV<revOp#"_e32", !eq(revOp, opName)>;
>
>    def _si : VOP2 <op.SI, outs, ins, opName#asm, []>,
> -            VOP2_REV<revOpSI#"_e32_si", !eq(revOpSI, opName)>,
>              SIMCInstr <opName#"_e32", SISubtarget.SI>;
>  }
>
>  multiclass VOP2_m <vop2 op, dag outs, dag ins, string asm, list<dag>
> pattern,
> -                   string opName, string revOpSI, string revOpVI> {
> +                   string opName, string revOp> {
>    def "" : VOP2_Pseudo <outs, ins, pattern, opName>,
> -           VOP2_REV<revOpSI#"_e32", !eq(revOpSI, opName)>;
> +           VOP2_REV<revOp#"_e32", !eq(revOp, opName)>;
>
>    def _si : VOP2 <op.SI, outs, ins, opName#asm, []>,
> -            VOP2_REV<revOpSI#"_e32_si", !eq(revOpSI, opName)>,
>              SIMCInstr <opName#"_e32", SISubtarget.SI>;
>    def _vi : VOP2 <op.VI, outs, ins, opName#asm, []>,
> -            VOP2_REV<revOpVI#"_e32_vi", !eq(revOpVI, opName)>,
>              SIMCInstr <opName#"_e32", SISubtarget.VI>;
>  }
>
> @@ -942,20 +939,18 @@ multiclass VOP3_1_m <vop op, dag outs, dag ins, string
> asm,
>  }
>
>  multiclass VOP3_2_m <vop op, dag outs, dag ins, string asm,
> -                     list<dag> pattern, string opName, string revOpSI,
> string revOpVI,
> +                     list<dag> pattern, string opName, string revOp,
>                       bit HasMods = 1, bit UseFullOp = 0> {
>
>    def "" : VOP3_Pseudo <outs, ins, pattern, opName>,
> -           VOP2_REV<revOpSI#"_e64", !eq(revOpSI, opName)>;
> +           VOP2_REV<revOp#"_e64", !eq(revOp, opName)>;
>
>    def _si : VOP3_Real_si <op.SI3,
>                outs, ins, asm, opName>,
> -            VOP2_REV<revOpSI#"_e64_si", !eq(revOpSI, opName)>,
>              VOP3DisableFields<1, 0, HasMods>;
>
>    def _vi : VOP3_Real_vi <op.VI3,
>                outs, ins, asm, opName>,
> -            VOP2_REV<revOpVI#"_e64_vi", !eq(revOpVI, opName)>,
>              VOP3DisableFields<1, 0, HasMods>;
>  }
>
> @@ -971,14 +966,12 @@ multiclass VOP3b_2_m <vop op, dag outs, dag ins,
> string asm,
>    let sdst = SIOperand.VCC, Defs = [VCC] in {
>      def _si : VOP3b <op.SI3, outs, ins, asm, []>,
>                VOP3DisableFields<1, 0, HasMods>,
> -              SIMCInstr<opName#"_e64", SISubtarget.SI>,
> -              VOP2_REV<revOp#"_e64_si", !eq(revOp, opName)>;
> +              SIMCInstr<opName#"_e64", SISubtarget.SI>;
>
>      // TODO: Do we need this VI variant here?
>      /*def _vi : VOP3b_vi <op.VI3, outs, ins, asm, []>,
>                VOP3DisableFields<1, 0, HasMods>,
> -              SIMCInstr<opName#"_e64", SISubtarget.VI>,
> -              VOP2_REV<revOp#"_e64_vi", !eq(revOp, opName)>;*/
> +              SIMCInstr<opName#"_e64", SISubtarget.VI>;*/
>    } // End sdst = SIOperand.VCC, Defs = [VCC]
>  }
>
> @@ -1057,17 +1050,17 @@ multiclass VOP1InstSI <vop1 op, string opName,
> VOPProfile P,
>  multiclass VOP2_Helper <vop2 op, string opName, dag outs,
>                          dag ins32, string asm32, list<dag> pat32,
>                          dag ins64, string asm64, list<dag> pat64,
> -                        string revOpSI, string revOpVI, bit HasMods> {
> -  defm _e32 : VOP2_m <op, outs, ins32, asm32, pat32, opName, revOpSI,
> revOpVI>;
> +                        string revOp, bit HasMods> {
> +  defm _e32 : VOP2_m <op, outs, ins32, asm32, pat32, opName, revOp>;
>
>    defm _e64 : VOP3_2_m <op,
> -    outs, ins64, opName#"_e64"#asm64, pat64, opName, revOpSI, revOpVI,
> HasMods
> +    outs, ins64, opName#"_e64"#asm64, pat64, opName, revOp, HasMods
>    >;
>  }
>
>  multiclass VOP2Inst <vop2 op, string opName, VOPProfile P,
>                       SDPatternOperator node = null_frag,
> -                     string revOpSI = opName, string revOpVI = revOpSI> :
> VOP2_Helper <
> +                     string revOp = opName> : VOP2_Helper <
>    op, opName, P.Outs,
>    P.Ins32, P.Asm32, [],
>    P.Ins64, P.Asm64,
> @@ -1077,7 +1070,7 @@ multiclass VOP2Inst <vop2 op, string opName,
> VOPProfile P,
>                                        i1:$clamp, i32:$omod)),
>                   (P.Src1VT (VOP3Mods P.Src1VT:$src1,
> i32:$src1_modifiers))))],
>        [(set P.DstVT:$dst, (node P.Src0VT:$src0, P.Src1VT:$src1))]),
> -  revOpSI, revOpVI, P.HasModifiers
> +  revOp, P.HasModifiers
>  >;
>
>  multiclass VOP2b_Helper <vop2 op, string opName, dag outs,
> @@ -1085,7 +1078,7 @@ multiclass VOP2b_Helper <vop2 op, string opName, dag
> outs,
>                           dag ins64, string asm64, list<dag> pat64,
>                           string revOp, bit HasMods> {
>
> -  defm _e32 : VOP2_m <op, outs, ins32, asm32, pat32, opName, revOp, revOp>;
> +  defm _e32 : VOP2_m <op, outs, ins32, asm32, pat32, opName, revOp>;
>
>    defm _e64 : VOP3b_2_m <op,
>      outs, ins64, opName#"_e64"#asm64, pat64, opName, revOp, HasMods
> @@ -1111,16 +1104,16 @@ multiclass VOP2bInst <vop2 op, string opName,
> VOPProfile P,
>  multiclass VOP2_VI3_Helper <vop23 op, string opName, dag outs,
>                              dag ins32, string asm32, list<dag> pat32,
>                              dag ins64, string asm64, list<dag> pat64,
> -                            string revOpSI, string revOpVI, bit HasMods> {
> -  defm _e32 : VOP2SI_m <op, outs, ins32, asm32, pat32, opName, revOpSI>;
> +                            string revOp, bit HasMods> {
> +  defm _e32 : VOP2SI_m <op, outs, ins32, asm32, pat32, opName, revOp>;
>
>    defm _e64 : VOP3_2_m <op, outs, ins64, opName#"_e64"#asm64, pat64,
> opName,
> -                        revOpSI, revOpVI, HasMods>;
> +                        revOp, HasMods>;
>  }
>
>  multiclass VOP2_VI3_Inst <vop23 op, string opName, VOPProfile P,
>                            SDPatternOperator node = null_frag,
> -                          string revOpSI = opName, string revOpVI =
> revOpSI>
> +                          string revOp = opName>
>                            : VOP2_VI3_Helper <
>    op, opName, P.Outs,
>    P.Ins32, P.Asm32, [],
> @@ -1131,7 +1124,7 @@ multiclass VOP2_VI3_Inst <vop23 op, string opName,
> VOPProfile P,
>                                        i1:$clamp, i32:$omod)),
>                   (P.Src1VT (VOP3Mods P.Src1VT:$src1,
> i32:$src1_modifiers))))],
>        [(set P.DstVT:$dst, (node P.Src0VT:$src0, P.Src1VT:$src1))]),
> -  revOpSI, revOpVI, P.HasModifiers
> +  revOp, P.HasModifiers
>  >;
>
>  class VOPC_Pseudo <dag outs, dag ins, list<dag> pattern, string opName> :
> diff --git a/lib/Target/R600/SIInstructions.td
> b/lib/Target/R600/SIInstructions.td
> index 953c360..ca2abf8 100644
> --- a/lib/Target/R600/SIInstructions.td
> +++ b/lib/Target/R600/SIInstructions.td
> @@ -1457,22 +1457,19 @@ defm V_MAX_U32 : VOP2Inst <vop2<0x14, 0xf>,
> "v_max_u32", VOP_I32_I32_I32,
>    AMDGPUumax
>  >;
>
> -// No non-Rev Op on VI
>  defm V_LSHRREV_B32 : VOP2Inst <
>    vop2<0x16, 0x10>, "v_lshrrev_b32", VOP_I32_I32_I32, null_frag,
> -    "v_lshr_b32", "v_lshrrev_b32"
> +    "v_lshr_b32"
>  >;
>
> -// No non-Rev OP on VI
>  defm V_ASHRREV_I32 : VOP2Inst <
>    vop2<0x18, 0x11>, "v_ashrrev_i32", VOP_I32_I32_I32, null_frag,
> -    "v_ashr_i32", "v_ashrrev_i32"
> +    "v_ashr_i32"
>  >;
>
> -// No non-Rev OP on VI
>  defm V_LSHLREV_B32 : VOP2Inst <
>    vop2<0x1a, 0x12>, "v_lshlrev_b32", VOP_I32_I32_I32, null_frag,
> -    "v_lshl_b32", "v_lshlrev_b32"
> +    "v_lshl_b32"
>  >;
>
>  defm V_AND_B32 : VOP2Inst <vop2<0x1b, 0x13>, "v_and_b32",
> --
> 2.1.0
>
>
> 0005-R600-SI-Don-t-generate-non-existent-LSHL-LSHR-ASHR-B.patch
>
> From 448f8a654ac800b80eba512545425bcb0e3f8cc9 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
> Date: Tue, 27 Jan 2015 15:29:32 +0100
> Subject: [PATCH 5/8] R600/SI: Don't generate non-existent LSHL, LSHR, ASHR
> B32
>  variants on VI
>
> This can happen when a REV instruction is commuted.
>
> The trick is not to define the _vi versions of instructions, which has these
> consequences:
> - code generation will always fail if a pseudo cannot be lowered
>   (very useful to catch bugs where an unsupported instruction somehow makes
>    it to the printer)
> - ability to query if a pseudo can be lowered, which is done in
> commuteOpcode
>   to prevent REV from commuting to non-REV on VI
> ---
>  lib/Target/R600/SIInstrInfo.cpp   |  8 ++++++--
>  lib/Target/R600/SIInstrInfo.td    | 34 ++++++++++++++++++++++++++++++----
>  lib/Target/R600/SIInstructions.td | 10 +++++-----
>  test/CodeGen/R600/shl.ll          | 25 ++++++++++++++++++++++++-
>  test/CodeGen/R600/sra.ll          | 30 +++++++++++++++++++++++++++++-
>  5 files changed, 94 insertions(+), 13 deletions(-)
>
> diff --git a/lib/Target/R600/SIInstrInfo.cpp
> b/lib/Target/R600/SIInstrInfo.cpp
> index 80b560e..53a1d8b 100644
> --- a/lib/Target/R600/SIInstrInfo.cpp
> +++ b/lib/Target/R600/SIInstrInfo.cpp
> @@ -408,11 +408,15 @@ unsigned SIInstrInfo::commuteOpcode(unsigned Opcode)
> const {
>    int NewOpc;
>
>    // Try to map original to commuted opcode
> -  if ((NewOpc = AMDGPU::getCommuteRev(Opcode)) != -1)
> +  NewOpc = AMDGPU::getCommuteRev(Opcode);
> +  // Check if the commuted (REV) opcode exists on the target.
> +  if (NewOpc != -1 && pseudoToMCOpcode(NewOpc) != -1)
>      return NewOpc;
>
>    // Try to map commuted to original opcode
> -  if ((NewOpc = AMDGPU::getCommuteOrig(Opcode)) != -1)
> +  NewOpc = AMDGPU::getCommuteOrig(Opcode);
> +  // Check if the original (non-REV) opcode exists on the target.
> +  if (NewOpc != -1 && pseudoToMCOpcode(NewOpc) != -1)
>      return NewOpc;
>
>    return Opcode;
> diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
> index 5699d49..fd0dfd3 100644
> --- a/lib/Target/R600/SIInstrInfo.td
> +++ b/lib/Target/R600/SIInstrInfo.td
> @@ -945,13 +945,24 @@ multiclass VOP3_2_m <vop op, dag outs, dag ins, string
> asm,
>    def "" : VOP3_Pseudo <outs, ins, pattern, opName>,
>             VOP2_REV<revOp#"_e64", !eq(revOp, opName)>;
>
> -  def _si : VOP3_Real_si <op.SI3,
> -              outs, ins, asm, opName>,
> +  def _si : VOP3_Real_si <op.SI3, outs, ins, asm, opName>,
>              VOP3DisableFields<1, 0, HasMods>;
>
> -  def _vi : VOP3_Real_vi <op.VI3,
> -              outs, ins, asm, opName>,
> +  def _vi : VOP3_Real_vi <op.VI3, outs, ins, asm, opName>,
> +            VOP3DisableFields<1, 0, HasMods>;
> +}
> +
> +multiclass VOP3SI_2_m <vop op, dag outs, dag ins, string asm,
> +                     list<dag> pattern, string opName, string revOp,
> +                     bit HasMods = 1, bit UseFullOp = 0> {
> +
> +  def "" : VOP3_Pseudo <outs, ins, pattern, opName>,
> +           VOP2_REV<revOp#"_e64", !eq(revOp, opName)>;
> +
> +  def _si : VOP3_Real_si <op.SI3, outs, ins, asm, opName>,
>              VOP3DisableFields<1, 0, HasMods>;
> +
> +  // No VI instruction. This class is for SI only.
>  }
>
>  multiclass VOP3b_2_m <vop op, dag outs, dag ins, string asm,
> @@ -1073,6 +1084,21 @@ multiclass VOP2Inst <vop2 op, string opName,
> VOPProfile P,
>    revOp, P.HasModifiers
>  >;
>
> +multiclass VOP2InstSI <vop2 op, string opName, VOPProfile P,
> +                       SDPatternOperator node = null_frag,
> +                       string revOp = opName> {
> +  defm _e32 : VOP2SI_m <op, P.Outs, P.Ins32, P.Asm32, [], opName, revOp>;
> +
> +  defm _e64 : VOP3SI_2_m <op, P.Outs, P.Ins64, opName#"_e64"#P.Asm64,
> +    !if(P.HasModifiers,
> +        [(set P.DstVT:$dst,
> +             (node (P.Src0VT (VOP3Mods0 P.Src0VT:$src0,
> i32:$src0_modifiers,
> +                                        i1:$clamp, i32:$omod)),
> +                   (P.Src1VT (VOP3Mods P.Src1VT:$src1,
> i32:$src1_modifiers))))],
> +        [(set P.DstVT:$dst, (node P.Src0VT:$src0, P.Src1VT:$src1))]),
> +    opName, revOp, P.HasModifiers>;
> +}
> +
>  multiclass VOP2b_Helper <vop2 op, string opName, dag outs,
>                           dag ins32, string asm32, list<dag> pat32,
>                           dag ins64, string asm64, list<dag> pat64,
> diff --git a/lib/Target/R600/SIInstructions.td
> b/lib/Target/R600/SIInstructions.td
> index ca2abf8..e62306d 100644
> --- a/lib/Target/R600/SIInstructions.td
> +++ b/lib/Target/R600/SIInstructions.td
> @@ -1540,21 +1540,21 @@ defm V_WRITELANE_B32 : VOP2SI_3VI_m <
>  // These instructions only exist on SI and CI
>  let SubtargetPredicate = isSICI in {
>
> -defm V_MIN_LEGACY_F32 : VOP2Inst <vop2<0xd>, "v_min_legacy_f32",
> +defm V_MIN_LEGACY_F32 : VOP2InstSI <vop2<0xd>, "v_min_legacy_f32",
>    VOP_F32_F32_F32, AMDGPUfmin_legacy
>  >;
> -defm V_MAX_LEGACY_F32 : VOP2Inst <vop2<0xe>, "v_max_legacy_f32",
> +defm V_MAX_LEGACY_F32 : VOP2InstSI <vop2<0xe>, "v_max_legacy_f32",
>    VOP_F32_F32_F32, AMDGPUfmax_legacy
>  >;
>
>  let isCommutable = 1 in {
> -defm V_LSHR_B32 : VOP2Inst <vop2<0x15>, "v_lshr_b32", VOP_I32_I32_I32,
> srl>;
> -defm V_ASHR_I32 : VOP2Inst <vop2<0x17>, "v_ashr_i32",
> +defm V_LSHR_B32 : VOP2InstSI <vop2<0x15>, "v_lshr_b32", VOP_I32_I32_I32,
> srl>;
> +defm V_ASHR_I32 : VOP2InstSI <vop2<0x17>, "v_ashr_i32",
>    VOP_I32_I32_I32, sra
>  >;
>
>  let hasPostISelHook = 1 in {
> -defm V_LSHL_B32 : VOP2Inst <vop2<0x19>, "v_lshl_b32", VOP_I32_I32_I32,
> shl>;
> +defm V_LSHL_B32 : VOP2InstSI <vop2<0x19>, "v_lshl_b32", VOP_I32_I32_I32,
> shl>;
>  }
>
>  } // End isCommutable = 1
> diff --git a/test/CodeGen/R600/shl.ll b/test/CodeGen/R600/shl.ll
> index 75341a2..ff2f096 100644
> --- a/test/CodeGen/R600/shl.ll
> +++ b/test/CodeGen/R600/shl.ll
> @@ -1,6 +1,6 @@
>  ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck
> --check-prefix=EG-CHECK %s
>  ;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs | FileCheck
> --check-prefix=SI-CHECK %s
> -;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs | FileCheck
> --check-prefix=SI-CHECK %s
> +;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs | FileCheck
> --check-prefix=VI-CHECK %s
>
>  ;EG-CHECK: {{^}}shl_v2i32:
>  ;EG-CHECK: LSHL {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW],
> T[0-9]+\.[XYZW]}}
> @@ -10,6 +10,10 @@
>  ;SI-CHECK: v_lshl_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>  ;SI-CHECK: v_lshl_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>
> +;VI-CHECK: {{^}}shl_v2i32:
> +;VI-CHECK: v_lshlrev_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +;VI-CHECK: v_lshlrev_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +
>  define void @shl_v2i32(<2 x i32> addrspace(1)* %out, <2 x i32>
> addrspace(1)* %in) {
>    %b_ptr = getelementptr <2 x i32> addrspace(1)* %in, i32 1
>    %a = load <2 x i32> addrspace(1) * %in
> @@ -31,6 +35,12 @@ define void @shl_v2i32(<2 x i32> addrspace(1)* %out, <2 x
> i32> addrspace(1)* %in
>  ;SI-CHECK: v_lshl_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>  ;SI-CHECK: v_lshl_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>
> +;VI-CHECK: {{^}}shl_v4i32:
> +;VI-CHECK: v_lshlrev_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +;VI-CHECK: v_lshlrev_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +;VI-CHECK: v_lshlrev_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +;VI-CHECK: v_lshlrev_b32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +
>  define void @shl_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32>
> addrspace(1)* %in) {
>    %b_ptr = getelementptr <4 x i32> addrspace(1)* %in, i32 1
>    %a = load <4 x i32> addrspace(1) * %in
> @@ -55,6 +65,9 @@ define void @shl_v4i32(<4 x i32> addrspace(1)* %out, <4 x
> i32> addrspace(1)* %in
>  ;SI-CHECK: {{^}}shl_i64:
>  ;SI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>
> +;VI-CHECK: {{^}}shl_i64:
> +;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +
>  define void @shl_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %in) {
>    %b_ptr = getelementptr i64 addrspace(1)* %in, i64 1
>    %a = load i64 addrspace(1) * %in
> @@ -90,6 +103,10 @@ define void @shl_i64(i64 addrspace(1)* %out, i64
> addrspace(1)* %in) {
>  ;SI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>  ;SI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>
> +;VI-CHECK: {{^}}shl_v2i64:
> +;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +
>  define void @shl_v2i64(<2 x i64> addrspace(1)* %out, <2 x i64>
> addrspace(1)* %in) {
>    %b_ptr = getelementptr <2 x i64> addrspace(1)* %in, i64 1
>    %a = load <2 x i64> addrspace(1) * %in
> @@ -147,6 +164,12 @@ define void @shl_v2i64(<2 x i64> addrspace(1)* %out, <2
> x i64> addrspace(1)* %in
>  ;SI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>  ;SI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>
> +;VI-CHECK: {{^}}shl_v4i64:
> +;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +
>  define void @shl_v4i64(<4 x i64> addrspace(1)* %out, <4 x i64>
> addrspace(1)* %in) {
>    %b_ptr = getelementptr <4 x i64> addrspace(1)* %in, i64 1
>    %a = load <4 x i64> addrspace(1) * %in
> diff --git a/test/CodeGen/R600/sra.ll b/test/CodeGen/R600/sra.ll
> index f062e4c..44c1101 100644
> --- a/test/CodeGen/R600/sra.ll
> +++ b/test/CodeGen/R600/sra.ll
> @@ -1,6 +1,6 @@
>  ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck
> --check-prefix=EG-CHECK %s
>  ;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs | FileCheck
> --check-prefix=SI-CHECK %s
> -;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs | FileCheck
> --check-prefix=SI-CHECK %s
> +;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs | FileCheck
> --check-prefix=VI-CHECK %s
>
>
> Please remove the -CHECK parts of these. Most of the tests only use only
> "SI", and using both naming conventions in different tests has proven to be
> error prone.
>
>
>  ;EG-CHECK-LABEL: {{^}}ashr_v2i32:
>  ;EG-CHECK: ASHR {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW],
> T[0-9]+\.[XYZW]}}
> @@ -10,6 +10,10 @@
>  ;SI-CHECK: v_ashr_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>  ;SI-CHECK: v_ashr_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>
> +;VI-CHECK-LABEL: {{^}}ashr_v2i32:
> +;VI-CHECK: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +;VI-CHECK: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +
>  define void @ashr_v2i32(<2 x i32> addrspace(1)* %out, <2 x i32>
> addrspace(1)* %in) {
>    %b_ptr = getelementptr <2 x i32> addrspace(1)* %in, i32 1
>    %a = load <2 x i32> addrspace(1) * %in
> @@ -31,6 +35,12 @@ define void @ashr_v2i32(<2 x i32> addrspace(1)* %out, <2
> x i32> addrspace(1)* %i
>  ;SI-CHECK: v_ashr_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>  ;SI-CHECK: v_ashr_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
>
> +;VI-CHECK-LABEL: {{^}}ashr_v4i32:
> +;VI-CHECK: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +;VI-CHECK: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +;VI-CHECK: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +;VI-CHECK: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
> +
>  define void @ashr_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32>
> addrspace(1)* %in) {
>    %b_ptr = getelementptr <4 x i32> addrspace(1)* %in, i32 1
>    %a = load <4 x i32> addrspace(1) * %in
> @@ -45,6 +55,10 @@ define void @ashr_v4i32(<4 x i32> addrspace(1)* %out, <4
> x i32> addrspace(1)* %i
>
>  ;SI-CHECK-LABEL: {{^}}ashr_i64:
>  ;SI-CHECK: s_ashr_i64 s[{{[0-9]}}:{{[0-9]}}], s[{{[0-9]}}:{{[0-9]}}], 8
> +
> +;VI-CHECK-LABEL: {{^}}ashr_i64:
> +;VI-CHECK: s_ashr_i64 s[{{[0-9]}}:{{[0-9]}}], s[{{[0-9]}}:{{[0-9]}}], 8
> +
>  define void @ashr_i64(i64 addrspace(1)* %out, i32 %in) {
>  entry:
>    %0 = sext i32 %in to i64
> @@ -69,6 +83,10 @@ entry:
>
>  ;SI-CHECK-LABEL: {{^}}ashr_i64_2:
>  ;SI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +
> +;VI-CHECK-LABEL: {{^}}ashr_i64_2:
> +;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +
>  define void @ashr_i64_2(i64 addrspace(1)* %out, i64 addrspace(1)* %in) {
>  entry:
>    %b_ptr = getelementptr i64 addrspace(1)* %in, i64 1
> @@ -109,6 +127,10 @@ entry:
>  ;SI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>  ;SI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>
> +;VI-CHECK-LABEL: {{^}}ashr_v2i64:
> +;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +
>  define void @ashr_v2i64(<2 x i64> addrspace(1)* %out, <2 x i64>
> addrspace(1)* %in) {
>    %b_ptr = getelementptr <2 x i64> addrspace(1)* %in, i64 1
>    %a = load <2 x i64> addrspace(1) * %in
> @@ -174,6 +196,12 @@ define void @ashr_v2i64(<2 x i64> addrspace(1)* %out,
> <2 x i64> addrspace(1)* %i
>  ;SI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>  ;SI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>
> +;VI-CHECK-LABEL: {{^}}ashr_v4i64:
> +;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +
>  define void @ashr_v4i64(<4 x i64> addrspace(1)* %out, <4 x i64>
> addrspace(1)* %in) {
>    %b_ptr = getelementptr <4 x i64> addrspace(1)* %in, i64 1
>    %a = load <4 x i64> addrspace(1) * %in
> --
> 2.1.0
>
>
> 0006-R600-SI-Fix-B64-VALU-shifts-on-VI.patch
>
> From cefe07504b0534fc864d2ec2189423a8208a1501 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
> Date: Tue, 27 Jan 2015 20:34:20 +0100
> Subject: [PATCH 6/8] R600/SI: Fix B64 VALU shifts on VI
>
> SI only has standard versions. VI only has REV versions.
> ---
>  lib/Target/R600/SIInstrInfo.cpp   | 18 ++++++++++++++++++
>  lib/Target/R600/SIInstrInfo.td    |  1 +
>  lib/Target/R600/SIInstructions.td | 14 ++++++++++++++
>  test/CodeGen/R600/rotl.i64.ll     | 28 +++++++++++++++-------------
>  test/CodeGen/R600/rotr.i64.ll     | 28 +++++++++++++++-------------
>  test/CodeGen/R600/shl.ll          | 14 +++++++-------
>  test/CodeGen/R600/sra.ll          | 14 +++++++-------
>  7 files changed, 77 insertions(+), 40 deletions(-)
>
> diff --git a/lib/Target/R600/SIInstrInfo.cpp
> b/lib/Target/R600/SIInstrInfo.cpp
> index 53a1d8b..bf8d589 100644
> --- a/lib/Target/R600/SIInstrInfo.cpp
> +++ b/lib/Target/R600/SIInstrInfo.cpp
> @@ -2047,6 +2047,24 @@ void SIInstrInfo::moveToVALU(MachineInstr &TopInst)
> const {
>          swapOperands(Inst);
>        }
>        break;
> +    case AMDGPU::S_LSHL_B64:
> +      if (ST.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS) {
> +        NewOpcode = AMDGPU::V_LSHLREV_B64;
> +        swapOperands(Inst);
> +      }
> +      break;
> +    case AMDGPU::S_ASHR_I64:
> +      if (ST.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS) {
> +        NewOpcode = AMDGPU::V_ASHRREV_I64;
> +        swapOperands(Inst);
> +      }
> +      break;
> +    case AMDGPU::S_LSHR_B64:
> +      if (ST.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS) {
> +        NewOpcode = AMDGPU::V_LSHRREV_B64;
> +        swapOperands(Inst);
> +      }
> +      break;
>
>      case AMDGPU::S_BFE_U64:
>      case AMDGPU::S_BFM_B64:
> diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
> index fd0dfd3..2cd5adc 100644
> --- a/lib/Target/R600/SIInstrInfo.td
> +++ b/lib/Target/R600/SIInstrInfo.td
> @@ -803,6 +803,7 @@ def VOP_I1_F64_I32 : VOPProfile <[i1, f64, i32,
> untyped]> {
>  }
>
>  def VOP_I64_I64_I32 : VOPProfile <[i64, i64, i32, untyped]>;
> +def VOP_I64_I32_I64 : VOPProfile <[i64, i32, i64, untyped]>;
>  def VOP_I64_I64_I64 : VOPProfile <[i64, i64, i64, untyped]>;
>
>  def VOP_F32_F32_F32_F32 : VOPProfile <[f32, f32, f32, f32]>;
> diff --git a/lib/Target/R600/SIInstructions.td
> b/lib/Target/R600/SIInstructions.td
> index e62306d..19710a3 100644
> --- a/lib/Target/R600/SIInstructions.td
> +++ b/lib/Target/R600/SIInstructions.td
> @@ -1803,6 +1803,20 @@ defm V_MULLIT_F32 : VOP3Inst <vop3<0x150>,
> "v_mullit_f32",
>
>  } // End SubtargetPredicate = isSICI
>
> +let SubtargetPredicate = isVI in {
> +
> +defm V_LSHLREV_B64 : VOP3Inst <vop3<0, 0x28f>, "v_lshlrev_b64",
> +  VOP_I64_I32_I64
> +>;
> +defm V_LSHRREV_B64 : VOP3Inst <vop3<0, 0x290>, "v_lshrrev_b64",
> +  VOP_I64_I32_I64
> +>;
> +defm V_ASHRREV_I64 : VOP3Inst <vop3<0, 0x291>, "v_ashrrev_i64",
> +  VOP_I64_I32_I64
> +>;
> +
> +} // End SubtargetPredicate = isVI
> +
>
> //===----------------------------------------------------------------------===//
>  // Pseudo Instructions
>
> //===----------------------------------------------------------------------===//
> diff --git a/test/CodeGen/R600/rotl.i64.ll b/test/CodeGen/R600/rotl.i64.ll
> index f094ece..6da17a4 100644
> --- a/test/CodeGen/R600/rotl.i64.ll
> +++ b/test/CodeGen/R600/rotl.i64.ll
> @@ -1,12 +1,12 @@
> -; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s | FileCheck
> -check-prefix=SI -check-prefix=FUNC %s
> -; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s | FileCheck
> -check-prefix=SI -check-prefix=FUNC %s
> +; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s | FileCheck
> -check-prefix=SI -check-prefix=BOTH %s
> +; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s | FileCheck
> -check-prefix=VI -check-prefix=BOTH %s
>
> -; FUNC-LABEL: {{^}}s_rotl_i64:
> -; SI-DAG: s_lshl_b64
> -; SI-DAG: s_sub_i32
> -; SI-DAG: s_lshr_b64
> -; SI: s_or_b64
> -; SI: s_endpgm
> +; BOTH-LABEL: {{^}}s_rotl_i64:
> +; BOTH-DAG: s_lshl_b64
> +; BOTH-DAG: s_sub_i32
> +; BOTH-DAG: s_lshr_b64
> +; BOTH: s_or_b64
> +; BOTH: s_endpgm
>  define void @s_rotl_i64(i64 addrspace(1)* %in, i64 %x, i64 %y) {
>  entry:
>    %0 = shl i64 %x, %y
> @@ -17,13 +17,15 @@ entry:
>    ret void
>  }
>
> -; FUNC-LABEL: {{^}}v_rotl_i64:
> +; BOTH-LABEL: {{^}}v_rotl_i64:
>  ; SI-DAG: v_lshl_b64
> -; SI-DAG: v_sub_i32
> +; VI-DAG: v_lshlrev_b64
> +; BOTH-DAG: v_sub_i32
>  ; SI: v_lshr_b64
> -; SI: v_or_b32
> -; SI: v_or_b32
> -; SI: s_endpgm
> +; VI: v_lshrrev_b64
> +; BOTH: v_or_b32
> +; BOTH: v_or_b32
> +; BOTH: s_endpgm
>  define void @v_rotl_i64(i64 addrspace(1)* %in, i64 addrspace(1)* %xptr, i64
> addrspace(1)* %yptr) {
>  entry:
>    %x = load i64 addrspace(1)* %xptr, align 8
> diff --git a/test/CodeGen/R600/rotr.i64.ll b/test/CodeGen/R600/rotr.i64.ll
> index a637f71..f1d1d26 100644
> --- a/test/CodeGen/R600/rotr.i64.ll
> +++ b/test/CodeGen/R600/rotr.i64.ll
> @@ -1,11 +1,11 @@
> -; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s | FileCheck
> -check-prefix=SI -check-prefix=FUNC %s
> -; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s | FileCheck
> -check-prefix=SI -check-prefix=FUNC %s
> +; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s | FileCheck
> -check-prefix=SI -check-prefix=BOTH %s
> +; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s | FileCheck
> -check-prefix=VI -check-prefix=BOTH %s
>
> -; FUNC-LABEL: {{^}}s_rotr_i64:
> -; SI-DAG: s_sub_i32
> -; SI-DAG: s_lshr_b64
> -; SI-DAG: s_lshl_b64
> -; SI: s_or_b64
> +; BOTH-LABEL: {{^}}s_rotr_i64:
> +; BOTH-DAG: s_sub_i32
> +; BOTH-DAG: s_lshr_b64
> +; BOTH-DAG: s_lshl_b64
> +; BOTH: s_or_b64
>  define void @s_rotr_i64(i64 addrspace(1)* %in, i64 %x, i64 %y) {
>  entry:
>    %tmp0 = sub i64 64, %y
> @@ -16,12 +16,14 @@ entry:
>    ret void
>  }
>
> -; FUNC-LABEL: {{^}}v_rotr_i64:
> -; SI-DAG: v_sub_i32
> +; BOTH-LABEL: {{^}}v_rotr_i64:
> +; BOTH-DAG: v_sub_i32
>  ; SI-DAG: v_lshr_b64
>  ; SI-DAG: v_lshl_b64
> -; SI: v_or_b32
> -; SI: v_or_b32
> +; VI-DAG: v_lshrrev_b64
> +; VI-DAG: v_lshlrev_b64
> +; BOTH: v_or_b32
> +; BOTH: v_or_b32
>  define void @v_rotr_i64(i64 addrspace(1)* %in, i64 addrspace(1)* %xptr, i64
> addrspace(1)* %yptr) {
>  entry:
>    %x = load i64 addrspace(1)* %xptr, align 8
> @@ -34,7 +36,7 @@ entry:
>    ret void
>  }
>
> -; FUNC-LABEL: {{^}}s_rotr_v2i64:
> +; BOTH-LABEL: {{^}}s_rotr_v2i64:
>  define void @s_rotr_v2i64(<2 x i64> addrspace(1)* %in, <2 x i64> %x, <2 x
> i64> %y) {
>  entry:
>    %tmp0 = sub <2 x i64> <i64 64, i64 64>, %y
> @@ -45,7 +47,7 @@ entry:
>    ret void
>  }
>
> -; FUNC-LABEL: {{^}}v_rotr_v2i64:
> +; BOTH-LABEL: {{^}}v_rotr_v2i64:
>  define void @v_rotr_v2i64(<2 x i64> addrspace(1)* %in, <2 x i64>
> addrspace(1)* %xptr, <2 x i64> addrspace(1)* %yptr) {
>  entry:
>    %x = load <2 x i64> addrspace(1)* %xptr, align 8
> diff --git a/test/CodeGen/R600/shl.ll b/test/CodeGen/R600/shl.ll
> index ff2f096..c6a18bf 100644
> --- a/test/CodeGen/R600/shl.ll
> +++ b/test/CodeGen/R600/shl.ll
> @@ -66,7 +66,7 @@ define void @shl_v4i32(<4 x i32> addrspace(1)* %out, <4 x
> i32> addrspace(1)* %in
>  ;SI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>
>  ;VI-CHECK: {{^}}shl_i64:
> -;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +;VI-CHECK: v_lshlrev_b64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
> v\[[0-9]+:[0-9]+\]}}
>
>  define void @shl_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %in) {
>    %b_ptr = getelementptr i64 addrspace(1)* %in, i64 1
> @@ -104,8 +104,8 @@ define void @shl_i64(i64 addrspace(1)* %out, i64
> addrspace(1)* %in) {
>  ;SI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>
>  ;VI-CHECK: {{^}}shl_v2i64:
> -;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> -;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +;VI-CHECK: v_lshlrev_b64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
> v\[[0-9]+:[0-9]+\]}}
> +;VI-CHECK: v_lshlrev_b64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
> v\[[0-9]+:[0-9]+\]}}
>
>  define void @shl_v2i64(<2 x i64> addrspace(1)* %out, <2 x i64>
> addrspace(1)* %in) {
>    %b_ptr = getelementptr <2 x i64> addrspace(1)* %in, i64 1
> @@ -165,10 +165,10 @@ define void @shl_v2i64(<2 x i64> addrspace(1)* %out,
> <2 x i64> addrspace(1)* %in
>  ;SI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>
>  ;VI-CHECK: {{^}}shl_v4i64:
> -;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> -;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> -;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> -;VI-CHECK: v_lshl_b64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +;VI-CHECK: v_lshlrev_b64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
> v\[[0-9]+:[0-9]+\]}}
> +;VI-CHECK: v_lshlrev_b64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
> v\[[0-9]+:[0-9]+\]}}
> +;VI-CHECK: v_lshlrev_b64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
> v\[[0-9]+:[0-9]+\]}}
> +;VI-CHECK: v_lshlrev_b64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
> v\[[0-9]+:[0-9]+\]}}
>
>  define void @shl_v4i64(<4 x i64> addrspace(1)* %out, <4 x i64>
> addrspace(1)* %in) {
>    %b_ptr = getelementptr <4 x i64> addrspace(1)* %in, i64 1
> diff --git a/test/CodeGen/R600/sra.ll b/test/CodeGen/R600/sra.ll
> index 44c1101..7b461ca 100644
> --- a/test/CodeGen/R600/sra.ll
> +++ b/test/CodeGen/R600/sra.ll
> @@ -85,7 +85,7 @@ entry:
>  ;SI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>
>  ;VI-CHECK-LABEL: {{^}}ashr_i64_2:
> -;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +;VI-CHECK: v_ashrrev_i64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
> v\[[0-9]+:[0-9]+\]}}
>
>  define void @ashr_i64_2(i64 addrspace(1)* %out, i64 addrspace(1)* %in) {
>  entry:
> @@ -128,8 +128,8 @@ entry:
>  ;SI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>
>  ;VI-CHECK-LABEL: {{^}}ashr_v2i64:
> -;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> -;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +;VI-CHECK: v_ashrrev_i64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
> v\[[0-9]+:[0-9]+\]}}
> +;VI-CHECK: v_ashrrev_i64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
> v\[[0-9]+:[0-9]+\]}}
>
>  define void @ashr_v2i64(<2 x i64> addrspace(1)* %out, <2 x i64>
> addrspace(1)* %in) {
>    %b_ptr = getelementptr <2 x i64> addrspace(1)* %in, i64 1
> @@ -197,10 +197,10 @@ define void @ashr_v2i64(<2 x i64> addrspace(1)* %out,
> <2 x i64> addrspace(1)* %i
>  ;SI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
>
>  ;VI-CHECK-LABEL: {{^}}ashr_v4i64:
> -;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> -;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> -;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> -;VI-CHECK: v_ashr_i64 {{v\[[0-9]+:[0-9]+\], v\[[0-9]+:[0-9]+\], v[0-9]+}}
> +;VI-CHECK: v_ashrrev_i64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
> v\[[0-9]+:[0-9]+\]}}
> +;VI-CHECK: v_ashrrev_i64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
> v\[[0-9]+:[0-9]+\]}}
> +;VI-CHECK: v_ashrrev_i64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
> v\[[0-9]+:[0-9]+\]}}
> +;VI-CHECK: v_ashrrev_i64 {{v\[[0-9]+:[0-9]+\], v[0-9]+,
> v\[[0-9]+:[0-9]+\]}}
>
>  define void @ashr_v4i64(<4 x i64> addrspace(1)* %out, <4 x i64>
> addrspace(1)* %in) {
>    %b_ptr = getelementptr <4 x i64> addrspace(1)* %in, i64 1
> --
> 2.1.0
>
>
> 0007-R600-SI-Rewrite-VOP1InstSI-to-contain-a-pseudo-and-_.patch
>
> From 4457a80bbb0972a530a1294179347b6e99bfa21c Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
> Date: Tue, 27 Jan 2015 16:28:47 +0100
> Subject: [PATCH 7/8] R600/SI: Rewrite VOP1InstSI to contain a pseudo and _si
>  opcode
>
> What this does is that if you accidentally select these instructions on VI,
> the code generation will fail, because the pseudo -> _vi mapping will be
> undefined.
>
> The idea is to be able to catch possible future bugs easily.
> ---
>  lib/Target/R600/SIInstrInfo.td | 30 +++++++++++++++++++++++-------
>  1 file changed, 23 insertions(+), 7 deletions(-)
>
> diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
> index 2cd5adc..c16b84b 100644
> --- a/lib/Target/R600/SIInstrInfo.td
> +++ b/lib/Target/R600/SIInstrInfo.td
> @@ -843,6 +843,15 @@ multiclass VOP1_m <vop1 op, dag outs, dag ins, string
> asm, list<dag> pattern,
>              SIMCInstr <opName#"_e32", SISubtarget.VI>;
>  }
>
> +multiclass VOP1SI_m <vop1 op, dag outs, dag ins, string asm, list<dag>
> pattern,
> +                   string opName> {
> +  def "" : VOP1_Pseudo <outs, ins, pattern, opName>;
> +
> +  def _si : VOP1<op.SI, outs, ins, asm, []>,
> +            SIMCInstr <opName#"_e32", SISubtarget.SI>;
> +  // No VI instruction. This class is for SI only.
> +}
> +
>  class VOP2_Pseudo <dag outs, dag ins, list<dag> pattern, string opName> :
>    VOP2Common <outs, ins, "", pattern>,
>    VOP <opName>,
> @@ -939,6 +948,16 @@ multiclass VOP3_1_m <vop op, dag outs, dag ins, string
> asm,
>              VOP3DisableFields<0, 0, HasMods>;
>  }
>
> +multiclass VOP3SI_1_m <vop op, dag outs, dag ins, string asm,
> +                     list<dag> pattern, string opName, bit HasMods = 1> {
> +
> +  def "" : VOP3_Pseudo <outs, ins, pattern, opName>;
> +
> +  def _si : VOP3_Real_si <op.SI3, outs, ins, asm, opName>,
> +            VOP3DisableFields<0, 0, HasMods>;
> +  // No VI instruction. This class is for SI only.
> +}
> +
>  multiclass VOP3_2_m <vop op, dag outs, dag ins, string asm,
>                       list<dag> pattern, string opName, string revOp,
>                       bit HasMods = 1, bit UseFullOp = 0> {
> @@ -1046,17 +1065,14 @@ multiclass VOP1Inst <vop1 op, string opName,
> VOPProfile P,
>  multiclass VOP1InstSI <vop1 op, string opName, VOPProfile P,
>                         SDPatternOperator node = null_frag> {
>
> -  def _e32 : VOP1 <op.SI, P.Outs, P.Ins32, opName#P.Asm32, []>,
> -             VOP <opName>;
> +  defm _e32 : VOP1SI_m <op, P.Outs, P.Ins32, opName#P.Asm32, [], opName>;
>
> -  def _e64 : VOP3Common <P.Outs, P.Ins64, opName#P.Asm64,
> +  defm _e64 : VOP3SI_1_m <op, P.Outs, P.Ins64, opName#P.Asm64,
>      !if(P.HasModifiers,
>        [(set P.DstVT:$dst, (node (P.Src0VT (VOP3Mods0 P.Src0VT:$src0,
>                                  i32:$src0_modifiers, i1:$clamp,
> i32:$omod))))],
> -      [(set P.DstVT:$dst, (node P.Src0VT:$src0))])>,
> -            VOP <opName>,
> -            VOP3e <op.SI3>,
> -            VOP3DisableFields<0, 0, P.HasModifiers>;
> +      [(set P.DstVT:$dst, (node P.Src0VT:$src0))]),
> +    opName, P.HasModifiers>;
>  }
>
>  multiclass VOP2_Helper <vop2 op, string opName, dag outs,
> --
> 2.1.0
>
>
> 0008-R600-SI-Remove-useless-patterns-in-VALU-which-are-al.patch
>
> From 86cdd84c7a4ba10d09d8186cf80a881521681c7e Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
> Date: Tue, 27 Jan 2015 18:57:55 +0100
> Subject: [PATCH 8/8] R600/SI: Remove useless patterns in VALU which are
>  already covered by SALU
>
> Also remove hasPostISelHook=1 from V_LSHL_B32. It's defined by InstSI
> already.
> ---
>  lib/Target/R600/SIInstructions.td | 61
> ++++++++++-----------------------------
>  1 file changed, 16 insertions(+), 45 deletions(-)
>
>
> LGTM
>




More information about the llvm-commits mailing list