<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 04/23/2015 12:58 PM, Tom Stellard

      wrote:<br>

    </div>

    <blockquote cite="mid:20150423195807.GD8373@yyz" type="cite">

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">Hi,

The attached patches reduce overall shader code size by preferring

v_mac_f32 over v_mad_f32 and also using the 32-bit encoding for

v_cndmask when src2 is vcc.

-Tom

</pre>

      </div>

    </blockquote>

    <br>

    <blockquote cite="mid:20150423195807.GD8373@yyz" type="cite">

      <pre wrap=""><div class="moz-txt-sig">

</div></pre>

      <br>

      <fieldset class="mimeAttachmentHeader"><legend

          class="mimeAttachmentHeaderName">0002-R600-SI-Fix-crash-on-physical-registers-in-SIInstrIn.patch</legend></fieldset>

      <br>

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">From 762f0757d4f237c474026c949e66a0f36d0d4ae7 Mon Sep 17 00:00:00 2001

From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>

Date: Mon, 20 Apr 2015 18:16:23 +0000

Subject: [PATCH 2/5] R600/SI: Fix crash on physical registers in

 SIInstrInfo::isOperandLegal()

No test case for this.  I ran into it while working on some improvements

to SIShrinkInstructions.cpp.

---

 lib/Target/R600/SIInstrInfo.cpp | 5 ++++-

 1 file changed, 4 insertions(+), 1 deletion(-)

</pre>

      </div>

    </blockquote>

    LGTM<br>

    <br>

    <blockquote cite="mid:20150423195807.GD8373@yyz" type="cite">

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">

</pre>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"><legend

          class="mimeAttachmentHeaderName">0003-R600-SI-The-SIShrinkInstructions-pass-should-only-fo.patch</legend></fieldset>

      <br>

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">From 426618d94a397e42d2c966fe192cc0642983d7f1 Mon Sep 17 00:00:00 2001

From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>

Date: Tue, 21 Apr 2015 20:31:37 +0000

Subject: [PATCH 3/5] R600/SI: The SIShrinkInstructions pass should only fold

 immediates with one use

This is convered by existing testcases and will be exposed by a future

commit.

---

 lib/Target/R600/SIShrinkInstructions.cpp | 2 +-

 1 file changed, 1 insertion(+), 1 deletion(-)

</pre>

      </div>

    </blockquote>

    LGTM<br>

    <br>

    <blockquote cite="mid:20150423195807.GD8373@yyz" type="cite">

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">

</pre>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"><legend

          class="mimeAttachmentHeaderName">0004-R600-SI-Select-mad-patterns-to-v_mac_f32.patch</legend></fieldset>

      <br>

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">From 5ca6edd9e5a1e79608720601088442be611bb8ab Mon Sep 17 00:00:00 2001

From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>

Date: Mon, 20 Apr 2015 18:18:54 +0000

Subject: [PATCH 4/5] R600/SI: Select mad patterns to v_mac_f32

The two-address instruction pass will convert these back to v_mad_f32

if necessary.

shader-db stats:

979 shaders

Totals:

SGPRS: 34792 -> 35048 (0.74 %)

VGPRS: 20740 -> 20560 (-0.87 %)

Code Size: 747712 -> 657436 (-12.07 %) bytes

LDS: 11 -> 11 (0.00 %) blocks

Scratch: 12288 -> 18432 (50.00 %) bytes per wave

Totals from affected shaders:

SGPRS: 31272 -> 31488 (0.69 %)

VGPRS: 18788 -> 18608 (-0.96 %)

Code Size: 728328 -> 638092 (-12.39 %) bytes

LDS: 11 -> 11 (0.00 %) blocks

Scratch: 12288 -> 18432 (50.00 %) bytes per wave

Increases:

SGPRS: 36 (0.04 %)

VGPRS: 13 (0.01 %)

Code Size: 0 (0.00 %)

LDS: 0 (0.00 %)

Scratch: 1 (0.00 %)

Decreases:

SGPRS: 12 (0.01 %)

VGPRS: 48 (0.05 %)

Code Size: 779 (0.80 %)

LDS: 0 (0.00 %)

Scratch: 0 (0.00 %)

---

 lib/Target/R600/AMDGPUISelDAGToDAG.cpp   |  19 ++++

 lib/Target/R600/SIFoldOperands.cpp       |  31 +++++++

 lib/Target/R600/SIInstrInfo.cpp          |  56 ++++++++++-

 lib/Target/R600/SIInstrInfo.h            |   4 +

 lib/Target/R600/SIInstrInfo.td           |   9 ++

 lib/Target/R600/SIInstructions.td        |  14 ++-

 lib/Target/R600/SIShrinkInstructions.cpp |  16 +++-

 test/CodeGen/R600/fmuladd.ll             |  30 +++---

 test/CodeGen/R600/llvm.amdgpu.lrp.ll     |   2 +-

 test/CodeGen/R600/mad-combine.ll         |  25 +++--

 test/CodeGen/R600/mad-sub.ll             |   6 +-

 test/CodeGen/R600/madak.ll               |  12 +--

 test/CodeGen/R600/madmk.ll               |  10 +-

 test/CodeGen/R600/v_mac.ll               | 155 +++++++++++++++++++++++++++++++

 14 files changed, 341 insertions(+), 48 deletions(-)

 create mode 100644 test/CodeGen/R600/v_mac.ll

diff --git a/lib/Target/R600/AMDGPUISelDAGToDAG.cpp b/lib/Target/R600/AMDGPUISelDAGToDAG.cpp

index def252a..85cdf62 100644

--- a/lib/Target/R600/AMDGPUISelDAGToDAG.cpp

+++ b/lib/Target/R600/AMDGPUISelDAGToDAG.cpp

@@ -109,8 +109,11 @@ private:

                          SDValue &Offset, SDValue &GLC) const;

   SDNode *SelectAddrSpaceCast(SDNode *N);

   bool SelectVOP3Mods(SDValue In, SDValue &Src, SDValue &SrcMods) const;

+  bool SelectVOP3NoMods(SDValue In, SDValue &Src, SDValue &SrcMods) const;

   bool SelectVOP3Mods0(SDValue In, SDValue &Src, SDValue &SrcMods,

                        SDValue &Clamp, SDValue &Omod) const;

+  bool SelectVOP3NoMods0(SDValue In, SDValue &Src, SDValue &SrcMods,

+                         SDValue &Clamp, SDValue &Omod) const;

   bool SelectVOP3Mods0Clamp(SDValue In, SDValue &Src, SDValue &SrcMods,

                             SDValue &Omod) const;

@@ -1264,6 +1267,12 @@ bool AMDGPUDAGToDAGISel::SelectVOP3Mods(SDValue In, SDValue &Src,

   return true;

 }

+bool AMDGPUDAGToDAGISel::SelectVOP3NoMods(SDValue In, SDValue &Src,

+                                         SDValue &SrcMods) const {

+  bool Res = SelectVOP3Mods(In, Src, SrcMods);

+  return Res && cast<ConstantSDNode>(SrcMods)->isNullValue();

+}

+

 bool AMDGPUDAGToDAGISel::SelectVOP3Mods0(SDValue In, SDValue &Src,

                                          SDValue &SrcMods, SDValue &Clamp,

                                          SDValue &Omod) const {

@@ -1274,6 +1283,16 @@ bool AMDGPUDAGToDAGISel::SelectVOP3Mods0(SDValue In, SDValue &Src,

   return SelectVOP3Mods(In, Src, SrcMods);

 }

+bool AMDGPUDAGToDAGISel::SelectVOP3NoMods0(SDValue In, SDValue &Src,

+                                           SDValue &SrcMods, SDValue &Clamp,

+                                           SDValue &Omod) const {

+  bool Res = SelectVOP3Mods0(In, Src, SrcMods, Clamp, Omod);

+

+  return Res && cast<ConstantSDNode>(SrcMods)->isNullValue() &&

+                cast<ConstantSDNode>(Clamp)->isNullValue() &&

+                cast<ConstantSDNode>(Omod)->isNullValue();

+}

+

 bool AMDGPUDAGToDAGISel::SelectVOP3Mods0Clamp(SDValue In, SDValue &Src,

                                               SDValue &SrcMods,

                                               SDValue &Omod) const {

diff --git a/lib/Target/R600/SIFoldOperands.cpp b/lib/Target/R600/SIFoldOperands.cpp

index 7ba5a6d..c4de645 100644

--- a/lib/Target/R600/SIFoldOperands.cpp

+++ b/lib/Target/R600/SIFoldOperands.cpp

@@ -126,11 +126,42 @@ static bool updateOperand(FoldCandidate &Fold,

   return false;

 }

+static bool isUseMIInFoldList(const std::vector<FoldCandidate> &FoldList,

+                              const MachineInstr *MI) {

+  for (auto Candidate : FoldList) {

+    if (Candidate.UseMI == MI)

+      return true;

+  }

+  return false;

+}

+

 static bool tryAddToFoldList(std::vector<FoldCandidate> &FoldList,

                              MachineInstr *MI, unsigned OpNo,

                              MachineOperand *OpToFold,

                              const SIInstrInfo *TII) {

   if (!TII->isOperandLegal(MI, OpNo, OpToFold)) {

+

+    // Special case for v_mac_f32_e64 if we are trying to fold into src2

+    unsigned Opc = MI->getOpcode();

+    if (Opc == AMDGPU::V_MAC_F32_e64 &&

+        (int)OpNo == AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src2)) {

+      // Check if changing this to a v_mad_f32 instruction will allow us to

+      // fold the operand.

+      MI->setDesc(TII->get(AMDGPU::V_MAD_F32));

+      bool FoldAsMAD = tryAddToFoldList(FoldList, MI, OpNo, OpToFold, TII);

+      if (FoldAsMAD) {

+        MI->untieRegOperand(OpNo);

+        return true;

+      }

+      MI->setDesc(TII->get(Opc));

+    }

+

+    // If we are already folding into another operand of MI, then

+    // we can't commute the instruction, otherwise we risk making the

+    // other fold illegal.

+    if (isUseMIInFoldList(FoldList, MI))

+      return false;

+

     // Operand is not legal, so try to commute the instruction to

     // see if this makes it possible to fold.

     unsigned CommuteIdx0;

diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp

index 931e984..223f0bf 100644

--- a/lib/Target/R600/SIInstrInfo.cpp

+++ b/lib/Target/R600/SIInstrInfo.cpp

@@ -906,7 +906,7 @@ bool SIInstrInfo::FoldImmediate(MachineInstr *UseMI, MachineInstr *DefMI,

     return false;

   unsigned Opc = UseMI->getOpcode();

-  if (Opc == AMDGPU::V_MAD_F32) {

+  if (Opc == AMDGPU::V_MAD_F32 || Opc == AMDGPU::V_MAC_F32_e64) {

     // Don't fold if we are using source modifiers. The new VOP2 instructions

     // don't have them.

     if (hasModifiersSet(*UseMI, AMDGPU::OpName::src0_modifiers) ||

@@ -945,9 +945,9 @@ bool SIInstrInfo::FoldImmediate(MachineInstr *UseMI, MachineInstr *DefMI,

       // instead of having to modify in place.

       // Remove these first since they are at the end.

-      UseMI->RemoveOperand(AMDGPU::getNamedOperandIdx(AMDGPU::V_MAD_F32,

+      UseMI->RemoveOperand(AMDGPU::getNamedOperandIdx(Opc,

                                                       AMDGPU::OpName::omod));

-      UseMI->RemoveOperand(AMDGPU::getNamedOperandIdx(AMDGPU::V_MAD_F32,

+      UseMI->RemoveOperand(AMDGPU::getNamedOperandIdx(Opc,

                                                       AMDGPU::OpName::clamp));

       unsigned Src1Reg = Src1->getReg();

@@ -959,6 +959,14 @@ bool SIInstrInfo::FoldImmediate(MachineInstr *UseMI, MachineInstr *DefMI,

       Src1->setReg(Src2Reg);

       Src1->setSubReg(Src2SubReg);

+      if (Opc == AMDGPU::V_MAC_F32_e64) {

+        UseMI->untieRegOperand(

+          AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src2));

+      }

+

+      UseMI->RemoveOperand(AMDGPU::getNamedOperandIdx(Opc,

+                                                      AMDGPU::OpName::src2));

+      // ChangingToImmediate adds Src2 back to the instruction.

       Src2->ChangeToImmediate(Imm);

       removeModOperands(*UseMI);

@@ -989,11 +997,17 @@ bool SIInstrInfo::FoldImmediate(MachineInstr *UseMI, MachineInstr *DefMI,

       // instead of having to modify in place.

       // Remove these first since they are at the end.

-      UseMI->RemoveOperand(AMDGPU::getNamedOperandIdx(AMDGPU::V_MAD_F32,

+      UseMI->RemoveOperand(AMDGPU::getNamedOperandIdx(Opc,

                                                       AMDGPU::OpName::omod));

-      UseMI->RemoveOperand(AMDGPU::getNamedOperandIdx(AMDGPU::V_MAD_F32,

+      UseMI->RemoveOperand(AMDGPU::getNamedOperandIdx(Opc,

                                                       AMDGPU::OpName::clamp));

+      if (Opc == AMDGPU::V_MAC_F32_e64) {

+        UseMI->untieRegOperand(

+          AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src2));

+      }

+

+      // ChangingToImmediate adds Src2 back to the instruction.

       Src2->ChangeToImmediate(Imm);

       // These come before src2.

@@ -1105,6 +1119,38 @@ bool SIInstrInfo::areMemAccessesTriviallyDisjoint(MachineInstr *MIa,

   return false;

 }

+MachineInstr *SIInstrInfo::convertToThreeAddress(MachineFunction::iterator &MBB,

+                                                MachineBasicBlock::iterator &MI,

+                                                LiveVariables *LV) const {

+

+  switch (MI->getOpcode()) {

+    default: return nullptr;

+    case AMDGPU::V_MAC_F32_e64: break;

+    case AMDGPU::V_MAC_F32_e32: {

+      const MachineOperand *Src0 = getNamedOperand(*MI, AMDGPU::OpName::src0);

+      if (Src0->isImm() && !isInlineConstant(*Src0, 4))

+        return nullptr;

+      break;

+    }

+  }

+

+  const MachineOperand *Dst = getNamedOperand(*MI, AMDGPU::OpName::dst);

+  const MachineOperand *Src0 = getNamedOperand(*MI, AMDGPU::OpName::src0);

+  const MachineOperand *Src1 = getNamedOperand(*MI, AMDGPU::OpName::src1);

+  const MachineOperand *Src2 = getNamedOperand(*MI, AMDGPU::OpName::src2);

+

+  return BuildMI(*MBB, MI, MI->getDebugLoc(), get(AMDGPU::V_MAD_F32))

+                 .addOperand(*Dst)

+                 .addImm(0) // Src0 mods

+                 .addOperand(*Src0)

+                 .addImm(0) // Src1 mods

+                 .addOperand(*Src1)

+                 .addImm(0) // Src mods

+                 .addOperand(*Src2)

+                 .addImm(0)  // clamp

+                 .addImm(0); // omod

+}

+

 bool SIInstrInfo::isInlineConstant(const APInt &Imm) const {

   int64_t SVal = Imm.getSExtValue();

   if (SVal >= -16 && SVal <= 64)

diff --git a/lib/Target/R600/SIInstrInfo.h b/lib/Target/R600/SIInstrInfo.h

index a9aa99f..45a1dec 100644

--- a/lib/Target/R600/SIInstrInfo.h

+++ b/lib/Target/R600/SIInstrInfo.h

@@ -139,6 +139,10 @@ public:

   bool FoldImmediate(MachineInstr *UseMI, MachineInstr *DefMI,

                      unsigned Reg, MachineRegisterInfo *MRI) const final;

+  MachineInstr *convertToThreeAddress(MachineFunction::iterator &MBB,

+                                      MachineBasicBlock::iterator &MI,

+                                      LiveVariables *LV) const override;

+

   bool isSALU(uint16_t Opcode) const {

     return get(Opcode).TSFlags & SIInstrFlags::SALU;

   }

diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td

index 076a0ce..6310e1f 100644

--- a/lib/Target/R600/SIInstrInfo.td

+++ b/lib/Target/R600/SIInstrInfo.td

@@ -404,9 +404,11 @@ def MUBUFOffset : ComplexPattern<i64, 6, "SelectMUBUFOffset">;

 def MUBUFOffsetAtomic : ComplexPattern<i64, 4, "SelectMUBUFOffset">;

 def VOP3Mods0 : ComplexPattern<untyped, 4, "SelectVOP3Mods0">;

+def VOP3NoMods0 : ComplexPattern<untyped, 4, "SelectVOP3NoMods0">;

 def VOP3Mods0Clamp : ComplexPattern<untyped, 3, "SelectVOP3Mods0Clamp">;

 def VOP3Mods0Clamp0OMod : ComplexPattern<untyped, 4, "SelectVOP3Mods0Clamp0OMod">;

 def VOP3Mods  : ComplexPattern<untyped, 2, "SelectVOP3Mods">;

+def VOP3NoMods : ComplexPattern<untyped, 2, "SelectVOP3NoMods">;

 //===----------------------------------------------------------------------===//

 // SI assembler operands

@@ -978,6 +980,13 @@ def VOP_MADK : VOPProfile <[f32, f32, f32, f32]> {

   field dag Ins = (ins VCSrc_32:$src0, VGPR_32:$vsrc1, u32imm:$src2);

   field string Asm = "$dst, $src0, $vsrc1, $src2";

 }

+def VOP_MAC : VOPProfile <[f32, f32, f32, f32]> {

+  let Ins32 = (ins Src0RC32:$src0, Src1RC32:$src1, VGPR_32:$src2);

+  let Ins64 = getIns64<Src0RC64, Src1RC64, RegisterOperand<VGPR_32>, 3,

+                             HasModifiers>.ret;

+  let Asm32 = getAsm32<2>.ret;

+  let Asm64 = getAsm64<2, HasModifiers>.ret;

+}

 def VOP_F64_F64_F64_F64 : VOPProfile <[f64, f64, f64, f64]>;

 def VOP_I32_I32_I32_I32 : VOPProfile <[i32, i32, i32, i32]>;

 def VOP_I64_I32_I32_I64 : VOPProfile <[i64, i32, i32, i64]>;

diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td

index 91e8c8c..9ddd3e7 100644

--- a/lib/Target/R600/SIInstructions.td

+++ b/lib/Target/R600/SIInstructions.td

@@ -1539,7 +1539,10 @@ defm V_AND_B32 : VOP2Inst <vop2<0x1b, 0x13>, "v_and_b32", VOP_I32_I32_I32>;

 defm V_OR_B32 : VOP2Inst <vop2<0x1c, 0x14>, "v_or_b32", VOP_I32_I32_I32>;

 defm V_XOR_B32 : VOP2Inst <vop2<0x1d, 0x15>, "v_xor_b32", VOP_I32_I32_I32>;

-defm V_MAC_F32 : VOP2Inst <vop2<0x1f, 0x16>, "v_mac_f32", VOP_F32_F32_F32>;

+let Constraints = "$dst = $src2", DisableEncoding="$src2",

+    isConvertibleToThreeAddress = 1 in {

+defm V_MAC_F32 : VOP2Inst <vop2<0x1f, 0x16>, "v_mac_f32", VOP_MAC>;

+}

 } // End isCommutable = 1

 defm V_MADMK_F32 : VOP2MADK <vop2<0x20, 0x17>, "v_madmk_f32">;

@@ -2251,6 +2254,15 @@ def : Pat <

   (V_CNDMASK_B32_e64 $src2, $src1, $src0)

 >;

+// Pattern for V_MAC_F32

+def : Pat <

+  (fmad  (VOP3NoMods0 f32:$src0, i32:$src0_modifiers, i1:$clamp, i32:$omod),

+         (VOP3NoMods f32:$src1, i32:$src1_modifiers),

+         (VOP3NoMods f32:$src2, i32:$src2_modifiers)),

+  (V_MAC_F32_e64 $src0_modifiers, $src0, $src1_modifiers, $src1,

+                 $src2_modifiers, $src2, $clamp, $omod)

+>;</pre>

      </div>

    </blockquote>

    <br>

    If there are modifiers, I assume this will still select v_mad_f32?<br>

    <br>

    Is there any reason to use v_mac_f32_e64 if the modifiers are never

    going to be used?<br>

    <br>

    It probably isn't that helpful, but using the modifiers with

    v_mac_f32 might slightly reduce register pressure in some cases vs.

    v_mad_f32 + modifiers<br>

    <br>

    <blockquote cite="mid:20150423195807.GD8373@yyz" type="cite">

      <fieldset class="mimeAttachmentHeader"><legend

          class="mimeAttachmentHeaderName">0005-R600-SI-Add-support-for-shrinking-v_cndmask_b32_e32-.patch</legend></fieldset>

      <br>

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">From 1ab9290ffa4835062bd563496fa71ca27b7ed8cd Mon Sep 17 00:00:00 2001

From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>

Date: Tue, 21 Apr 2015 23:24:40 +0000

Subject: [PATCH 5/5] R600/SI: Add support for shrinking v_cndmask_b32_e32

 instructions

shader-db stats:

979 shaders

Totals:

SGPRS: 35048 -> 35176 (0.37 %)

VGPRS: 20560 -> 20560 (0.00 %)

Code Size: 657436 -> 651536 (-0.90 %) bytes

LDS: 11 -> 11 (0.00 %) blocks

Scratch: 18432 -> 18432 (0.00 %) bytes per wave

Totals from affected shaders:

SGPRS: 5504 -> 5632 (2.33 %)

VGPRS: 3456 -> 3456 (0.00 %)

Code Size: 242948 -> 237048 (-2.43 %) bytes

LDS: 1 -> 1 (0.00 %) blocks

Scratch: 8192 -> 8192 (0.00 %) bytes per wave

Increases:

SGPRS: 16 (0.02 %)

VGPRS: 0 (0.00 %)

Code Size: 0 (0.00 %)

LDS: 0 (0.00 %)

Scratch: 0 (0.00 %)

Decreases:

SGPRS: 0 (0.00 %)

VGPRS: 0 (0.00 %)

Code Size: 104 (0.11 %)

LDS: 0 (0.00 %)

Scratch: 0 (0.00 %)

---

 lib/Target/R600/SIShrinkInstructions.cpp |  29 ++++++--

 test/CodeGen/R600/llvm.round.ll          |   4 +-

 test/CodeGen/R600/select-vectors.ll      | 116 +++++++++++++++----------------

 test/CodeGen/R600/select64.ll            |   4 +-

 test/CodeGen/R600/sint_to_fp.f64.ll      |   8 +--

 test/CodeGen/R600/uint_to_fp.f64.ll      |  10 +--

 test/CodeGen/R600/vselect.ll             |  34 ++++-----

 test/CodeGen/R600/xor.ll                 |   4 +-

 8 files changed, 115 insertions(+), 94 deletions(-)

diff --git a/lib/Target/R600/SIShrinkInstructions.cpp b/lib/Target/R600/SIShrinkInstructions.cpp

index e7511e6..0f181d3 100644

--- a/lib/Target/R600/SIShrinkInstructions.cpp

+++ b/lib/Target/R600/SIShrinkInstructions.cpp

@@ -95,13 +95,19 @@ static bool canShrink(MachineInstr &MI, const SIInstrInfo *TII,

   // a register allocation hint pre-regalloc and then do the shrining

   // post-regalloc.

   if (Src2) {

-    if (MI.getOpcode() != AMDGPU::V_MAC_F32_e64)

-      return false;

-

     const MachineOperand *Src2Mod =

         TII->getNamedOperand(MI, AMDGPU::OpName::src2_modifiers);

-    if (!isVGPR(Src2, TRI, MRI) || (Src2Mod && Src2Mod->getImm() != 0))

-      return false;

+    switch (MI.getOpcode()) {

+      default: return false;

+

+      case AMDGPU::V_MAC_F32_e64:

+        if (!isVGPR(Src2, TRI, MRI) || (Src2Mod && Src2Mod->getImm() != 0))

+          return false;

+        break;</pre>

      </div>

    </blockquote>

    You can simplify this Src2Mod check slightly with

    SIInstrInfo::hasModifiersSet<br>

    <blockquote cite="mid:20150423195807.GD8373@yyz" type="cite">

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">

+

+      case AMDGPU::V_CNDMASK_B32_e64:

+        break;

+    }

   }

   const MachineOperand *Src1 = TII->getNamedOperand(MI, AMDGPU::OpName::src1);

@@ -250,6 +256,19 @@ bool SIShrinkInstructions::runOnMachineFunction(MachineFunction &MF) {

           continue;

       }

+      if (Op32 == AMDGPU::V_CNDMASK_B32_e32) {

+        // We shrink V_CNDMASK_B32_e64 using regalloc hints like we do for VOPC

+        // instructions.

+        unsigned SReg =

+            TII->getNamedOperand(MI, AMDGPU::OpName::src2)->getReg();</pre>

      </div>

    </blockquote>

    I think it might be possible though unlikely that a

    V_CNDMASK_B32_e64 could be emitted with an immediate / non-register

    value for src2<br>

    <blockquote cite="mid:20150423195807.GD8373@yyz" type="cite">

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">

+        if (TargetRegisterInfo::isVirtualRegister(SReg)) {

+          MRI.setRegAllocationHint(SReg, 0, AMDGPU::VCC);

+          continue;

+        }

+        if (SReg != AMDGPU::VCC)

+          continue;

+      }

+

       // We can shrink this instruction

       DEBUG(dbgs() << "Shrinking "; MI.dump(); dbgs() << '\n';);

</pre>

      </div>

    </blockquote>

    <br>

  </body>

</html>