<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <div class="moz-cite-prefix">On 12/18/2014 08:02 PM, Tom Stellard

      wrote:<br>

    </div>

    <blockquote cite="mid:20141219010234.GA712@freedesktop.org"

      type="cite">

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">Hi,

This series of patches removes the legacy operand folding that was done on

the SelectionDAG.  The SIFoldOperands MachineInstr pass now provides the

same functionality.

-Tom

</pre>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"><legend

          class="mimeAttachmentHeaderName">0001-R600-SI-Use-immediates-in-the-first-operand-in-fabs-.patch</legend></fieldset>

      <br>

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">From d6dc4e6bbf378ad86e925982d38297a2622b4ffa Mon Sep 17 00:00:00 2001

From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>

Date: Thu, 11 Dec 2014 18:34:11 -0500

Subject: [PATCH 1/7] R600/SI: Use immediates in the first operand in fabs/fneg

 patterns

This is for the stand-alone patterns that are lowered using v_or_b32

and v_xor_b32.  Putting the immediate in the first operand ensures

that it will be folded into the instruction.</pre>

      </div>

    </blockquote>

    <br>

    I looked at doing this before, and wasn't sure it was the best idea.

    In general I think the folding of literals should be smarter

    considering<br>

    the uses literal has. For example, in the test update, an unrolled

    vector xor with immediate will now have a copy of the 32-bit

    immediate for<br>

    every vector component. It would probably be smarter to move the

    immediate into a register in this case, and re-use for every

    operand. It's 8 * N vs. 8 + 4 * (N - 1) bytes for the whole vector

    to save only 4 cycles, so it ends up being smaller. Maybe this

    should be an -Os vs. -O3 kind of option. This also applies for SGPR

    uses.<br>

    <blockquote cite="mid:20141219010234.GA712@freedesktop.org"

      type="cite">

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">

---

 lib/Target/R600/SIInstructions.td  | 18 +++++++++---------

 test/CodeGen/R600/fneg-fabs.f64.ll | 26 ++++++++++----------------

 test/CodeGen/R600/fneg-fabs.ll     | 26 +++++++++-----------------

 3 files changed, 28 insertions(+), 42 deletions(-)

diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td

index 463287e..96f75f9 100644

--- a/lib/Target/R600/SIInstructions.td

+++ b/lib/Target/R600/SIInstructions.td

@@ -2491,7 +2491,7 @@ def : Pat <

 // FIXME: Should use S_OR_B32

 def : Pat <

   (fneg (fabs f32:$src)),

-  (V_OR_B32_e32 $src, (V_MOV_B32_e32 0x80000000)) /* Set sign bit */

+  (V_OR_B32_e32 (V_MOV_B32_e32 0x80000000), $src) /* Set sign bit */

 >;

 // FIXME: Should use S_OR_B32

@@ -2500,19 +2500,19 @@ def : Pat <

   (REG_SEQUENCE VReg_64,

     (i32 (EXTRACT_SUBREG f64:$src, sub0)),

     sub0,

-    (V_OR_B32_e32 (EXTRACT_SUBREG f64:$src, sub1),

-                  (V_MOV_B32_e32 0x80000000)), // Set sign bit.

+    (V_OR_B32_e32 (V_MOV_B32_e32 0x80000000),  // Set sign bit.

+                  (EXTRACT_SUBREG f64:$src, sub1)),

     sub1)

 >;

 def : Pat <

   (fabs f32:$src),

-  (V_AND_B32_e32 $src, (V_MOV_B32_e32 0x7fffffff))

+  (V_AND_B32_e32 (V_MOV_B32_e32 0x7fffffff), $src)

 >;

 def : Pat <

   (fneg f32:$src),

-  (V_XOR_B32_e32 $src, (V_MOV_B32_e32 0x80000000))

+  (V_XOR_B32_e32 (V_MOV_B32_e32 0x80000000), $src)

 >;

 def : Pat <

@@ -2520,8 +2520,8 @@ def : Pat <

   (REG_SEQUENCE VReg_64,

     (i32 (EXTRACT_SUBREG f64:$src, sub0)),

     sub0,

-    (V_AND_B32_e32 (EXTRACT_SUBREG f64:$src, sub1),

-                   (V_MOV_B32_e32 0x7fffffff)), // Set sign bit.

+    (V_AND_B32_e32 (V_MOV_B32_e32 0x7fffffff), // Set sign bit.

+                   (EXTRACT_SUBREG f64:$src, sub1)),

      sub1)

 >;

@@ -2530,8 +2530,8 @@ def : Pat <

   (REG_SEQUENCE VReg_64,

     (i32 (EXTRACT_SUBREG f64:$src, sub0)),

     sub0,

-    (V_XOR_B32_e32 (EXTRACT_SUBREG f64:$src, sub1),

-                   (V_MOV_B32_e32 0x80000000)),

+    (V_XOR_B32_e32 (V_MOV_B32_e32 0x80000000),

+                   (EXTRACT_SUBREG f64:$src, sub1)),

     sub1)

 >;

diff --git a/test/CodeGen/R600/fneg-fabs.f64.ll b/test/CodeGen/R600/fneg-fabs.f64.ll

index 555f4cc..6584108 100644

--- a/test/CodeGen/R600/fneg-fabs.f64.ll

+++ b/test/CodeGen/R600/fneg-fabs.f64.ll

@@ -4,8 +4,7 @@

 ; into 2 modifiers, although theoretically that should work.

 ; FUNC-LABEL: {{^}}fneg_fabs_fadd_f64:

-; SI: v_mov_b32_e32 [[IMMREG:v[0-9]+]], 0x7fffffff

-; SI: v_and_b32_e32 v[[FABS:[0-9]+]], {{s[0-9]+}}, [[IMMREG]]

+; SI: v_and_b32_e32 v[[FABS:[0-9]+]], 0x7fffffff, {{v[0-9]+}}

 ; SI: v_add_f64 {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, -v{{\[[0-9]+}}:[[FABS]]{{\]}}

 define void @fneg_fabs_fadd_f64(double addrspace(1)* %out, double %x, double %y) {

   %fabs = call double @llvm.fabs.f64(double %x)

@@ -45,8 +44,7 @@ define void @fneg_fabs_free_f64(double addrspace(1)* %out, i64 %in) {

 }

 ; FUNC-LABEL: {{^}}fneg_fabs_fn_free_f64:

-; SI: v_mov_b32_e32 [[IMMREG:v[0-9]+]], 0x80000000

-; SI: v_or_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}, [[IMMREG]]

+; SI: v_or_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}

 define void @fneg_fabs_fn_free_f64(double addrspace(1)* %out, i64 %in) {

   %bc = bitcast i64 %in to double

   %fabs = call double @fabs(double %bc)

@@ -58,8 +56,8 @@ define void @fneg_fabs_fn_free_f64(double addrspace(1)* %out, i64 %in) {

 ; FUNC-LABEL: {{^}}fneg_fabs_f64:

 ; SI: s_load_dwordx2

 ; SI: s_load_dwordx2 s{{\[}}[[LO_X:[0-9]+]]:[[HI_X:[0-9]+]]{{\]}}

-; SI: v_mov_b32_e32 [[IMMREG:v[0-9]+]], 0x80000000

-; SI-DAG: v_or_b32_e32 v[[HI_V:[0-9]+]], s[[HI_X]], [[IMMREG]]

+; SI: v_mov_b32_e32 [[HI_X_V:v[0-9]+]], s[[HI_X]]

+; SI-DAG: v_or_b32_e32 v[[HI_V:[0-9]+]], 0x80000000, [[HI_X_V]]

 ; SI-DAG: v_mov_b32_e32 v[[LO_V:[0-9]+]], s[[LO_X]]

 ; SI: buffer_store_dwordx2 v{{\[}}[[LO_V]]:[[HI_V]]{{\]}}

 define void @fneg_fabs_f64(double addrspace(1)* %out, double %in) {

@@ -70,10 +68,8 @@ define void @fneg_fabs_f64(double addrspace(1)* %out, double %in) {

 }

 ; FUNC-LABEL: {{^}}fneg_fabs_v2f64:

-; SI: v_mov_b32_e32 [[IMMREG:v[0-9]+]], 0x80000000

-; SI-NOT: 0x80000000

-; SI: v_or_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}, [[IMMREG]]

-; SI: v_or_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}, [[IMMREG]]

+; SI: v_or_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}

+; SI: v_or_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}

 define void @fneg_fabs_v2f64(<2 x double> addrspace(1)* %out, <2 x double> %in) {

   %fabs = call <2 x double> @llvm.fabs.v2f64(<2 x double> %in)

   %fsub = fsub <2 x double> <double -0.000000e+00, double -0.000000e+00>, %fabs

@@ -82,12 +78,10 @@ define void @fneg_fabs_v2f64(<2 x double> addrspace(1)* %out, <2 x double> %in)

 }

 ; FUNC-LABEL: {{^}}fneg_fabs_v4f64:

-; SI: v_mov_b32_e32 [[IMMREG:v[0-9]+]], 0x80000000

-; SI-NOT: 0x80000000

-; SI: v_or_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}, [[IMMREG]]

-; SI: v_or_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}, [[IMMREG]]

-; SI: v_or_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}, [[IMMREG]]

-; SI: v_or_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}, [[IMMREG]]

+; SI: v_or_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}

+; SI: v_or_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}

+; SI: v_or_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}

+; SI: v_or_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}

 define void @fneg_fabs_v4f64(<4 x double> addrspace(1)* %out, <4 x double> %in) {

   %fabs = call <4 x double> @llvm.fabs.v4f64(<4 x double> %in)

   %fsub = fsub <4 x double> <double -0.000000e+00, double -0.000000e+00, double -0.000000e+00, double -0.000000e+00>, %fabs

diff --git a/test/CodeGen/R600/fneg-fabs.ll b/test/CodeGen/R600/fneg-fabs.ll

index 3cc832f..12cc2a6 100644

--- a/test/CodeGen/R600/fneg-fabs.ll

+++ b/test/CodeGen/R600/fneg-fabs.ll

@@ -33,8 +33,7 @@ define void @fneg_fabs_fmul_f32(float addrspace(1)* %out, float %x, float %y) {

 ; R600: |PV.{{[XYZW]}}|

 ; R600: -PV

-; SI: v_mov_b32_e32 [[IMMREG:v[0-9]+]], 0x80000000

-; SI: v_or_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}, [[IMMREG]]

+; SI: v_or_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}

 define void @fneg_fabs_free_f32(float addrspace(1)* %out, i32 %in) {

   %bc = bitcast i32 %in to float

   %fabs = call float @llvm.fabs.f32(float %bc)

@@ -48,8 +47,7 @@ define void @fneg_fabs_free_f32(float addrspace(1)* %out, i32 %in) {

 ; R600: |PV.{{[XYZW]}}|

 ; R600: -PV

-; SI: v_mov_b32_e32 [[IMMREG:v[0-9]+]], 0x80000000

-; SI: v_or_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}, [[IMMREG]]

+; SI: v_or_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}

 define void @fneg_fabs_fn_free_f32(float addrspace(1)* %out, i32 %in) {

   %bc = bitcast i32 %in to float

   %fabs = call float @fabs(float %bc)

@@ -59,8 +57,7 @@ define void @fneg_fabs_fn_free_f32(float addrspace(1)* %out, i32 %in) {

 }

 ; FUNC-LABEL: {{^}}fneg_fabs_f32:

-; SI: v_mov_b32_e32 [[IMMREG:v[0-9]+]], 0x80000000

-; SI: v_or_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}, [[IMMREG]]

+; SI: v_or_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}

 define void @fneg_fabs_f32(float addrspace(1)* %out, float %in) {

   %fabs = call float @llvm.fabs.f32(float %in)

   %fsub = fsub float -0.000000e+00, %fabs

@@ -84,11 +81,8 @@ define void @v_fneg_fabs_f32(float addrspace(1)* %out, float addrspace(1)* %in)

 ; R600: |{{(PV|T[0-9])\.[XYZW]}}|

 ; R600: -PV

-; FIXME: SGPR should be used directly for first src operand.

-; SI: v_mov_b32_e32 [[IMMREG:v[0-9]+]], 0x80000000

-; SI-NOT: 0x80000000

-; SI: v_or_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, [[IMMREG]]

-; SI: v_or_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, [[IMMREG]]

+; SI: v_or_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}

+; SI: v_or_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}

 define void @fneg_fabs_v2f32(<2 x float> addrspace(1)* %out, <2 x float> %in) {

   %fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %in)

   %fsub = fsub <2 x float> <float -0.000000e+00, float -0.000000e+00>, %fabs

@@ -98,12 +92,10 @@ define void @fneg_fabs_v2f32(<2 x float> addrspace(1)* %out, <2 x float> %in) {

 ; FIXME: SGPR should be used directly for first src operand.

 ; FUNC-LABEL: {{^}}fneg_fabs_v4f32:

-; SI: v_mov_b32_e32 [[IMMREG:v[0-9]+]], 0x80000000

-; SI-NOT: 0x80000000

-; SI: v_or_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, [[IMMREG]]

-; SI: v_or_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, [[IMMREG]]

-; SI: v_or_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, [[IMMREG]]

-; SI: v_or_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, [[IMMREG]]

+; SI: v_or_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}

+; SI: v_or_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}

+; SI: v_or_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}

+; SI: v_or_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}

 define void @fneg_fabs_v4f32(<4 x float> addrspace(1)* %out, <4 x float> %in) {

   %fabs = call <4 x float> @llvm.fabs.v4f32(<4 x float> %in)

   %fsub = fsub <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %fabs

<div class="moz-txt-sig">-- 

1.8.5.5

</div></pre>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"><legend

          class="mimeAttachmentHeaderName">0002-R600-SI-Make-sure-non-inline-constants-aren-t-folded.patch</legend></fieldset>

      <br>

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">From 9dacc28f2ff57dafa62c091cb3bc5b995a7c2255 Mon Sep 17 00:00:00 2001

From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>

Date: Thu, 18 Dec 2014 16:42:54 -0500

Subject: [PATCH 2/7] R600/SI: Make sure non-inline constants aren't folded

 into mubuf soffset operand

mubuf instructions now define the soffset field using the SCSrc_32

register class which indicates that only SGPRs and inline constants

are allowed.

---

 lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp |  5 ++--

 lib/Target/R600/SIInstrInfo.td                   | 30 ++++++++++++------------

 lib/Target/R600/SIRegisterInfo.cpp               |  1 +

 lib/Target/R600/SIRegisterInfo.td                |  6 +++++

 test/CodeGen/R600/mubuf.ll                       | 25 ++++++++++++++++++++

 5 files changed, 50 insertions(+), 17 deletions(-)</pre>

      </div>

    </blockquote>

    <br>

    <br>

    LGTM. I didn't know soffset could be a constant. I think more tests

    should be included that use inline immediate values for the

    constant, and at the limits of the offset range (those might already

    be there)<br>

    <blockquote cite="mid:20141219010234.GA712@freedesktop.org"

      type="cite">

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">

</pre>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"><legend

          class="mimeAttachmentHeaderName">0003-R600-SI-isLegalOperand-shouldn-t-check-constant-bus-.patch</legend></fieldset>

      <br>

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">From 47155e7618b04c2640f3dc1b2f77cfa6c70f6b2a Mon Sep 17 00:00:00 2001

From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>

Date: Thu, 11 Dec 2014 18:47:07 -0500

Subject: [PATCH 3/7] R600/SI: isLegalOperand() shouldn't check constant bus

 for SALU instructions

The constant bus restrictions only apply to VALU instructions.  This

enables SIFoldOperands to fold immediates into SALU instructions.

---

 lib/Target/R600/SIInstrInfo.cpp | 2 +-

 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp

index 08bfc5e..a58a46e 100644

--- a/lib/Target/R600/SIInstrInfo.cpp

+++ b/lib/Target/R600/SIInstrInfo.cpp

@@ -1405,7 +1405,7 @@ bool SIInstrInfo::isOperandLegal(const MachineInstr *MI, unsigned OpIdx,

   if (!MO)

     MO = &MI->getOperand(OpIdx);

-  if (usesConstantBus(MRI, *MO)) {

+  if (isVALU(InstDesc.Opcode) && usesConstantBus(MRI, *MO)) {

     unsigned SGPRUsed =

         MO->isReg() ? MO->getReg() : (unsigned)AMDGPU::NoRegister;

     for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {

<div class="moz-txt-sig">-- 

1.8.5.5

</div></pre>

      </div>

    </blockquote>

    LGTM<br>

    <br>

    <blockquote cite="mid:20141219010234.GA712@freedesktop.org"

      type="cite">

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap=""><div class="moz-txt-sig">

</div></pre>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"><legend

          class="mimeAttachmentHeaderName">0004-R600-SI-Refactor-SIFoldOperands-to-simplify-immediat.patch</legend></fieldset>

      <br>

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">From b25bd5edfda7d9a16682e45ff40faa4411f81a73 Mon Sep 17 00:00:00 2001

From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>

Date: Wed, 17 Dec 2014 11:41:55 -0500

Subject: [PATCH 4/7] R600/SI: Refactor SIFoldOperands to simplify immediate

 folding

This will make a future patch much less intrusive.

---

 lib/Target/R600/SIFoldOperands.cpp | 79 ++++++++++++++++++++++++++------------

 1 file changed, 54 insertions(+), 25 deletions(-)

diff --git a/lib/Target/R600/SIFoldOperands.cpp b/lib/Target/R600/SIFoldOperands.cpp

index 761e866..b172ee8 100644

--- a/lib/Target/R600/SIFoldOperands.cpp

+++ b/lib/Target/R600/SIFoldOperands.cpp

@@ -49,6 +49,23 @@ public:

   }

 };

+struct FoldCandidate {

+  MachineInstr *UseMI;

+  unsigned UseOpNo;

+  MachineOperand *OpToFold;

+  uint64_t ImmToFold;

+

+  FoldCandidate(MachineInstr *MI, unsigned OpNo, MachineOperand *FoldOp) :

+      UseMI(MI), UseOpNo(OpNo), OpToFold(FoldOp), ImmToFold(0) { }

+

+  FoldCandidate(MachineInstr *MI, unsigned OpNo, uint64_t Imm) :

+      UseMI(MI), UseOpNo(OpNo), OpToFold(nullptr), ImmToFold(Imm) { }

+

+  bool IsImm() const {</pre>

      </div>

    </blockquote>

    Should be isImm()<br>

    <blockquote cite="mid:20141219010234.GA712@freedesktop.org"

      type="cite">

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">

+    return !OpToFold;

+  }

+};

+

 } // End anonymous namespace.

 INITIALIZE_PASS_BEGIN(SIFoldOperands, DEBUG_TYPE,

@@ -78,30 +95,24 @@ static bool isSafeToFold(unsigned Opcode) {

   }

 }

-static bool updateOperand(MachineInstr *MI, unsigned OpNo,

-                          const MachineOperand &New,

+static bool updateOperand(FoldCandidate &Fold,

                           const TargetRegisterInfo &TRI) {

-  MachineOperand &Old = MI->getOperand(OpNo);

+  MachineInstr *MI = Fold.UseMI;

+  MachineOperand &Old = MI->getOperand(Fold.UseOpNo);

   assert(Old.isReg());

-  if (New.isImm()) {

-    Old.ChangeToImmediate(New.getImm());

+  if (Fold.IsImm()) {

+    Old.ChangeToImmediate(Fold.ImmToFold);

     return true;

   }

-  if (New.isFPImm()) {

-    Old.ChangeToFPImmediate(New.getFPImm());

+  MachineOperand *New = Fold.OpToFold;

+  if (TargetRegisterInfo::isVirtualRegister(Old.getReg()) &&

+      TargetRegisterInfo::isVirtualRegister(New->getReg())) {

+    Old.substVirtReg(New->getReg(), New->getSubReg(), TRI);

     return true;

   }

-  if (New.isReg())  {

-    if (TargetRegisterInfo::isVirtualRegister(Old.getReg()) &&

-        TargetRegisterInfo::isVirtualRegister(New.getReg())) {

-      Old.substVirtReg(New.getReg(), New.getSubReg(), TRI);

-      return true;

-    }

-  }

-

   // FIXME: Handle physical registers.

   return false;

@@ -133,7 +144,7 @@ bool SIFoldOperands::runOnMachineFunction(MachineFunction &MF) {

            OpToFold.getSubReg()))

         continue;

-      std::vector<std::pair<MachineInstr *, unsigned>> FoldList;

+      std::vector<FoldCandidate> FoldList;

       for (MachineRegisterInfo::use_iterator

            Use = MRI.use_begin(MI.getOperand(0).getReg()), E = MRI.use_end();

            Use != E; ++Use) {

@@ -146,10 +157,11 @@ bool SIFoldOperands::runOnMachineFunction(MachineFunction &MF) {

           continue;

         }

+        bool FoldingImm = OpToFold.isImm() || OpToFold.isFPImm();

+</pre>

      </div>

    </blockquote>

    Have you tried replacing all FP immediates with integers instead?<br>

    <blockquote cite="mid:20141219010234.GA712@freedesktop.org"

      type="cite">

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">

         // In order to fold immediates into copies, we need to change the

         // copy to a MOV.

-        if ((OpToFold.isImm() || OpToFold.isFPImm()) &&

-             UseMI->getOpcode() == AMDGPU::COPY) {

+        if (FoldingImm && UseMI->getOpcode() == AMDGPU::COPY) {

           const TargetRegisterClass *TRC =

               MRI.getRegClass(UseMI->getOperand(0).getReg());

@@ -173,9 +185,24 @@ bool SIFoldOperands::runOnMachineFunction(MachineFunction &MF) {

             UseDesc.OpInfo[Use.getOperandNo()].RegClass == -1)

           continue;

-        // Normal substitution

+        if (FoldingImm) {

+          uint64_t Imm;

+          if (OpToFold.isFPImm()) {

+            Imm = OpToFold.getFPImm()->getValueAPF().bitcastToAPInt().getSExtValue();

+          } else {

+            Imm = OpToFold.getImm();

+          }

+

+          const MachineOperand ImmOp = MachineOperand::CreateImm(Imm);

+          if (TII->isOperandLegal(UseMI, Use.getOperandNo(), &ImmOp)) {

+            FoldList.push_back(FoldCandidate(UseMI, Use.getOperandNo(), Imm));

+            continue;

+          }

+        }

+

+        // Normal substitution with registers

         if (TII->isOperandLegal(UseMI, Use.getOperandNo(), &OpToFold)) {

-          FoldList.push_back(std::make_pair(UseMI, Use.getOperandNo()));

+          FoldList.push_back(FoldCandidate(UseMI, Use.getOperandNo(), &OpToFold));

           continue;

         }

@@ -187,13 +214,15 @@ bool SIFoldOperands::runOnMachineFunction(MachineFunction &MF) {

         // already does this.

       }

-      for (std::pair<MachineInstr *, unsigned> Fold : FoldList) {

-        if (updateOperand(Fold.first, Fold.second, OpToFold, TRI)) {

+      for (FoldCandidate &Fold : FoldList) {

+        if (updateOperand(Fold, TRI)) {

           // Clear kill flags.

-          if (OpToFold.isReg())

-            OpToFold.setIsKill(false);

+          if (!Fold.IsImm()) {

+            assert(Fold.OpToFold && Fold.OpToFold->isReg());

+            Fold.OpToFold->setIsKill(false);

+          }

           DEBUG(dbgs() << "Folded source from " << MI << " into OpNo " <<

-                Fold.second << " of " << *Fold.first << '\n');

+                Fold.UseOpNo << " of " << *Fold.UseMI << '\n');

         }

       }

     }

<div class="moz-txt-sig">-- 

1.8.5.5

</div></pre>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"><legend

          class="mimeAttachmentHeaderName">0005-R600-SI-Teach-SIFoldOperands-to-split-64-bit-constan.patch</legend></fieldset>

      <br>

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">From a8f3e690476d07a0b87adc624c690a6a68b22736 Mon Sep 17 00:00:00 2001

From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>

Date: Thu, 11 Dec 2014 18:49:19 -0500

Subject: [PATCH 5/7] R600/SI: Teach SIFoldOperands to split 64-bit constants

 when folding

This allows folding of sequences like:

s[0:1] = s_mov_b64 4

v_add_i32 v0, s0, v0

v_addc_u32 v1, s1, v1

into

v_add_i32 v0, 4, v0

v_add_i32 v1, 0, v1

---

 lib/Target/R600/SIFoldOperands.cpp   | 69 +++++++++++++++++++++++-------------

 test/CodeGen/R600/operand-folding.ll | 17 +++++++++

 test/CodeGen/R600/sint_to_fp.f64.ll  |  8 ++---

 test/CodeGen/R600/uint_to_fp.f64.ll  |  8 ++---

 4 files changed, 69 insertions(+), 33 deletions(-)

diff --git a/lib/Target/R600/SIFoldOperands.cpp b/lib/Target/R600/SIFoldOperands.cpp

index b172ee8..c661b0a 100644

--- a/lib/Target/R600/SIFoldOperands.cpp

+++ b/lib/Target/R600/SIFoldOperands.cpp

@@ -153,27 +153,51 @@ bool SIFoldOperands::runOnMachineFunction(MachineFunction &MF) {

         const MachineOperand &UseOp = UseMI->getOperand(Use.getOperandNo());

         // FIXME: Fold operands with subregs.

-        if (UseOp.isReg() && UseOp.getSubReg()) {

+        if (UseOp.isReg() && UseOp.getSubReg() && OpToFold.isReg()) {

           continue;

         }

         bool FoldingImm = OpToFold.isImm() || OpToFold.isFPImm();

+        APInt Imm;

-        // In order to fold immediates into copies, we need to change the

-        // copy to a MOV.

-        if (FoldingImm && UseMI->getOpcode() == AMDGPU::COPY) {

-          const TargetRegisterClass *TRC =

-              MRI.getRegClass(UseMI->getOperand(0).getReg());

-

-          if (TRC->getSize() == 4) {

-            if (TRI.isSGPRClass(TRC))

-              UseMI->setDesc(TII->get(AMDGPU::S_MOV_B32));

-            else

-              UseMI->setDesc(TII->get(AMDGPU::V_MOV_B32_e32));

-          } else if (TRC->getSize() == 8 && TRI.isSGPRClass(TRC)) {

-            UseMI->setDesc(TII->get(AMDGPU::S_MOV_B64));

+        if (FoldingImm) {

+          const TargetRegisterClass *UseRC = MRI.getRegClass(UseOp.getReg());

+

+          if (OpToFold.isFPImm()) {

+            Imm = OpToFold.getFPImm()->getValueAPF().bitcastToAPInt();

           } else {

-            continue;

+            Imm = APInt(64, OpToFold.getImm());

+          }

+

+          // Split 64-bit constants into 32-bits for folding.

+          if (UseOp.getSubReg()) {

+            if (UseRC->getSize() != 8)

+              continue;

+

+            if (UseOp.getSubReg() == AMDGPU::sub0) {

+              Imm = Imm.getLoBits(32);

+            } else {

+              assert(UseOp.getSubReg() == AMDGPU::sub1);

+              Imm = Imm.getHiBits(32);

+            }

+          }

+

+          // In order to fold immediates into copies, we need to change the

+          // copy to a MOV.

+          if (UseMI->getOpcode() == AMDGPU::COPY) {

+            const TargetRegisterClass *TRC =

+                MRI.getRegClass(UseMI->getOperand(0).getReg());

+

+            if (TRC->getSize() == 4) {

+              if (TRI.isSGPRClass(TRC))

+                UseMI->setDesc(TII->get(AMDGPU::S_MOV_B32));

+              else

+                UseMI->setDesc(TII->get(AMDGPU::V_MOV_B32_e32));

+            } else if (TRC->getSize() == 8 && TRI.isSGPRClass(TRC)) {

+              UseMI->setDesc(TII->get(AMDGPU::S_MOV_B64));

+            } else {

+              continue;

+            }

</pre>

      </div>

    </blockquote>

    It would probably be useful to factor this into a getMovOpcode()

    somewhere<br>

    <blockquote cite="mid:20141219010234.GA712@freedesktop.org"

      type="cite">

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">

           }

         }

@@ -185,19 +209,14 @@ bool SIFoldOperands::runOnMachineFunction(MachineFunction &MF) {

             UseDesc.OpInfo[Use.getOperandNo()].RegClass == -1)

           continue;

-        if (FoldingImm) {

-          uint64_t Imm;

-          if (OpToFold.isFPImm()) {

-            Imm = OpToFold.getFPImm()->getValueAPF().bitcastToAPInt().getSExtValue();

-          } else {

-            Imm = OpToFold.getImm();

-          }

-          const MachineOperand ImmOp = MachineOperand::CreateImm(Imm);

+        if (FoldingImm) {

+          const MachineOperand ImmOp = MachineOperand::CreateImm(Imm.getSExtValue());

           if (TII->isOperandLegal(UseMI, Use.getOperandNo(), &ImmOp)) {

-            FoldList.push_back(FoldCandidate(UseMI, Use.getOperandNo(), Imm));

-            continue;

+            FoldList.push_back(FoldCandidate(UseMI, Use.getOperandNo(),

+                               Imm.getSExtValue()));

           }

+          continue;

         }

         // Normal substitution with registers

diff --git a/test/CodeGen/R600/operand-folding.ll b/test/CodeGen/R600/operand-folding.ll

index 05177b4..f62aa09 100644

--- a/test/CodeGen/R600/operand-folding.ll

+++ b/test/CodeGen/R600/operand-folding.ll

@@ -36,5 +36,22 @@ endif:

   ret void

 }

+; CHECK-LABEL: {{^}}fold_64bit_constant_add:

+; CHECK-NOT: s_mov_b64

+; FIXME: It would be better if we clud use v_add here and drop the extra</pre>

      </div>

    </blockquote>

    Typo 'clud'<br>

    <blockquote cite="mid:20141219010234.GA712@freedesktop.org"

      type="cite">

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">

+; v_mov_b32 instructions.

+; CHECK-DAG: s_add_u32 [[LO:s[0-9]+]], s{{[0-9]+}}, 1

+; CHECK-DAG: s_addc_u32 [[HI:s[0-9]+]], s{{[0-9]+}}, 0

+; CHECK-DAG: v_mov_b32_e32 v[[VLO:[0-9]+]], [[LO]]

+; CHECK-DAG: v_mov_b32_e32 v[[VHI:[0-9]+]], [[HI]]

+; CHECK: buffer_store_dwordx2 v{{\[}}[[VLO]]:[[VHI]]{{\]}},

+

+define void @fold_64bit_constant_add(i64 addrspace(1)* %out, i32 %cmp, i64 %val) {

+entry:

+  %tmp0 = add i64 %val, 1

+  store i64 %tmp0, i64 addrspace(1)* %out

+  ret void

+}

+</pre>

      </div>

    </blockquote>

    One that will use vector adds would be useful if one isn't there

    already<br>

    <br>

    <blockquote cite="mid:20141219010234.GA712@freedesktop.org"

      type="cite">

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">

 declare i32 @llvm.r600.read.tidig.x() #0

 attributes #0 = { readnone }

diff --git a/test/CodeGen/R600/sint_to_fp.f64.ll b/test/CodeGen/R600/sint_to_fp.f64.ll

index 6e4f87c..efbdf25 100644

--- a/test/CodeGen/R600/sint_to_fp.f64.ll

+++ b/test/CodeGen/R600/sint_to_fp.f64.ll

@@ -12,10 +12,10 @@ define void @sint_to_fp_i32_to_f64(double addrspace(1)* %out, i32 %in) {

 ; SI-LABEL: {{^}}sint_to_fp_i1_f64:

 ; SI: v_cmp_eq_i32_e64 [[CMP:s\[[0-9]+:[0-9]\]]],

-; FIXME: We should the VGPR sources for V_CNDMASK are copied from SGPRs,

-; we should be able to fold the SGPRs into the V_CNDMASK instructions.

-; SI: v_cndmask_b32_e64 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, [[CMP]]

-; SI: v_cndmask_b32_e64 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, [[CMP]]

+; We can't fold the SGPRs into v_cndmask_b32_e64, because it already

+; uses an SGPR for [[CMP]]

+; SI: v_cndmask_b32_e64 v{{[0-9]+}}, 0, v{{[0-9]+}}, [[CMP]]

+; SI: v_cndmask_b32_e64 v{{[0-9]+}}, 0, v{{[0-9]+}}, [[CMP]]

 ; SI: buffer_store_dwordx2

 ; SI: s_endpgm

 define void @sint_to_fp_i1_f64(double addrspace(1)* %out, i32 %in) {

diff --git a/test/CodeGen/R600/uint_to_fp.f64.ll b/test/CodeGen/R600/uint_to_fp.f64.ll

index d16872b..fa70bdf 100644

--- a/test/CodeGen/R600/uint_to_fp.f64.ll

+++ b/test/CodeGen/R600/uint_to_fp.f64.ll

@@ -72,10 +72,10 @@ define void @s_uint_to_fp_v4i32_to_v4f64(<4 x double> addrspace(1)* %out, <4 x i

 ; SI-LABEL: {{^}}uint_to_fp_i1_to_f64:

 ; SI: v_cmp_eq_i32_e64 [[CMP:s\[[0-9]+:[0-9]\]]],

-; FIXME: We should the VGPR sources for V_CNDMASK are copied from SGPRs,

-; we should be able to fold the SGPRs into the V_CNDMASK instructions.

-; SI: v_cndmask_b32_e64 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, [[CMP]]

-; SI: v_cndmask_b32_e64 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, [[CMP]]

+; We can't fold the SGPRs into v_cndmask_b32_e64, because it already

+; uses an SGPR for [[CMP]]

+; SI: v_cndmask_b32_e64 v{{[0-9]+}}, 0, v{{[0-9]+}}, [[CMP]]

+; SI: v_cndmask_b32_e64 v{{[0-9]+}}, 0, v{{[0-9]+}}, [[CMP]]

 ; SI: buffer_store_dwordx2

 ; SI: s_endpgm

 define void @uint_to_fp_i1_to_f64(double addrspace(1)* %out, i32 %in) {

<div class="moz-txt-sig">-- 

1.8.5.5

</div></pre>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"><legend

          class="mimeAttachmentHeaderName">0006-R600-SI-Add-a-V_MOV_B64-pseudo-instruction.patch</legend></fieldset>

      <br>

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">From 8ff850ac46ab082a06882e743946297c2791d6e3 Mon Sep 17 00:00:00 2001

From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>

Date: Wed, 17 Dec 2014 16:35:19 -0500

Subject: [PATCH 6/7] R600/SI: Add a V_MOV_B64 pseudo instruction

This is used to simplify the SIFoldOperands pass and make it easier to

fold immediates.

---

 lib/Target/R600/SIFoldOperands.cpp         |  3 +++

 lib/Target/R600/SIInstrInfo.cpp            | 29 +++++++++++++++++++++++++++

 lib/Target/R600/SIInstructions.td          |  6 ++++++

 test/CodeGen/R600/atomic_cmp_swap_local.ll | 10 ++++------

 test/CodeGen/R600/imm.ll                   |  7 ++-----

 test/CodeGen/R600/local-atomics64.ll       | 32 ++++++++++++------------------

 6 files changed, 57 insertions(+), 30 deletions(-)</pre>

      </div>

    </blockquote>

    <br>

    I thought you were avoiding this with the previous patch? Does the

    pass handle constants specially and this is for SGPRs only?<br>

    <br>

    I think the name might need something to indicate it's a

    pseudoinstruction. v_mov_b64 sounds like a reasonable real

    instruction name which I wish would be added which might confuse

    people who aren't aware it doesn't exist.<br>

    <blockquote cite="mid:20141219010234.GA712@freedesktop.org"

      type="cite">

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">

diff --git a/lib/Target/R600/SIFoldOperands.cpp b/lib/Target/R600/SIFoldOperands.cpp

index c661b0a..1b0e09e 100644

--- a/lib/Target/R600/SIFoldOperands.cpp

+++ b/lib/Target/R600/SIFoldOperands.cpp

@@ -86,6 +86,7 @@ static bool isSafeToFold(unsigned Opcode) {

   switch(Opcode) {

   case AMDGPU::V_MOV_B32_e32:

   case AMDGPU::V_MOV_B32_e64:

+  case AMDGPU::V_MOV_B64:

   case AMDGPU::S_MOV_B32:

   case AMDGPU::S_MOV_B64:

   case AMDGPU::COPY:

@@ -195,6 +196,8 @@ bool SIFoldOperands::runOnMachineFunction(MachineFunction &MF) {

                 UseMI->setDesc(TII->get(AMDGPU::V_MOV_B32_e32));

             } else if (TRC->getSize() == 8 && TRI.isSGPRClass(TRC)) {

               UseMI->setDesc(TII->get(AMDGPU::S_MOV_B64));

+            } else if (TRC->getSize() == 8 && !TRI.isSGPRClass(TRC)) {

+              UseMI->setDesc(TII->get(AMDGPU::V_MOV_B64));

             } else {

               continue;

             }

diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp

index a58a46e..fe29a50 100644

--- a/lib/Target/R600/SIInstrInfo.cpp

+++ b/lib/Target/R600/SIInstrInfo.cpp

@@ -662,6 +662,35 @@ bool SIInstrInfo::expandPostRAPseudo(MachineBasicBlock::iterator MI) const {

     // This is just a placeholder for register allocation.

     MI->eraseFromParent();

     break;

+

+  case AMDGPU::V_MOV_B64: {

+    unsigned Dst = MI->getOperand(0).getReg();

+    unsigned DstLo = RI.getSubReg(Dst, AMDGPU::sub0);

+    unsigned DstHi = RI.getSubReg(Dst, AMDGPU::sub1);

+

+    const MachineOperand &SrcOp = MI->getOperand(1);

+    // FIXME: Will this work for 64-bit floating point immediates?

+    assert(!SrcOp.isFPImm());

+    if (SrcOp.isImm()) {

+      APInt Imm(64, SrcOp.getImm());

+      BuildMI(MBB, MI, DL, get(AMDGPU::V_MOV_B32_e32), DstLo)

+              .addImm(Imm.getLoBits(32).getZExtValue())

+              .addReg(Dst, RegState::Implicit);

+      BuildMI(MBB, MI, DL, get(AMDGPU::V_MOV_B32_e32), DstHi)

+              .addImm(Imm.getHiBits(32).getZExtValue())

+              .addReg(Dst, RegState::Implicit);

+    } else {

+      assert(SrcOp.isReg());

+      BuildMI(MBB, MI, DL, get(AMDGPU::V_MOV_B32_e32), DstLo)

+              .addReg(RI.getSubReg(SrcOp.getReg(), AMDGPU::sub0))

+              .addReg(Dst, RegState::Implicit);

+      BuildMI(MBB, MI, DL, get(AMDGPU::V_MOV_B32_e32), DstHi)

+              .addReg(RI.getSubReg(SrcOp.getReg(), AMDGPU::sub1))

+              .addReg(Dst, RegState::Implicit);

+    }

+    MI->eraseFromParent();

+    break;

+  }

   }

   return true;

 }

diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td

index 96f75f9..dbc090d 100644

--- a/lib/Target/R600/SIInstructions.td

+++ b/lib/Target/R600/SIInstructions.td

@@ -1742,6 +1742,12 @@ defm V_TRIG_PREOP_F64 : VOP3Inst <

 //===----------------------------------------------------------------------===//

 let isCodeGenOnly = 1, isPseudo = 1 in {

+let hasSideEffects = 0, mayLoad = 0, mayStore = 0 in {

+// 64-bit vector move instruction.  This is mainly used by the SIFoldOperands

+// pass to enable folding of inline immediates.

+def V_MOV_B64 : InstSI <(outs VReg_64:$dst), (ins VSrc_64:$src0), "", []>;

+} // end let hasSideEffects = 0, mayLoad = 0, mayStore = 0

+

 let hasSideEffects = 1 in {

 def SGPR_USE : InstSI <(outs),(ins), "", []>;

 }

diff --git a/test/CodeGen/R600/atomic_cmp_swap_local.ll b/test/CodeGen/R600/atomic_cmp_swap_local.ll

index 223f4d3..a77627c 100644

--- a/test/CodeGen/R600/atomic_cmp_swap_local.ll

+++ b/test/CodeGen/R600/atomic_cmp_swap_local.ll

@@ -20,9 +20,8 @@ define void @lds_atomic_cmpxchg_ret_i32_offset(i32 addrspace(1)* %out, i32 addrs

 ; FUNC-LABEL: {{^}}lds_atomic_cmpxchg_ret_i64_offset:

 ; SI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb

 ; SI: s_load_dwordx2 s{{\[}}[[LOSWAP:[0-9]+]]:[[HISWAP:[0-9]+]]{{\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0xd

-; SI: s_mov_b64  s{{\[}}[[LOSCMP:[0-9]+]]:[[HISCMP:[0-9]+]]{{\]}}, 7

-; SI-DAG: v_mov_b32_e32 v[[LOVCMP:[0-9]+]], s[[LOSCMP]]

-; SI-DAG: v_mov_b32_e32 v[[HIVCMP:[0-9]+]], s[[HISCMP]]

+; SI-DAG: v_mov_b32_e32 v[[LOVCMP:[0-9]+]], 7

+; SI-DAG: v_mov_b32_e32 v[[HIVCMP:[0-9]+]], 0

 ; SI-DAG: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]

 ; SI-DAG: v_mov_b32_e32 v[[LOSWAPV:[0-9]+]], s[[LOSWAP]]

 ; SI-DAG: v_mov_b32_e32 v[[HISWAPV:[0-9]+]], s[[HISWAP]]

@@ -69,9 +68,8 @@ define void @lds_atomic_cmpxchg_noret_i32_offset(i32 addrspace(3)* %ptr, i32 %sw

 ; FUNC-LABEL: {{^}}lds_atomic_cmpxchg_noret_i64_offset:

 ; SI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x9

 ; SI: s_load_dwordx2 s{{\[}}[[LOSWAP:[0-9]+]]:[[HISWAP:[0-9]+]]{{\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0xb

-; SI: s_mov_b64  s{{\[}}[[LOSCMP:[0-9]+]]:[[HISCMP:[0-9]+]]{{\]}}, 7

-; SI-DAG: v_mov_b32_e32 v[[LOVCMP:[0-9]+]], s[[LOSCMP]]

-; SI-DAG: v_mov_b32_e32 v[[HIVCMP:[0-9]+]], s[[HISCMP]]

+; SI-DAG: v_mov_b32_e32 v[[LOVCMP:[0-9]+]], 7

+; SI-DAG: v_mov_b32_e32 v[[HIVCMP:[0-9]+]], 0

 ; SI-DAG: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]

 ; SI-DAG: v_mov_b32_e32 v[[LOSWAPV:[0-9]+]], s[[LOSWAP]]

 ; SI-DAG: v_mov_b32_e32 v[[HISWAPV:[0-9]+]], s[[HISWAP]]

diff --git a/test/CodeGen/R600/imm.ll b/test/CodeGen/R600/imm.ll

index 79f36b6..1cc03b8 100644

--- a/test/CodeGen/R600/imm.ll

+++ b/test/CodeGen/R600/imm.ll

@@ -474,12 +474,9 @@ define void @add_inline_imm_64_f64(double addrspace(1)* %out, double %x) {

 }

-; FIXME: These shoudn't bother materializing in SGPRs

-

 ; CHECK-LABEL: {{^}}store_inline_imm_0.0_f64

-; CHECK: s_mov_b64 s{{\[}}[[LO_SREG:[0-9]+]]:[[HI_SREG:[0-9]+]]{{\]}}, 0{{$}}

-; CHECK-DAG: v_mov_b32_e32 v[[LO_VREG:[0-9]+]], s[[LO_SREG]]

-; CHECK-DAG: v_mov_b32_e32 v[[HI_VREG:[0-9]+]], s[[HI_SREG]]

+; CHECK: v_mov_b32_e32 v[[LO_VREG:[0-9]+]], 0

+; CHECK: v_mov_b32_e32 v[[HI_VREG:[0-9]+]], 0

 ; CHECK: buffer_store_dwordx2 v{{\[}}[[LO_VREG]]:[[HI_VREG]]{{\]}}

 define void @store_inline_imm_0.0_f64(double addrspace(1)* %out) {

   store double 0.0, double addrspace(1)* %out

diff --git a/test/CodeGen/R600/local-atomics64.ll b/test/CodeGen/R600/local-atomics64.ll

index ce0cf59..b39581e 100644

--- a/test/CodeGen/R600/local-atomics64.ll

+++ b/test/CodeGen/R600/local-atomics64.ll

@@ -30,9 +30,8 @@ define void @lds_atomic_add_ret_i64(i64 addrspace(1)* %out, i64 addrspace(3)* %p

 ; FUNC-LABEL: {{^}}lds_atomic_add_ret_i64_offset:

 ; SI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb

-; SI: s_mov_b64 s{{\[}}[[LOSDATA:[0-9]+]]:[[HISDATA:[0-9]+]]{{\]}}, 9

-; SI-DAG: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], s[[LOSDATA]]

-; SI-DAG: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], s[[HISDATA]]

+; SI-DAG: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], 9

+; SI-DAG: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], 0

 ; SI-DAG: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]

 ; SI: ds_add_rtn_u64 [[RESULT:v\[[0-9]+:[0-9]+\]]], [[VPTR]], v{{\[}}[[LOVDATA]]:[[HIVDATA]]{{\]}} offset:32 [M0]

 ; SI: buffer_store_dwordx2 [[RESULT]],

@@ -45,9 +44,8 @@ define void @lds_atomic_add_ret_i64_offset(i64 addrspace(1)* %out, i64 addrspace

 }

 ; FUNC-LABEL: {{^}}lds_atomic_inc_ret_i64:

-; SI: s_mov_b64 s{{\[}}[[LOSDATA:[0-9]+]]:[[HISDATA:[0-9]+]]{{\]}}, -1

-; SI-DAG: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], s[[LOSDATA]]

-; SI-DAG: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], s[[HISDATA]]

+; SI: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], -1

+; SI: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], -1

 ; SI: ds_inc_rtn_u64 [[RESULT:v\[[0-9]+:[0-9]+\]]], [[VPTR]], v{{\[}}[[LOVDATA]]:[[HIVDATA]]{{\]}}

 ; SI: buffer_store_dwordx2 [[RESULT]],

 ; SI: s_endpgm

@@ -87,9 +85,8 @@ define void @lds_atomic_sub_ret_i64_offset(i64 addrspace(1)* %out, i64 addrspace

 }

 ; FUNC-LABEL: {{^}}lds_atomic_dec_ret_i64:

-; SI: s_mov_b64 s{{\[}}[[LOSDATA:[0-9]+]]:[[HISDATA:[0-9]+]]{{\]}}, -1

-; SI-DAG: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], s[[LOSDATA]]

-; SI-DAG: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], s[[HISDATA]]

+; SI: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], -1

+; SI: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], -1

 ; SI: ds_dec_rtn_u64 [[RESULT:v\[[0-9]+:[0-9]+\]]], [[VPTR]], v{{\[}}[[LOVDATA]]:[[HIVDATA]]{{\]}}

 ; SI: buffer_store_dwordx2 [[RESULT]],

 ; SI: s_endpgm

@@ -277,10 +274,9 @@ define void @lds_atomic_add_noret_i64(i64 addrspace(3)* %ptr) nounwind {

 ; FUNC-LABEL: {{^}}lds_atomic_add_noret_i64_offset:

 ; SI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x9

-; SI: s_mov_b64 s{{\[}}[[LOSDATA:[0-9]+]]:[[HISDATA:[0-9]+]]{{\]}}, 9

-; SI-DAG: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], s[[LOSDATA]]

-; SI-DAG: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], s[[HISDATA]]

-; SI-DAG: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]

+; SI: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]

+; SI: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], 9

+; SI: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], 0

 ; SI: ds_add_u64 [[VPTR]], v{{\[}}[[LOVDATA]]:[[HIVDATA]]{{\]}} offset:32 [M0]

 ; SI: s_endpgm

 define void @lds_atomic_add_noret_i64_offset(i64 addrspace(3)* %ptr) nounwind {

@@ -290,9 +286,8 @@ define void @lds_atomic_add_noret_i64_offset(i64 addrspace(3)* %ptr) nounwind {

 }

 ; FUNC-LABEL: {{^}}lds_atomic_inc_noret_i64:

-; SI: s_mov_b64 s{{\[}}[[LOSDATA:[0-9]+]]:[[HISDATA:[0-9]+]]{{\]}}, -1

-; SI-DAG: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], s[[LOSDATA]]

-; SI-DAG: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], s[[HISDATA]]

+; SI: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], -1

+; SI: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], -1

 ; SI: ds_inc_u64 [[VPTR]], v{{\[}}[[LOVDATA]]:[[HIVDATA]]{{\]}}

 ; SI: s_endpgm

 define void @lds_atomic_inc_noret_i64(i64 addrspace(3)* %ptr) nounwind {

@@ -327,9 +322,8 @@ define void @lds_atomic_sub_noret_i64_offset(i64 addrspace(3)* %ptr) nounwind {

 }

 ; FUNC-LABEL: {{^}}lds_atomic_dec_noret_i64:

-; SI: s_mov_b64 s{{\[}}[[LOSDATA:[0-9]+]]:[[HISDATA:[0-9]+]]{{\]}}, -1

-; SI-DAG: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], s[[LOSDATA]]

-; SI-DAG: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], s[[HISDATA]]

+; SI: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], -1

+; SI: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], -1

 ; SI: ds_dec_u64 [[VPTR]], v{{\[}}[[LOVDATA]]:[[HIVDATA]]{{\]}}

 ; SI: s_endpgm

 define void @lds_atomic_dec_noret_i64(i64 addrspace(3)* %ptr) nounwind {

<div class="moz-txt-sig">-- 

1.8.5.5

</div></pre>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"><legend

          class="mimeAttachmentHeaderName">0007-R600-SI-Remove-SIISelLowering-legalizeOperands.patch</legend></fieldset>

      <br>

      <div class="moz-text-plain" wrap="true" graphical-quote="true"

        style="font-family: -moz-fixed; font-size: 12px;"

        lang="x-western">

        <pre wrap="">From dd174475c08471d665d33c6dbe665dafb35fe47c Mon Sep 17 00:00:00 2001

From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>

Date: Thu, 11 Dec 2014 19:05:21 -0500

Subject: [PATCH 7/7] R600/SI: Remove SIISelLowering::legalizeOperands()

Its functionality has been replaced by calling

SIInstrInfo::legalizeOperands() from

SIISelLowering::AdjstInstrPostInstrSelection() and running the

SIFoldOperands and SIShrinkInstructions passes.

---

 lib/Target/R600/SIISelLowering.cpp  | 174 +-----------------------------------

 lib/Target/R600/SIISelLowering.h    |   3 -

 test/CodeGen/R600/fneg.ll           |   2 +-

 test/CodeGen/R600/imm.ll            |   4 +-

 test/CodeGen/R600/seto.ll           |   2 +-

 test/CodeGen/R600/setuo.ll          |   2 +-

 test/CodeGen/R600/sint_to_fp.f64.ll |   4 +-

 test/CodeGen/R600/sint_to_fp.ll     |   2 +-

 test/CodeGen/R600/uint_to_fp.f64.ll |   4 +-

 test/CodeGen/R600/uint_to_fp.ll     |   2 +-

 test/CodeGen/R600/xor.ll            |   2 +-

 11 files changed, 13 insertions(+), 188 deletions(-)</pre>

      </div>

    </blockquote>

    <br>

    LGTM<br>

  </body>

</html>