<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    <div class="moz-cite-prefix">On 12/05/2014 07:38 PM, Tom Stellard
      wrote:<br>
    </div>
    <blockquote cite="mid:20141206003809.GA26139@freedesktop.org"
      type="cite">
      <div class="moz-text-plain" wrap="true" graphical-quote="true"
        style="font-family: -moz-fixed; font-size: 12px;"
        lang="x-western">
        <pre wrap="">On Tue, Dec 02, 2014 at 05:52:32PM -0500, Matt Arsenault wrote:
</pre>
        <blockquote type="cite" style="color: #000000;">
          <pre wrap=""><span class="moz-txt-citetags">> </span>
</pre>
        </blockquote>
        <pre wrap="">Hi Matt,

Here are some updated patches.

-Tom
</pre>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"><legend
          class="mimeAttachmentHeaderName">0001-R600-SI-Move-setting-of-the-lds-bit-to-the-base-MUBU.patch</legend></fieldset>
      <br>
      <div class="moz-text-plain" wrap="true" graphical-quote="true"
        style="font-family: -moz-fixed; font-size: 12px;"
        lang="x-western">
        <pre wrap="">From 57edec3cb6061a6ec794c5db6937c30b8c669973 Mon Sep 17 00:00:00 2001
From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>
Date: Thu, 4 Dec 2014 15:21:31 +0000
Subject: [PATCH 1/6] R600/SI: Move setting of the lds bit to the base MUBUF
 class

---
 lib/Target/R600/SIInstrFormats.td | 2 ++
 lib/Target/R600/SIInstrInfo.td    | 7 +++----
 2 files changed, 5 insertions(+), 4 deletions(-)
</pre>
      </div>
    </blockquote>
    LGTM<br>
    <blockquote cite="mid:20141206003809.GA26139@freedesktop.org"
      type="cite">
      <div class="moz-text-plain" wrap="true" graphical-quote="true"
        style="font-family: -moz-fixed; font-size: 12px;"
        lang="x-western">
        <pre wrap="">
</pre>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"><legend
          class="mimeAttachmentHeaderName">0002-R600-SI-Set-MayStore-0-on-MUBUF-loads.patch</legend></fieldset>
      <br>
      <div class="moz-text-plain" wrap="true" graphical-quote="true"
        style="font-family: -moz-fixed; font-size: 12px;"
        lang="x-western">
        <pre wrap="">From 7401e81c0113ad5c2928ef6d84f26500f604a967 Mon Sep 17 00:00:00 2001
From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>
Date: Mon, 24 Nov 2014 20:56:28 +0000
Subject: [PATCH 2/6] R600/SI: Set MayStore = 0 on MUBUF loads

---
 lib/Target/R600/SIInstrInfo.td | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
index 5e479bc..23c4781 100644
--- a/lib/Target/R600/SIInstrInfo.td
+++ b/lib/Target/R600/SIInstrInfo.td
@@ -1191,7 +1191,7 @@ multiclass MUBUF_Load_Helper <bits<7> op, string asm, RegisterClass regClass,
                               ValueType load_vt = i32,
                               SDPatternOperator ld = null_frag> {
 
-  let mayLoad = 1 in {
+  let mayLoad = 1, mayStore = 0 in {
 
     let addr64 = 0 in {
 
<div class="moz-txt-sig">-- 
2.0.4
</div></pre>
      </div>
    </blockquote>
    <br>
    LGTM<br>
    <blockquote cite="mid:20141206003809.GA26139@freedesktop.org"
      type="cite">
      <div class="moz-text-plain" wrap="true" graphical-quote="true"
        style="font-family: -moz-fixed; font-size: 12px;"
        lang="x-western">
        <pre wrap=""><div class="moz-txt-sig">
</div></pre>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"><legend
          class="mimeAttachmentHeaderName">0003-R600-SI-Be-more-conservative-about-inserting-s_waitc.patch</legend></fieldset>
      <br>
      <div class="moz-text-plain" wrap="true" graphical-quote="true"
        style="font-family: -moz-fixed; font-size: 12px;"
        lang="x-western">
        <pre wrap="">From c0f83f4b60d4cb5bf8998297e1401ce142a9386d Mon Sep 17 00:00:00 2001
From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>
Date: Fri, 5 Dec 2014 22:10:28 +0000
Subject: [PATCH 3/6] R600/SI: Be more conservative about inserting s_waitcnt

Based on testing it seems that a ds_read instruction may
overwrite the address of a previous ds_read.  Here is an
example:

ds_read_b32 v18, v19
ds_read_b32 v19, v20

So we need to make sure we insert an s_waitcnt between these
two instructions.  This changes the SIInsertWait pass to
handle this scenario for all instruction types.  This is likely
overly conservative, but more information is needed to implement
this in an optimal way.
---
 lib/Target/R600/SIInsertWaits.cpp | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)</pre>
      </div>
    </blockquote>
    <br>
    LGTM but I think the comment could be clearer.<br>
    <blockquote cite="mid:20141206003809.GA26139@freedesktop.org"
      type="cite">
      <div class="moz-text-plain" wrap="true" graphical-quote="true"
        style="font-family: -moz-fixed; font-size: 12px;"
        lang="x-western">
        <pre wrap="">

diff --git a/lib/Target/R600/SIInsertWaits.cpp b/lib/Target/R600/SIInsertWaits.cpp
index 712d97d..5bdd8d7 100644
--- a/lib/Target/R600/SIInsertWaits.cpp
+++ b/lib/Target/R600/SIInsertWaits.cpp
@@ -172,18 +172,16 @@ bool SIInsertWaits::isOpRelevant(MachineOperand &Op) {
   if (MI.getOpcode() == AMDGPU::EXP)
     return true;
 
-  // For stores the stored value is also relevant
-  if (!MI.getDesc().mayStore())
-    return false;
-
-  for (MachineInstr::mop_iterator I = MI.operands_begin(),
-       E = MI.operands_end(); I != E; ++I) {
-
-    if (I->isReg() && I->isUse())
-      return Op.isIdenticalTo(*I);
-  }
-
-  return false;
+  // We used to not consider address operands as relevant.  However testing
+  // has shown that a ds_read_b32 may overwrite the address of a previous
+  // ds_read_b32 instruction.  Therefore, we need to consider address
+  // registers relevant.</pre>
      </div>
    </blockquote>
    To be more clear, how about:<br>
    <br>
    However, testing has shown that a ds instruction may require its
    address register to be preserved until the operation is completed.
    If the address register is overwritten before the operation is
    complete, it may not work correctly.<br>
    <br>
    <blockquote cite="mid:20141206003809.GA26139@freedesktop.org"
      type="cite">
      <div class="moz-text-plain" wrap="true" graphical-quote="true"
        style="font-family: -moz-fixed; font-size: 12px;"
        lang="x-western">
        <pre wrap="">
+  //
+  // FIXME: We are being conservative and doing this for all instruction
+  // types.  Is this necessary.</pre>
      </div>
    </blockquote>
    Missing question mark<br>
    <blockquote cite="mid:20141206003809.GA26139@freedesktop.org"
      type="cite">
      <div class="moz-text-plain" wrap="true" graphical-quote="true"
        style="font-family: -moz-fixed; font-size: 12px;"
        lang="x-western">
        <pre wrap="">
+  //
+  //
+  return Op.isReg() && Op.isUse();
 }
 
 RegInterval SIInsertWaits::getRegInterval(MachineOperand &Op) {
<div class="moz-txt-sig">-- 
2.0.4

</div></pre>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"><legend
          class="mimeAttachmentHeaderName">0004-R600-SI-Spill-VGPRs-to-scratch-space-for-compute-sha.patch</legend></fieldset>
      <br>
      <div class="moz-text-plain" wrap="true" graphical-quote="true"
        style="font-family: -moz-fixed; font-size: 12px;"
        lang="x-western">
        <pre wrap="">From 218db2e2540f3391fc9af28948960e119bb6dcc8 Mon Sep 17 00:00:00 2001
From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>
Date: Thu, 27 Nov 2014 21:13:48 +0000
Subject: [PATCH 4/6] R600/SI: Spill VGPRs to scratch space for compute shaders

---
 lib/Target/R600/AMDGPU.h                  |   1 +
 lib/Target/R600/AMDGPUTargetMachine.cpp   |   1 +
 lib/Target/R600/CMakeLists.txt            |   1 +
 lib/Target/R600/SIInstrInfo.cpp           |  26 ++--
 lib/Target/R600/SIInstrInfo.td            |   2 +
 lib/Target/R600/SIInstructions.td         |  51 ++++----
 lib/Target/R600/SIMachineFunctionInfo.cpp |   3 +-
 lib/Target/R600/SIMachineFunctionInfo.h   |   3 +
 lib/Target/R600/SIPrepareScratchRegs.cpp  | 196 ++++++++++++++++++++++++++++++
 lib/Target/R600/SIRegisterInfo.cpp        | 156 ++++++++++++++----------
 lib/Target/R600/SIRegisterInfo.h          |   9 +-
 11 files changed, 354 insertions(+), 95 deletions(-)
 create mode 100644 lib/Target/R600/SIPrepareScratchRegs.cpp</pre>
      </div>
    </blockquote>
    <br>
    LGTM<br>
    <blockquote cite="mid:20141206003809.GA26139@freedesktop.org"
      type="cite">
      <div class="moz-text-plain" wrap="true" graphical-quote="true"
        style="font-family: -moz-fixed; font-size: 12px;"
        lang="x-western">
        <pre wrap="">
+
+    for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();
+         I != E; ++I) {
+      MachineInstr &MI = *I;
+      DebugLoc DL = MI.getDebugLoc();
+      switch(MI.getOpcode()) {
+        default: break;;
+        case AMDGPU::SI_SPILL_V512_SAVE:
+        case AMDGPU::SI_SPILL_V256_SAVE:
+        case AMDGPU::SI_SPILL_V128_SAVE:
+        case AMDGPU::SI_SPILL_V96_SAVE:
+        case AMDGPU::SI_SPILL_V64_SAVE:
+        case AMDGPU::SI_SPILL_V32_SAVE:
+        case AMDGPU::SI_SPILL_V32_RESTORE:
+        case AMDGPU::SI_SPILL_V64_RESTORE:
+        case AMDGPU::SI_SPILL_V128_RESTORE:
+        case AMDGPU::SI_SPILL_V256_RESTORE:
+        case AMDGPU::SI_SPILL_V512_RESTORE:
+
</pre>
      </div>
    </blockquote>
    <br>
    Maybe these should get an isSpillSave bit?<br>
    <br>
    <blockquote cite="mid:20141206003809.GA26139@freedesktop.org"
      type="cite">
      <fieldset class="mimeAttachmentHeader"><legend
          class="mimeAttachmentHeaderName">0006-R600-SI-Define-a-schedule-model-and-enable-the-gener.patch</legend></fieldset>
      <br>
      <div class="moz-text-plain" wrap="true" graphical-quote="true"
        style="font-family: -moz-fixed; font-size: 12px;"
        lang="x-western">
        <pre wrap="">From 0375ddd7acffe50c650454766c04229877f42908 Mon Sep 17 00:00:00 2001
From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>
Date: Fri, 19 Jul 2013 11:50:00 -0700
Subject: [PATCH 6/6] R600/SI: Define a schedule model and enable the generic
 machine scheduler

The schedule model is not complete yet, and could be improved.
---
 lib/Target/R600/AMDGPUSubtarget.cpp              | 21 ++++++-
 lib/Target/R600/AMDGPUSubtarget.h                | 13 +++-
 lib/Target/R600/Processors.td                    | 22 +++----
 lib/Target/R600/SIInstrFormats.td                | 10 ++-
 lib/Target/R600/SIInstructions.td                | 48 +++++++++++++-
 lib/Target/R600/SIRegisterInfo.cpp               | 55 +++++++++++++++-
 lib/Target/R600/SIRegisterInfo.h                 | 12 +++-
 lib/Target/R600/SISchedule.td                    | 80 +++++++++++++++++++++++-
 test/CodeGen/R600/atomic_cmp_swap_local.ll       |  6 +-
 test/CodeGen/R600/ctpop.ll                       |  4 +-
 test/CodeGen/R600/ds_read2st64.ll                |  4 +-
 test/CodeGen/R600/fceil64.ll                     |  4 +-
 test/CodeGen/R600/ffloor.ll                      |  4 +-
 test/CodeGen/R600/fmax3.ll                       |  6 +-
 test/CodeGen/R600/fmin3.ll                       |  6 +-
 test/CodeGen/R600/fneg-fabs.f64.ll               |  2 +-
 test/CodeGen/R600/ftrunc.f64.ll                  |  4 +-
 test/CodeGen/R600/llvm.memcpy.ll                 | 34 +++++-----
 test/CodeGen/R600/local-atomics.ll               |  4 +-
 test/CodeGen/R600/local-atomics64.ll             |  2 +-
 test/CodeGen/R600/local-memory-two-objects.ll    |  4 +-
 test/CodeGen/R600/si-triv-disjoint-mem-access.ll |  2 +-
 test/CodeGen/R600/smrd.ll                        | 10 +--
 test/CodeGen/R600/valu-i1.ll                     |  3 +-
 test/CodeGen/R600/wait.ll                        |  5 +-
 test/CodeGen/R600/xor.ll                         |  2 +-
 test/CodeGen/R600/zero_extend.ll                 |  2 +-
 27 files changed, 290 insertions(+), 79 deletions(-)
</pre>
      </div>
    </blockquote>
    <br>
    LGTM<br>
    <br>
  </body>
</html>