<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="moz-cite-prefix">On 12/02/2014 04:47 PM, Tom Stellard
wrote:<br>
</div>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">Hi,
The attached patches enable the machine scheduler on SI subtargets as well as
enable register spilling for compute. Enabling the scheduler improves performance
on the Unigine demos by 10% and Luxmark by 20%.
There is still a lot of room for improvement in the scheduler definitions and
the associated callbacks. These patches just enable the scheduler and implement
a simple machine model.
-Tom
</pre>
</div>
<br>
<fieldset class="mimeAttachmentHeader"><legend
class="mimeAttachmentHeaderName">0001-R600-SI-Don-t-run-SI-passes-on-R600-subtargets.patch</legend></fieldset>
<br>
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">From d30579a680c62e5dcb2e2c9e02ee203cf4d7bba3 Mon Sep 17 00:00:00 2001
From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>
Date: Fri, 28 Nov 2014 14:38:48 +0000
Subject: [PATCH 1/6] R600/SI: Don't run SI passes on R600 subtargets
---
lib/Target/R600/AMDGPUTargetMachine.cpp | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)</pre>
</div>
</blockquote>
<br>
LGTM<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<fieldset class="mimeAttachmentHeader"><legend
class="mimeAttachmentHeaderName">0002-R600-SI-Move-SIInsertWaits-into-AMDGPUPassConfig-add.patch</legend></fieldset>
<br>
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">From a25b877f29865facce0aff53295356e336bd2b0a Mon Sep 17 00:00:00 2001
From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>
Date: Fri, 28 Nov 2014 14:34:24 +0000
Subject: [PATCH 2/6] R600/SI: Move SIInsertWaits into
AMDGPUPassConfig::addPreSched2()
This pass needs to be run after PrologEpilogInserter, because
that pass may inserter spill code which reads or writes memory.
---
lib/Target/R600/AMDGPUTargetMachine.cpp | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)</pre>
</div>
</blockquote>
<br>
LGTM
<pre wrap=""><div class="moz-txt-sig">
</div></pre>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite"><br>
<fieldset class="mimeAttachmentHeader"><legend
class="mimeAttachmentHeaderName">0003-R600-SI-Set-MayStore-0-on-MUBUF-loads.patch</legend></fieldset>
<br>
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">From c0688d069d3de09a75241ec2f0f0cc605320f3f8 Mon Sep 17 00:00:00 2001
From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>
Date: Mon, 24 Nov 2014 20:56:28 +0000
Subject: [PATCH 3/6] R600/SI: Set MayStore = 0 on MUBUF loads
---
lib/Target/R600/SIInstrInfo.td | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
index 392c272..8cc84c9 100644
--- a/lib/Target/R600/SIInstrInfo.td
+++ b/lib/Target/R600/SIInstrInfo.td
@@ -1187,7 +1187,7 @@ multiclass MUBUF_Load_Helper <bits<7> op, string asm, RegisterClass regClass,
ValueType load_vt = i32,
SDPatternOperator ld = null_frag> {
- let lds = 0, mayLoad = 1 in {
+ let lds = 0, mayLoad = 1, mayStore = 0 in {
let addr64 = 0 in {
<div class="moz-txt-sig">--
2.0.4
</div></pre>
</div>
</blockquote>
LGTM. Shouldn't this be getting set in the base MUBUF instruction
class? Why do these need to be set again here?<br>
<br>
<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap=""><div class="moz-txt-sig">
</div></pre>
</div>
<br>
<fieldset class="mimeAttachmentHeader"><legend
class="mimeAttachmentHeaderName">0004-R600-SI-Spill-VGPRs-to-scratch-space-for-compute-sha.patch</legend></fieldset>
<br>
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">From 2688eabde67974f5720c9abc99777a97923c5e8e Mon Sep 17 00:00:00 2001
From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>
Date: Thu, 27 Nov 2014 21:13:48 +0000
Subject: [PATCH 4/6] R600/SI: Spill VGPRs to scratch space for compute shaders
---
lib/Target/R600/AMDGPU.h | 1 +
lib/Target/R600/AMDGPUTargetMachine.cpp | 1 +
lib/Target/R600/CMakeLists.txt | 1 +
lib/Target/R600/SIInstrInfo.cpp | 33 ++++-
lib/Target/R600/SIInstrInfo.td | 2 +
lib/Target/R600/SIInstructions.td | 10 +-
lib/Target/R600/SIMachineFunctionInfo.cpp | 3 +-
lib/Target/R600/SIMachineFunctionInfo.h | 3 +
lib/Target/R600/SIPrepareScratchRegs.cpp | 198 ++++++++++++++++++++++++++++++
lib/Target/R600/SIRegisterInfo.cpp | 156 +++++++++++++----------
lib/Target/R600/SIRegisterInfo.h | 9 +-
11 files changed, 344 insertions(+), 73 deletions(-)
create mode 100644 lib/Target/R600/SIPrepareScratchRegs.cpp
diff --git a/lib/Target/R600/AMDGPU.h b/lib/Target/R600/AMDGPU.h
index 13379e7..6819808 100644
--- a/lib/Target/R600/AMDGPU.h
+++ b/lib/Target/R600/AMDGPU.h
@@ -47,6 +47,7 @@ FunctionPass *createSIFixSGPRCopiesPass(TargetMachine &tm);
FunctionPass *createSIFixSGPRLiveRangesPass();
FunctionPass *createSICodeEmitterPass(formatted_raw_ostream &OS);
FunctionPass *createSIInsertWaits(TargetMachine &tm);
+FunctionPass *createSIPrepareScratchRegs();
void initializeSIFoldOperandsPass(PassRegistry &);
extern char &SIFoldOperandsID;
diff --git a/lib/Target/R600/AMDGPUTargetMachine.cpp b/lib/Target/R600/AMDGPUTargetMachine.cpp
index d4ee738..c035af0 100644
--- a/lib/Target/R600/AMDGPUTargetMachine.cpp
+++ b/lib/Target/R600/AMDGPUTargetMachine.cpp
@@ -190,6 +190,7 @@ bool AMDGPUPassConfig::addPostRegAlloc() {
const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>();
if (ST.getGeneration() > AMDGPUSubtarget::NORTHERN_ISLANDS) {
+ addPass(createSIPrepareScratchRegs());
addPass(createSIShrinkInstructionsPass());
}
return false;
diff --git a/lib/Target/R600/CMakeLists.txt b/lib/Target/R600/CMakeLists.txt
index 3b703e7..5a4bae2 100644
--- a/lib/Target/R600/CMakeLists.txt
+++ b/lib/Target/R600/CMakeLists.txt
@@ -51,6 +51,7 @@ add_llvm_target(R600CodeGen
SILowerControlFlow.cpp
SILowerI1Copies.cpp
SIMachineFunctionInfo.cpp
+ SIPrepareScratchRegs.cpp
SIRegisterInfo.cpp
SIShrinkInstructions.cpp
SITypeRewriter.cpp
diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
index 1a0010c..acdb0fa 100644
--- a/lib/Target/R600/SIInstrInfo.cpp
+++ b/lib/Target/R600/SIInstrInfo.cpp
@@ -426,8 +426,7 @@ static bool shouldTryToSpillVGPRs(MachineFunction *MF) {
// FIXME: Even though it can cause problems, we need to enable
// spilling at -O0, since the fast register allocator always
// spills registers that are live at the end of blocks.
- return MFI->getShaderType() == ShaderType::COMPUTE &&
- TM.getOptLevel() == CodeGenOpt::None;
+ return MFI->getShaderType() == ShaderType::COMPUTE;
</pre>
</div>
</blockquote>
I still don't think conditionally enabling spilling makes any sense.
It just changes how it breaks for non-compute shaders.<br>
<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
}
@@ -438,7 +437,9 @@ void SIInstrInfo::storeRegToStackSlot(MachineBasicBlock &MBB,
const TargetRegisterClass *RC,
const TargetRegisterInfo *TRI) const {
MachineFunction *MF = MBB.getParent();
+ SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
MachineFrameInfo *FrameInfo = MF->getFrameInfo();
+ MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
DebugLoc DL = MBB.findDebugLoc(MI);
int Opcode = -1;
@@ -454,6 +455,19 @@ void SIInstrInfo::storeRegToStackSlot(MachineBasicBlock &MBB,
case 512: Opcode = AMDGPU::SI_SPILL_S512_SAVE; break;
}
} else if(shouldTryToSpillVGPRs(MF) && RI.hasVGPRs(RC)) {
+ MFI->setHasSpilledVGPRs();
+#if 0
+ unsigned ScratchPtr =
+ RI.getPreloadedValue(*MF, SIRegisterInfo::SCRATCH_PTR);
+ unsigned ScratchOffset =
+ RI.getPreloadedValue(*MF, SIRegisterInfo::SCRATCH_WAVE_OFFSET);
+ if (!MRI.isLiveIn(ScratchPtr))
+ MRI.addLiveIn(ScratchPtr);
+
+ if (!MRI.isLiveIn(ScratchOffset))
+ MRI.addLiveIn(ScratchOffset);
+#endif
+</pre>
</div>
</blockquote>
Dead code<br>
<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
switch(RC->getSize() * 8) {
case 32: Opcode = AMDGPU::SI_SPILL_V32_SAVE; break;
case 64: Opcode = AMDGPU::SI_SPILL_V64_SAVE; break;
@@ -468,7 +482,11 @@ void SIInstrInfo::storeRegToStackSlot(MachineBasicBlock &MBB,
FrameInfo->setObjectAlignment(FrameIndex, 4);
BuildMI(MBB, MI, DL, get(Opcode))
.addReg(SrcReg)
- .addFrameIndex(FrameIndex);
+ .addFrameIndex(FrameIndex)
+ // Place-holder registers, these will be filled in by
+ // SIPrepareScratchRegs.</pre>
</div>
</blockquote>
Why do you need to reserve physical registers for this? Can you
define virtual registers with IMPLICIT_DEF, or virtual registers
with<br>
a spill register class for the pseudo instructions?<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
+ .addReg(AMDGPU::SGPR0_SGPR1, RegState::Undef)
+ .addReg(AMDGPU::SGPR0, RegState::Undef);
} else {
LLVMContext &Ctx = MF->getFunction()->getContext();
Ctx.emitError("SIInstrInfo::storeRegToStackSlot - Do not know how to"
@@ -510,7 +528,12 @@ void SIInstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB,
if (Opcode != -1) {
FrameInfo->setObjectAlignment(FrameIndex, 4);
BuildMI(MBB, MI, DL, get(Opcode), DestReg)
- .addFrameIndex(FrameIndex);
+ .addFrameIndex(FrameIndex)
+ // Place-holder registers, these will be filled in by
+ // SIPrepareScratchRegs.
+ .addReg(AMDGPU::SGPR0_SGPR1, RegState::Undef)
+ .addReg(AMDGPU::SGPR0, RegState::Undef);
+
} else {
LLVMContext &Ctx = MF->getFunction()->getContext();
Ctx.emitError("SIInstrInfo::loadRegFromStackSlot - Do not know how to"
@@ -541,7 +564,7 @@ unsigned SIInstrInfo::calculateLDSSpillAddress(MachineBasicBlock &MBB,
MachineBasicBlock::iterator Insert = Entry.front();
DebugLoc DL = Insert->getDebugLoc();
- TIDReg = RI.findUnusedVGPR(MF->getRegInfo());
+ TIDReg = RI.findUnusedRegister(MF->getRegInfo(), &AMDGPU::VGPR_32RegClass);
if (TIDReg == AMDGPU::NoRegister)
return TIDReg;
diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
index 8cc84c9..afdc200 100644
--- a/lib/Target/R600/SIInstrInfo.td
+++ b/lib/Target/R600/SIInstrInfo.td
@@ -1240,6 +1240,7 @@ multiclass MUBUF_Load_Helper <bits<7> op, string asm, RegisterClass regClass,
multiclass MUBUF_Store_Helper <bits<7> op, string name, RegisterClass vdataClass,
ValueType store_vt, SDPatternOperator st> {
+ let mayLoad = 0, mayStore = 1 in {
let addr64 = 0, lds = 0 in {
def "" : MUBUF <
@@ -1298,6 +1299,7 @@ multiclass MUBUF_Store_Helper <bits<7> op, string name, RegisterClass vdataClass
let tfe = 0;
let soffset = 128; // ZERO
}
+ } // End mayLoad = 0, mayStore = 1
}
class FLAT_Load_Helper <bits<7> op, string asm, RegisterClass regClass> :
diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
index 00ce9bf..3a969e7 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -1856,13 +1856,14 @@ multiclass SI_SPILL_SGPR <RegisterClass sgpr_class> {
def _SAVE : InstSI <
(outs),
- (ins sgpr_class:$src, i32imm:$frame_idx),
+ (ins sgpr_class:$src, i32imm:$frame_idx, SReg_64:$scratch_ptr,
+ SReg_32:$scratch_offset),
"", []
>;
def _RESTORE : InstSI <
(outs sgpr_class:$dst),
- (ins i32imm:$frame_idx),
+ (ins i32imm:$frame_idx, SReg_64:$scratch_ptr, SReg_32:$scratch_offset),
"", []
>;
@@ -1877,13 +1878,14 @@ defm SI_SPILL_S512 : SI_SPILL_SGPR <SReg_512>;
multiclass SI_SPILL_VGPR <RegisterClass vgpr_class> {
def _SAVE : InstSI <
(outs),
- (ins vgpr_class:$src, i32imm:$frame_idx),
+ (ins vgpr_class:$src, i32imm:$frame_idx, SReg_64:$scratch_ptr,
+ SReg_32:$scratch_offset),
"", []
>;
def _RESTORE : InstSI <
(outs vgpr_class:$dst),
- (ins i32imm:$frame_idx),
+ (ins i32imm:$frame_idx, SReg_64:$scratch_ptr, SReg_32:$scratch_offset),
"", []
>;
}
diff --git a/lib/Target/R600/SIMachineFunctionInfo.cpp b/lib/Target/R600/SIMachineFunctionInfo.cpp
index d58f31d..198dd56 100644
--- a/lib/Target/R600/SIMachineFunctionInfo.cpp
+++ b/lib/Target/R600/SIMachineFunctionInfo.cpp
@@ -29,6 +29,7 @@ void SIMachineFunctionInfo::anchor() {}
SIMachineFunctionInfo::SIMachineFunctionInfo(const MachineFunction &MF)
: AMDGPUMachineFunction(MF),
TIDReg(AMDGPU::NoRegister),
+ HasSpilledVGPRs(false),
PSInputAddr(0),
NumUserSGPRs(0),
LDSWaveSpillSize(0) { }
@@ -50,7 +51,7 @@ SIMachineFunctionInfo::SpilledReg SIMachineFunctionInfo::getSpilledReg(
struct SpilledReg Spill;
if (!LaneVGPRs.count(LaneVGPRIdx)) {
- unsigned LaneVGPR = TRI->findUnusedVGPR(MRI);
+ unsigned LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass);
LaneVGPRs[LaneVGPRIdx] = LaneVGPR;
MRI.setPhysRegUsed(LaneVGPR);
diff --git a/lib/Target/R600/SIMachineFunctionInfo.h b/lib/Target/R600/SIMachineFunctionInfo.h
index 6bb8f9d..7185271 100644
--- a/lib/Target/R600/SIMachineFunctionInfo.h
+++ b/lib/Target/R600/SIMachineFunctionInfo.h
@@ -29,6 +29,7 @@ class SIMachineFunctionInfo : public AMDGPUMachineFunction {
void anchor() override;
unsigned TIDReg;
+ bool HasSpilledVGPRs;
public:
@@ -52,6 +53,8 @@ public:
bool hasCalculatedTID() const { return TIDReg != AMDGPU::NoRegister; };
unsigned getTIDReg() const { return TIDReg; };
void setTIDReg(unsigned Reg) { TIDReg = Reg; }
+ bool hasSpilledVGPRs() const { return HasSpilledVGPRs; }
+ void setHasSpilledVGPRs(bool Spill = true) { HasSpilledVGPRs = Spill; }
unsigned getMaximumWorkGroupSize(const MachineFunction &MF) const;
};
diff --git a/lib/Target/R600/SIPrepareScratchRegs.cpp b/lib/Target/R600/SIPrepareScratchRegs.cpp
new file mode 100644
index 0000000..32010f0
--- /dev/null
+++ b/lib/Target/R600/SIPrepareScratchRegs.cpp
@@ -0,0 +1,198 @@
+//===-- SIPrepareScratchRegs.cpp - Use predicates for control flow --------===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+///
+/// This pass loads scratch pointer and scratch offset into a register or a
+/// frame index which can be used anywhere in the program. These values will
+/// be used for spilling VGPRs.
+///
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "AMDGPUSubtarget.h"
+#include "SIDefines.h"
+#include "SIInstrInfo.h"
+#include "SIMachineFunctionInfo.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/RegisterScavenging.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/LLVMContext.h"
+
+using namespace llvm;
+
+namespace {
+
+class SIPrepareScratchRegs : public MachineFunctionPass {
+
+private:
+ static char ID;
+
+public:
+ SIPrepareScratchRegs() : MachineFunctionPass(ID) { }
+
+ bool runOnMachineFunction(MachineFunction &MF) override;
+
+ const char *getPassName() const override {
+ return "SI prepare scratch registers";
+ }
+
+};
+
+} // End anonymous namespace
+
+char SIPrepareScratchRegs::ID = 0;
+
+FunctionPass *llvm::createSIPrepareScratchRegs() {
+ return new SIPrepareScratchRegs();
+}
+
+// FIXME: Insert waits listed in Table 4.2 "Required User-Inserted Wait States"
+// around other non-memory instructions.</pre>
</div>
</blockquote>
Comment copied from other pass?<br>
<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
+bool SIPrepareScratchRegs::runOnMachineFunction(MachineFunction &MF) {
+ SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
+ const SIInstrInfo *TII =
+ static_cast<const SIInstrInfo *>(MF.getSubtarget().getInstrInfo());
+ const SIRegisterInfo *TRI = &TII->getRegisterInfo();
+ MachineRegisterInfo &MRI = MF.getRegInfo();
+ MachineFrameInfo *FrameInfo = MF.getFrameInfo();
+ MachineBasicBlock *Entry = MF.begin();
+ MachineBasicBlock::iterator I = Entry->begin();
+ DebugLoc DL = I->getDebugLoc();
+
+ // FIXME: If we don't have enough VGPRs for SGPR spilling we will need to do
+ // run this pass.</pre>
</div>
</blockquote>
Grammar: "we will need to do"<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
+ if (!MFI->hasSpilledVGPRs())
+ return false;
+
+ unsigned ScratchPtrPreloadReg =
+ TRI->getPreloadedValue(MF, SIRegisterInfo::SCRATCH_PTR);
+ unsigned ScratchOffsetPreloadReg =
+ TRI->getPreloadedValue(MF, SIRegisterInfo::SCRATCH_WAVE_OFFSET);
+
+ if (!Entry->isLiveIn(ScratchPtrPreloadReg))
+ Entry->addLiveIn(ScratchPtrPreloadReg);
+
+ if (!Entry->isLiveIn(ScratchOffsetPreloadReg))
+ Entry->addLiveIn(ScratchOffsetPreloadReg);
+
+ // Load the scratch pointer
+ unsigned ScratchPtrReg =
+ TRI->findUnusedRegister(MRI, &AMDGPU::SGPR_64RegClass);
+ int ScratchPtrFI = ~0;</pre>
</div>
</blockquote>
Initialize to -1 since it's signed<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
+
+ if (ScratchPtrReg != AMDGPU::NoRegister) {
+ // Found a SGPR to use.</pre>
</div>
</blockquote>
Grammar: an SGPR<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
+ MRI.setPhysRegUsed(ScratchPtrReg);
+ BuildMI(*Entry, I, DL, TII->get(AMDGPU::S_MOV_B64), ScratchPtrReg)
+ .addReg(ScratchPtrPreloadReg);
+ } else {
+ // No SGPR is available, we must spill.
+ ScratchPtrFI = FrameInfo->CreateSpillStackObject(8, 4);
+ BuildMI(*Entry, I, DL, TII->get(AMDGPU::SI_SPILL_S64_SAVE))
+ .addReg(ScratchPtrPreloadReg)
+ .addFrameIndex(ScratchPtrFI);
+ }
+
+ // load the scratch offset</pre>
</div>
</blockquote>
Capitalize / period comment<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
+ unsigned ScratchOffsetReg =
+ TRI->findUnusedRegister(MRI, &AMDGPU::SGPR_32RegClass);
+ int ScratchOffsetFI = ~0;
+
+ if (ScratchOffsetReg != AMDGPU::NoRegister) {
+ // Found an SGPR to use
+ MRI.setPhysRegUsed(ScratchOffsetReg);
+ BuildMI(*Entry, I, DL, TII->get(AMDGPU::S_MOV_B32), ScratchOffsetReg)
+ .addReg(ScratchOffsetPreloadReg);
+ } else {
+ // No SGPR is available, we must spill.
+ ScratchOffsetFI = FrameInfo->CreateSpillStackObject(4,4);
+ BuildMI(*Entry, I, DL, TII->get(AMDGPU::SI_SPILL_S32_SAVE))
+ .addReg(ScratchOffsetPreloadReg)
+ .addFrameIndex(ScratchOffsetFI);
+ }
+
+
+ // Now that we have the scratch pointer and offset values, we need to
+ // add them to all the SI_SPILL_V* instructions.
+
+ RegScavenger RS;
+ bool UseRegScavenger =
+ (ScratchPtrReg == AMDGPU::NoRegister ||
+ ScratchOffsetReg == AMDGPU::NoRegister);
+ for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
+ BI != BE; ++BI) {
+
+ MachineBasicBlock &MBB = *BI;
+ if (UseRegScavenger)
+ RS.enterBasicBlock(&MBB);
+
+ for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();
+ I != E; ++I) {
+ MachineInstr &MI = *I;
+ DebugLoc DL = MI.getDebugLoc();
+ switch(MI.getOpcode()) {
+ default: break;;
+ case AMDGPU::SI_SPILL_V512_SAVE:
+ case AMDGPU::SI_SPILL_V256_SAVE:
+ case AMDGPU::SI_SPILL_V128_SAVE:
+ case AMDGPU::SI_SPILL_V96_SAVE:
+ case AMDGPU::SI_SPILL_V64_SAVE:
+ case AMDGPU::SI_SPILL_V32_SAVE:
+ case AMDGPU::SI_SPILL_V32_RESTORE:
+ case AMDGPU::SI_SPILL_V64_RESTORE:
+ case AMDGPU::SI_SPILL_V128_RESTORE:
+ case AMDGPU::SI_SPILL_V256_RESTORE:
+ case AMDGPU::SI_SPILL_V512_RESTORE:
+
+ // Scratch Pointer
+ if (ScratchPtrReg == AMDGPU::NoRegister) {
+ ScratchPtrReg = RS.scavengeRegister(&AMDGPU::SGPR_64RegClass, 0);
+ BuildMI(MBB, I, DL, TII->get(AMDGPU::SI_SPILL_S64_RESTORE),
+ ScratchPtrReg)
+ .addFrameIndex(ScratchPtrFI)
+ .addReg(AMDGPU::NoRegister)
+ .addReg(AMDGPU::NoRegister);
+ } else if (!MBB.isLiveIn(ScratchPtrReg)) {
+ MBB.addLiveIn(ScratchPtrReg);
+ }
+
+ if (ScratchOffsetReg == AMDGPU::NoRegister) {
+ ScratchOffsetReg = RS.scavengeRegister(&AMDGPU::SGPR_32RegClass, 0);
+ BuildMI(MBB, I, DL, TII->get(AMDGPU::SI_SPILL_S32_RESTORE),
+ ScratchOffsetReg)
+ .addFrameIndex(ScratchOffsetFI)
+ .addReg(AMDGPU::NoRegister)
+ .addReg(AMDGPU::NoRegister);
+ } else if (!MBB.isLiveIn(ScratchOffsetReg)) {
+ MBB.addLiveIn(ScratchOffsetReg);
+ }
+
+ if (ScratchPtrReg == AMDGPU::NoRegister ||
+ ScratchOffsetReg == AMDGPU::NoRegister) {
+ LLVMContext &Ctx = MF.getFunction()->getContext();
+ Ctx.emitError("Ran out of SGPRs for spilling VGPRs");
+ ScratchPtrReg = AMDGPU::SGPR0;
+ ScratchOffsetReg = AMDGPU::SGPR0;
+ }
+ MI.getOperand(2).setReg(ScratchPtrReg);
+ MI.getOperand(3).setReg(ScratchOffsetReg);
+
+ break;
+ }
+ if (UseRegScavenger)
+ RS.forward();
+ }
+ }
+ return true;
+}
diff --git a/lib/Target/R600/SIRegisterInfo.cpp b/lib/Target/R600/SIRegisterInfo.cpp
index cffea12..27abe9a 100644
--- a/lib/Target/R600/SIRegisterInfo.cpp
+++ b/lib/Target/R600/SIRegisterInfo.cpp
@@ -23,6 +23,7 @@
#include "llvm/IR/Function.h"
#include "llvm/IR/LLVMContext.h"
+#include "llvm/Support/Debug.h"
using namespace llvm;
SIRegisterInfo::SIRegisterInfo(const AMDGPUSubtarget &st)
@@ -92,6 +93,84 @@ static unsigned getNumSubRegsForSpillOp(unsigned Op) {
}
}
+void SIRegisterInfo::buildScratchLoadStore(MachineBasicBlock::iterator MI,
+ unsigned LoadStoreOp,
+ unsigned Value,
+ unsigned ScratchPtr,
+ unsigned ScratchOffset,
+ int64_t Offset,
+ RegScavenger *RS) const {
+
+ const SIInstrInfo <b class="moz-txt-star"><span class="moz-txt-tag">*</span>TII = static_cast<const SIInstrInfo<span class="moz-txt-tag">*</span></b>>(ST.getInstrInfo());
+ MachineBasicBlock *MBB = MI->getParent();
+ const MachineFunction *MF = MI->getParent()->getParent();
+ LLVMContext &Ctx = MF->getFunction()->getContext();
+ DebugLoc DL = MI->getDebugLoc();
+ bool IsLoad = TII->get(LoadStoreOp).mayLoad();
+
+ bool RanOutOfSGPRs = false;
+ unsigned SOffset = ScratchOffset;
+
+ unsigned RsrcReg = RS->scavengeRegister(&AMDGPU::SReg_128RegClass, MI, 0);
+ if (RsrcReg == AMDGPU::NoRegister) {
+ RanOutOfSGPRs = true;
+ RsrcReg = AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3;
+ }
+
+ unsigned NumSubRegs = getNumSubRegsForSpillOp(MI->getOpcode());
+ unsigned Size = NumSubRegs * 4;
+
+ uint64_t Rsrc = AMDGPU::RSRC_DATA_FORMAT | AMDGPU::RSRC_TID_ENABLE |
+ 0xffffffff; // Size
+
+ BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_MOV_B64),
+ getSubReg(RsrcReg, AMDGPU::sub0_sub1))
+ .addReg(ScratchPtr)
+ .addReg(RsrcReg, RegState::ImplicitDefine);
+
+ BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32),
+ getSubReg(RsrcReg, AMDGPU::sub2))
+ .addImm(Rsrc & 0xffffffff)
+ .addReg(RsrcReg, RegState::ImplicitDefine);
+
+ BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32),
+ getSubReg(RsrcReg, AMDGPU::sub3))
+ .addImm(Rsrc >> 32)
+ .addReg(RsrcReg, RegState::ImplicitDefine);
+
+ if (!isUInt<12>(Offset + Size)) {
+ SOffset = RS->scavengeRegister(&AMDGPU::SGPR_32RegClass, MI, 0);
+ if (SOffset == AMDGPU::NoRegister) {
+ RanOutOfSGPRs = true;
+ SOffset = AMDGPU::SGPR0;
+ }
+ BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), SOffset)
+ .addReg(ScratchOffset)
+ .addImm(Offset);
+ Offset = 0;
+ }
+
+ if (RanOutOfSGPRs)
+ Ctx.emitError("Ran out of SGPRs for spilling VGPRS");</pre>
</div>
</blockquote>
Errors should be lowercased<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
+
+ for (unsigned i = 0, e = NumSubRegs; i != e; ++i, Offset += 4) {
+ unsigned SubReg = NumSubRegs > 1 ?
+ getPhysRegSubReg(Value, &AMDGPU::VGPR_32RegClass, i) :
+ Value;
+ bool IsKill = (i == e - 1);
+
+ BuildMI(*MBB, MI, DL, TII->get(LoadStoreOp))
+ .addReg(SubReg, getDefRegState(IsLoad))
+ .addReg(RsrcReg, getKillRegState(IsKill))
+ .addImm(Offset)
+ .addReg(SOffset, getKillRegState(IsKill))
+ .addImm(0) // glc
+ .addImm(0) // slc
+ .addImm(0) // tfe
+ .addReg(Value, RegState::Implicit | getDefRegState(IsLoad));
+ }
+}
+
void SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
int SPAdj, unsigned FIOperandNum,
RegScavenger *RS) const {
@@ -160,7 +239,8 @@ void SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_READLANE_B32), SubReg)
.addReg(Spill.VGPR)
- .addImm(Spill.Lane);
+ .addImm(Spill.Lane)
+ .addReg(MI->getOperand(0).getReg(), RegState::ImplicitDefine);
if (isM0) {
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32), AMDGPU::M0)
.addReg(SubReg);
@@ -177,71 +257,24 @@ void SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
case AMDGPU::SI_SPILL_V128_SAVE:
case AMDGPU::SI_SPILL_V96_SAVE:
case AMDGPU::SI_SPILL_V64_SAVE:
- case AMDGPU::SI_SPILL_V32_SAVE: {
- unsigned NumSubRegs = getNumSubRegsForSpillOp(MI->getOpcode());
- unsigned SrcReg = MI->getOperand(0).getReg();
- int64_t Offset = FrameInfo->getObjectOffset(Index);
- unsigned Size = NumSubRegs * 4;
- unsigned TmpReg = RS->scavengeRegister(&AMDGPU::VGPR_32RegClass, MI, 0);
-
- for (unsigned i = 0, e = NumSubRegs; i != e; ++i) {
- unsigned SubReg = NumSubRegs > 1 ?
- getPhysRegSubReg(SrcReg, &AMDGPU::VGPR_32RegClass, i) :
- SrcReg;
- Offset += (i * 4);
- MFI->LDSWaveSpillSize = std::max((unsigned)Offset + 4, (unsigned)MFI->LDSWaveSpillSize);
-
- unsigned AddrReg = TII->calculateLDSSpillAddress(*MBB, MI, RS, TmpReg,
- Offset, Size);
-
- if (AddrReg == AMDGPU::NoRegister) {
- LLVMContext &Ctx = MF->getFunction()->getContext();
- Ctx.emitError("Ran out of VGPRs for spilling VGPRS");
- AddrReg = AMDGPU::VGPR0;
- }
-
- // Store the value in LDS
- BuildMI(*MBB, MI, DL, TII->get(AMDGPU::DS_WRITE_B32))
- .addImm(0) // gds
- .addReg(AddrReg, RegState::Kill) // addr
- .addReg(SubReg) // data0
- .addImm(0); // offset
- }
-
+ case AMDGPU::SI_SPILL_V32_SAVE:
+ buildScratchLoadStore(MI, AMDGPU::BUFFER_STORE_DWORD_OFFSET,
+ MI->getOperand(0).getReg(),
+ MI->getOperand(2).getReg(),
+ MI->getOperand(3).getReg(),
+ FrameInfo->getObjectOffset(Index), RS);
MI->eraseFromParent();
break;
- }
case AMDGPU::SI_SPILL_V32_RESTORE:
case AMDGPU::SI_SPILL_V64_RESTORE:
case AMDGPU::SI_SPILL_V128_RESTORE:
case AMDGPU::SI_SPILL_V256_RESTORE:
case AMDGPU::SI_SPILL_V512_RESTORE: {
- unsigned NumSubRegs = getNumSubRegsForSpillOp(MI->getOpcode());
- unsigned DstReg = MI->getOperand(0).getReg();
- int64_t Offset = FrameInfo->getObjectOffset(Index);
- unsigned Size = NumSubRegs * 4;
- unsigned TmpReg = RS->scavengeRegister(&AMDGPU::VGPR_32RegClass, MI, 0);
-
- // FIXME: We could use DS_READ_B64 here to optimize for larger registers.
- for (unsigned i = 0, e = NumSubRegs; i != e; ++i) {
- unsigned SubReg = NumSubRegs > 1 ?
- getPhysRegSubReg(DstReg, &AMDGPU::VGPR_32RegClass, i) :
- DstReg;
-
- Offset += (i * 4);
- unsigned AddrReg = TII->calculateLDSSpillAddress(*MBB, MI, RS, TmpReg,
- Offset, Size);
- if (AddrReg == AMDGPU::NoRegister) {
- LLVMContext &Ctx = MF->getFunction()->getContext();
- Ctx.emitError("Ran out of VGPRs for spilling VGPRs");
- AddrReg = AMDGPU::VGPR0;
- }
-
- BuildMI(*MBB, MI, DL, TII->get(AMDGPU::DS_READ_B32), SubReg)
- .addImm(0) // gds
- .addReg(AddrReg, RegState::Kill) // addr
- .addImm(0); //offset
- }
+ buildScratchLoadStore(MI, AMDGPU::BUFFER_LOAD_DWORD_OFFSET,
+ MI->getOperand(0).getReg(),
+ MI->getOperand(2).getReg(),
+ MI->getOperand(3).getReg(),
+ FrameInfo->getObjectOffset(Index), RS);</pre>
</div>
</blockquote>
Use named operands?<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
MI->eraseFromParent();
break;
}
@@ -452,9 +485,8 @@ unsigned SIRegisterInfo::getPreloadedValue(const MachineFunction &MF,
/// \brief Returns a register that is not used at any point in the function.
/// If all registers are used, then this function will return
// AMDGPU::NoRegister.
-unsigned SIRegisterInfo::findUnusedVGPR(const MachineRegisterInfo &MRI) const {
-
- const TargetRegisterClass *RC = &AMDGPU::VGPR_32RegClass;
+unsigned SIRegisterInfo::findUnusedRegister(const MachineRegisterInfo &MRI,
+ const TargetRegisterClass *RC) const {
for (TargetRegisterClass::iterator I = RC->begin(), E = RC->end();
I != E; ++I) {
diff --git a/lib/Target/R600/SIRegisterInfo.h b/lib/Target/R600/SIRegisterInfo.h
index c7e54db..f1d78b4 100644
--- a/lib/Target/R600/SIRegisterInfo.h
+++ b/lib/Target/R600/SIRegisterInfo.h
@@ -113,7 +113,14 @@ struct SIRegisterInfo : public AMDGPURegisterInfo {
unsigned getPreloadedValue(const MachineFunction &MF,
enum PreloadedValue Value) const;
- unsigned findUnusedVGPR(const MachineRegisterInfo &MRI) const;
+ unsigned findUnusedRegister(const MachineRegisterInfo &MRI,
+ const TargetRegisterClass *RC) const;
+
+private:
+ void buildScratchLoadStore(MachineBasicBlock::iterator MI,
+ unsigned LoadStoreOp, unsigned Value,
+ unsigned ScratchPtr, unsigned ScratchOffset,
+ int64_t Offset, RegScavenger *RS) const;
};
} // End namespace llvm
<div class="moz-txt-sig">--
2.0.4
</div></pre>
</div>
<br>
<fieldset class="mimeAttachmentHeader"><legend
class="mimeAttachmentHeaderName">0005-MISched-Fix-moving-stores-across-barriers.patch</legend></fieldset>
<br>
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">From c151bc73cdf7e1f40ec90c2d8dbc93ae34673890 Mon Sep 17 00:00:00 2001
From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>
Date: Tue, 2 Dec 2014 17:26:08 +0000
Subject: [PATCH 5/6] MISched: Fix moving stores across barriers
This fixes an issue with ScheduleDAGInstrs::buildSchedGraph
where stores without an underlying object would not be added
as a predecessor to the current BarrierChain.
---
lib/CodeGen/ScheduleDAGInstrs.cpp | 9 +++++--
test/CodeGen/R600/store-barrier.ll | 52 ++++++++++++++++++++++++++++++++++++++
2 files changed, 59 insertions(+), 2 deletions(-)
create mode 100644 test/CodeGen/R600/store-barrier.ll
diff --git a/lib/CodeGen/ScheduleDAGInstrs.cpp b/lib/CodeGen/ScheduleDAGInstrs.cpp
index d8d8422..ee8b5c2 100644
--- a/lib/CodeGen/ScheduleDAGInstrs.cpp
+++ b/lib/CodeGen/ScheduleDAGInstrs.cpp
@@ -794,6 +794,7 @@ void ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA,
for (MachineBasicBlock::iterator MII = RegionEnd, MIE = RegionBegin;
MII != MIE; --MII) {
MachineInstr *MI = std::prev(MII);
+
if (MI && DbgMI) {
DbgValues.push_back(std::make_pair(DbgMI, MI));
DbgMI = nullptr;
@@ -920,6 +921,12 @@ void ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA,
AliasMemDefs.clear();
AliasMemUses.clear();
} else if (MI->mayStore()) {
+ // Add dependence on barrier chain, if needed.
+ // There is no point to check aliasing on barrier event. Even if
+ // SU and barrier <span class="moz-txt-underscore"><span class="moz-txt-tag">_</span>could<span class="moz-txt-tag">_</span></span> be reordered, they should not. In addition,
+ // we have lost all RejectMemNodes below barrier.
+ if (BarrierChain)
+ BarrierChain->addPred(SDep(SU, SDep::Barrier));
UnderlyingObjectsVector Objs;
getUnderlyingObjectsForInstr(MI, MFI, Objs);
@@ -993,8 +1000,6 @@ void ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA,
// There is no point to check aliasing on barrier event. Even if
// SU and barrier <span class="moz-txt-underscore"><span class="moz-txt-tag">_</span>could<span class="moz-txt-tag">_</span></span> be reordered, they should not. In addition,
// we have lost all RejectMemNodes below barrier.
- if (BarrierChain)
- BarrierChain->addPred(SDep(SU, SDep::Barrier));
} else if (MI->mayLoad()) {
bool MayAlias = true;
if (MI->isInvariantLoad(AA)) {
diff --git a/test/CodeGen/R600/store-barrier.ll b/test/CodeGen/R600/store-barrier.ll
new file mode 100644
index 0000000..229cd8f
--- /dev/null
+++ b/test/CodeGen/R600/store-barrier.ll
@@ -0,0 +1,52 @@
+; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs -mattr=+load-store-opt -enable-misched < %s | FileCheck --check-prefix=CHECK %s
+; RUN: llc -march=r600 -mcpu=bonaire -verify-machineinstrs -mattr=+load-store-opt -enable-misched < %s | FileCheck --check-prefix=CHECK %s
+
+; This test is for a bug in the machine scheduler where stores without
+; an underlying object would be moved across the barrier. In this
+; test, the <2 x i8> store will be split into two i8 stores, so they
+; won't have an underlying object.
+
+; CHECK-LABEL: {{^}}test:
+; CHECK: ds_write_b8
+; CHECK: ds_write_b8
+; CHECK: s_barrier</pre>
</div>
</blockquote>
Should probably check something after the barrier also<br>
<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
+; Function Attrs: nounwind
+define void @test(<2 x i8> addrspace(3)* nocapture %arg, <2 x i8> addrspace(1)* nocapture readonly %arg1, i32 addrspace(1)* nocapture readonly %arg2, <2 x i8> addrspace(1)* nocapture %arg3, i32 %arg4, i64 %tmp9) #0 {
+bb:
+ %tmp10 = getelementptr inbounds i32 addrspace(1)* %arg2, i64 %tmp9
+ %tmp13 = load i32 addrspace(1)* %tmp10, align 2
+ %tmp14 = getelementptr inbounds <2 x i8> addrspace(3)* %arg, i32 %tmp13
+ %tmp15 = load <2 x i8> addrspace(3)* %tmp14, align 2
+ %tmp16 = add i32 %tmp13, 1
+ %tmp17 = getelementptr inbounds <2 x i8> addrspace(3)* %arg, i32 %tmp16
+ store <2 x i8> %tmp15, <2 x i8> addrspace(3)* %tmp17, align 2
+ tail call void @llvm.AMDGPU.barrier.local() #2
+ %tmp25 = load i32 addrspace(1)* %tmp10, align 4
+ %tmp26 = sext i32 %tmp25 to i64
+ %tmp27 = sext i32 %arg4 to i64
+ %tmp28 = getelementptr inbounds <2 x i8> addrspace(3)* %arg, i32 %tmp25, i32 %arg4
+ %tmp29 = load i8 addrspace(3)* %tmp28, align 1
+ %tmp30 = getelementptr inbounds <2 x i8> addrspace(1)* %arg3, i64 %tmp26, i64 %tmp27
+ store i8 %tmp29, i8 addrspace(1)* %tmp30, align 1
+ %tmp32 = getelementptr inbounds <2 x i8> addrspace(3)* %arg, i32 %tmp25, i32 0
+ %tmp33 = load i8 addrspace(3)* %tmp32, align 1
+ %tmp35 = getelementptr inbounds <2 x i8> addrspace(1)* %arg3, i64 %tmp26, i64 0
+ store i8 %tmp33, i8 addrspace(1)* %tmp35, align 1
+ ret void
+}
+
+; Function Attrs: noduplicate nounwind
+declare void @llvm.AMDGPU.barrier.local() #2
+
+attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #1 = { nounwind readnone }
+attributes #2 = { noduplicate nounwind }
+
+!opencl.kernels = !{!0}
+
+!0 = metadata !{void (<2 x i8> addrspace(3)*, <2 x i8> addrspace(1)*, i32 addrspace(1)*, <2 x i8> addrspace(1)*, i32, i64)* @test}
+!3 = metadata !{metadata !4, metadata !4, i64 0}
+!4 = metadata !{metadata !"int", metadata !5, i64 0}
+!5 = metadata !{metadata !"omnipotent char", metadata !6, i64 0}
+!6 = metadata !{metadata !"Simple C/C++ TBAA"}
+!7 = metadata !{metadata !5, metadata !5, i64 0}
<div class="moz-txt-sig">--
2.0.4
</div></pre>
</div>
<br>
<fieldset class="mimeAttachmentHeader"><legend
class="mimeAttachmentHeaderName">0006-R600-SI-Define-a-schedule-model-and-enable-the-gener.patch</legend></fieldset>
<br>
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">From db8ef9632e44dc76668758fd83981057e3bcfac1 Mon Sep 17 00:00:00 2001
From: Tom Stellard <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:thomas.stellard@amd.com"><thomas.stellard@amd.com></a>
Date: Fri, 19 Jul 2013 11:50:00 -0700
Subject: [PATCH 6/6] R600/SI: Define a schedule model and enable the generic
machine scheduler
The schedule model is not complete yet, and could be improved.
---
lib/Target/R600/AMDGPUSubtarget.cpp | 14 ++++-
lib/Target/R600/AMDGPUSubtarget.h | 6 +-
lib/Target/R600/Processors.td | 24 ++++----
lib/Target/R600/SIInstrFormats.td | 10 +++-
lib/Target/R600/SIInstructions.td | 48 ++++++++++++++-
lib/Target/R600/SIRegisterInfo.cpp | 54 ++++++++++++++++-
lib/Target/R600/SIRegisterInfo.h | 12 +++-
lib/Target/R600/SISchedule.td | 76 +++++++++++++++++++++++-
test/CodeGen/R600/atomic_cmp_swap_local.ll | 6 +-
test/CodeGen/R600/ctpop.ll | 4 +-
test/CodeGen/R600/ds_read2st64.ll | 4 +-
test/CodeGen/R600/fceil64.ll | 4 +-
test/CodeGen/R600/ffloor.ll | 4 +-
test/CodeGen/R600/fmax3.ll | 6 +-
test/CodeGen/R600/fmin3.ll | 6 +-
test/CodeGen/R600/fneg-fabs.f64.ll | 2 +-
test/CodeGen/R600/ftrunc.f64.ll | 4 +-
test/CodeGen/R600/llvm.memcpy.ll | 34 +++++------
test/CodeGen/R600/local-atomics.ll | 4 +-
test/CodeGen/R600/local-atomics64.ll | 2 +-
test/CodeGen/R600/local-memory-two-objects.ll | 4 +-
test/CodeGen/R600/si-triv-disjoint-mem-access.ll | 2 +-
test/CodeGen/R600/smrd.ll | 10 ++--
test/CodeGen/R600/wait.ll | 5 +-
test/CodeGen/R600/zero_extend.ll | 2 +-
25 files changed, 271 insertions(+), 76 deletions(-)
diff --git a/lib/Target/R600/AMDGPUSubtarget.cpp b/lib/Target/R600/AMDGPUSubtarget.cpp
index 9d09a19..5a3785f 100644
--- a/lib/Target/R600/AMDGPUSubtarget.cpp
+++ b/lib/Target/R600/AMDGPUSubtarget.cpp
@@ -19,8 +19,7 @@
#include "SIInstrInfo.h"
#include "SIISelLowering.h"
#include "llvm/ADT/SmallString.h"
-
-#include "llvm/ADT/SmallString.h"
+#include "llvm/CodeGen/MachineScheduler.h"
using namespace llvm;
@@ -107,3 +106,14 @@ unsigned AMDGPUSubtarget::getStackEntrySize() const {
llvm_unreachable("Illegal wavefront size.");
}
}
+
+void AMDGPUSubtarget::overrideSchedPolicy(MachineSchedPolicy &Policy,
+ MachineInstr *begin,
+ MachineInstr *end,
+ unsigned NumRegionInstrs) const {
+ if (getGeneration() >= SOUTHERN_ISLANDS) {
+ Policy.ShouldTrackPressure = true;;</pre>
</div>
</blockquote>
Double ;<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
+ Policy.OnlyTopDown = false;
+ Policy.OnlyBottomUp = false;</pre>
</div>
</blockquote>
Is there a reason for selecting this? There should be a comment for
why the policy is what it is<br>
<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
+ }
+}
diff --git a/lib/Target/R600/AMDGPUSubtarget.h b/lib/Target/R600/AMDGPUSubtarget.h
index f71d80a..3e44c66 100644
--- a/lib/Target/R600/AMDGPUSubtarget.h
+++ b/lib/Target/R600/AMDGPUSubtarget.h
@@ -199,9 +199,13 @@ public:
}
bool enableMachineScheduler() const override {
- return getGeneration() <= NORTHERN_ISLANDS;
+ return true;
}
+ void overrideSchedPolicy(MachineSchedPolicy &Policy,
+ MachineInstr *begin, MachineInstr *end,
+ unsigned NumRegionInstrs) const override;
+
// Helper functions to simplify if statements
bool isTargetELF() const {
return false;
diff --git a/lib/Target/R600/Processors.td b/lib/Target/R600/Processors.td
index ce17d7c..17422f9 100644
--- a/lib/Target/R600/Processors.td
+++ b/lib/Target/R600/Processors.td
@@ -83,28 +83,30 @@ def : Proc<"cayman", R600_VLIW4_Itin,
// Southern Islands
//===----------------------------------------------------------------------===//
-def : Proc<"SI", SI_Itin, [FeatureSouthernIslands]>;
+// FIXME: Which of these should use the half speed?</pre>
</div>
</blockquote>
I believe this can be different for different versions of tahiti
also. This should probably be a subtarget feature settable by the
driver.<br>
<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
-def : Proc<"tahiti", SI_Itin, [FeatureSouthernIslands]>;
+def : ProcessorModel<"SI", SIFullSpeedModel, [FeatureSouthernIslands]>;
-def : Proc<"pitcairn", SI_Itin, [FeatureSouthernIslands]>;
+def : ProcessorModel<"tahiti", SIFullSpeedModel, [FeatureSouthernIslands]>;
-def : Proc<"verde", SI_Itin, [FeatureSouthernIslands]>;
+def : ProcessorModel<"pitcairn", SIFullSpeedModel, [FeatureSouthernIslands]>;</pre>
</div>
</blockquote>
verde, bonaire and pitcairn are all 1/16th FP64<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
-def : Proc<"oland", SI_Itin, [FeatureSouthernIslands]>;
+def : ProcessorModel<"verde", SIFullSpeedModel, [FeatureSouthernIslands]>;
-def : Proc<"hainan", SI_Itin, [FeatureSouthernIslands]>;
+def : ProcessorModel<"oland", SIFullSpeedModel, [FeatureSouthernIslands]>;
+
+def : ProcessorModel<"hainan", SIFullSpeedModel, [FeatureSouthernIslands]>;
//===----------------------------------------------------------------------===//
// Sea Islands
//===----------------------------------------------------------------------===//
-def : Proc<"bonaire", SI_Itin, [FeatureSeaIslands]>;
+def : ProcessorModel<"bonaire", SIFullSpeedModel, [FeatureSeaIslands]>;
-def : Proc<"kabini", SI_Itin, [FeatureSeaIslands]>;
+def : ProcessorModel<"kabini", SIFullSpeedModel, [FeatureSeaIslands]>;
-def : Proc<"kaveri", SI_Itin, [FeatureSeaIslands]>;
+def : ProcessorModel<"kaveri", SIFullSpeedModel, [FeatureSeaIslands]>;
-def : Proc<"hawaii", SI_Itin, [FeatureSeaIslands]>;
+def : ProcessorModel<"hawaii", SIFullSpeedModel, [FeatureSeaIslands]>;</pre>
</div>
</blockquote>
Hawaii is 1/8th FP64, but 1/4th on the workstation versions<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
-def : Proc<"mullins", SI_Itin, [FeatureSeaIslands]>;
+def : ProcessorModel<"mullins", SIFullSpeedModel, [FeatureSeaIslands]>;
diff --git a/lib/Target/R600/SIInstrFormats.td b/lib/Target/R600/SIInstrFormats.td
index ee1a52b..4b688e0 100644
--- a/lib/Target/R600/SIInstrFormats.td
+++ b/lib/Target/R600/SIInstrFormats.td
@@ -46,6 +46,7 @@ class InstSI <dag outs, dag ins, string asm, list<dag> pattern> :
// Most instructions require adjustments after selection to satisfy
// operand requirements.
let hasPostISelHook = 1;
+ let SchedRW = [Write32Bit];
}
class Enc32 {
@@ -161,6 +162,8 @@ class SMRDe <bits<5> op, bits<1> imm> : Enc32 {
let Inst{31-27} = 0x18; //encoding
}
+let SchedRW = [WriteSALU] in {
+
class SOP1 <bits<8> op, dag outs, dag ins, string asm, list<dag> pattern> :
InstSI<outs, ins, asm, pattern>, SOP1e <op> {
@@ -216,6 +219,8 @@ class SOPP <bits<7> op, dag ins, string asm, list<dag> pattern = []> :
let UseNamedOperandTable = 1;
}
+} // let SchedRW = [WriteSALU]
+
class SMRD <dag outs, dag ins, string asm, list<dag> pattern> :
InstSI<outs, ins, asm, pattern> {
@@ -225,6 +230,7 @@ class SMRD <dag outs, dag ins, string asm, list<dag> pattern> :
let mayLoad = 1;
let hasSideEffects = 0;
let UseNamedOperandTable = 1;
+ let SchedRW = [WriteSMEM];
}
//===----------------------------------------------------------------------===//
@@ -547,6 +553,7 @@ class DS <bits<8> op, dag outs, dag ins, string asm, list<dag> pattern> :
let LGKM_CNT = 1;
let UseNamedOperandTable = 1;
let DisableEncoding = "$m0";
+ let SchedRW = [WriteLDS];
}
class MUBUF <bits<7> op, dag outs, dag ins, string asm, list<dag> pattern> :
@@ -558,6 +565,7 @@ class MUBUF <bits<7> op, dag outs, dag ins, string asm, list<dag> pattern> :
let hasSideEffects = 0;
let UseNamedOperandTable = 1;
+ let SchedRW = [WriteVMEM];
}
class MTBUF <dag outs, dag ins, string asm, list<dag> pattern> :
@@ -569,6 +577,7 @@ class MTBUF <dag outs, dag ins, string asm, list<dag> pattern> :
let neverHasSideEffects = 1;
let UseNamedOperandTable = 1;
+ let SchedRW = [WriteVMEM];
}
class FLAT <bits<7> op, dag outs, dag ins, string asm, list<dag> pattern> :
@@ -597,5 +606,4 @@ class MIMG <bits<7> op, dag outs, dag ins, string asm, list<dag> pattern> :
}
-
} // End Uses = [EXEC]
diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
index 3a969e7..fd860e5 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -1160,6 +1160,8 @@ defm V_MOV_B32 : VOP1Inst <vop1<0x1>, "v_mov_b32", VOP_I32_I32>;
let Uses = [EXEC] in {
+// FIXME: Specify SchedRW for READFIRSTLANE+B32</pre>
</div>
</blockquote>
Comment typo in instruction name +<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
+
def V_READFIRSTLANE_B32 : VOP1 <
0x00000002,
(outs SReg_32:$vdst),
@@ -1170,6 +1172,8 @@ def V_READFIRSTLANE_B32 : VOP1 <
}
+let SchedRW = [WriteConversion] in {
+
defm V_CVT_I32_F64 : VOP1Inst <vop1<0x3>, "v_cvt_i32_f64",
VOP_I32_F64, fp_to_sint
>;
@@ -1223,6 +1227,8 @@ defm V_CVT_F64_U32 : VOP1Inst <vop1<0x16>, "v_cvt_f64_u32",
VOP_F64_I32, uint_to_fp
>;
+} // let SchedRW = [WriteConversion]
+
defm V_FRACT_F32 : VOP1Inst <vop1<0x20>, "v_fract_f32",
VOP_F32_F32, AMDGPUfract
>;
@@ -1241,6 +1247,9 @@ defm V_FLOOR_F32 : VOP1Inst <vop1<0x24>, "v_floor_f32",
defm V_EXP_F32 : VOP1Inst <vop1<0x25>, "v_exp_f32",
VOP_F32_F32, fexp2
>;
+
+let SchedRW = [WriteFloatTrans] in {</pre>
</div>
</blockquote>
I don't think WriteFloatTrans or some of these others are useful
ways to characterize the instructions. Are these standard
classifications the generic scheduler uses? More useful would be
something like QuarterRate32<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
+
defm V_LOG_CLAMP_F32 : VOP1Inst <vop1<0x26>, "v_log_clamp_f32", VOP_F32_F32>;
defm V_LOG_F32 : VOP1Inst <vop1<0x27>, "v_log_f32",
VOP_F32_F32, flog2
@@ -1261,6 +1270,11 @@ defm V_RSQ_LEGACY_F32 : VOP1Inst <vop1<0x2d>, "v_rsq_legacy_f32",
defm V_RSQ_F32 : VOP1Inst <vop1<0x2e>, "v_rsq_f32",
VOP_F32_F32, AMDGPUrsq
>;
+
+} //let SchedRW = [WriteFloatTrans]
+
+let SchedRW = [WriteDouble] in {
+
defm V_RCP_F64 : VOP1Inst <vop1<0x2f>, "v_rcp_f64",
VOP_F64_F64, AMDGPUrcp
>;
@@ -1271,12 +1285,21 @@ defm V_RSQ_F64 : VOP1Inst <vop1<0x31>, "v_rsq_f64",
defm V_RSQ_CLAMP_F64 : VOP1Inst <vop1<0x32>, "v_rsq_clamp_f64",
VOP_F64_F64, AMDGPUrsq_clamped
>;
+
+} // let SchedRW = [WriteDouble];
+
defm V_SQRT_F32 : VOP1Inst <vop1<0x33>, "v_sqrt_f32",
VOP_F32_F32, fsqrt
>;
+
+let SchedRW = [WriteDouble] in {
+
defm V_SQRT_F64 : VOP1Inst <vop1<0x34>, "v_sqrt_f64",
VOP_F64_F64, fsqrt
>;
+
+} // let SchedRW = [WriteDouble]
+
defm V_SIN_F32 : VOP1Inst <vop1<0x35>, "v_sin_f32",
VOP_F32_F32, AMDGPUsin
>;
@@ -1303,6 +1326,8 @@ defm V_MOVRELSD_B32 : VOP1Inst <vop1<0x44>, "v_movrelsd_b32", VOP_I32_I32>;
// VINTRP Instructions
//===----------------------------------------------------------------------===//
+// FIXME: Specify SchedRW for VINTRP insturctions.
+
def V_INTERP_P1_F32 : VINTRP <
0x00000000,
(outs VReg_32:$dst),
@@ -1337,6 +1362,8 @@ def V_INTERP_MOV_F32 : VINTRP <
// VOP2 Instructions
//===----------------------------------------------------------------------===//
+// FIXME: Specify SchedRW for V_CNDMASK and V_*LANE_B32
+
def V_CNDMASK_B32_e32 : VOP2 <0x00000000, (outs VReg_32:$dst),
(ins VSrc_32:$src0, VReg_32:$src1, VCCReg:$vcc),
"v_cndmask_b32_e32 $dst, $src0, $src1, [$vcc]",
@@ -1405,7 +1432,6 @@ defm V_MUL_U32_U24 : VOP2Inst <vop2<0xb>, "v_mul_u32_u24",
>;
//defm V_MUL_HI_U32_U24 : VOP2_32 <0x0000000c, "v_mul_hi_u32_u24", []>;
-
defm V_MIN_LEGACY_F32 : VOP2Inst <vop2<0xd>, "v_min_legacy_f32",
VOP_F32_F32_F32, AMDGPUfmin_legacy
>;
@@ -1608,10 +1634,15 @@ defm V_SAD_U32 : VOP3Inst <vop3<0x15d>, "v_sad_u32",
defm V_DIV_FIXUP_F32 : VOP3Inst <
vop3<0x15f>, "v_div_fixup_f32", VOP_F32_F32_F32_F32, AMDGPUdiv_fixup
>;
+
+let SchedRW = [WriteDouble] in {
+
defm V_DIV_FIXUP_F64 : VOP3Inst <
vop3<0x160>, "v_div_fixup_f64", VOP_F64_F64_F64_F64, AMDGPUdiv_fixup
>;
+} // let SchedRW = [WriteDouble]
+
defm V_LSHL_B64 : VOP3Inst <vop3<0x161>, "v_lshl_b64",
VOP_I64_I64_I32, shl
>;
@@ -1622,6 +1653,7 @@ defm V_ASHR_I64 : VOP3Inst <vop3<0x163>, "v_ashr_i64",
VOP_I64_I64_I32, sra
>;
+let SchedRW = [WriteDouble] in {
let isCommutable = 1 in {
defm V_ADD_F64 : VOP3Inst <vop3<0x164>, "v_add_f64",
@@ -1644,7 +1676,9 @@ defm V_LDEXP_F64 : VOP3Inst <vop3<0x168>, "v_ldexp_f64",
VOP_F64_F64_I32, AMDGPUldexp
>;
-let isCommutable = 1 in {
+} // let SchedRW = [WriteDouble]
+
+let isCommutable = 1, SchedRW = [WriteIntMUL] in {
defm V_MUL_LO_U32 : VOP3Inst <vop3<0x169>, "v_mul_lo_u32",
VOP_I32_I32_I32
@@ -1659,30 +1693,38 @@ defm V_MUL_HI_I32 : VOP3Inst <vop3<0x16c>, "v_mul_hi_i32",
VOP_I32_I32_I32
>;
-} // isCommutable = 1
+} // isCommutable = 1, SchedRW = [WriteIntMUL]
defm V_DIV_SCALE_F32 : VOP3b_32 <vop3<0x16d>, "v_div_scale_f32", []>;
+let SchedRW = [WriteDouble] in {
// Double precision division pre-scale.
defm V_DIV_SCALE_F64 : VOP3b_64 <vop3<0x16e>, "v_div_scale_f64", []>;
+} // let SchedRW = [WriteDouble]
let isCommutable = 1 in {
defm V_DIV_FMAS_F32 : VOP3Inst <vop3<0x16f>, "v_div_fmas_f32",
VOP_F32_F32_F32_F32, AMDGPUdiv_fmas
>;
+
+let SchedRW = [WriteDouble] in {
defm V_DIV_FMAS_F64 : VOP3Inst <vop3<0x170>, "v_div_fmas_f64",
VOP_F64_F64_F64_F64, AMDGPUdiv_fmas
>;
+} // End SchedRW = [WriteDouble]
} // End isCommutable = 1
//def V_MSAD_U8 : VOP3_U8 <0x00000171, "v_msad_u8", []>;
//def V_QSAD_U8 : VOP3_U8 <0x00000172, "v_qsad_u8", []>;
//def V_MQSAD_U8 : VOP3_U8 <0x00000173, "v_mqsad_u8", []>;
+let SchedRW = [WriteDouble] in {
defm V_TRIG_PREOP_F64 : VOP3Inst <
vop3<0x174>, "v_trig_preop_f64", VOP_F64_F64_I32, AMDGPUtrig_preop
>;
+} // let SchedRW = [WriteDouble]
+
//===----------------------------------------------------------------------===//
// Pseudo Instructions
//===----------------------------------------------------------------------===//
diff --git a/lib/Target/R600/SIRegisterInfo.cpp b/lib/Target/R600/SIRegisterInfo.cpp
index 27abe9a..a05ce7b 100644
--- a/lib/Target/R600/SIRegisterInfo.cpp
+++ b/lib/Target/R600/SIRegisterInfo.cpp
@@ -49,9 +49,31 @@ BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
return Reserved;
}
-unsigned SIRegisterInfo::getRegPressureLimit(const TargetRegisterClass *RC,
- MachineFunction &MF) const {
- return RC->getNumRegs();
+unsigned SIRegisterInfo::getRegPressureSetLimit(unsigned Idx) const {
+
+ unsigned SGPRLimit = getNumSGPRsAllowed(10);
+ unsigned VGPRLimit = getNumVGPRsAllowed(10);</pre>
</div>
</blockquote>
Magic number 10. Should have some kind of named constant or
subtarget feature for maximum number of waves.<br>
There should probably also be a TODO based on the known other
constraints on number of waves, like the used LDS size<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
+
+ for (regclass_iterator I = regclass_begin(), E = regclass_end();
+ I != E; ++I) {
+
+ unsigned NumSubRegs = std::max((int)(*I)->getSize() / 4, 1);
+ unsigned Limit;
+
+ if (isSGPRClass(*I)) {
+ Limit = SGPRLimit / NumSubRegs;
+ } else {
+ Limit = VGPRLimit / NumSubRegs;
+ }
+
+ const int *Sets = getRegClassPressureSets(*I);
+ assert(Sets);
+ for (unsigned i = 0; Sets[i] != -1; ++i) {
+ if (Sets[i] == (int)Idx)
+ return Limit;</pre>
</div>
</blockquote>
Weird indentation<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
+ }
+ }
+ return 256;
}
bool SIRegisterInfo::requiresRegisterScavenging(const MachineFunction &Fn) const {
@@ -496,3 +518,29 @@ unsigned SIRegisterInfo::findUnusedRegister(const MachineRegisterInfo &MRI,
return AMDGPU::NoRegister;
}
+unsigned SIRegisterInfo::getNumVGPRsAllowed(unsigned WaveCount) const {
+ switch(WaveCount) {</pre>
</div>
</blockquote>
Space between switch and (<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
+ case 10: return 24;
+ case 9: return 28;
+ case 8: return 32;
+ case 7: return 36;
+ case 6: return 40;
+ case 5: return 48;
+ case 4: return 64;
+ case 3: return 84;
+ case 2: return 128;
+ default: return 256;
+ }
+}
+
+unsigned SIRegisterInfo::getNumSGPRsAllowed(unsigned WaveCount) const {
+ switch(WaveCount) {
+ case 10: return 48;
+ case 9: return 56;
+ case 8: return 64;
+ case 7: return 72;
+ case 6: return 80;
+ case 5: return 96;
+ default: return 103;
+ }
+}
diff --git a/lib/Target/R600/SIRegisterInfo.h b/lib/Target/R600/SIRegisterInfo.h
index f1d78b4..40acfa1 100644
--- a/lib/Target/R600/SIRegisterInfo.h
+++ b/lib/Target/R600/SIRegisterInfo.h
@@ -17,6 +17,7 @@
#define LLVM_LIB_TARGET_R600_SIREGISTERINFO_H
#include "AMDGPURegisterInfo.h"
+#include "llvm/Support/Debug.h"
namespace llvm {
@@ -26,8 +27,7 @@ struct SIRegisterInfo : public AMDGPURegisterInfo {
BitVector getReservedRegs(const MachineFunction &MF) const override;
- unsigned getRegPressureLimit(const TargetRegisterClass *RC,
- MachineFunction &MF) const override;
+ unsigned getRegPressureSetLimit(unsigned Idx) const override;
bool requiresRegisterScavenging(const MachineFunction &Fn) const override;
@@ -113,6 +113,14 @@ struct SIRegisterInfo : public AMDGPURegisterInfo {
unsigned getPreloadedValue(const MachineFunction &MF,
enum PreloadedValue Value) const;
+ /// \brief Give the maximum number of VGPRs that can be used by \p WaveCount
+ /// concurrent waves.
+ unsigned getNumVGPRsAllowed(unsigned WaveCount) const;
+
+ /// \brief Give the maximum number of SGPRs that can be used by \p WaveCount
+ /// concurrent waves.
+ unsigned getNumSGPRsAllowed(unsigned WaveCount) const;
+
unsigned findUnusedRegister(const MachineRegisterInfo &MRI,
const TargetRegisterClass *RC) const;
diff --git a/lib/Target/R600/SISchedule.td b/lib/Target/R600/SISchedule.td
index 28b65b8..5a1ae29 100644
--- a/lib/Target/R600/SISchedule.td
+++ b/lib/Target/R600/SISchedule.td
@@ -7,9 +7,81 @@
//
//===----------------------------------------------------------------------===//
//
-// TODO: This is just a place holder for now.
+// MachineModel definitions for Southern Islands (SI)
//
//===----------------------------------------------------------------------===//
-
def SI_Itin : ProcessorItineraries <[], [], []>;
+
+
+def WriteBranch : SchedWrite;
+def WriteExport : SchedWrite;
+def WriteLDS : SchedWrite;
+def WriteSALU : SchedWrite;
+def WriteSMEM : SchedWrite;
+def WriteVMEM : SchedWrite;
+
+// Vector ALU instructions
+def Write32Bit : SchedWrite;
+def WriteIntMUL : SchedWrite;
+
+def WriteConversion : SchedWrite;
+
+def WriteFloatFMA : SchedWrite;
+def WriteFloatTrans : SchedWrite;
+
+def WriteDouble : SchedWrite;
+def WriteDoubleAdd : SchedWrite;
+
+def SIFullSpeedModel : SchedMachineModel;
+
+// BufferSize = 0 means the processors are in-order.
+let BufferSize = 0 in {
+
+// XXX: Are the resource counts correct?
+def HWBranch : ProcResource<1>;
+def HWExport : ProcResource<7>; // Taken from S_WAITCNT
+def HWLGKM : ProcResource<31>; // Taken from S_WAITCNT
+def HWSALU : ProcResource<1>;
+def HWVMEM : ProcResource<15>; // Taken from S_WAITCNT
+def HWVALU : ProcResource<1>;
+
+}
+
+let SchedModel = SIFullSpeedModel in {
+
+class HWWriteRes<SchedWrite write, list<ProcResourceKind> resources,
+ int latency> : WriteRes<write, resources> {
+ let Latency = latency;
+}
+
+class HWVALUWriteRes<SchedWrite write, int latency> :
+ HWWriteRes<write, [HWVALU], latency>;
+
+// The latency numbers are taken from AMD Accelerated Parallel Processing
+// guide. They may not be acurate.
+
+def : HWWriteRes<WriteBranch, [HWBranch], 100>; // XXX: Guessed ???
+def : HWWriteRes<WriteExport, [HWExport], 100>; // XXX: Guessed ???</pre>
</div>
</blockquote>
I would assume export is the same as for VMEM?<br>
<blockquote cite="mid:20141202214751.GA4494@freedesktop.org"
type="cite">
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">
+def : HWWriteRes<WriteLDS, [HWLGKM], 32>; // 2 - 64
+def : HWWriteRes<WriteSALU, [HWSALU], 1>;
+def : HWWriteRes<WriteSMEM, [HWLGKM], 10>; // XXX: Guessed ???
+def : HWWriteRes<WriteVMEM, [HWVMEM], 450>; // 300 - 600
+
+// XXX: These definitions assume full double-precision speed, some devices are
+// slower. These are also taken from the AMD Accelerated Parallel Processing
+// guide and may not be accurate.
+
+// The latency values are 1 / (operations / cycle) / 4.
+def : HWVALUWriteRes<Write32Bit, 1>;
+def : HWVALUWriteRes<WriteIntMUL, 4>;
+
+def : HWVALUWriteRes<WriteConversion, 4>;
+
+def : HWVALUWriteRes<WriteFloatFMA, 1>; // 16 For single speed
+def : HWVALUWriteRes<WriteFloatTrans, 4>;
+
+def : HWVALUWriteRes<WriteDouble, 4>; // 16 for single speed
+def : HWVALUWriteRes<WriteDoubleAdd, 2>; // 8 for single speed
+
+} // End SchedModel = SIFullSpeedModel
diff --git a/test/CodeGen/R600/atomic_cmp_swap_local.ll b/test/CodeGen/R600/atomic_cmp_swap_local.ll
index 223f4d3..35e8ade 100644
--- a/test/CodeGen/R600/atomic_cmp_swap_local.ll
+++ b/test/CodeGen/R600/atomic_cmp_swap_local.ll
@@ -2,9 +2,9 @@
; RUN: llc -march=r600 -mcpu=bonaire -verify-machineinstrs < %s | FileCheck -strict-whitespace -check-prefix=CI -check-prefix=FUNC %s
; FUNC-LABEL: {{^}}lds_atomic_cmpxchg_ret_i32_offset:
+; SI: v_mov_b32_e32 [[VCMP:v[0-9]+]], 7
; SI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb
; SI: s_load_dword [[SWAP:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xc
-; SI-DAG: v_mov_b32_e32 [[VCMP:v[0-9]+]], 7
; SI-DAG: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]
; SI-DAG: v_mov_b32_e32 [[VSWAP:v[0-9]+]], [[SWAP]]
; SI: ds_cmpst_rtn_b32 [[RESULT:v[0-9]+]], [[VPTR]], [[VCMP]], [[VSWAP]] offset:16 [M0]
@@ -18,11 +18,11 @@ define void @lds_atomic_cmpxchg_ret_i32_offset(i32 addrspace(1)* %out, i32 addrs
}
; FUNC-LABEL: {{^}}lds_atomic_cmpxchg_ret_i64_offset:
-; SI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb
-; SI: s_load_dwordx2 s{{\[}}[[LOSWAP:[0-9]+]]:[[HISWAP:[0-9]+]]{{\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0xd
; SI: s_mov_b64 s{{\[}}[[LOSCMP:[0-9]+]]:[[HISCMP:[0-9]+]]{{\]}}, 7
; SI-DAG: v_mov_b32_e32 v[[LOVCMP:[0-9]+]], s[[LOSCMP]]
; SI-DAG: v_mov_b32_e32 v[[HIVCMP:[0-9]+]], s[[HISCMP]]
+; SI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb
+; SI: s_load_dwordx2 s{{\[}}[[LOSWAP:[0-9]+]]:[[HISWAP:[0-9]+]]{{\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0xd
; SI-DAG: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]
; SI-DAG: v_mov_b32_e32 v[[LOSWAPV:[0-9]+]], s[[LOSWAP]]
; SI-DAG: v_mov_b32_e32 v[[HISWAPV:[0-9]+]], s[[HISWAP]]
diff --git a/test/CodeGen/R600/ctpop.ll b/test/CodeGen/R600/ctpop.ll
index 5cfdaef..eba2e21 100644
--- a/test/CodeGen/R600/ctpop.ll
+++ b/test/CodeGen/R600/ctpop.ll
@@ -38,11 +38,11 @@ define void @v_ctpop_i32(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noali
}
; FUNC-LABEL: {{^}}v_ctpop_add_chain_i32:
-; SI: buffer_load_dword [[VAL0:v[0-9]+]],
; SI: buffer_load_dword [[VAL1:v[0-9]+]],
+; SI: buffer_load_dword [[VAL0:v[0-9]+]],
; SI: v_mov_b32_e32 [[VZERO:v[0-9]+]], 0
; SI: v_bcnt_u32_b32_e32 [[MIDRESULT:v[0-9]+]], [[VAL1]], [[VZERO]]
-; SI-NEXT: v_bcnt_u32_b32_e32 [[RESULT:v[0-9]+]], [[VAL0]], [[MIDRESULT]]
+; SI: v_bcnt_u32_b32_e32 [[RESULT:v[0-9]+]], [[VAL0]], [[MIDRESULT]]
; SI: buffer_store_dword [[RESULT]],
; SI: s_endpgm
diff --git a/test/CodeGen/R600/ds_read2st64.ll b/test/CodeGen/R600/ds_read2st64.ll
index 3e98e59..5e2fa3f 100644
--- a/test/CodeGen/R600/ds_read2st64.ll
+++ b/test/CodeGen/R600/ds_read2st64.ll
@@ -65,8 +65,8 @@ define void @simple_read2st64_f32_max_offset(float addrspace(1)* %out, float add
; SI-LABEL: @simple_read2st64_f32_over_max_offset
; SI-NOT: ds_read2st64_b32
-; SI: ds_read_b32 {{v[0-9]+}}, {{v[0-9]+}} offset:256
; SI: v_add_i32_e32 [[BIGADD:v[0-9]+]], 0x10000, {{v[0-9]+}}
+; SI: ds_read_b32 {{v[0-9]+}}, {{v[0-9]+}} offset:256
; SI: ds_read_b32 {{v[0-9]+}}, [[BIGADD]]
; SI: s_endpgm
define void @simple_read2st64_f32_over_max_offset(float addrspace(1)* %out, float addrspace(3)* %lds) #0 {
@@ -197,8 +197,8 @@ define void @simple_read2st64_f64_max_offset(double addrspace(1)* %out, double a
; SI-LABEL: @simple_read2st64_f64_over_max_offset
; SI-NOT: ds_read2st64_b64
-; SI: ds_read_b64 {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}} offset:512
; SI: v_add_i32_e32 [[BIGADD:v[0-9]+]], 0x10000, {{v[0-9]+}}
+; SI: ds_read_b64 {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}} offset:512
; SI: ds_read_b64 {{v\[[0-9]+:[0-9]+\]}}, [[BIGADD]]
; SI: s_endpgm
define void @simple_read2st64_f64_over_max_offset(double addrspace(1)* %out, double addrspace(3)* %lds) #0 {
diff --git a/test/CodeGen/R600/fceil64.ll b/test/CodeGen/R600/fceil64.ll
index 029f41d..7fcaec2 100644
--- a/test/CodeGen/R600/fceil64.ll
+++ b/test/CodeGen/R600/fceil64.ll
@@ -11,12 +11,12 @@ declare <16 x double> @llvm.ceil.v16f64(<16 x double>) nounwind readnone
; FUNC-LABEL: {{^}}fceil_f64:
; CI: v_ceil_f64_e32
; SI: s_bfe_u32 [[SEXP:s[0-9]+]], {{s[0-9]+}}, 0xb0014
+; SI: s_and_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0x80000000
; SI: s_add_i32 s{{[0-9]+}}, [[SEXP]], 0xfffffc01
; SI: s_lshr_b64
; SI: s_not_b64
; SI: s_and_b64
-; SI-DAG: s_and_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0x80000000
-; SI-DAG: cmp_lt_i32
+; SI: cmp_lt_i32
; SI: cndmask_b32
; SI: cndmask_b32
; SI: cmp_gt_i32
diff --git a/test/CodeGen/R600/ffloor.ll b/test/CodeGen/R600/ffloor.ll
index 166f705..2ca428e 100644
--- a/test/CodeGen/R600/ffloor.ll
+++ b/test/CodeGen/R600/ffloor.ll
@@ -12,12 +12,12 @@ declare <16 x double> @llvm.floor.v16f64(<16 x double>) nounwind readnone
; CI: v_floor_f64_e32
; SI: s_bfe_u32 [[SEXP:s[0-9]+]], {{s[0-9]+}}, 0xb0014
+; SI: s_and_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0x80000000
; SI: s_add_i32 s{{[0-9]+}}, [[SEXP]], 0xfffffc01
; SI: s_lshr_b64
; SI: s_not_b64
; SI: s_and_b64
-; SI-DAG: s_and_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0x80000000
-; SI-DAG: cmp_lt_i32
+; SI: cmp_lt_i32
; SI: cndmask_b32
; SI: cndmask_b32
; SI: cmp_gt_i32
diff --git a/test/CodeGen/R600/fmax3.ll b/test/CodeGen/R600/fmax3.ll
index cf371b3..baa38d0 100644
--- a/test/CodeGen/R600/fmax3.ll
+++ b/test/CodeGen/R600/fmax3.ll
@@ -3,9 +3,9 @@
declare float @llvm.maxnum.f32(float, float) nounwind readnone
; SI-LABEL: {{^}}test_fmax3_olt_0:
-; SI: buffer_load_dword [[REGA:v[0-9]+]]
-; SI: buffer_load_dword [[REGB:v[0-9]+]]
; SI: buffer_load_dword [[REGC:v[0-9]+]]
+; SI: buffer_load_dword [[REGB:v[0-9]+]]
+; SI: buffer_load_dword [[REGA:v[0-9]+]]
; SI: v_max3_f32 [[RESULT:v[0-9]+]], [[REGC]], [[REGB]], [[REGA]]
; SI: buffer_store_dword [[RESULT]],
; SI: s_endpgm
@@ -21,8 +21,8 @@ define void @test_fmax3_olt_0(float addrspace(1)* %out, float addrspace(1)* %apt
; Commute operand of second fmax
; SI-LABEL: {{^}}test_fmax3_olt_1:
-; SI: buffer_load_dword [[REGA:v[0-9]+]]
; SI: buffer_load_dword [[REGB:v[0-9]+]]
+; SI: buffer_load_dword [[REGA:v[0-9]+]]
; SI: buffer_load_dword [[REGC:v[0-9]+]]
; SI: v_max3_f32 [[RESULT:v[0-9]+]], [[REGC]], [[REGB]], [[REGA]]
; SI: buffer_store_dword [[RESULT]],
diff --git a/test/CodeGen/R600/fmin3.ll b/test/CodeGen/R600/fmin3.ll
index 7420368..78d04b5 100644
--- a/test/CodeGen/R600/fmin3.ll
+++ b/test/CodeGen/R600/fmin3.ll
@@ -3,9 +3,9 @@
declare float @llvm.minnum.f32(float, float) nounwind readnone
; SI-LABEL: {{^}}test_fmin3_olt_0:
-; SI: buffer_load_dword [[REGA:v[0-9]+]]
-; SI: buffer_load_dword [[REGB:v[0-9]+]]
; SI: buffer_load_dword [[REGC:v[0-9]+]]
+; SI: buffer_load_dword [[REGB:v[0-9]+]]
+; SI: buffer_load_dword [[REGA:v[0-9]+]]
; SI: v_min3_f32 [[RESULT:v[0-9]+]], [[REGC]], [[REGB]], [[REGA]]
; SI: buffer_store_dword [[RESULT]],
; SI: s_endpgm
@@ -21,8 +21,8 @@ define void @test_fmin3_olt_0(float addrspace(1)* %out, float addrspace(1)* %apt
; Commute operand of second fmin
; SI-LABEL: {{^}}test_fmin3_olt_1:
-; SI: buffer_load_dword [[REGA:v[0-9]+]]
; SI: buffer_load_dword [[REGB:v[0-9]+]]
+; SI: buffer_load_dword [[REGA:v[0-9]+]]
; SI: buffer_load_dword [[REGC:v[0-9]+]]
; SI: v_min3_f32 [[RESULT:v[0-9]+]], [[REGC]], [[REGB]], [[REGA]]
; SI: buffer_store_dword [[RESULT]],
diff --git a/test/CodeGen/R600/fneg-fabs.f64.ll b/test/CodeGen/R600/fneg-fabs.f64.ll
index 555f4cc..60209a8 100644
--- a/test/CodeGen/R600/fneg-fabs.f64.ll
+++ b/test/CodeGen/R600/fneg-fabs.f64.ll
@@ -56,8 +56,8 @@ define void @fneg_fabs_fn_free_f64(double addrspace(1)* %out, i64 %in) {
}
; FUNC-LABEL: {{^}}fneg_fabs_f64:
-; SI: s_load_dwordx2
; SI: s_load_dwordx2 s{{\[}}[[LO_X:[0-9]+]]:[[HI_X:[0-9]+]]{{\]}}
+; SI: s_load_dwordx2
; SI: v_mov_b32_e32 [[IMMREG:v[0-9]+]], 0x80000000
; SI-DAG: v_or_b32_e32 v[[HI_V:[0-9]+]], s[[HI_X]], [[IMMREG]]
; SI-DAG: v_mov_b32_e32 v[[LO_V:[0-9]+]], s[[LO_X]]
diff --git a/test/CodeGen/R600/ftrunc.f64.ll b/test/CodeGen/R600/ftrunc.f64.ll
index fba6154..5547c2f 100644
--- a/test/CodeGen/R600/ftrunc.f64.ll
+++ b/test/CodeGen/R600/ftrunc.f64.ll
@@ -23,12 +23,12 @@ define void @v_ftrunc_f64(double addrspace(1)* %out, double addrspace(1)* %in) {
; CI: v_trunc_f64_e32
; SI: s_bfe_u32 [[SEXP:s[0-9]+]], {{s[0-9]+}}, 0xb0014
+; SI: s_and_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0x80000000
; SI: s_add_i32 s{{[0-9]+}}, [[SEXP]], 0xfffffc01
; SI: s_lshr_b64
+; SI: cmp_lt_i32
; SI: s_not_b64
; SI: s_and_b64
-; SI: s_and_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0x80000000
-; SI: cmp_lt_i32
; SI: cndmask_b32
; SI: cndmask_b32
; SI: cmp_gt_i32
diff --git a/test/CodeGen/R600/llvm.memcpy.ll b/test/CodeGen/R600/llvm.memcpy.ll
index 5f2710a..ba4085c 100644
--- a/test/CodeGen/R600/llvm.memcpy.ll
+++ b/test/CodeGen/R600/llvm.memcpy.ll
@@ -6,39 +6,23 @@ declare void @llvm.memcpy.p1i8.p1i8.i64(i8 addrspace(1)* nocapture, i8 addrspace
; FUNC-LABEL: {{^}}test_small_memcpy_i64_lds_to_lds_align1:
; SI: ds_read_u8
-; SI: ds_write_b8
; SI: ds_read_u8
-; SI: ds_write_b8
; SI: ds_read_u8
-; SI: ds_write_b8
; SI: ds_read_u8
-; SI: ds_write_b8
; SI: ds_read_u8
-; SI: ds_write_b8
-
; SI: ds_read_u8
-; SI: ds_write_b8
; SI: ds_read_u8
-; SI: ds_write_b8
; SI: ds_read_u8
-; SI: ds_write_b8
+
; SI: ds_read_u8
-; SI: ds_write_b8
; SI: ds_read_u8
-; SI: ds_write_b8
-
; SI: ds_read_u8
-; SI: ds_write_b8
; SI: ds_read_u8
-; SI: ds_write_b8
; SI: ds_read_u8
-; SI: ds_write_b8
; SI: ds_read_u8
-; SI: ds_write_b8
; SI: ds_read_u8
; SI: ds_read_u8
-
; SI: ds_read_u8
; SI: ds_read_u8
; SI: ds_read_u8
@@ -65,6 +49,14 @@ declare void @llvm.memcpy.p1i8.p1i8.i64(i8 addrspace(1)* nocapture, i8 addrspace
; SI: ds_write_b8
; SI: ds_write_b8
; SI: ds_write_b8
+
+; SI: ds_write_b8
+; SI: ds_write_b8
+; SI: ds_write_b8
+; SI: ds_write_b8
+; SI: ds_write_b8
+; SI: ds_write_b8
+; SI: ds_write_b8
; SI: ds_write_b8
; SI: ds_write_b8
@@ -75,6 +67,14 @@ declare void @llvm.memcpy.p1i8.p1i8.i64(i8 addrspace(1)* nocapture, i8 addrspace
; SI: ds_write_b8
; SI: ds_write_b8
; SI: ds_write_b8
+
+; SI: ds_write_b8
+; SI: ds_write_b8
+; SI: ds_write_b8
+; SI: ds_write_b8
+; SI: ds_write_b8
+; SI: ds_write_b8
+; SI: ds_write_b8
; SI: ds_write_b8
; SI: s_endpgm
diff --git a/test/CodeGen/R600/local-atomics.ll b/test/CodeGen/R600/local-atomics.ll
index e9baa08..cbcae60 100644
--- a/test/CodeGen/R600/local-atomics.ll
+++ b/test/CodeGen/R600/local-atomics.ll
@@ -4,8 +4,8 @@
; FUNC-LABEL: {{^}}lds_atomic_xchg_ret_i32:
; EG: LDS_WRXCHG_RET *
-; SI: s_load_dword [[SPTR:s[0-9]+]],
; SI: v_mov_b32_e32 [[DATA:v[0-9]+]], 4
+; SI: s_load_dword [[SPTR:s[0-9]+]],
; SI: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[SPTR]]
; SI: ds_wrxchg_rtn_b32 [[RESULT:v[0-9]+]], [[VPTR]], [[DATA]] [M0]
; SI: buffer_store_dword [[RESULT]],
@@ -30,8 +30,8 @@ define void @lds_atomic_xchg_ret_i32_offset(i32 addrspace(1)* %out, i32 addrspac
; XXX - Is it really necessary to load 4 into VGPR?
; FUNC-LABEL: {{^}}lds_atomic_add_ret_i32:
; EG: LDS_ADD_RET *
-; SI: s_load_dword [[SPTR:s[0-9]+]],
; SI: v_mov_b32_e32 [[DATA:v[0-9]+]], 4
+; SI: s_load_dword [[SPTR:s[0-9]+]],
; SI: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[SPTR]]
; SI: ds_add_rtn_u32 [[RESULT:v[0-9]+]], [[VPTR]], [[DATA]] [M0]
; SI: buffer_store_dword [[RESULT]],
diff --git a/test/CodeGen/R600/local-atomics64.ll b/test/CodeGen/R600/local-atomics64.ll
index ce0cf59..8e6d5c1 100644
--- a/test/CodeGen/R600/local-atomics64.ll
+++ b/test/CodeGen/R600/local-atomics64.ll
@@ -29,10 +29,10 @@ define void @lds_atomic_add_ret_i64(i64 addrspace(1)* %out, i64 addrspace(3)* %p
}
; FUNC-LABEL: {{^}}lds_atomic_add_ret_i64_offset:
-; SI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb
; SI: s_mov_b64 s{{\[}}[[LOSDATA:[0-9]+]]:[[HISDATA:[0-9]+]]{{\]}}, 9
; SI-DAG: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], s[[LOSDATA]]
; SI-DAG: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], s[[HISDATA]]
+; SI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb
; SI-DAG: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]
; SI: ds_add_rtn_u64 [[RESULT:v\[[0-9]+:[0-9]+\]]], [[VPTR]], v{{\[}}[[LOVDATA]]:[[HIVDATA]]{{\]}} offset:32 [M0]
; SI: buffer_store_dwordx2 [[RESULT]],
diff --git a/test/CodeGen/R600/local-memory-two-objects.ll b/test/CodeGen/R600/local-memory-two-objects.ll
index 88ef05d..83ab70d 100644
--- a/test/CodeGen/R600/local-memory-two-objects.ll
+++ b/test/CodeGen/R600/local-memory-two-objects.ll
@@ -31,8 +31,8 @@
; EG-CHECK-NOT: LDS_READ_RET {{[*]*}} OQAP, T[[ADDRR]]
; SI: v_add_i32_e32 [[SIPTR:v[0-9]+]], 16, v{{[0-9]+}}
; SI: ds_read_b32 {{v[0-9]+}}, [[SIPTR]] [M0]
-; CI: ds_read_b32 {{v[0-9]+}}, [[ADDRR:v[0-9]+]] offset:16 [M0]
-; CI: ds_read_b32 {{v[0-9]+}}, [[ADDRR]] [M0]
+; CI: ds_read_b32 {{v[0-9]+}}, [[ADDRR:v[0-9]+]] [M0]
+; CI: ds_read_b32 {{v[0-9]+}}, [[ADDRR]] offset:16 [M0]
define void @local_memory_two_objects(i32 addrspace(1)* %out) {
entry:
diff --git a/test/CodeGen/R600/si-triv-disjoint-mem-access.ll b/test/CodeGen/R600/si-triv-disjoint-mem-access.ll
index 2c146eb..2e2d551 100644
--- a/test/CodeGen/R600/si-triv-disjoint-mem-access.ll
+++ b/test/CodeGen/R600/si-triv-disjoint-mem-access.ll
@@ -51,8 +51,8 @@ define void @no_reorder_local_load_volatile_global_store_local_load(i32 addrspac
; FUNC-LABEL: @no_reorder_barrier_local_load_global_store_local_load
; CI: ds_read_b32 {{v[0-9]+}}, {{v[0-9]+}} offset:4
-; CI: buffer_store_dword
; CI: ds_read_b32 {{v[0-9]+}}, {{v[0-9]+}} offset:8
+; CI: buffer_store_dword
define void @no_reorder_barrier_local_load_global_store_local_load(i32 addrspace(1)* %out, i32 addrspace(1)* %gptr) #0 {
%ptr0 = load i32 addrspace(3)* addrspace(3)* @stored_lds_ptr, align 4
diff --git a/test/CodeGen/R600/smrd.ll b/test/CodeGen/R600/smrd.ll
index 1c7df16..858e3f8 100644
--- a/test/CodeGen/R600/smrd.ll
+++ b/test/CodeGen/R600/smrd.ll
@@ -37,10 +37,12 @@ entry:
; SMRD load with a 64-bit offset
; CHECK-LABEL: {{^}}smrd3:
-; CHECK-DAG: s_mov_b32 s[[SHI:[0-9]+]], 4
-; CHECK-DAG: s_mov_b32 s[[SLO:[0-9]+]], 0 ;
-; FIXME: We don't need to copy these values to VGPRs
-; CHECK-DAG: v_mov_b32_e32 v[[VLO:[0-9]+]], s[[SLO]]
+; FIXME: There are too many copies here because we don't fold immediates
+; through REG_SEQUENCE
+; CHECK: s_mov_b32 s[[SLO:[0-9]+]], 0 ;
+; CHECK: s_mov_b32 s[[SHI:[0-9]+]], 4
+; CHECK: s_mov_b32 s[[SSLO:[0-9]+]], s[[SLO]]
+; CHECK-DAG: v_mov_b32_e32 v[[VLO:[0-9]+]], s[[SSLO]]
; CHECK-DAG: v_mov_b32_e32 v[[VHI:[0-9]+]], s[[SHI]]
; FIXME: We should be able to use s_load_dword here
; CHECK: buffer_load_dword v{{[0-9]+}}, v{{\[}}[[VLO]]:[[VHI]]{{\]}}, s[{{[0-9]+:[0-9]+}}], 0 addr64
diff --git a/test/CodeGen/R600/wait.ll b/test/CodeGen/R600/wait.ll
index 735eabd..cfb446f 100644
--- a/test/CodeGen/R600/wait.ll
+++ b/test/CodeGen/R600/wait.ll
@@ -3,9 +3,8 @@
; CHECK-LABEL: {{^}}main:
; CHECK: s_load_dwordx4
; CHECK: s_load_dwordx4
-; CHECK: s_waitcnt lgkmcnt(0){{$}}
-; CHECK: s_waitcnt vmcnt(0){{$}}
-; CHECK: s_waitcnt expcnt(0) lgkmcnt(0){{$}}
+; CHECK: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
+; CHECK: s_endpgm
define void @main(<16 x i8> addrspace(2)* inreg %arg, <16 x i8> addrspace(2)* inreg %arg1, <32 x i8> addrspace(2)* inreg %arg2, <16 x i8> addrspace(2)* inreg %arg3, <16 x i8> addrspace(2)* inreg %arg4, i32 inreg %arg5, i32 %arg6, i32 %arg7, i32 %arg8, i32 %arg9, float addrspace(2)* inreg %constptr) #0 {
main_body:
%tmp = getelementptr <16 x i8> addrspace(2)* %arg3, i32 0
diff --git a/test/CodeGen/R600/zero_extend.ll b/test/CodeGen/R600/zero_extend.ll
index 0fe1f15..1c746b0 100644
--- a/test/CodeGen/R600/zero_extend.ll
+++ b/test/CodeGen/R600/zero_extend.ll
@@ -29,9 +29,9 @@ entry:
}
; SI-CHECK-LABEL: {{^}}zext_i1_to_i64:
+; SI-CHECK: s_mov_b32 s{{[0-9]+}}, 0
; SI-CHECK: v_cmp_eq_i32
; SI-CHECK: v_cndmask_b32
-; SI-CHECK: s_mov_b32 s{{[0-9]+}}, 0
define void @zext_i1_to_i64(i64 addrspace(1)* %out, i32 %a, i32 %b) nounwind {
%cmp = icmp eq i32 %a, %b
%ext = zext i1 %cmp to i64
<div class="moz-txt-sig">--
2.0.4
</div></pre>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<div class="moz-text-plain" wrap="true" graphical-quote="true"
style="font-family: -moz-fixed; font-size: 12px;"
lang="x-western">
<pre wrap="">_______________________________________________
llvm-commits mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a>
</pre>
</div>
</blockquote>
<br>
</body>
</html>