PATCH: R600/SI: Bug fixes for mi-sched

Tom Stellard tom at stellard.net
Wed Nov 19 12:05:55 PST 2014


On Tue, Nov 18, 2014 at 05:49:04PM -0800, Matt Arsenault wrote:
> On 11/18/2014 05:10 PM, Tom Stellard wrote:
> >Hi Matt,
> >
> >Here is the latest set of patches for fixing bugs uncovered by enabling the
> >MI Scheduler.
> >
> >-Tom
> >
> >0001-R600-SI-Make-SIInstrInfo-isOperandLegal-more-strict.patch
> >
> >
> > From 14b5c10f1d0af13f7155765deb3a687e3a133cf1 Mon Sep 17 00:00:00 2001
> >From: Tom Stellard<thomas.stellard at amd.com>
> >Date: Tue, 18 Nov 2014 11:42:13 -0500
> >Subject: [PATCH 1/6] R600/SI: Make SIInstrInfo::isOperandLegal() more strict
> >
> >A register operand that has a common sub-class with its instruction's
> >defined register class is not always legal.  For example,
> >SReg_32 and M0Reg both have a common sub-class, but we can't
> >use an SReg_32 in instructions that expect a M0Reg.
> >
> >This prevents the llvm.SI.sendmsg.ll test from failing when the fold
> >operand pass is added.
> >---
> 
> LGTM. Might want to change the example to use s0 instead of sgpr0 to
> match the asm output
> 

r222368

> >0002-R600-SI-Make-sure-the-LoadStore-optimizer-emits-inst.patch
> >
> >
> > From 43a26609e9291d3c2b927e1b93bc7bffe4a4f6c7 Mon Sep 17 00:00:00 2001
> >From: Tom Stellard<thomas.stellard at amd.com>
> >Date: Tue, 18 Nov 2014 13:37:03 -0500
> >Subject: [PATCH 2/6] R600/SI: Make sure the LoadStore optimizer emits
> >  instructions with offset0 < offset1
> >
> >This will help reduce the number of changes needed to the test
> >cases when the MI scheduler is enabled.
> >---
> >  lib/Target/R600/SILoadStoreOptimizer.cpp | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> >diff --git a/lib/Target/R600/SILoadStoreOptimizer.cpp b/lib/Target/R600/SILoadStoreOptimizer.cpp
> >index 4140196..006b869 100644
> >--- a/lib/Target/R600/SILoadStoreOptimizer.cpp
> >+++ b/lib/Target/R600/SILoadStoreOptimizer.cpp
> >@@ -255,6 +255,9 @@ MachineBasicBlock::iterator  SILoadStoreOptimizer::mergeRead2Pair(
> >      = (EltSize == 4) ? &AMDGPU::VReg_64RegClass : &AMDGPU::VReg_128RegClass;
> >    unsigned DestReg = MRI->createVirtualRegister(SuperRC);
> >+  if (NewOffset0 > NewOffset1)
> >+    std::swap(NewOffset0, NewOffset1);
> I don't think this is correct. It will still write offset0 into the
> low register, and offset1 into the high register of the destination
> pair. Just swapping the offsets will change which load the extracted
> result gets. I would expect the scheduler to order the offsets so
> that this ends up happening anyway (I thought there was already a
> check to prevent folding if offset1 > offset0).

The reason the offsets were out of order was that the loads
weren't being clustered by the scheduler and the sorting only
seems to happen when they get clustered.

Even though they weren't being clustered, ds_read2 was still
matching, because the load instructions  happened to
be next to each other by luck.

Your two patches from yesterday:
"R600/SI: Set hasSideEffects = 0 on load and store instructions."
"R600/SI: Implement areMemAccessesTriviallyDisjoit"

allowed the loads to be clustered, which made this patch
unnecessary, so I've dropped it.

> >0003-R600-SI-Add-SIFoldOperands-pass.patch
> >
> >
> > From 41409a4ba20a0a74acc58ae07a5130144bff8c39 Mon Sep 17 00:00:00 2001
> >From: Tom Stellard<thomas.stellard at amd.com>
> >Date: Tue, 18 Nov 2014 11:45:28 -0500
> >Subject: [PATCH 3/6] R600/SI: Add SIFoldOperands pass
> >
> >This pass attempts to fold the source operands of mov and copy
> >instructions into their uses.
> 
> Is this supposed to replace what's left of
> SITargetLowering::legalizeOperands? This doesn't delete it so this
> probably isn't going to hit many cases. I'm also slightly worried
> this might break my f64 inline immediate patches

I was planning to remove legalizeOperands in a separate patch.  It does
seem to hit a fair amount of cases even with legalizeOperands still there.

> >---
> >  lib/Target/R600/AMDGPU.h                |   4 +
> >  lib/Target/R600/AMDGPUTargetMachine.cpp |   2 +
> >  lib/Target/R600/CMakeLists.txt          |   1 +
> >  lib/Target/R600/SIFoldOperands.cpp      | 200 ++++++++++++++++++++++++++++++++
> >  test/CodeGen/R600/extload.ll            |  21 ++--
> >  test/CodeGen/R600/local-atomics.ll      |  24 ++--
> >  test/CodeGen/R600/operand-folding.ll    |  40 +++++++
> >  7 files changed, 264 insertions(+), 28 deletions(-)
> >  create mode 100644 lib/Target/R600/SIFoldOperands.cpp
> >  create mode 100644 test/CodeGen/R600/operand-folding.ll
> >
> >diff --git a/lib/Target/R600/AMDGPU.h b/lib/Target/R600/AMDGPU.h
> >index 261075e..13379e7 100644
> >--- a/lib/Target/R600/AMDGPU.h
> >+++ b/lib/Target/R600/AMDGPU.h
> >@@ -38,6 +38,7 @@ FunctionPass *createAMDGPUCFGStructurizerPass();
> >  // SI Passes
> >  FunctionPass *createSITypeRewriter();
> >  FunctionPass *createSIAnnotateControlFlowPass();
> >+FunctionPass *createSIFoldOperandsPass();
> >  FunctionPass *createSILowerI1CopiesPass();
> >  FunctionPass *createSIShrinkInstructionsPass();
> >  FunctionPass *createSILoadStoreOptimizerPass(TargetMachine &tm);
> >@@ -47,6 +48,9 @@ FunctionPass *createSIFixSGPRLiveRangesPass();
> >  FunctionPass *createSICodeEmitterPass(formatted_raw_ostream &OS);
> >  FunctionPass *createSIInsertWaits(TargetMachine &tm);
> >+void initializeSIFoldOperandsPass(PassRegistry &);
> >+extern char &SIFoldOperandsID;
> >+
> >  void initializeSILowerI1CopiesPass(PassRegistry &);
> >  extern char &SILowerI1CopiesID;
> >diff --git a/lib/Target/R600/AMDGPUTargetMachine.cpp b/lib/Target/R600/AMDGPUTargetMachine.cpp
> >index b2cd988..80142f0 100644
> >--- a/lib/Target/R600/AMDGPUTargetMachine.cpp
> >+++ b/lib/Target/R600/AMDGPUTargetMachine.cpp
> >@@ -159,6 +159,8 @@ bool AMDGPUPassConfig::addInstSelector() {
> >      addPass(createSIFixSGPRCopiesPass(*TM));
> >    }
> >+  addPass(createSILowerI1CopiesPass());
> >+  addPass(createSIFoldOperandsPass());
> >    return false;
> >  }
> >diff --git a/lib/Target/R600/CMakeLists.txt b/lib/Target/R600/CMakeLists.txt
> >index ed0a216..3b703e7 100644
> >--- a/lib/Target/R600/CMakeLists.txt
> >+++ b/lib/Target/R600/CMakeLists.txt
> >@@ -43,6 +43,7 @@ add_llvm_target(R600CodeGen
> >    SIAnnotateControlFlow.cpp
> >    SIFixSGPRCopies.cpp
> >    SIFixSGPRLiveRanges.cpp
> >+  SIFoldOperands.cpp
> >    SIInsertWaits.cpp
> >    SIInstrInfo.cpp
> >    SIISelLowering.cpp
> >diff --git a/lib/Target/R600/SIFoldOperands.cpp b/lib/Target/R600/SIFoldOperands.cpp
> >new file mode 100644
> >index 0000000..1b0ba24
> >--- /dev/null
> >+++ b/lib/Target/R600/SIFoldOperands.cpp
> >@@ -0,0 +1,200 @@
> >+//===-- SIFoldOperands.cpp - Fold operands --- ----------------------------===//
> >+//
> >+//                     The LLVM Compiler Infrastructure
> >+//
> >+// This file is distributed under the University of Illinois Open Source
> >+// License. See LICENSE.TXT for details.
> >+//
> >+/// \file
> >+//===----------------------------------------------------------------------===//
> >+//
> >+
> >+#include "AMDGPU.h"
> >+#include "AMDGPUSubtarget.h"
> >+#include "SIInstrInfo.h"
> >+#include "llvm/CodeGen/LiveIntervalAnalysis.h"
> >+#include "llvm/CodeGen/MachineDominators.h"
> >+#include "llvm/CodeGen/MachineFunctionPass.h"
> >+#include "llvm/CodeGen/MachineInstrBuilder.h"
> >+#include "llvm/CodeGen/MachineRegisterInfo.h"
> >+#include "llvm/IR/LLVMContext.h"
> >+#include "llvm/IR/Function.h"
> >+#include "llvm/Support/Debug.h"
> >+#include "llvm/Target/TargetMachine.h"
> >+
> >+#define DEBUG_TYPE "si-fold-operands"
> >+using namespace llvm;
> >+
> >+namespace {
> >+
> >+class SIFoldOperands : public MachineFunctionPass {
> >+public:
> >+  static char ID;
> >+
> >+public:
> >+  SIFoldOperands() : MachineFunctionPass(ID) {
> >+    initializeSIFoldOperandsPass(*PassRegistry::getPassRegistry());
> >+  }
> >+
> >+  bool runOnMachineFunction(MachineFunction &MF) override;
> >+
> >+  const char *getPassName() const override {
> >+    return "SI Fold Operands";
> >+  }
> >+
> >+  void getAnalysisUsage(AnalysisUsage &AU) const override {
> >+    AU.addRequired<MachineDominatorTree>();
> >+    AU.setPreservesCFG();
> >+    MachineFunctionPass::getAnalysisUsage(AU);
> >+  }
> >+};
> >+
> >+} // End anonymous namespace.
> >+
> >+INITIALIZE_PASS_BEGIN(SIFoldOperands, DEBUG_TYPE,
> >+                      "SI Fold Operands", false, false)
> >+INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
> >+INITIALIZE_PASS_END(SIFoldOperands, DEBUG_TYPE,
> >+                    "SI Fold Operands", false, false)
> >+
> >+char SIFoldOperands::ID = 0;
> >+
> >+char &llvm::SIFoldOperandsID = SIFoldOperands::ID;
> >+
> >+FunctionPass *llvm::createSIFoldOperandsPass() {
> >+  return new SIFoldOperands();
> >+}
> >+
> >+static bool isSafeToFold(unsigned Opcode) {
> >+  switch(Opcode) {
> >+  case AMDGPU::V_MOV_B32_e32:
> >+  case AMDGPU::V_MOV_B32_e64:
> >+  case AMDGPU::S_MOV_B32:
> >+  case AMDGPU::S_MOV_B64:
> >+  case AMDGPU::COPY:
> >+    return true;
> >+  default:
> >+    return false;
> >+  }
> >+}
> >+
> >+static bool updateOperand(MachineInstr *MI, unsigned OpNo,
> >+                          const MachineOperand &New,
> >+                          const TargetRegisterInfo &TRI) {
> >+  MachineOperand &Old = MI->getOperand(OpNo);
> >+  assert(Old.isReg());
> >+
> >+  if (New.isImm()) {
> >+    Old.ChangeToImmediate(New.getImm());
> >+    return true;
> >+  }
> >+
> >+  if (New.isFPImm()) {
> >+    Old.ChangeToFPImmediate(New.getFPImm());
> >+    return true;
> >+  }
> I've been considering replacing all fp immediates with the bitcasted
> integers, since handling fp immediates everywhere is a pain and
> there isn't much point to distinguishing them. Where do you think
> the best place to do this would be?
> 

Would it work to convert them in AMDGPUDAGToDAGISel::Select() ?

> >+
> >+  if (New.isReg())  {
> >+    if (TargetRegisterInfo::isVirtualRegister(Old.getReg()) &&
> >+        TargetRegisterInfo::isVirtualRegister(New.getReg())) {
> >+      Old.substVirtReg(New.getReg(), New.getSubReg(), TRI);
> >+      return true;
> >+    }
> >+  }
> >+
> >+  // FIXME: Handle physical registers.
> 
> What cases that can be folded will ever have physical registers at
> this point?

If this pass was run after register allocation it would have to deal with
physical registers.

> 
> >+
> >+  return false;
> >+}
> >+
> >+bool SIFoldOperands::runOnMachineFunction(MachineFunction &MF) {
> >+  MachineRegisterInfo &MRI = MF.getRegInfo();
> >+  const SIInstrInfo *TII =
> >+      static_cast<const SIInstrInfo *>(MF.getSubtarget().getInstrInfo());
> >+  const SIRegisterInfo &TRI = TII->getRegisterInfo();
> >+
> >+  for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
> >+                                                  BI != BE; ++BI) {
> >+
> >+    MachineBasicBlock &MBB = *BI;
> >+    MachineBasicBlock::iterator I, Next;
> >+    for (I = MBB.begin(); I != MBB.end(); I = Next) {
> >+      Next = std::next(I);
> >+      MachineInstr &MI = *I;
> >+
> >+      if (!isSafeToFold(MI.getOpcode()))
> >+        continue;
> >+
> >+      const MachineOperand &OpToFold = MI.getOperand(1);
> >+
> >+      // FIXME: Fold operands with subregs.
> >+      if (OpToFold.isReg() &&
> >+          (!TargetRegisterInfo::isVirtualRegister(OpToFold.getReg()) ||
> >+           OpToFold.getSubReg()))
> >+        continue;
> >+
> >+      std::vector<std::pair<MachineInstr *, unsigned>> FoldList;
> >+      for (MachineRegisterInfo::use_iterator
> >+           Use = MRI.use_begin(MI.getOperand(0).getReg()), E = MRI.use_end();
> >+           Use != E; ++Use) {
> I think you can use a range for here with MRI.use_instructions(Op.getReg())

I'm using the iterator, because it has a reference to the operand number.
I can't get that using a range.

I've attached updated patches.  Let me know what you think.

-Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-R600-SI-Add-SIFoldOperands-pass.patch
Type: text/x-diff
Size: 16304 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141119/c46149b5/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-R600-SI-Mark-s_mov_b32-and-s_mov_b64-as-rematerializ.patch
Type: text/x-diff
Size: 1011 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141119/c46149b5/attachment-0001.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-R600-SI-Emit-s_mov_b32-m0-1-before-every-DS-instruct.patch
Type: text/x-diff
Size: 14461 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141119/c46149b5/attachment-0002.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-R600-SI-Add-an-s_mov_b32-to-patterns-which-use-the-M.patch
Type: text/x-diff
Size: 2898 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141119/c46149b5/attachment-0003.patch>


More information about the llvm-commits mailing list