[PATCH] D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST

Thu Oct 4 02:27:09 PDT 2018

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/FLATInstructions.td:179
   let is_flat_global = 1 in {
-    def "" : FLAT_Load_Pseudo<opName, regClass, HasTiedInput, 1, 1>;
-    def _SADDR : FLAT_Load_Pseudo<opName, regClass, HasTiedInput, 1, 1, 1>;
+    def "" : FLAT_Load_Pseudo<opName, regClass, HasTiedInput, 1, 1>, GlobalSaddrTable<0, opName>;
+    def _SADDR : FLAT_Load_Pseudo<opName, regClass, HasTiedInput, 1, 1, 1>, GlobalSaddrTable<1, opName>;
----------------
Put these on the following line and indent

================
Comment at: lib/Target/AMDGPU/SIFixupVectorISel.cpp:47-51
+// Opt used by a few lit tests to preserve older CHECK patterns.
+static cl::opt<bool> EnableSIGlobalSAddr(
+  "amdgpu-enable-si-global-saddr",
+  cl::desc("Enable use of SGPR regs for global load/store instructions"),
+  cl::init(true));
----------------
I'd rather not have this option. We have an already excessively long list of band-aids to avoid updating tests

================
Comment at: lib/Target/AMDGPU/SIFixupVectorISel.cpp:100-101
+      continue;
+    MachineInstr *DefInst = MRI.getUniqueVRegDef(WOp->getReg());
+    switch (DefInst->getOpcode()) {
+    default:
----------------
I think you could break this with a shader using an inreg/sgpr argument

================
Comment at: lib/Target/AMDGPU/SIFixupVectorISel.cpp:107-110
+    case AMDGPU::REG_SEQUENCE:
+      Worklist.push_back(&DefInst->getOperand(1));
+      Worklist.push_back(&DefInst->getOperand(3));
+      break;
----------------
This is assuming too much about the REG_SEQUENCE, so at least needs an assert. It could be a wider reg_sequence

================
Comment at: lib/Target/AMDGPU/SIFixupVectorISel.cpp:112-117
+      BaseReg = DefInst->getOperand(2).getReg();
+      if (!TargetRegisterInfo::isVirtualRegister(BaseReg))
+        continue;
+      IndexReg = DefInst->getOperand(3).getReg();
+      if (!TargetRegisterInfo::isVirtualRegister(IndexReg))
+        continue;
----------------
You don't need these. We'll never emit a physical register operand like this. I might worry about seeing a subreg index on the use operand though

================
Comment at: lib/Target/AMDGPU/SIFixupVectorISel.cpp:125
+      MI = MRI.getUniqueVRegDef(BaseReg);
+      if (!(MI && MI->isCopy()))
+        continue;
----------------
de Morgan this

================
Comment at: lib/Target/AMDGPU/SIFixupVectorISel.cpp:207
+bool SIFixupVectorISel::runOnMachineFunction(MachineFunction &MF) {
+  // This pass does not run at -O0, avoid putting correctness patches here.
+  if (skipFunction(MF.getFunction()))
----------------
Comment not needed

================
Comment at: lib/Target/AMDGPU/SIFixupVectorISel.cpp:217-218
+  bool FuncModified = false;
+  for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
+                                                  BI != BE; ++BI) {
+    // Cleanup missed Saddr opportunites from ISel.
----------------
Range loop

================
Comment at: test/CodeGen/AMDGPU/conv2d-saddr.ll:13
+
+target triple = "amdgcn-unknown-amdhsa"
+
----------------
Move the triple to the command line and drop this

================
Comment at: test/CodeGen/AMDGPU/conv2d-saddr.ll:16
+; Function Attrs: convergent nounwind
+define hidden amdgpu_kernel void @simpleConv2d(i32 addrspace(1)* nocapture %dst_image, i32 addrspace(1)* nocapture readonly %src_image, i32 addrspace(1)* nocapture readonly %conv_kernel) local_unnamed_addr #0 !kernel_arg_addr_space !5 !kernel_arg_access_qual !6 !kernel_arg_type !7 !kernel_arg_base_type !7 !kernel_arg_type_qual !8 {
+entry:
----------------
This testcase is way too big. Most of the tests should be only a handful of instructions. There should be ones stressing the immediate limits. I also don't see ones for the atomics that the comments say work

https://reviews.llvm.org/D52846