[PATCH] D73880: AMDGPU/GlobalISel: Fix move s.buffer.load to VALU

Wed Feb 5 08:40:34 PST 2020

nhaehnle added a comment.

Thank you for doing this. A couple of concerns...

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:1375-1377
+  // Use the alignment to ensure that the required offsets will fit into the
+  // immediate offsets.
+  const unsigned Align = NumLoads > 1 ? 16 * NumLoads : 4;
----------------
What's the justification for the alignment in the `NumLoads > 1` case?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:1437-1448
+  if (RSrcBank != &AMDGPU::SGPRRegBank) {
+    // Remove the original instruction to avoid potentially confusing the
+    // waterfall loop logic.
+    B.setInstr(*Span.begin());
+    MI.eraseFromParent();
+
+    SmallSet<Register, 4> OpsToWaterfall;
----------------
This case should waterfall the original load if only the srsrc is a VGPR....

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn-s-buffer-load.mir:79
     ; CHECK: [[BUILD_VECTOR:%[0-9]+]]:sgpr(<4 x s32>) = G_BUILD_VECTOR [[V_READFIRSTLANE_B32_]](s32), [[V_READFIRSTLANE_B32_1]](s32), [[V_READFIRSTLANE_B32_2]](s32), [[V_READFIRSTLANE_B32_3]](s32)
-    ; CHECK: [[INT:%[0-9]+]]:sgpr(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.s.buffer.load), [[BUILD_VECTOR]](<4 x s32>), [[COPY1]](s32), 0
+    ; CHECK: [[AMDGPU_BUFFER_LOAD:%[0-9]+]]:vgpr(<4 x s32>) = G_AMDGPU_BUFFER_LOAD [[BUILD_VECTOR]](<4 x s32>), [[C1]](s32), [[COPY2]], [[C]], 0, 0, 0 :: (dereferenceable invariant load 16, align 4)
     ; CHECK: [[S_AND_SAVEEXEC_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_SAVEEXEC_B64 killed [[S_AND_B64_]], implicit-def $exec, implicit-def $scc, implicit $exec
----------------
This should still be an s_buffer_load...

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73880/new/

https://reviews.llvm.org/D73880