[PATCH] D115401: AMDGPU: Fix clobbering SCC when expanding large offset spill pseudos
Sebastian Neubauer via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 14 04:28:22 PST 2021
sebastian-ne added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:189
+ TmpVGPRUsed = RS->isRegUsed(TmpVGPR);
+ if (TmpVGPRUsed) {
----------------
What is the difference between `TmpVGPRUsed` and `TmpVGPRLive`? It looks like the same thing to me?
================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:228
if (TmpVGPRLive)
- TRI.buildVGPRSpillLoadStore(*this, TmpVGPRIndex, 0, /*IsLoad*/ false,
- /*IsKill*/ false);
+ TRI.buildVGPRSpillLoadStore(*this, TmpVGPRIndex, 0, /*IsLoad*/ false);
// Spill inactive lanes
----------------
Isn’t TmpVGPR still used in the second `buildVGPRSpillLoadStore`? How does this work if it’s killed here?
================
Comment at: llvm/test/CodeGen/AMDGPU/swdev309419-agpr.mir:1588-1593
+ ; GFX908-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc
+ ; GFX908-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)
+ ; GFX908-NEXT: $vgpr0 = V_MOV_B32_e32 8200, implicit $exec
+ ; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec
+ ; GFX908-NEXT: BUFFER_STORE_DWORD_OFFEN $vgpr0, $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.1, addrspace 5)
+ ; GFX908-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
----------------
This looks incorrect. It’s trying to use v0 as offset and agpr intermediate at the same time.
================
Comment at: llvm/test/CodeGen/AMDGPU/swdev309419.mir:8
+
+# Test that if we have a spill with a live SCC def, and we the offset
+# does not fit in the immediate offset we do not clobber SCC.
----------------
Typo: we the offset
================
Comment at: llvm/test/CodeGen/AMDGPU/swdev309419.mir:1074-1078
+ ; GFX10-FLATSCR-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc
+ ; GFX10-FLATSCR-NEXT: $vcc_lo = S_ADD_I32 $sgpr32, 8200, implicit-def $scc
+ ; GFX10-FLATSCR-NEXT: $vgpr1 = V_MOV_B32_e32 killed $vcc_lo, implicit $exec
+ ; GFX10-FLATSCR-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
+ ; GFX10-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
----------------
I guess this is expected to still incorrectly overwrite $scc?
Maybe add a fixme?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D115401/new/
https://reviews.llvm.org/D115401
More information about the llvm-commits
mailing list