[PATCH] D96336: [AMDGPU] Save VGPR of whole wave when spilling

Wed Apr 7 06:30:32 PDT 2021

sebastian-ne marked 2 inline comments as done.
sebastian-ne added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll:49
 ; GFX6: s_add_u32 s32, s32, 0x[[OFFSET:[0-9a-f]+]]
+; GFX6-NEXT: s_waitcnt expcnt(0)
 ; GFX6-NEXT: buffer_load_dword v{{[0-9]+}}, off, s[{{[0-9:]+}}], s32
----------------
t-tye wrote:
> sebastian-ne wrote:
> > foad wrote:
> > > What causes this change?
> > Above these tested lines, the VGPR gets saved to scratch in a buffer_store_dword.
> > The same VGPR is the destination in buffer_load_dword below, so waiting for expcnt(0) makes sure we do not overwrite it before the store happened (the docs say expcnt waits until writes to the last level cache happened, so I guess the store→load is the reason).
> Are you sure exp_cnt does what you describe? In older hardware exp_cnt was used to ensure input registers had been consumed by an instruction, but that is not longer true as the hardware now has interlocks making using expr_cnt no longer serve this purpose (although are hazards in some multi-dword cases.
> 
> The other wait_cnt counters act to indicate if the memory operation is visible. But the hardware ensures single location coherence per thread so why must this be waited on?
The test checks GFX6, does that count as old hardware? :)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96336/new/

https://reviews.llvm.org/D96336