[llvm] [AMDGPU] Don't DEALLOC_VGPRS from callable functions (PR #72245)

Tue Nov 14 05:09:26 PST 2023

================
@@ -1039,10 +1041,13 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
   // Identify S_ENDPGM instructions which may have to wait for outstanding VMEM
   // stores. In this case it can be useful to send a message to explicitly
   // release all VGPRs before the stores have completed, but it is only safe to
-  // do this if there are no outstanding scratch stores.
+  // do this if:
+  // * there are no outstanding scratch stores
+  // * this is not a callable function
   else if (MI.getOpcode() == AMDGPU::S_ENDPGM ||
            MI.getOpcode() == AMDGPU::S_ENDPGM_SAVED) {
----------------
jasilvanus wrote:

Ah -- is this about the case where the caller writes to scratch, and the callee terminates the wave, in which case we should not send the message because the caller write might still be pending?

https://github.com/llvm/llvm-project/pull/72245