[PATCH] D124192: [AMDGPU] Callee must always spill writelane VGPRs

Nicolai Hähnle via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 29 06:23:58 PDT 2022


nhaehnle added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIFrameLowering.cpp:1263
+      if (MI.getOpcode() == AMDGPU::V_WRITELANE_B32)
+        MFI->addToLaneVGPRs(MF, MI.getOperand(0).getReg());
+    }
----------------
cdevadas wrote:
> nhaehnle wrote:
> > arsenm wrote:
> > > cdevadas wrote:
> > > > arsenm wrote:
> > > > > cdevadas wrote:
> > > > > > arsenm wrote:
> > > > > > > I don't think this is the right place to determine these registers. It's adding an extra loop over the function, and adding statefulness to determineCalleeSaves. Theoretically we should be able to call it multiple times 
> > > > > > Any recommendation?
> > > > > > This should be done before identifying the CSR registers and they should be skipped from the default CSR spill insertion routines.
> > > > > Logically I think this belongs in PEI::calculateCallFrameInfo, but this isn't really a reasonable thing to add to the generic code.
> > > > Yes, we can't accommodate the code into PEI::calculateCallFrameInfo.  The processFunctionBeforeFrameFinalized is also not a viable option. At this moment, determineCalleeSaves seemed a better place. The need to iterate over the MBB, in this case, can't be avoided.
> > > I don't like it, but I guess we can go with this for now. Is there a way to assert that this was only computed once? I know at one point I had a patch that tried to speculatively call this to see what spills will be later inserted
> > I don't understand why this is necessary in the first place. Can't you call `allocateWWMSpill` when the V_WRITELANE_B32 for spilling is inserted? Actually, isn't that already happening?
> Yes, that's the behavior with the upstream compiler today. This patch is a prerequisite for D124196 which changes the SGPR spilling strategy by spilling them into virtual VGPRs instead of using the physical registers. These virtual registers get physVGPRs during the second iteration of allocation passes that allocates only VGPR classes. 
> At the time of WRITELANE insertion at SILowerSGPRSpills pass, we only have the virtual registers with D124196. That's the reason we need to iterate over all BBs looking for WRITELANE instructions later during PrologEpilogInserter.
Do you have a plan to fix this? And what about the correctness of D124196, see the question about the register allocator introducing COPYs when allocating VGPRs.

I have a feeling that the compiler really does need to track at some level the nature of VGPRs, i.e. whether they're used at lane scope or at wave scope, in order to fix this potential miscompilation due to reg alloc. Once you have that, you don't need the scan for V_WRITELANE either.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124192/new/

https://reviews.llvm.org/D124192



More information about the llvm-commits mailing list