[PATCH] D91110: [AMDGPU] Remove scratch rsrc from spill pseudos
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 11 12:14:40 PST 2020
rampitec added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:1391-1393
+ if (Register ScratchRSrcReg = MFI->getScratchRSrcReg()) {
+ MIB.addReg(ScratchRSrcReg, RegState::Implicit);
+ }
----------------
arsenm wrote:
> rampitec wrote:
> > arsenm wrote:
> > > I don't see why this would be conditional, or why this would need to be added
> > It is conditional because I do not want to add "implicit $noreg" in case of flat scratch.
> > It is needed because SIFrameLowering::getEntryFunctionReservedScratchRsrcReg() does that:
> >
> >
> > ```
> > if (!ScratchRsrcReg || !MRI.isPhysRegUsed(ScratchRsrcReg))
> > return Register();
> > ```
> >
> > I.e. if register is unused it will not be initialized.
> I think we can remove this check and replace it with a check for stack objects now. In the spill case it should have always been redundant
Theoretically yes, we could replace it with:
```
if (!ScratchRsrcReg || (!MRI.isPhysRegUsed(ScratchRsrcReg) &&
allStackObjectsAreDead(MF.getFrameInfo())))
```
Then isPhysRegUsed() is still needed. Example is in the amdpal.ll, function @scratch:
```
bb.0.entry:
liveins: $sgpr0_sgpr1
%1:sgpr_64(p4) = COPY $sgpr0_sgpr1
%4:sreg_64_xexec = S_LOAD_DWORDX2_IMM %1:sgpr_64(p4), 11, 0, 0 :: (dereferenceable invariant load 8 from %ir.0, align 4, addrspace 4)
%5:sreg_32 = COPY %4.sub0:sreg_64_xexec
%6:sreg_32 = S_MOV_B32 0
%7:sreg_64 = REG_SEQUENCE killed %5:sreg_32, %subreg.sub0, killed %6:sreg_32, %subreg.sub1
%8:sreg_32 = S_MOV_B32 2
%9:sreg_64 = S_LSHL_B64 killed %7:sreg_64, killed %8:sreg_32, implicit-def dead $scc
%10:sreg_64 = S_ADD_U64_PSEUDO %1:sgpr_64(p4), killed %9:sreg_64, implicit-def dead $scc
%11:sreg_32 = COPY %4.sub1:sreg_64_xexec
%12:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM killed %10:sreg_64, 9, 0, 0 :: (dereferenceable invariant load 4, addrspace 4)
%13:vgpr_32 = COPY %12:sreg_32_xm0_xexec
%14:vgpr_32 = COPY %11:sreg_32
BUFFER_STORE_DWORD_OFFEN killed %13:vgpr_32, killed %14:vgpr_32, $private_rsrc_reg, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4 into %ir.2, addrspace 5)
S_ENDPGM 0
```
It has $private_rsrc_reg use right after selection and no stack objects.
Then allStackObjectsAreDead() is also needed. Unfortunatelly doing so worsens the codegen. For example in idot8s.ll there is test @idot8_acc32. It ends up requesting 12 bytes of scratch, but never uses it. Stack is allocated at the selection and comes dead, but not marked as dead and not eliminated:
```
Frame Objects:
fi#-1: size=4, align=16, fixed, at location [SP]
fi#0: size=4, align=4, at location [SP+4]
fi#1: size=4, align=4, at location [SP+8]
```
I believe it was supposed to be eliminated by the StackSlotColoring, but it was not because LiveStacks comes out empty.
I.e. until we resolve dead stack objects elimination issue it is better to rely on the scratch rsrc register usage.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D91110/new/
https://reviews.llvm.org/D91110
More information about the llvm-commits
mailing list