[PATCH] D114652: [AMDGPU] Treat call clobbered return address registers s[30:31] as callee saved
Venkata Ramanaiah Nalamothu via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 6 06:20:23 PST 2021
RamNalamothu added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp:232
+ Register Reg = TRI->getReturnAddressReg(MF);
+ if (!FuncInfo->isEntryFunction() && MFI.hasCalls()) {
+ const TargetRegisterClass *RC =
----------------
RamNalamothu wrote:
> sebastian-ne wrote:
> > RamNalamothu wrote:
> > > sebastian-ne wrote:
> > > > sebastian-ne wrote:
> > > > > RamNalamothu wrote:
> > > > > > sebastian-ne wrote:
> > > > > > > RamNalamothu wrote:
> > > > > > > > sebastian-ne wrote:
> > > > > > > > > arsenm wrote:
> > > > > > > > > > RamNalamothu wrote:
> > > > > > > > > > > sebastian-ne wrote:
> > > > > > > > > > > > Why is the return address only marked as callee-save if this function has calls? I assume it can’t it be overwritten if there are no calls because it is marked as reserved?
> > > > > > > > > > > >
> > > > > > > > > > > > Coming back here after reading the tests, I think s[30:31] should also be marked as callee-save if there are no calls. It shouldn’t cause harm if there are no calls (the register is reserved anyway) and it should fix the tests.
> > > > > > > > > > > > Why is the return address only marked as callee-save if this function has calls? I assume it can’t it be overwritten if there are no calls because it is marked as reserved?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Yes.
> > > > > > > > > > >
> > > > > > > > > > > > Coming back here after reading the tests, I think s[30:31] should also be marked as callee-save if there are no calls. It shouldn’t cause harm if there are no calls (the register is reserved anyway) and it should fix the tests.
> > > > > > > > > > >
> > > > > > > > > > > AFAIK that's the eventual goal and there is a discussion to pick the ABI registers for return address from the callee saved range. And these changes should work with no/minimal changes.
> > > > > > > > > > >
> > > > > > > > > > > However, I don't think I understand how the tests will get fixed automatically if the return address registers are marked as callee saved because even the SGPR CSR save/restore happens during SGPRSpillLowering which is where I am manually adding s[30:31] to the CSR list.
> > > > > > > > > > I don't think this register should be reserved, which avoids the need to do this
> > > > > > > > > Why is the return address not callee-save if there are no calls?
> > > > > > > > >
> > > > > > > > > Does removing the `&& MFI.hasCalls()` condition have any bad effects?
> > > > > > > > No functional issues but its just that it will be unnecessary to save/restore when the registers are not clobbered i.e. when we have no calls.
> > > > > > > But if we have no calls and the register is not clobbered, marking the registers as callee-save shouldn’t save and restore them?
> > > > > > >
> > > > > > > I think it doesn’t hurt to remove this condition. On the contrary, I expect that removing the hasCalls condition will fix the case where s30 or s31 are overwritten in inline-assembly.
> > > > > > Sorry, I didn't get your point well.
> > > > > >
> > > > > > Are you saying we **should** save/restore s[30:31] even when we don't have calls i.e. even when s[30:31] is not modified in the current function? Does that help in any way, other than the inline asm scenario?
> > > > > >
> > > > > > For the inline asm scenario, as I already mentioned. I heard that we currently don't have any policy and the ABI registers cannot be clobbered like any other scratch register right?
> > > > > > And even if we want to handle that scenario, I guess a blanket removal of `hasCalls()` condition is not the best/optimal IMO because we will be emitting unnecessary save/restore if the code inside inline asm does not actually modify s[30:31].
> > > > > Ups, ignore my previous comment. I should have read the code again before making that comment.
> > > > >
> > > > > What I thought this code would do, is add s[30:31] to the callee-save register list if this condition is satisfied. I did not see that it is already spilling it, based on the condition.
> > > > >
> > > > > Let me try what happens if s[30:31] is (unconditionally) added to the CSR list and the code here is removed.
> > > > I tested it and I think marking s[30:31] as callee-save register instead of explicitly saving them here is better.
> > > >
> > > > 1. We don’t need `hasCall` to check if s[30:31] may be overwritten.
> > > > 2. We shouldn’t need to mark the register as reserved (didn’t test this).
> > > > 3. It “just works”, even if there is inline assembly overwriting s30 or s31
> > > > (and there are no unnecessary saves/restores if it isn’t overwritten).
> > > >
> > > > The last point also means you don’t need to add work-arounds to tests by introducing artificial calls or changing the registers in inline assembly, because it will keep working as is.
> > > >
> > > > See this sketched commit: https://github.com/Flakebi/amdvlk-llvm-project/commit/240040601e0f8849acb85141142408b3e6a32ff9
> > > > E.g. it fixed the test here: https://github.com/Flakebi/amdvlk-llvm-project/commit/240040601e0f8849acb85141142408b3e6a32ff9#diff-25d99c15c21155a13557b29a1c0edfa792c19040f02a0d7cd676865b46275713L75-R88
> > > What you did there is you just took a part of this patch and doing that is incorrect. If you still want to do those experiments, please take the full patch and try making your changes on top of that.
> > >
> > > This patch makes changes to return instruction and its use of return address during the instruction selection. I think you missed to fully understand that aspect.
> > >
> > > Please read again my earlier response to @arsenm which I copied here.
> > >
> > > >> I don't think it's necessary to reserve this
> > >
> > > > **But otherwise the register allocation thinks it is available for allocation as `s[30:31]` are scratch registers at the moment and its usage with return instruction is hidden with using SI_RETURN through the register allocation and we don't emit Live-ins anymore.**
> > > >
> > > > And we will have to reserve it here even after these registers actually get picked from **CSR range** because we want to emit save/restore exclusively so that the unwind info machinery knows it and can generate CFI as needed for return address. That's the reason why all of this got started right.
> > >
> > >
> > I used `arc patch` to get the diff (https://github.com/Flakebi/amdvlk-llvm-project/commit/240040601e0f8849acb85141142408b3e6a32ff9) and based my commit on top of that, so I hope it got all of your changes?
> >
> > Unfortunately I’m not familiar with CFI (I guess it refers to call-frame-information and not control-flow-integrity?). Are there more upcoming patches that need the explicit spill here and can’t rely on the llvm infrastructure for CSRs?
> >
> > I like that this patch simplifies the call-related code in amdgpu, but as you probably noticed, I’m not that happy that inline assembly that touches s[30:31] now ends up with non-working code.
> >
> > If this is an intended change (and I don’t feel authoritative to say if this is fine or not), I think it should be mentioned in the commit message that s[30:31] is not saved anymore if used in inline assembly. And the tests that intend to test that (like @void_func_void_clobber_s30_s31) should probably just be removed.
> Ah, my bad. I didn't see the parent commit.
>
> It is call frame information and there will be additional CFI changes but only in the downstream. They are not/can't be up-streamed yet.
>
> This patch is written with the assumption that s[30:31] can't be moved to CSR range right now. Like I mentioned in the beginning, there is already a discussion to pick a different register pair (i.e. s[28:29]) for return address which will be part of the (new) CSR range.
>
> Yes, the condition check in `SILowerSGPRSpills` can be removed if we can move s[30:31] to CSR range now. @t-tye @arsenm what do you say?
>
> However s[30:31] spill should not come from the register allocation and must be generated only during SGPR/frame lowering for the unwind info (CFI) generation. For this, I still believe s[30:31] has to be reserved so that the register allocation won't interfere with that. Or am I missing something here @scott.linder ?
@t-tye @arsenm @scott.linder ping.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D114652/new/
https://reviews.llvm.org/D114652
More information about the llvm-commits
mailing list