[PATCH] D75138: [WIP][AMDGPU] Eliminate the ScratchWaveOffset register from the calling convention

Tue Feb 25 19:14:24 PST 2020

scott.linder marked an inline comment as done.
scott.linder added a comment.

Yes, there are a lot of test updates and likely more new tests needed, but I just posted some tests that exercise the bits I'm currently stuck on for now.

I will try to articulate the issue with `hasFP` better tomorrow morning, but currently we are making the decision about whether to have a distinct FP (i.e. S34) before we actually know if we use the stack. If we have a call, but no stack use early, and then later we need to reference the stack we end up in a situation where at PEI time we are updating the same register both for the ABI SP and for the entry function FP, which obviously isn't right.

The right thing seems to be to not have any stack or frame pointer at all, but I am not sure how to implement that and wanted to ask for some help estimating how reasonable that would be.

================
Comment at: llvm/lib/Target/AMDGPU/SIFrameLowering.cpp:602-603
+
+    // Save and restore SRSRC bits [48:63]. We only want to update the base
+    // address in bits [0:47].
+    BuildMI(MBB, I, DL, TII->get(AMDGPU::S_AND_B32), SavedWord)
----------------
arsenm wrote:
> arsenm wrote:
> > arsenm wrote:
> > > arsenm wrote:
> > > > arsenm wrote:
> > > > > Do we actually need these bits? I'm fairly confident these are always 0 in the HSA resource descriptor (or at least are a known constant we can just reproduce later)
> > > > According to this it's hardcoded: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/core/runtime/amd_aql_queue.cpp#L1015
> > > > 
> > > > We just need to worry about SWIZZLE_ENABLE being set to 1. This is the high bit, so all it can do is trigger a carry on the second add. So I think that means you can get away with just doing the add, and then using s_bitset1_b32 to ensure it wasn't carried away
> > > Actually, I don't think any add that fits in the 48-bit address space should ever touch the high bits (although I usually manage to be wrong about known bits optimizations with adds)
> > I think this means it's OK to just not worry about the high bits: https://rise4fun.com/Alive/i24
> > 
> As long as we know bit 48 is 0, this seems fine. As this is hardcoded in the driver, this is probably OK https://rise4fun.com/Alive/KmH
That make sense to me, and this would simplify things a lot. I don't quite understand if we need to ensure [48:62] are 0, though? If the addc carries into bit 48 is that an issue? I.e. https://rise4fun.com/Alive/qsv

At the very least, it seems like we can avoid the need to save anything and just mask in a constant, but if it is possible to avoid that too it removes a couple additional instructions from nearly every kernel prologue.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75138/new/

https://reviews.llvm.org/D75138