[PATCH] D95071: [AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses.

Wed Jan 20 14:27:49 PST 2021

scott.linder added a comment.

Thank you for the patch, I definitely think this is a step in the right direction! However, I think we are still broken in common cases, the first of which can be seen by simply removing `amdgpu_kernel` from `llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll` and noticing that the `soffset` of the `buffer_load_dword`s in `%bb.1` do not correctly refer to the frame pointer, they are constant `0` just like in the kernel case:

  $ sed 's/amdgpu_kernel//' llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll | release/bin/llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs | grep '%bb' -A3
  ; %bb.0:                                ; %entry
          s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
          s_waitcnt_vscnt null, 0x0
          s_or_saveexec_b32 s4, -1
  --
  ; %bb.1:                                ; %if.then4.i
          s_clause 0x1
          buffer_load_dword v0, v40, s[0:3], 0 offen
          buffer_load_dword v1, v40, s[0:3], 0 offen offset:4

This was true before this patch, and also true with a slightly different result back when the `FIXME` was first added:

  $ git switch --detach 60b1967c3933c && ninja -C release/ llc && sed 's/amdgpu_kernel//' llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll | release/bin/llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs | grep '%bb' -A3
  HEAD is now at 60b1967c3933 [AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
  ninja: Entering directory `release/'
  ninja: no work to do.
  ; %bb.0:                                ; %entry
          s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
          s_waitcnt_vscnt null, 0x0
          s_or_saveexec_b32 s4, -1
  --
  ; %bb.1:                                ; %if.then4.i
          buffer_load_dword v0, v32, s[0:3], s32 offen
          buffer_load_dword v1, v32, s[0:3], s32 offen offset:4
          s_waitcnt vmcnt(0)

The fundamental issue is in all of these cases is that we still are relying on `eliminateFrameIndex` to correct the `soffset` of an `MUBUF` instruction, when it is not guaranteed that the frame index will survive to this point in it's original place (i.e. as an operand of the `MUBUF` we care about updating). From the comment, it seems like `LocalStackSlotAllocation` is one such place that can move the frame index. One mitigating factor is that we also try to fold the frame index back into the `MUBUF` when possible in another pass, but as we can't do this in all cases, we can't rely on it for generating correct code.

I think using a pseudo for these `MUBUF` cases early, and lowering them around the same time as `eliminateFrameIndex` (i.e. some time after we know how to populate the `soffset` field of the corresponding "real" `MUBUF` instruction) should work in all cases. IIRC this was the approach Matt liked back when the `FIXME` was added.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:495-502

   if (IsFlat) {
     assert(TII->isLegalFLATOffset(NewOffset, AMDGPUAS::PRIVATE_ADDRESS, true) &&
            "offset should be legal");
     FIOp->ChangeToRegister(BaseReg, false);
     OffsetOp->setImm(NewOffset);
     return;
----------------
Not exactly related to this patch, but I feel like I must be reading this wrong. Why is there a different assert and an early return for DEBUG builds? If isLegalFLATOffset implies isLegalMUBUFImmOffset then this is just an additional, stricter check to what is below, but it still seems odd to have a `return` here then. Why not just fall through in the DEBUG case instead of copy-pasting the code?

================
Comment at: llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll:19

-define amdgpu_kernel void @local_stack_offset_uses_sp(i64 addrspace(1)* %out, i8 addrspace(1)* %in) {
 ; MUBUF-LABEL: local_stack_offset_uses_sp:
----------------
I don't see the connection to the patch at hand, is this just unrelated cleanup or does this patch affect the dead code here?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95071/new/

https://reviews.llvm.org/D95071