[PATCH] D102177: [AMDGPU][RFC] Improve sgpr function arguments

Sebastian Neubauer via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon May 10 09:18:49 PDT 2021


sebastian-ne created this revision.
sebastian-ne added reviewers: t-tye, arsenm, madhur13490.
Herald added subscribers: kerbowa, hiraditya, tpr, dstuttard, yaxunl, nhaehnle, jvesely, kzhuravl.
sebastian-ne requested review of this revision.
Herald added subscribers: llvm-commits, wdng.
Herald added a project: LLVM.

The SGPR layout on functions calls currently looks like this:

+------------+--------------+-------------------------+-------------------+-------------------+------------------+

| s[0:3] SRD | arguments... | s[30:31] return address | s32 stack pointer | s33 frame pointer | s34 base pointer |
|

+------------+--------------+-------------------------+-------------------+-------------------+------------------+

The return address and stack pointer occupy multiple 4-aligned blocks
of SGPRs.
Large scalar memory reads require a 4-aligned block of SGPRs, so if less
of them are available, register allocation becomes more difficult.

The stack resource descriptor occupies SGPR0-3. If we want to pass
user-data SGPRS to a function, the SGPRs need to be moved from s[0:...]
to s[4:...] before the call.
This is also the case when flat scratch is used instead of the SRD, even
if s[0:4] is unused then, because the same call convention is used.

To improve this, I propose the following layout:

+--------------+-------------------+-------------------+-------------------------+--------------+------------------+

| arguments... | s32 stack pointer | s33 frame pointer | s[34:35] return address | s[36:39] SRD | s40 base pointer |
|

+--------------+-------------------+-------------------+-------------------------+--------------+------------------+

The stack and frame pointer stay where they are and the return address
is moved behind them, into s[34:35]. That way, only one 4-aligned block
is occupied.

To free s[0:3] for arguments, the SRD is moved to s[36:39]. This has
the effect, that all of s[0:31] can be used for arguments.

The base pointer is not used in the general case, so it is moved to
s40.

Using flat scratch instead of the SRD will leave a whole between the
return address and the base pointer (if used). So, alternatively we
could swap the base pointer and the SRD, leaving a 4-wide whole in the
common case and when the SRD is in use.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D102177

Files:
  llvm/docs/AMDGPUUsage.rst
  llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
  llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
  llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
  llvm/lib/Target/AMDGPU/SIRegisterInfo.h

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D102177.344085.patch
Type: text/x-patch
Size: 8040 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210510/bda195a9/attachment.bin>


More information about the llvm-commits mailing list