[PATCH] D102177: [AMDGPU][RFC] Improve sgpr function arguments
Sebastian Neubauer via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon May 10 09:18:49 PDT 2021
sebastian-ne created this revision.
sebastian-ne added reviewers: t-tye, arsenm, madhur13490.
Herald added subscribers: kerbowa, hiraditya, tpr, dstuttard, yaxunl, nhaehnle, jvesely, kzhuravl.
sebastian-ne requested review of this revision.
Herald added subscribers: llvm-commits, wdng.
Herald added a project: LLVM.
The SGPR layout on functions calls currently looks like this:
+------------+--------------+-------------------------+-------------------+-------------------+------------------+
| s[0:3] SRD | arguments... | s[30:31] return address | s32 stack pointer | s33 frame pointer | s34 base pointer |
|
+------------+--------------+-------------------------+-------------------+-------------------+------------------+
The return address and stack pointer occupy multiple 4-aligned blocks
of SGPRs.
Large scalar memory reads require a 4-aligned block of SGPRs, so if less
of them are available, register allocation becomes more difficult.
The stack resource descriptor occupies SGPR0-3. If we want to pass
user-data SGPRS to a function, the SGPRs need to be moved from s[0:...]
to s[4:...] before the call.
This is also the case when flat scratch is used instead of the SRD, even
if s[0:4] is unused then, because the same call convention is used.
To improve this, I propose the following layout:
+--------------+-------------------+-------------------+-------------------------+--------------+------------------+
| arguments... | s32 stack pointer | s33 frame pointer | s[34:35] return address | s[36:39] SRD | s40 base pointer |
|
+--------------+-------------------+-------------------+-------------------------+--------------+------------------+
The stack and frame pointer stay where they are and the return address
is moved behind them, into s[34:35]. That way, only one 4-aligned block
is occupied.
To free s[0:3] for arguments, the SRD is moved to s[36:39]. This has
the effect, that all of s[0:31] can be used for arguments.
The base pointer is not used in the general case, so it is moved to
s40.
Using flat scratch instead of the SRD will leave a whole between the
return address and the base pointer (if used). So, alternatively we
could swap the base pointer and the SRD, leaving a 4-wide whole in the
common case and when the SRD is in use.
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D102177
Files:
llvm/docs/AMDGPUUsage.rst
llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
llvm/lib/Target/AMDGPU/SIRegisterInfo.h
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D102177.344085.patch
Type: text/x-patch
Size: 8040 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210510/bda195a9/attachment.bin>
More information about the llvm-commits
mailing list