[llvm] [WIP][AMDGPU] Change `CC_AMDGPU_Func` to only use SGPR0 to SGPR27 for `inreg` argument passing (PR #115753)
Shilei Tian via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 11 10:34:30 PST 2024
https://github.com/shiltian created https://github.com/llvm/llvm-project/pull/115753
In `emitCSRSpillStores`, a caller-saved SGPR is required to save `exec`, which
limits us to using SGPR0 through SGPR29
Currently, we assume that one is always available; however, this isn’t always
the case, as SGPR0 to SGPR29 are also used for inreg argument passing.
This PR is trying to fix this issue by not using all caller-saved SGPRs for
`inreg` argument passing. This will make sure that we will always have at least
two SGPRs available when it needs CSR spilling.
Fixes #113782.
>From f49d280977a1bfb463d3ec31c4f088dca1fcdea0 Mon Sep 17 00:00:00 2001
From: Shilei Tian <i at tianshilei.me>
Date: Mon, 11 Nov 2024 13:33:59 -0500
Subject: [PATCH] [WIP][AMDGPU] Change `CC_AMDGPU_Func` to only use SGPR0 to
SGPR27 for `inreg` argument passing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In `emitCSRSpillStores`, a caller-saved SGPR is required to save `exec`, which
limits us to using SGPR0 through SGPR29
Currently, we assume that one is always available; however, this isn’t always
the case, as SGPR0 to SGPR29 are also used for inreg argument passing.
This PR is trying to fix this issue by not using all caller-saved SGPRs for
`inreg` argument passing. This will make sure that we will always have at least
two SGPRs available when it needs CSR spilling.
Fixes #113782.
---
llvm/docs/AMDGPUUsage.rst | 4 ++--
llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 5b83ea428c0bff..b17226ccde712b 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -16545,7 +16545,7 @@ On entry to a function:
:ref:`amdgpu-amdhsa-kernel-prolog-m0`.
4. The EXEC register is set to the lanes active on entry to the function.
5. MODE register: *TBD*
-6. VGPR0-31 and SGPR4-29 are used to pass function input arguments as described
+6. VGPR0-31 and SGPR4-27 are used to pass function input arguments as described
below.
7. SGPR30-31 return address (RA). The code address that the function must
return to when it completes. The value is undefined if the function is *no
@@ -16796,7 +16796,7 @@ The input and result arguments are assigned in order in the following manner:
How are overly aligned structures allocated on the stack?
* SGPR arguments are assigned to consecutive SGPRs starting at SGPR0 up to
- SGPR29.
+ SGPR27.
If there are more arguments than will fit in these registers, the remaining
arguments are allocated on the stack in order on naturally aligned
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td b/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
index 80969fce3d77fb..1787fe76a31ea5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
@@ -127,7 +127,7 @@ def CC_AMDGPU_Func : CallingConv<[
CCIfType<[i8, i16], CCIfExtend<CCPromoteToType<i32>>>,
CCIfInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16, bf16, v2bf16] , CCAssignToReg<
- !foreach(i, !range(0, 30), !cast<Register>("SGPR"#i)) // SGPR0-29
+ !foreach(i, !range(0, 28), !cast<Register>("SGPR"#i)) // SGPR0-27
>>>,
CCIfType<[i32, f32, i16, f16, v2i16, v2f16, i1, bf16, v2bf16], CCAssignToReg<
More information about the llvm-commits
mailing list