[llvm] [WIP][AMDGPU] Change `CC_AMDGPU_Func` to only use SGPR0 to SGPR27 for `inreg` argument passing (PR #115753)

Shilei Tian via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 11 10:34:30 PST 2024


https://github.com/shiltian created https://github.com/llvm/llvm-project/pull/115753

In `emitCSRSpillStores`, a caller-saved SGPR is required to save `exec`, which
limits us to using SGPR0 through SGPR29
Currently, we assume that one is always available; however, this isn’t always
the case, as SGPR0 to SGPR29 are also used for inreg argument passing.

This PR is trying to fix this issue by not using all caller-saved SGPRs for
`inreg` argument passing. This will make sure that we will always have at least
two SGPRs available when it needs CSR spilling.

Fixes #113782.

>From f49d280977a1bfb463d3ec31c4f088dca1fcdea0 Mon Sep 17 00:00:00 2001
From: Shilei Tian <i at tianshilei.me>
Date: Mon, 11 Nov 2024 13:33:59 -0500
Subject: [PATCH] [WIP][AMDGPU] Change `CC_AMDGPU_Func` to only use SGPR0 to
 SGPR27 for `inreg` argument passing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In `emitCSRSpillStores`, a caller-saved SGPR is required to save `exec`, which
limits us to using SGPR0 through SGPR29
Currently, we assume that one is always available; however, this isn’t always
the case, as SGPR0 to SGPR29 are also used for inreg argument passing.

This PR is trying to fix this issue by not using all caller-saved SGPRs for
`inreg` argument passing. This will make sure that we will always have at least
two SGPRs available when it needs CSR spilling.

Fixes #113782.
---
 llvm/docs/AMDGPUUsage.rst                   | 4 ++--
 llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 5b83ea428c0bff..b17226ccde712b 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -16545,7 +16545,7 @@ On entry to a function:
     :ref:`amdgpu-amdhsa-kernel-prolog-m0`.
 4.  The EXEC register is set to the lanes active on entry to the function.
 5.  MODE register: *TBD*
-6.  VGPR0-31 and SGPR4-29 are used to pass function input arguments as described
+6.  VGPR0-31 and SGPR4-27 are used to pass function input arguments as described
     below.
 7.  SGPR30-31 return address (RA). The code address that the function must
     return to when it completes. The value is undefined if the function is *no
@@ -16796,7 +16796,7 @@ The input and result arguments are assigned in order in the following manner:
     How are overly aligned structures allocated on the stack?
 
 * SGPR arguments are assigned to consecutive SGPRs starting at SGPR0 up to
-  SGPR29.
+  SGPR27.
 
   If there are more arguments than will fit in these registers, the remaining
   arguments are allocated on the stack in order on naturally aligned
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td b/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
index 80969fce3d77fb..1787fe76a31ea5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
@@ -127,7 +127,7 @@ def CC_AMDGPU_Func : CallingConv<[
   CCIfType<[i8, i16], CCIfExtend<CCPromoteToType<i32>>>,
 
   CCIfInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16, bf16, v2bf16] , CCAssignToReg<
-    !foreach(i, !range(0, 30), !cast<Register>("SGPR"#i))  // SGPR0-29
+    !foreach(i, !range(0, 28), !cast<Register>("SGPR"#i))  // SGPR0-27
   >>>,
 
   CCIfType<[i32, f32, i16, f16, v2i16, v2f16, i1, bf16, v2bf16], CCAssignToReg<



More information about the llvm-commits mailing list