[llvm] [AMDGPU] Include unused preload kernarg in KD total SGPR count (PR #104743)

Austin Kerbow via llvm-commits llvm-commits at lists.llvm.org
Mon Aug 19 01:08:17 PDT 2024


https://github.com/kerbowa created https://github.com/llvm/llvm-project/pull/104743

Unlike with implicitly preloaded data UserSGPRs firmware is unable to handle cases where SGPRs for kernel arguments contain prelaoded data but not are not explicitly referenced in the kernel. We need to include these preloaded SGPRs in the GRANULATED_WAVEFRONT_SGPR_COUNT calculation to not clobber SGPRs in adjacent waves.

>From af950b636f7880291f824b0d18b30fd077d14079 Mon Sep 17 00:00:00 2001
From: Austin Kerbow <Austin.Kerbow at amd.com>
Date: Mon, 19 Aug 2024 01:02:17 -0700
Subject: [PATCH] [AMDGPU] Include unused preload kernarg in KD total SGPR
 count

Unlike with implicitly preloaded data UserSGPRs firmware is unable to
handle cases where SGPRs for kernel arguments contain prelaoded data but
not are not explicitly referenced in the kernel. We need to include
these preloaded SGPRs in the GRANULATED_WAVEFRONT_SGPR_COUNT calculation
to not clobber SGPRs in adjacent waves.
---
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp        | 11 +++++++++--
 .../AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll     | 14 ++++++++++++++
 2 files changed, 23 insertions(+), 2 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index b90d245b7bd394..cfa5216c8c54b1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -970,8 +970,15 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
     return SubGPR;
   };
 
-  ProgInfo.SGPRBlocks = GetNumGPRBlocks(ProgInfo.NumSGPRsForWavesPerEU,
-                                        IsaInfo::getSGPREncodingGranule(&STM));
+  // Consider cases where the total number of UserSGPRs plus extra SGPRs is
+  // greater than the number of explicitly referenced SGPRs.
+  const MCExpr *MaxUserSGPRs = MCBinaryExpr::createAdd(
+      CreateExpr(MFI->getNumUserSGPRs()), ExtraSGPRs, Ctx);
+
+  ProgInfo.SGPRBlocks =
+      GetNumGPRBlocks(AMDGPUMCExpr::createMax(
+                          {ProgInfo.NumSGPRsForWavesPerEU, MaxUserSGPRs}, Ctx),
+                      IsaInfo::getSGPREncodingGranule(&STM));
   ProgInfo.VGPRBlocks = GetNumGPRBlocks(ProgInfo.NumVGPRsForWavesPerEU,
                                         IsaInfo::getVGPREncodingGranule(&STM));
 
diff --git a/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll b/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
new file mode 100644
index 00000000000000..34bef81171e812
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
@@ -0,0 +1,14 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -filetype=obj < %s > %t
+; RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+
+; OBJDUMP: Contents of section .rodata:
+; OBJDUMP-NEXT: 0000 00000000 00000000 10010000 00000000
+; OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
+; OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000
+; OBJDUMP-NEXT: 0030 4000af00 94130000 1a000400 00000000
+; OBJDUMP-NOT: 0030 0000af00 94130000 1a000400 00000000
+
+; Include preloaded SGPRs that are not explicitly used in the kernel in
+; GRANULATED_WAVEFRONT_SGPR_COUNT.
+
+define amdgpu_kernel void @amdhsa_kernarg_preload_num_sgprs(i128 inreg) { ret void }



More information about the llvm-commits mailing list