[llvm] [AMDGPU] Include unused preload kernarg in KD total SGPR count (PR #104743)
via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 19 01:08:50 PDT 2024
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-backend-amdgpu
Author: Austin Kerbow (kerbowa)
<details>
<summary>Changes</summary>
Unlike with implicitly preloaded data UserSGPRs firmware is unable to handle cases where SGPRs for kernel arguments contain prelaoded data but not are not explicitly referenced in the kernel. We need to include these preloaded SGPRs in the GRANULATED_WAVEFRONT_SGPR_COUNT calculation to not clobber SGPRs in adjacent waves.
---
Full diff: https://github.com/llvm/llvm-project/pull/104743.diff
2 Files Affected:
- (modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+9-2)
- (added) llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll (+14)
``````````diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index b90d245b7bd394..cfa5216c8c54b1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -970,8 +970,15 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
return SubGPR;
};
- ProgInfo.SGPRBlocks = GetNumGPRBlocks(ProgInfo.NumSGPRsForWavesPerEU,
- IsaInfo::getSGPREncodingGranule(&STM));
+ // Consider cases where the total number of UserSGPRs plus extra SGPRs is
+ // greater than the number of explicitly referenced SGPRs.
+ const MCExpr *MaxUserSGPRs = MCBinaryExpr::createAdd(
+ CreateExpr(MFI->getNumUserSGPRs()), ExtraSGPRs, Ctx);
+
+ ProgInfo.SGPRBlocks =
+ GetNumGPRBlocks(AMDGPUMCExpr::createMax(
+ {ProgInfo.NumSGPRsForWavesPerEU, MaxUserSGPRs}, Ctx),
+ IsaInfo::getSGPREncodingGranule(&STM));
ProgInfo.VGPRBlocks = GetNumGPRBlocks(ProgInfo.NumVGPRsForWavesPerEU,
IsaInfo::getVGPREncodingGranule(&STM));
diff --git a/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll b/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
new file mode 100644
index 00000000000000..34bef81171e812
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
@@ -0,0 +1,14 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -filetype=obj < %s > %t
+; RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+
+; OBJDUMP: Contents of section .rodata:
+; OBJDUMP-NEXT: 0000 00000000 00000000 10010000 00000000
+; OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
+; OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000
+; OBJDUMP-NEXT: 0030 4000af00 94130000 1a000400 00000000
+; OBJDUMP-NOT: 0030 0000af00 94130000 1a000400 00000000
+
+; Include preloaded SGPRs that are not explicitly used in the kernel in
+; GRANULATED_WAVEFRONT_SGPR_COUNT.
+
+define amdgpu_kernel void @amdhsa_kernarg_preload_num_sgprs(i128 inreg) { ret void }
``````````
</details>
https://github.com/llvm/llvm-project/pull/104743
More information about the llvm-commits
mailing list