[llvm] [AMDGPU] Skip register uses in AMDGPUResourceUsageAnalysis (PR #133242)

Diana Picus via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 9 05:36:35 PDT 2025


rovka wrote:

> I created some small tests, to make sure this works as intended in all cases. Probably makes sense to add them here. One of them yields surprising results, probably something that should be fixed.
> 
> PAL tests:
> 
> ```llvm
> ; RUN: llc -mcpu=gfx1200 -o - < %s | FileCheck %s
> ; Check that reads of a VGPR in kernels counts towards VGPR count, but in functions, only writes of VGPRs count towards VGPR count.
> target triple = "amdgcn--amdpal"
> 
> @global = addrspace(1) global i32 poison, align 4
> 
> ; CHECK-LABEL: amdpal.pipelines:
> 
> ; Neither uses not writes a VGPR, but the hardware initializes the VGPRs that the kernel receives, so they count as used.
> ; CHECK-LABEL: .entry_point_symbol: kernel_use
> ; CHECK: .vgpr_count:     0x20
> define amdgpu_cs void @kernel_use([32 x i32] %args) {
> entry:
>   %a = extractvalue [32 x i32] %args, 14
>   store i32 %a, ptr addrspace(1) @global
>   ret void
> }
> 
> ; Neither uses not writes a VGPR
> ; CHECK-LABEL: gfx_func:
> ; CHECK: .vgpr_count:     0x20
> define amdgpu_gfx [32 x i32] @gfx_func([32 x i32] %args) {
> entry:
>   ret [32 x i32] %args
> }
> 
> ; Neither uses not writes a VGPR
> ; CHECK-LABEL: chain_func:
> ; CHECK: .vgpr_count:     0x1
> define amdgpu_cs_chain void @chain_func([32 x i32] %args) {
> entry:
>   call void (ptr, i32, {}, [32 x i32], i32, ...) @llvm.amdgcn.cs.chain.p0.i32.s.a(
>         ptr @chain_func, i32 0, {} inreg {}, [32 x i32] %args, i32 0)
>   unreachable
> }
> ```
> 
> The (to me) surprising one is `gfx_func`, it only contains SALU instructions, so should have no defs of VGPRs and only uses for the return. I would expect it to have `vgpr_count: 0x0` or maybe 0x1.
> 
> This one works as expected:
> 
> ```llvm
> ; RUN: llc -mcpu=gfx1200 -o - < %s | FileCheck %s
> target triple = "amdgcn--amdpal"
> 
> declare amdgpu_gfx void @gfx_dummy([32 x i32] %args)
> 
> ; CHECK-LABEL: .entry_point_symbol: kernel_call
> ; CHECK: .vgpr_count:     0x20
> define amdgpu_cs void @kernel_call([32 x i32] %args) {
> entry:
>   call amdgpu_gfx void @gfx_dummy([32 x i32] %args)
>   ret void
> }
> ```
> 
> Carefully crafted compute test (the hw initializes at most one VGPR, so the test needs to ensure, no VGPR is ever written from any instruction). Also works as expected (correctly marks one VGPR as used).
> 
> ```llvm
> ; RUN: llc -mcpu=gfx1200 -o - < %s | FileCheck %s
> target triple = "amdgcn-amd-amdhsa"
> 
> @global = addrspace(1) global i32 poison, align 4
> 
> ; Carefully crafted kernel that uses v0 but never writes a VGPR or reads another VGPR.
> ; Only hardware-initialized VGPRs (v0) are read in this kernel.
> 
> ; CHECK-LABEL: amdhsa.kernels:
> ; CHECK: .vgpr_count:     1
> define amdgpu_kernel void @kernel(ptr addrspace(8) %rsrc) #0 {
> entry:
>   %id = call i32 @llvm.amdgcn.workitem.id.x()
>   call void @llvm.amdgcn.raw.ptr.buffer.store.i32(i32 %id, ptr addrspace(8) %rsrc, i32 0, i32 0, i32 0)
>   ret void
> }
> 
> attributes #0 = { "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" }
> ```

Hi Sebastian, thanks for looking into this!

I don't think the existing test coverage is really that bad. The testcase that you pointed out (amgpu_gfx leaf function) is not conceptually that different from the many amdgpu_gfx testcases we have [here](https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AMDGPU/amdpal-callable.ll), or even from [the leaf function test](https://github.com/llvm/llvm-project/pull/133242/files#diff-a2f98b42243e1ef9f9428c36e0940ae63895ef4d6502d8f4bf9cac65f8e38984) I added.  All of these have in common the fact that they follow the code path for leaf functions, which gets the VGPR/SGPR/AGPR usage by just checking TRI. 

I tried updating this code path to look only at register defs, so we're consistent with non-leaf functions, but that produces different results in a lot of tests, including for things like `.numbered_sgpr`, `granulated_wavefront_sgpr_count`, `wavefront_sgpr_count`, `.amdhsa_next_free_sgpr`... you get the idea. I'm feeling a bit uncomfortable updating all of those, because I'm not sure what they're used for and if it's actually ok for them to ignore uses. In any case, over-reporting the register usage of leaf functions is benign. It's still going to be less than the usage of whatever function/kernel actually defines those registers, so we won't be allocating too much. Can I get away with just a comment or something? :D

https://github.com/llvm/llvm-project/pull/133242


More information about the llvm-commits mailing list