[llvm] [AMDGPU] Architected SGPRs for GFX12 (PR #76140)

Thu Jan 4 03:30:17 PST 2024

================
@@ -0,0 +1,34 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -mattr=+architected-sgprs --verify-machineinstrs < %s | FileCheck -check-prefixes=GFX9,GFX9-SDAG %s
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -mattr=+architected-sgprs -global-isel --verify-machineinstrs < %s | FileCheck -check-prefixes=GFX9,GFX9-GISEL %s
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx1200 --verify-machineinstrs < %s | FileCheck -check-prefixes=GFX12,GFX12-SDAG %s
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx1200 -global-isel --verify-machineinstrs < %s | FileCheck -check-prefixes=GFX12,GFX12-GISEL %s
+
+define amdgpu_cs void @test_wave_id(ptr addrspace(1) %out) {
+; GFX9-LABEL: test_wave_id:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_bfe_u32 s0, ttmp8, 0x50019
+; GFX9-NEXT:    v_mov_b32_e32 v2, s0
+; GFX9-NEXT:    global_store_dword v[0:1], v2, off
+; GFX9-NEXT:    s_endpgm
+;
+; GFX12-LABEL: test_wave_id:
+; GFX12:       ; %bb.0:
+; GFX12-NEXT:    s_bfe_u32 s0, ttmp8, 0x50019
+; GFX12-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
+; GFX12-NEXT:    v_mov_b32_e32 v2, s0
+; GFX12-NEXT:    global_store_b32 v[0:1], v2, off
+; GFX12-NEXT:    s_nop 0
+; GFX12-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX12-NEXT:    s_endpgm
+  %waveid = call i32 @llvm.amdgcn.wave.id()
+  store i32 %waveid, ptr addrspace(1) %out
+  ret void
+}
+
----------------
arsenm wrote:

OK, I see what you mean. But that means this is routing through a lot of complexity in order to support function ABI handling. This case is much simpler and can bypass all of it. The intrinsic lowering just needs to directly copy from the TTMP register. The existing inputs are a poor analogy if we have one dedicated, constant register 

https://github.com/llvm/llvm-project/pull/76140