[llvm] [AMDGPU] Move WWM register pre-allocation to during regalloc (PR #70618)

Carl Ritson via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 31 20:08:48 PDT 2023


================
@@ -725,36 +725,36 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_div_value_agent_scope_
 ; GFX9-DPP-NEXT:    s_swappc_b64 s[30:31], s[16:17]
 ; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v1, exec_lo, 0
 ; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v1, exec_hi, v1
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, v0
+; GFX9-DPP-NEXT:    v_mov_b32_e32 v40, v0
 ; GFX9-DPP-NEXT:    s_not_b64 exec, exec
-; GFX9-DPP-NEXT:    v_bfrev_b32_e32 v3, 1
+; GFX9-DPP-NEXT:    v_bfrev_b32_e32 v40, 1
 ; GFX9-DPP-NEXT:    s_not_b64 exec, exec
 ; GFX9-DPP-NEXT:    s_or_saveexec_b64 s[0:1], -1
-; GFX9-DPP-NEXT:    v_bfrev_b32_e32 v5, 1
-; GFX9-DPP-NEXT:    v_bfrev_b32_e32 v4, 1
+; GFX9-DPP-NEXT:    v_bfrev_b32_e32 v42, 1
+; GFX9-DPP-NEXT:    v_bfrev_b32_e32 v41, 1
 ; GFX9-DPP-NEXT:    s_nop 0
-; GFX9-DPP-NEXT:    v_mov_b32_dpp v5, v3 row_shr:1 row_mask:0xf bank_mask:0xf
-; GFX9-DPP-NEXT:    v_add_f32_e32 v3, v3, v5
-; GFX9-DPP-NEXT:    v_bfrev_b32_e32 v5, 1
+; GFX9-DPP-NEXT:    v_mov_b32_dpp v42, v40 row_shr:1 row_mask:0xf bank_mask:0xf
+; GFX9-DPP-NEXT:    v_add_f32_e32 v40, v40, v42
----------------
perlfu wrote:

I have addressed this by turning off UsedPhysRegMask testing in isPhysRegUsed.
Because the pass now runs after the first register rewrite (for SGPRs) the clobbers from calls have been inserted into the UsedPhysRegMask even for VGPRs.
Ignoring seems safe as: 1) we never knew about them before; 2) they will be correctly spilt if live ranges are split.
Additionally as far as I understand the only thing in our code path that updates UsedPhysRegMask is VirtRegMap, so ignoring the map for WWM register assignment is valid.

https://github.com/llvm/llvm-project/pull/70618


More information about the llvm-commits mailing list