[PATCH] D113784: [RFC][AMDGPU][GlobalISel] Fix RegBanks for G_CONSTANT

Fri Nov 12 10:38:07 PST 2021

mbrkusanin added a comment.

This is another attempt at trying to get of cross reg bank copies for which I needed manual checks when matching (like in https://reviews.llvm.org/D110937, https://reviews.llvm.org/D98040)

I'm wondering if this would be considered more of a hack then an appropriate way of getting rid of these annoying copies.

We get decreased instruction count in a lot of test.

I'll highlight below (inline) cases that I noticed where instruction count actually increases.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.i16.ll:2101
 ; GFX10-NEXT:    v_cmp_eq_u32_e64 s1, 3, v6
+; GFX10-NEXT:    s_mov_b32 null, 0
 ; GFX10-NEXT:    v_cmp_eq_u32_e64 s2, 0, v6
----------------
This is because of SMEMtoVectorWriteHazards on gfx10. Inserted by GCNHazardRecognizer::fixSMEMtoVectorWriteHazards.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll:74
+; GFX1030-NEXT:    v_lshl_or_b32 v7, v9, 16, v8
+; GFX1030-NEXT:    image_bvh_intersect_ray v[0:3], [v0, v1, v2, v3, v4, v6, v5, v7], s[0:3] a16
+; GFX1030-NEXT:    s_waitcnt vmcnt(0)
----------------
Longer code due to a vgpr (v5) used for const.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll:33-35
+; GFX10-NEXT:    v_mov_b32_e32 v2, 0xff
+; GFX10-NEXT:    v_and_b32_e32 v1, v1, v2
+; GFX10-NEXT:    v_and_b32_e32 v0, v0, v2
----------------
SIFoldOperands won't inline this constant because it has multiple uses. It does this because this might increase code size. Previously this were two different constants (two different instructions) but they had same value. SDag also does not inline const but uses sgpr.

================
Comment at: llvm/test/CodeGen/AMDGPU/cttz.ll:973-981
+; GFX10-GISEL-NEXT:    v_mov_b32_e32 v2, 0
 ; GFX10-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX10-GISEL-NEXT:    global_load_dword v0, v0, s[2:3]
 ; GFX10-GISEL-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10-GISEL-NEXT:    v_ffbl_b32_e32 v1, v0
 ; GFX10-GISEL-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
 ; GFX10-GISEL-NEXT:    v_min_u32_e32 v1, 32, v1
----------------
Extra vgpr used here (now 3 was 2)
Previously we had "S_MOV_B32 0" and "V_MOV_B32_e32 0". Now both are V_MOV*.
CSE will combine them and keep first one (earlier one).
SIFoldOperands inlines first use (the one that was originally sgpr) leaving us with less optimal scheduling the what we originally had.
The only "V_MOV_B32_e32 0" now need to be allocated to a v2 because const is def'd earlier than it really needs to.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113784/new/

https://reviews.llvm.org/D113784