[llvm-branch-commits] [llvm] [AMDGPU] Allow 16-bit imm folding in real true16 (PR #173318)

Tue Jan 13 11:20:35 PST 2026

================
@@ -19,27 +19,16 @@ define amdgpu_kernel void @store_inline_imm_neg_0.0_i16(ptr addrspace(1) %out) {
 ; GFX10-NEXT:    s_waitcnt_vscnt null, 0x0 ; encoding: [0x00,0x00,0xfd,0xbb]
 ; GFX10-NEXT:    s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
 ;
-; GFX11-TRUE16-LABEL: store_inline_imm_neg_0.0_i16:
-; GFX11-TRUE16:       ; %bb.0:
-; GFX11-TRUE16-NEXT:    s_load_b64 s[0:1], s[4:5], 0x0 ; encoding: [0x02,0x00,0x04,0xf4,0x00,0x00,0x00,0xf8]
-; GFX11-TRUE16-NEXT:    v_mov_b16_e32 v0.l, 0x8000 ; encoding: [0xff,0x38,0x00,0x7e,0x00,0x80,0xff,0xff]
-; GFX11-TRUE16-NEXT:    s_mov_b32 s3, 0x31016000 ; encoding: [0xff,0x00,0x83,0xbe,0x00,0x60,0x01,0x31]
-; GFX11-TRUE16-NEXT:    s_mov_b32 s2, -1 ; encoding: [0xc1,0x00,0x82,0xbe]
-; GFX11-TRUE16-NEXT:    s_waitcnt lgkmcnt(0) ; encoding: [0x07,0xfc,0x89,0xbf]
-; GFX11-TRUE16-NEXT:    buffer_store_b16 v0, off, s[0:3], 0 dlc ; encoding: [0x00,0x20,0x64,0xe0,0x00,0x00,0x00,0x80]
-; GFX11-TRUE16-NEXT:    s_waitcnt_vscnt null, 0x0 ; encoding: [0x00,0x00,0x7c,0xbc]
-; GFX11-TRUE16-NEXT:    s_endpgm ; encoding: [0x00,0x00,0xb0,0xbf]
-;
-; GFX11-FAKE16-LABEL: store_inline_imm_neg_0.0_i16:
-; GFX11-FAKE16:       ; %bb.0:
-; GFX11-FAKE16-NEXT:    s_load_b64 s[0:1], s[4:5], 0x0 ; encoding: [0x02,0x00,0x04,0xf4,0x00,0x00,0x00,0xf8]
-; GFX11-FAKE16-NEXT:    v_mov_b32_e32 v0, 0xffff8000 ; encoding: [0xff,0x02,0x00,0x7e,0x00,0x80,0xff,0xff]
-; GFX11-FAKE16-NEXT:    s_mov_b32 s3, 0x31016000 ; encoding: [0xff,0x00,0x83,0xbe,0x00,0x60,0x01,0x31]
-; GFX11-FAKE16-NEXT:    s_mov_b32 s2, -1 ; encoding: [0xc1,0x00,0x82,0xbe]
-; GFX11-FAKE16-NEXT:    s_waitcnt lgkmcnt(0) ; encoding: [0x07,0xfc,0x89,0xbf]
-; GFX11-FAKE16-NEXT:    buffer_store_b16 v0, off, s[0:3], 0 dlc ; encoding: [0x00,0x20,0x64,0xe0,0x00,0x00,0x00,0x80]
-; GFX11-FAKE16-NEXT:    s_waitcnt_vscnt null, 0x0 ; encoding: [0x00,0x00,0x7c,0xbc]
-; GFX11-FAKE16-NEXT:    s_endpgm ; encoding: [0x00,0x00,0xb0,0xbf]
+; GFX11-LABEL: store_inline_imm_neg_0.0_i16:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_load_b64 s[0:1], s[4:5], 0x0 ; encoding: [0x02,0x00,0x04,0xf4,0x00,0x00,0x00,0xf8]
+; GFX11-NEXT:    v_mov_b32_e32 v0, 0xffff8000 ; encoding: [0xff,0x02,0x00,0x7e,0x00,0x80,0xff,0xff]
----------------
rampitec wrote:

This is one of the few regressions, yes. It only happens with `-mattr=-flat-for-global` though. The reason is that data argument of the `BUFFER_STORE_SHORT_OFFSET` is `VGPR_32`, so peephole optimizer promoted it:
```
  %20:vgpr_32 = V_MOV_B32_e32 -32768, implicit $exec
  BUFFER_STORE_SHORT_OFFSET killed %20:vgpr_32, killed %18:sgpr_128, 0, 0, 0, 0, implicit $exec :: (volatile store (s16) into %ir.out.load, addrspace 1)
```
The `GLOBAL_STORE_SHORT_SADDR_t16` takes `VGPR_16` though and it does not happen:
```
  %15:vgpr_16 = V_MOV_B16_t16_e64 0, -32768, 0, implicit $exec
  GLOBAL_STORE_SHORT_SADDR_t16 killed %14:vgpr_32, killed %15:vgpr_16, killed %13:sreg_64_xexec_xnull, 0, 0, implicit $exec :: (volatile store (s16) into %ir.out.load, addrspace 1)
```
SO, I guess we need `BUFFER_STORE_SHORT_OFFSET_t16` taking a 16-bit operand.

https://github.com/llvm/llvm-project/pull/173318