<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/123728>123728</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AMDGPU][GISel] masking instructions (`and`) not necessary
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
qcolombet
</td>
</tr>
</table>
<pre>
When comparing SDISel with GISel I stumbled on a case where GISel keeps a bunch of `and` instructions to make the values to a `i8` type.
These masks are actually useless because the value is fed to a `truncstore i8` which does the masking itself.
SDISel removes all the redundant bit logic but GISel doesn't.
I haven't dug too much into the details therefore I don't know exactly how the simplification is implemented in SDISel (this is some combines for sure, but which ones I don't know.)
Note that this is likely an issue for all targets hence a generic combiner helper is likely welcome.
# To Reproduce #
Download the attached IR or copy/past the LLVM IR input in the section below.
[repro.ll.txt](https://github.com/user-attachments/files/18489129/repro.ll.txt)
Then run:
```bash
llc -march=amdgcn -mcpu=gfx942 -mtriple amdgcn-amd-hmcsa -global-isel=<0|1> repro.ll -o -
```
# Results #
GISel ends up with a bunch of bit manipulation operations, in particular `and`s, whereas SDISel doesn't.
With GISel:
```asm
s_load_dwordx2 s[0:1], s[4:5], 0x24
v_mov_b32_e32 v2, 8
v_mov_b32_e32 v0, 0xff
s_waitcnt lgkmcnt(0)
s_lshr_b32 s2, s0, 16
s_lshr_b32 s3, s1, 16
v_cvt_f32_f16_e32 v1, s2
v_cvt_f32_f16_e32 v3, s0
v_cvt_f32_f16_e32 v4, s1
v_cvt_f32_f16_e32 v5, s3
v_cvt_i32_f32_e32 v1, v1
v_cvt_i32_f32_e32 v3, v3
v_cvt_i32_f32_e32 v4, v4
v_cvt_i32_f32_e32 v5, v5
v_and_b32_e32 v6, 0xff, v1 <-- the AND I'm talking about
v_lshlrev_b32_sdwa v1, v2, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0
v_and_or_b32 v0, v3, v0, v1
v_and_b32_e32 v1, 0xff, v4 <-- here too
v_and_b32_e32 v4, 0xff, v5 <-- here too
v_lshlrev_b32_e32 v1, 16, v1
v_lshlrev_b32_e32 v4, 24, v4
v_or3_b32 v4, v0, v1, v4
v_lshlrev_b16_e32 v0, 8, v6
v_or_b32_sdwa v0, v3, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
v_lshrrev_b16_e32 v3, 8, v0
v_mov_b64_e32 v[0:1], 0
global_store_byte v[0:1], v4, off
v_mov_b64_e32 v[0:1], 1
global_store_byte v[0:1], v3, off
v_mov_b64_e32 v[0:1], 2
v_lshrrev_b16_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
global_store_byte_d16_hi v[0:1], v4, off
v_mov_b64_e32 v[0:1], 3
global_store_byte v[0:1], v2, off
```
With SDISel:
```asm
s_load_dwordx2 s[0:1], s[4:5], 0x24
v_mov_b64_e32 v[0:1], 2
s_waitcnt lgkmcnt(0)
v_cvt_i16_f16_e32 v3, s1
s_lshr_b32 s3, s1, 16
v_cvt_i16_f16_e32 v2, s0
global_store_byte v[0:1], v3, off
v_mov_b64_e32 v[0:1], 0
s_lshr_b32 s2, s0, 16
v_cvt_i16_f16_e32 v5, s3
global_store_byte v[0:1], v2, off
v_mov_b64_e32 v[0:1], 3
v_cvt_i16_f16_e32 v4, s2
global_store_byte v[0:1], v5, off
v_mov_b64_e32 v[0:1], 1
global_store_byte v[0:1], v4, off
```
# Note #
Input IR:
```llvm
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
target triple = "amdgcn-amd-amdhsa"
define amdgpu_kernel void @foo(<4 x half> %i35) {
bb:
%i90 = fptosi <4 x half> %i35 to <4 x i8>
store <4 x i8> %i90, ptr addrspace(1) null, align 1
ret void
}
```
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy0WF9vozoW_zTuyxEV2ECShzzQZnpV6d7Zq87MjvYpMnAI3hqbtQ1pv_3KhjSk0446u_dKFdT8zvmdv_4Xbq04KMQtyW5Itrvig2u12f6n0lJ3JbqrUtfP2-8tKqh013Mj1AG-7O6_oISjcC38Fv69B-uGrpRYg1bAoeIW4diiwVngEbG3wKEcVNWCboDkMVc1yWMQyjozVE5oZcFp6PgjgmsRRi4HDJ-4FxdrL-2ee7wmcfG1RYvQcftogRsEXrmBS_kMg0WJ1kKJFR_sggmEhQbrFz5nBlVZpw3CRH1sRdVCrb3NduL20QpnUTbe5hy3wU6PaIFLGQQN1oOquXJQCgdSH0QF5eDmyD2fInTlPAOJi3to-YjhC9TDAZzW0A1VC0I5HfhqdFzI4ITBxvt3D7WeNB6VPgI-8crJZ2j1MShY0fVSNKLiPok-Tv8BO1QOaxDqVDBC1671qAWrO_QVLYVCC402YAeDhN4Gz6dMaA9dmr4mdDOF8Vk7n1vu4EQpxSPKZ-DeATtgIA0p4uaAzkKLqkLgcECFRlQn6wZalD2aBcURZaU7nBNGKIOvGh6wN7oeKgRC2YTs9FFJzeuQBO4cr1qs4f4BtIFK98-E3vXcugD__vs___CQUP3gfE5C4jC0HZQofWxxQbIb4-1cS3ntnhzJdoSuW-d6S1hB6B2hdwfh2qG8rnRH6N1g0USTZZ9tS-hdIyT6d7JO15uEbgi9u6A8JfCrn1NmUJ44LkgeT38lty2JCykriDpuqpawHe_qQ6Ug6qp-IGx3aJ42KQWIOmdELxEmPOJdHbVdZTlEB6lLLiNhURK2I-w2JqvbhLBPACdnINIQLS2fs_2AdpDOnhM9dTKq2sLQT_N-MZV913dciX6QUwPqHk34z_qGEgp6bpyoBsnNed4HLCwR3J4a9PVc-f6ywrzKErddkNjYvW-AfX3Upn6iYEl2ExNWJKFyt2GcElZk8zh-oumkOO47Pe5LRvfIKIzUo-s3oXhSbJqTxSMXrlIO5OGxq5QjdB2fqur9sa3xymADpw3qSf4jzAKcLOFxX41u3zC6b5J8sh5wS9_F2WzkPTydrbyHZwFnS1x4_BR9sD8m7-LB_vi-frA_pu_iwf6YnXCu6nPm85fMBx-AsNsoChO3-LyDe0JXHTguwyrNSz24E4u0rTQ41dDWR34Kg85EtXX7MDWK3fd_POzCeFCDxZqw4tvnb18-7fZ_FjuwpoovBK2pkvnDzb--ftrHS7_1VNipYebExJfpuwgvuQgvncMLm6bT-k2V9EIle0dlGf7ZUpJf-vKDVCCnrwqmDZuiSi_iuZB5YTo1VTzNJi-Un4kW5bhI0f9Sjin7y3oE1YVH5sIjdvYovpjleToJvFo4ZqFpHd2HU8K-fHb4g-CUF31aHX7OmnyYlf0CK30r6inPdNH26V_S9n68fy-OfZ3k-1b8f0liH04SXbBe7mJh45g2lb9r5_h5MX6-S8zrYJK_XseTX9gmLtTpchv4i_sr_tDW9oZTF3vLL9bzg13yhtV0uWN-wGr2t0zg9N3e9CescHp-OV7dhzPp_cOrVpVy9L06nZ-h5o5L_qwHB4TtgFCKUU9YkafhEfXJckAJK9j0iHq2HKRLsWyJ5MvBygeT-5Bolr88PbL2CJ2fUb_x_2zoj4LiZGVM8kAWjTSdoPFkZkzXs8wmnwnHF7JofGGMxizxX7OERmMSBx7_ikYaBwr_ilSgzdPoC6NRkUW_JZEShBU-Fi-0IZSeEzofnudkLs7QvKtbyydZEhc1NkJNh-x-2D-iUShh1KIGksaN1sTn4jaFJ2i5bPwZm9BMsIzQDZDVDYmLspwqCwHZxMFm0zttBbyp6u-oMyDWhH0KutNVdfl5pvOd1jsDvK6N7XmFhK4Tb10NUnqQS3FQoXf98d8F531sq92y3a7qLas3bMOvcJus2CrN8nW2vmq3601KE1axmiY8bjacYl3HTZmVmzxNs7K8Elsa0yxOaJLEWZqsr1nd8GyVVE1CKTYlJ2mMHRfy2rf0tTaHq3BF3CaUrej6SvISpT39BGG2Xioqh4MlaSyFdfas54ST4ceK4o_db39-87Mtu5luCNnufGVf_qTg63O6dISsaAcKK7SWm-erwcjtTy54YQ5Or6g3-t9YOULvgvfhkjcFMG7pfwMAAP__qwEU2A">