<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/147863>147863</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[LLVM] Suboptimal code generated for rounding right shifts on NEON, AArch64 SIMD, LSX, and RISC-V V
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
johnplatts
</td>
</tr>
</table>
<pre>
Here is a link to a snippet that generates suboptimal code on NEON, AArch64 SIMD, LSX, and RISC-V V:
https://alive2.llvm.org/ce/z/V82FtF
Alive2 has determined that the transformation of `(a >> b) + ((b == 0) ? 0 : ((a >> (b - 1)) & 1))` to `(b == 0) ? a : ((a >> (b - 1)) - (a >> b))` seems to be correct.
The above snippet can be further optimized to the following on ARMv7 NEON (arm-linux-gnueabihf):
```
src1: @ @src1
vneg.s32 q9, q1
vrshl.u32 q0, q0, q9
mov pc, lr
tgt1: @ @tgt1
vneg.s32 q9, q1
vrshl.u32 q0, q0, q9
mov pc, lr
src2: @ @src2
vneg.s32 q9, q1
vrshl.s32 q0, q0, q9
mov pc, lr
tgt2: @ @tgt2
vneg.s32 q9, q1
vrshl.s32 q0, q0, q9
mov pc, lr
```
The above snippet can be further optimized to the following on AArch64:
```
src1: // @src1
neg v1.4s, v1.4s
urshl v0.4s, v0.4s, v1.4s
ret
tgt1: // @tgt1
neg v1.4s, v1.4s
urshl v0.4s, v0.4s, v1.4s
ret
src2: // @src2
neg v1.4s, v1.4s
srshl v0.4s, v0.4s, v1.4s
ret
tgt2: // @tgt2
neg v1.4s, v1.4s
srshl v0.4s, v0.4s, v1.4s
ret
```
The above snippet can be further optimized to the following on LoongArch64 with LSX:
```
src1: # @src1
vsrlr.w $vr0, $vr0, $vr1
ret
tgt1: # @tgt1
vsrlr.w $vr0, $vr0, $vr1
ret
src2: # @src2
vsrar.w $vr0, $vr0, $vr1
ret
tgt2: # @tgt2
vsrar.w $vr0, $vr0, $vr1
ret
```
The above snippet can be further optimized to the following on 64-bit RISC-V with the "V" extension:
```
src1: # @src1
csrwi vxrm, 0
vsetivli zero, 4, e32, m1, ta, ma
vssrl.vv v8, v8, v9
ret
tgt1: # @tgt1
csrwi vxrm, 0
vsetivli zero, 4, e32, m1, ta, ma
vssrl.vv v8, v8, v9
ret
src2: # @src2
csrwi vxrm, 0
vsetivli zero, 4, e32, m1, ta, ma
vssra.vv v8, v8, v9
ret
tgt2: # @tgt2
csrwi vxrm, 0
vsetivli zero, 4, e32, m1, ta, ma
vssra.vv v8, v8, v9
ret
```
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy0V99vo7gT_2ucl1EiYwhJHnhgt0Xfldr9SttTda8GBvAe2Fnb0F7_-pNNkiZpuk2uOas1Bs-Pz8x8MBlujKglYkLmX8j8ZsJ72yid_FSNXLfcWjPJVfl38j_UCMIAh1bIv8Aq4GCkWK_Rgm24hRolam7RgOlztbai4y0UqkRQEr7f_v87YV8hTXXRxBE8fLu_cfd3D3-6C5cl_Pj28HX6CI8kTAlNG2vXxi1ZRljGWzEgm7Xt0M2UrgnLCiQseyEse1yyzGaEOqXUi0HDDZRoUXdCYjmisw2C1VyaSumOW6EkqApITAlbciDhLQlvISdsBYR9AcKWhC1zIOENCW-A-udhBhRImG52d1pecgoBYatRP96uSUxdokYvb63xM6xN4Rjgxq5B7IyzniMUSmss7GzMwh8NAs_VgLv6FFw6sarXtkENvjjixaVG-cRUqm3Vk5C1K1X6435Y-IJ517qbtkL2z9Na9shz0VQOgK-Ri2v8o6nRReCi-XiQiLp_r0B3CoPEemZCtr3_tXK8-HUgok3Tzvo9GeplxnnlJDs1-J114Z61mtDU1vYyYF6BpldCtBkO2B4oowt2abbYqWy9C8qcCepttjyw11ycdHsiJZ9yu8-kKzB4PGTeJ-l4qJxgocR6DDSYRcahGxc0hd6F57fodosey2yGRrvHu1dfW2J96GszLnZ5Aav2E-Ar_CEacwaaXeQXwzgi2lXQHOTmugy7U0rWm0_Zk7CN_4595kxk4akz0ehWz56AsGjQ_j06WgXvsu5sl0ek3Lq8xOeWdq9BsEOL_AyLl1MnPHVAnefsAP51mRFH01zY7Y8Zzw0nQxh7JIwBPluURij5u8PpFBcKo5-Ei_BZdy4YehA1WjG0Al5QK7cZuQlD5i5d4GbL_ZofaBndzoZhd7_0b884rw5q8ilKnYV8M_5dAG-Rn-Tn2QFsGXx95A4y_23OD5DvvsYnyf6fJvYClPsknpRJWK7CFZ9gEizmLKDxchFNmgRjHoU8KBd0HtNwGYQxX4WLACs6D6pVUE5Ewiib00VAKWVBEM1wtSiqoJgH-XwRlYuYRBQ7LtpdFzARxvSYBNFiGYeTlufYGt_EMCbxCfwuYcz1NDpxStO8rw2JaCuMNa9mrLCt737u7h7vyfwGHo76l21nU0KlNGjVy9K97VrUjQXTiMqaC3ucSa_b5LDFqYVt-nxWqI6wzGHbXKZrrX5iYQnLfESGsGwT8pCwfwIAAP__ZAWVVw">