<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/112925>112925</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
LLVM fails to optimize right shift by constant+saturating narrow to single narrowing right shift on AArch64
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
johnplatts
</td>
</tr>
</table>
<pre>
LLVM fails to optimize the following right shift+narrow operations down to a single narrowing right shift instruction on AArch64:
```
define dso_local noundef <4 x i16> @NarrowShrI32By5(<4 x i32> noundef %0) #0 {
%2 = ashr <4 x i32> %0, <i32 5, i32 5, i32 5, i32 5>
%3 = tail call noundef <4 x i16> @llvm.aarch64.neon.sqxtn.v4i16(<4 x i32> %2)
ret <4 x i16> %3
}
declare <4 x i16> @llvm.aarch64.neon.sqxtn.v4i16(<4 x i32>) #1
define dso_local noundef <4 x i16> @NarrowShrU32By5(<4 x i32> noundef %0) #0 {
%2 = lshr <4 x i32> %0, <i32 5, i32 5, i32 5, i32 5>
%3 = tail call noundef <4 x i16> @llvm.aarch64.neon.uqxtn.v4i16(<4 x i32> %2)
ret <4 x i16> %3
}
declare <4 x i16> @llvm.aarch64.neon.uqxtn.v4i16(<4 x i32>) #1
define dso_local noundef <4 x i16> @NarrowShrI32By5ToU16(<4 x i32> noundef %0) #0 {
%2 = lshr <4 x i32> %0, <i32 5, i32 5, i32 5, i32 5>
%3 = tail call noundef <4 x i16> @llvm.aarch64.neon.sqxtun.v4i16(<4 x i32> %2)
ret <4 x i16> %3
}
declare <4 x i16> @llvm.aarch64.neon.sqxtun.v4i16(<4 x i32>) #1
attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+fp-armv8,+neon,+outline-atomics,+v8a,-fmv" }
attributes #1 = { mustprogress nocallback nofree nosync nounwind willreturn memory(none) }
```
Here is the assembly that is currently generated when the above code is compiled with llc:
```
NarrowShrI32By5: // @NarrowShrI32By5
sshr v0.4s, v0.4s, #5
sqxtn v0.4h, v0.4s
ret
NarrowShrU32By5: // @NarrowShrU32By5
ushr v0.4s, v0.4s, #5
uqxtn v0.4h, v0.4s
ret
NarrowShrI32By5ToU16: // @NarrowShrI32By5ToU16
ushr v0.4s, v0.4s, #5
sqxtun v0.4h, v0.4s
ret
```
The snippet above can be found at https://godbolt.org/z/jq3zMKPz1.
Here is a more optimized version of the above code:
```
NarrowShrI32By5:
sqshrn v0.4s, v0.4s, #5
ret
NarrowShrU32By5:
uqshrn v0.4s, v0.4s, #5
ret
NarrowShrI32By5ToU16:
sqshrun v0.4s, v0.4s, #5
ret
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzUV1Fv2zYQ_jX0y0GGRFmO_eCHJK6xYe0wYM1eB4o6SWwpUiWPdp1fP1BWnNqw07Qd0M0IYpr67vjxuyNPJ7xXjUFcseKOFeuJCNRat_pgW9NrQeQnpa32q7dv_3oHtVDaA1mwPalOPSJQi1Bbre1OmQacaloC36qaGL8zwjm7A9ujE6Ss8VDZnYnmArwyjUY4QM5MQRlPLshoA9bA7a2T7XzG8luWrll6y-bp-Df8rLBWBqHy9m9tpdBgbDAV1sDy-xl8BpXNWf4G2Cz9fVjtz9b9mvO7fcH44gmS8wg5GvIiZXwJjOcpsJu7wzoQ5zmwfA3Ctw5ObQ8293FW5RyKOL42yN984TEfPJJQGqTQL7DXettNhRi0mBq0Zuo_fSYz3c4i5nwvkSvjy6eFHNK5R17ko54363Ew6im1cPiDBEb9slPP3xyphx-MlP5vRCr87Ei9QOBfitThTL23Dxe2-H8LV8zr8NNP1jUGlwImiJwqA6Efxc3XUWDogqfe2cah92Bs7RDBWIcyOB9Hfm_koMxOmQp2SmuHFJyBDjvr9owvjDUYVww7EqVGYJzXTnSY9FYZQsc4Z_macW6sSTSKmnEOw8-EnOh7ZZqkE9QeceQCjhhPQn5MemcJJVmXlKGu0SVePeIRvhixJFyDlMg-HB81aNApeQqoUVBw6I8oxu_qPhGu2y4Yv491Ca05jGwgrQwmgmynpD9MbheC8fuk7raD46cQniqcXVM4plop5MdnsV8v8XO6nFa4w_9f0CEoP5Rc4T12pd4DtYLipAzOoSG9h0EUQVjBrkVzQJd2iyBtNdhL2_VKx-eKWtBaXius5-Uyv4UrH8Y3jG8uldjxkBw-Pp5uANim01kU-3nAeH6OjRfWAGifkScIh3TG8-E7eD5c4Bm-gWf4Dp5fXpUXyV7V82Dz3WQPt8rr2F7Mwfctgjeq75GeskoYKOM7YDAVCIKWqPcxo4YtNLYqraapdQ3jm0fGNx8-5Y_vfvvjMZteym0BnXV4fL2sYIvOD6-B9Vkmvz5pzyXwrTOvk-uFDDtPglc4_VoeXOA5xurrLo8iTKpVXi3zpZjgKrvhSz4vlvNi0q6wyKsbjtW8Kuf1fCYxW2YoeVbJtJSLsp6oFU_5LEuzRVbkS55NZ-kyK2eIKc6W2fymZrMUO6H0dKhW1jUT5X3AVZbxJS8mWpSo_dBFcG5wB8PTeP8W64lbRaOkDI2P5U558s9uSJHGaw3Gl31BuQdpjSdhYnvh4z0vKDYPY6dB9uW-4rmXmASnV2epqqgN5VTajvFN5DZ-xfL0ASUxvhl25BnfjFvervg_AQAA__-GdMFj">