<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/68466>68466</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Saturating truncation produces extra instructions
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
calebzulawski
</td>
</tr>
</table>
<pre>
See: https://llvm.godbolt.org/z/4KdejfEsG
The following two functions:
```
declare <4 x i16> @llvm.smax.v4i16(<4 x i16>, <4 x i16>)
declare <4 x i16> @llvm.smin.v4i16(<4 x i16>, <4 x i16>)
declare <8 x i16> @llvm.smax.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.smin.v8i16(<8 x i16>, <8 x i16>)
define <4 x i8> @saturate4(<4 x i16> %x) {
%1 = tail call <4 x i16> @llvm.smax.v4i16(<4 x i16> %x, <4 x i16> zeroinitializer)
%2 = tail call <4 x i16> @llvm.smin.v4i16(<4 x i16> %1, <4 x i16> <i16 255, i16 255, i16 255, i16 255>)
%3 = trunc <4 x i16> %2 to <4 x i8>
ret <4 x i8> %3
}
define <8 x i8> @saturate8(<8 x i16> %x) {
%1 = tail call <8 x i16> @llvm.smax.v8i16(<8 x i16> %x, <8 x i16> zeroinitializer)
%2 = tail call <8 x i16> @llvm.smin.v8i16(<8 x i16> %1, <8 x i16> <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>)
%3 = trunc <8 x i16> %2 to <8 x i8>
ret <8 x i8> %3
}
```
produce the following:
```
.LCPI0_0:
.short 255 # 0xff
.short 255 # 0xff
.short 255 # 0xff
.short 255 # 0xff
.zero 2
.zero 2
.zero 2
.zero 2
saturate4: # @saturate4
pxor xmm1, xmm1
pmaxsw xmm0, xmm1
pminsw xmm0, xmmword ptr [rip + .LCPI0_0]
packuswb xmm0, xmm0
ret
saturate8: # @saturate8
packuswb xmm0, xmm0
ret
```
The `saturate4` function produces extra min/max. I believe the `trunc` followed by `shufflevector` is being optimized before the saturating truncation could be detected.
Discovered in https://github.com/rust-lang/portable-simd/issues/369#issuecomment-1751589313
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJykVk2P4jgQ_TXmUmrkOB84hxy6m2E12j2sNHtfOXEFPOPEyHaA4devbGgassA0PRHKh1N-9apeUSnhnFr2iBXJX0g-n4jBr4ytGqGx3g9abN0PNamN_Fl9QyTpM6y8XzuSPhO2IGyh9aabLo2sjfZTY5eELfaELbI_JX5vv7g_CJ0T-nw4_7NCaI3WZqv6JfitgXboG69MH_HOLElBj7_4KLHRwiKQ9DWDHaikIOkXIBmN3l0ndtNNFlYZvzAh7BVGC-XHEFX_G4j8Fkd-QuRjRP44YuD4GOIbbqv699D5EdUJP1jhMRvHDITlO8JKILOXAwKEpQRIOgcvlIZGaP2wOEfYUT5hj9aoXnkltNqjPXGPPtkHfd6SL_L-v0-SvqqkAJbn4d3927N0RkrpgZId-maMGvh6c5Hot40W_UgAlqdHkWbz62rxa2rxsfofV-uhMj1Xi39erYcK-Vwt_jm1Hrn9hbL8qrL8hrL8rrKX7e1wXlsjhwbBn3fJ-31x-tfr31_pv_RkBcdj6lbGeghhAWEp0F3b3ra4d4x2f3Lbx2mFggIAdm356uIN29Pye1tLn6PTi1Z3sXO9MzZcd10Xyy5eLy06sXPbaEFvWah-ZLE1VsLaWyD5i1VrIOwFTsrlx5KAtWh-DG5bvwGdAdBLFxb9ZWj8Smh8ROvX6CfYq9UZvt2kOMtcQU9fbziWrgPceSugUz1hi9BKAL5CjVrh5lDXpKDx_xR3xxpHCfXPiLwa2lbjBhtvbHivHNQYBgWz9qpT-2CJrbEHpCOROEgERBGJNGbQwQwkemw8yul5DHPlGrNBixJUP5pjlsqvhnramI6whR2cf9KiD9PM2lgvao1PTnWSsIVybkBH2CItSsLS-NiYrsPePyWzPMl5mSbpRFapLNNSTLBKirIo8qxIi8mqki2l2Mi8ZRKThEtWC4m5KDnSWc4Zm6iKUZYmlM5ols-SYkozWrSSlnlaZ-WszElGsRNKT2P3NHY5iRyqgmdFMdGiRu3iMMdYj1uILwljYbazVdjzVA9LF7qvct69o3jlNVbfruZ1JLDqnbfDYXKbDFZXd3IZ8I-Xp7U137Hx51mMrP8LAAD__2hT2oU">