<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/68466>68466</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Saturating truncation produces extra instructions
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          calebzulawski
      </td>
    </tr>
</table>

<pre>
    See: https://llvm.godbolt.org/z/4KdejfEsG

The following two functions:

```
declare <4 x i16> @llvm.smax.v4i16(<4 x i16>, <4 x i16>)
declare <4 x i16> @llvm.smin.v4i16(<4 x i16>, <4 x i16>)
declare <8 x i16> @llvm.smax.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.smin.v8i16(<8 x i16>, <8 x i16>)

define <4 x i8> @saturate4(<4 x i16> %x) {
  %1 = tail call <4 x i16> @llvm.smax.v4i16(<4 x i16> %x, <4 x i16> zeroinitializer)
  %2 = tail call <4 x i16> @llvm.smin.v4i16(<4 x i16> %1, <4 x i16> <i16 255, i16 255, i16 255, i16 255>)
  %3 = trunc <4 x i16> %2 to <4 x i8>
  ret <4 x i8> %3
}

define <8 x i8> @saturate8(<8 x i16> %x) {
  %1 = tail call <8 x i16> @llvm.smax.v8i16(<8 x i16> %x, <8 x i16> zeroinitializer)
  %2 = tail call <8 x i16> @llvm.smin.v8i16(<8 x i16> %1, <8 x i16> <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>)
  %3 = trunc <8 x i16> %2 to <8 x i8>
  ret <8 x i8> %3
}
```

produce the following:

```
.LCPI0_0:
        .short  255 # 0xff
        .short  255                             # 0xff
 .short  255                             # 0xff
        .short  255 # 0xff
        .zero   2
        .zero 2
        .zero   2
        .zero   2
saturate4: # @saturate4
        pxor    xmm1, xmm1
        pmaxsw  xmm0, xmm1
        pminsw  xmm0, xmmword ptr [rip + .LCPI0_0]
 packuswb        xmm0, xmm0
        ret
saturate8: # @saturate8
        packuswb        xmm0, xmm0
 ret
```

The `saturate4` function produces extra min/max.  I believe the `trunc` followed by `shufflevector` is being optimized before the saturating truncation could be detected.

Discovered in https://github.com/rust-lang/portable-simd/issues/369#issuecomment-1751589313
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJykVk2P4jgQ_TXmUmrkOB84hxy6m2E12j2sNHtfOXEFPOPEyHaA4devbGgassA0PRHKh1N-9apeUSnhnFr2iBXJX0g-n4jBr4ytGqGx3g9abN0PNamN_Fl9QyTpM6y8XzuSPhO2IGyh9aabLo2sjfZTY5eELfaELbI_JX5vv7g_CJ0T-nw4_7NCaI3WZqv6JfitgXboG69MH_HOLElBj7_4KLHRwiKQ9DWDHaikIOkXIBmN3l0ndtNNFlYZvzAh7BVGC-XHEFX_G4j8Fkd-QuRjRP44YuD4GOIbbqv699D5EdUJP1jhMRvHDITlO8JKILOXAwKEpQRIOgcvlIZGaP2wOEfYUT5hj9aoXnkltNqjPXGPPtkHfd6SL_L-v0-SvqqkAJbn4d3927N0RkrpgZId-maMGvh6c5Hot40W_UgAlqdHkWbz62rxa2rxsfofV-uhMj1Xi39erYcK-Vwt_jm1Hrn9hbL8qrL8hrL8rrKX7e1wXlsjhwbBn3fJ-31x-tfr31_pv_RkBcdj6lbGeghhAWEp0F3b3ra4d4x2f3Lbx2mFggIAdm356uIN29Pye1tLn6PTi1Z3sXO9MzZcd10Xyy5eLy06sXPbaEFvWah-ZLE1VsLaWyD5i1VrIOwFTsrlx5KAtWh-DG5bvwGdAdBLFxb9ZWj8Smh8ROvX6CfYq9UZvt2kOMtcQU9fbziWrgPceSugUz1hi9BKAL5CjVrh5lDXpKDx_xR3xxpHCfXPiLwa2lbjBhtvbHivHNQYBgWz9qpT-2CJrbEHpCOROEgERBGJNGbQwQwkemw8yul5DHPlGrNBixJUP5pjlsqvhnramI6whR2cf9KiD9PM2lgvao1PTnWSsIVybkBH2CItSsLS-NiYrsPePyWzPMl5mSbpRFapLNNSTLBKirIo8qxIi8mqki2l2Mi8ZRKThEtWC4m5KDnSWc4Zm6iKUZYmlM5ols-SYkozWrSSlnlaZ-WszElGsRNKT2P3NHY5iRyqgmdFMdGiRu3iMMdYj1uILwljYbazVdjzVA9LF7qvct69o3jlNVbfruZ1JLDqnbfDYXKbDFZXd3IZ8I-Xp7U137Hx51mMrP8LAAD__2hT2oU">