<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/55758>55758</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Incorrect optimization of _mm_xor_si128() to no-op under -ffast-math
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          achurch
      </td>
    </tr>
</table>

<pre>
    Given the following C function using x86 SSE instrinsics:

```c
#include <xmmintrin.h>
#include <emmintrin.h>
void f(const float *C, float *d, float *e) {
    const __m128i sign_1010 = _mm_set1_epi64x(1ull<<63);
    const __m128 C_0 = _mm_load_ps(C);
    const __m128 d_0 = _mm_load_ps(d);
    const __m128 e_0 = _mm_load_ps(e);
    const __m128 e_2 = (__m128)_mm_xor_si128(
        sign_1010, (__m128i)_mm_shuffle_ps(e_0, e_0, _MM_SHUFFLE(1,0,3,2)));
    const __m128 sub = _mm_sub_ps(d_0, e_2);
    const __m128 add = _mm_add_ps(d_0, e_2);
    const __m128 C02 = (__m128)_mm_xor_si128(
        sign_1010, (__m128i)_mm_shuffle_ps(C_0, C_0, _MM_SHUFFLE(2,2,0,0)));
    const __m128 C13 = _mm_shuffle_ps(C_0, C_0, _MM_SHUFFLE(3,3,1,1));
    const __m128 mix = _mm_add_ps(
        _mm_mul_ps(C13, sub),
        _mm_mul_ps(C02, _mm_shuffle_ps(sub, sub, _MM_SHUFFLE(2,3,0,1))));
    const __m128 e_temp = _mm_sub_ps(add, mix);
    const __m128 e_out = (__m128)_mm_xor_si128(
        sign_1010, (__m128i)_mm_shuffle_ps(e_temp, e_temp, _MM_SHUFFLE(1,0,3,2)));
    _mm_store_ps(d, _mm_add_ps(add, mix));
    _mm_store_ps(e, e_out);
}
```

compiling under Clang 11.0.0 or later with `-O -ffast-math` causes the third call to `_mm_xor_si128()` to be incorrectly optimized out, as though `sign_1010` was a constant zero value:

```
0000000000000000 <f>:
// ...
  3f:   0f 28 c4                movaps %xmm4,%xmm0
  42:   0f 5c c1                subps  %xmm1,%xmm0
  45:   66 0f 70 c0 4e          pshufd $0x4e,%xmm0,%xmm0
// pxor/xorps missing here!
  4a:   0f 58 e1                addps  %xmm1,%xmm4
  4d:   0f 29 26                movaps %xmm4,(%rsi)
  50:   66 0f 7f 02             movdqa %xmm0,(%rdx)
  54:   c3                      ret    
```

This is presumably caused by the compiler improperly treating the value `1u<<31` as floating-point `-0.0` and optimizing it to `+0.0`; the explicit casts to and from `__m128i` are intended to suppress this optimization (and were introduced to solve the same problem when it appeared in GCC 8).

Compiling without `-ffast-math` correctly includes the xor operation (though placing it before the `pshufd`, presumably because the backend determines [correctly] that the result is the same regardless of instruction order).

The LLVM bitcode does include the xor instruction (and in fact, the only difference when using `-ffast-math` is that `fadd`/`fsub`/`fmul` instructions get the `fast` flag):

```
; Function Attrs: nofree noinline norecurse nounwind uwtable willreturn
define void @f(float* nocapture readonly %0, float* nocapture %1, float* nocapture %2) local_unnamed_addr #0 {
; ...
  %23 = fadd fast <4 x float> %22, %20
  %24 = fsub fast <4 x float> %14, %23
  %25 = bitcast <4 x float> %24 to <2 x i64>
  %26 = xor <2 x i64> %25, <i64 -9223372036854775808, i64 -9223372036854775808>
  %27 = shufflevector <2 x i64> %26, <2 x i64> undef, <2 x i32> <i32 1, i32 0>
  %28 = fadd fast <4 x float> %23, %14
  store <4 x float> %28, <4 x float>* %6, align 16, !tbaa !3
  %29 = bitcast float* %2 to <2 x i64>*
  store <2 x i64> %27, <2 x i64>* %29, align 16, !tbaa !3
  ret void
}
```

git bisect blames bef6e67e, and reverting that change against HEAD (currently 5ff27fe1) results in correct code, though I suspect that change simply exposes the bug by making the constant in question visible to a different transformation.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy1WEtzozgQ_jXkoooLBH4dfMgwyc5UzdQeZnevLgHC1q5AjIDEM79-v5YAP-J4sod1ERuk_rpb_SaZKX5sflPPsmbdXrLSaG1eVL1jKSv7Ou-UqVnf0sJhtWDfvj0yVbedxZfK2yB-CMKPQTh-L0J_5cMzj1Wd676QLIjTQ1WpmpCzfRA_XqOQVyiejSpYGfBVbiCXldqIjgX8IQ14enwqzp5kwNcsWH7wHBg-HrzdVhFfKdaqXb2NwiiE0I9sW1XbVnbRVjZqkRwgKuq1hjq4FjFYBfEbnFi6PbKA8GLbtICnNzHFVUxxEyOvYuQvMNxhQOhXQE34g7HbVrnn1RFKn8ksZMwJpgZcu-_LUstB9NYRDT_br1-33z79-fT05ZGshyVahulSTir6601F2z47-qHPBnuMAvhNrCiKCYv7_4RNw__RQKnXIb1mIO4N44wUvsNAaRQfDfReIfHggcj_3ZZQqcNrM56fnbaqXg-CI2JNnnOc01ukIXfKXerusAOLawaKBwNFRwPdjvZOVs3rOMJpiD8O-Au46bv_M11IOx-U491_ThrHtTNWThUjPXfY-Vlvo6VXBqc-JQyWHy8q-Wl5z03VKE2toK8LaVmqBe6jaBbOQmYs06LD6ovq9gzI-9_ZfVmKtruvRLfHAstF38rWtZlur2yBBa1ZZ4j6lamhFSDYzCQ6Tm6slXmnfzDTdKpSP2XBnO4pE8TR9Dsn9OgTgF-wJbyfRd2xn9Ia9ix0L99qW_4xvPhQZyqpH00o_oSLzWaz0bwx9snKYckQTHnCLj6VeRZNiyCZowkmlDDuLhzxCZ_w85zl0SUe4Qz4gI-u4Ocev1gQiyVMHbJEHvENRSMqJU_CQyJP8BechpM18AR-8Q2plWpd-99LC2Q0iRRHlZE_r1RGMF5TOZnwxdFka8YX7zAZgmJuW0qwkck8PDt3yVDSL5gU3wU7Pa1jUhxOmSSeSR5f6uA_Vnb0cyMx_tirluFqrGz7SmQIUxfrBct-uHD3mYPkUFVjTSMtKDorRUeGJQIXlhTAUe8HjziiCEYAu6EGZPeNwWzkEgvp5jbrYswGYqO6IZUC_sFTIK0dc3lotMqxnyMbW6IiaGlN5TJvKFvE0VKudRLZXRBZ2zd0JEownG6QJdxISOUGTF6kh1hT9PkAMvpZOrmtqCRsYjItK_ayx4QJHUTTSMgpgGK_pSmjTJ-dGjOdqgxVEleWceaLSjKVg2F69FUFAcvIupOKQ2FotMgHE2WyRA101GDk88LZLD31Xiad_xxZJvJ_YBFWSFQ3zKeQFcw_TBoE848gw9xJtMRBdxQK0_mt3AlbaLKiKf3s3Pup2lgU0cvT_wHcly9_fWWZ6nKDsbgwEDjOyOMhT9kMnoA5S5G7ikhUpsYxClWWcFCdS29-P8e_NqfTVzhDl9REyB5P9EDdeXpAS3fER9kt28lutCVxpP1Si53rKbdqLIXm0_h68dB1ll4lWG1KKyV-VI0AoBvYuLct3fU13koK1r908BDOo7RGYva29gwLWRLCvS0ESUgvDC5z8DYAcC4akJIzROEsgyIQTq8M5zTYit7eot7MNFb0tq9rOLig_muxFYfHVw463kl7IJwf4ci8jCxFTSVhh0FK_OhouJ8k5jw8RSYeSWPyW8goGZHxKXLukBRJb0lMXM2IU44NvPxMb10ev3B4CrgzEs_bSYxTLLH7NedxvORhvFjNk-VyvgpXtP3m3rmYpRMzDEzPyKqrAheDwJNlGkPK0-WYO2poFXPmvEg34YW81TtcEQ8GjaaG5Yanq7SrQYXTDQoc7DmlhcZUwiJ_AB51maCWFJ25an3mqin2aOu1i_jDK6UujLV8ZayR2_o9GlHLo1R6z1S4o6qqWriNZRoJ0VKNXcjF0o2YVJosnGqHVocqk-8xNkomdoJKCfv0-OCmbiQ6ShXV9HlZ8mUpafIfKipVwLHoMyqLvsy54v4ZbaptaOOUe4tOC1ZofWacObN-R_24Ev-MXXcaDcH9ey9bV42eVauowlCbnCoomFtRt-gdlWsus7tiExfreC3uOtVpufk8jqjnbRIl_8pkS7xrc2-aYZA-Kcd3vdWbfdc17n8rbhyDgfd9NsMMgQetn8efe7TWvyWV_CeMaFAfN_M5Euxuv4kXeZznPMkWYZIvszJEk1it8jzhi7LIo_JOi0zqdoNOFnBeyxfmWOAe_exObXjIeYhgwV8cRrOijNfrpeDz1XIVLwuBCisrofSM9JgZu7uzG6cSbNxiUysMGsdN0dJcLqUTB_6ih-_sRuT73ub7Oyd641T_F2zBV-g">