<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/55713>55713</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            #pragma float_control(precise, on) doesn't work for SSE intrinsics
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          obfuscated
      </td>
    </tr>
</table>

<pre>
    This is the link to godbolt with the full reproducer: https://godbolt.org/z/qYczcba39

The problem is that the pragma doesn't switch the mode when using intrinsics directly, but works when using the operators for the __m128 types.

I've originally discovered this in clang 14.0.1.

The code to see the problem is this (compiled with -Ofast -msse4.2 -mrecip=none):
```
__m128 func(__m128 d, float oldLen, float newLen) {
        #pragma float_control(precise, on)
        return _mm_div_ps(
                _mm_mul_ps(d, _mm_set1_ps(oldLen)),
                _mm_set1_ps(newLen)
        );
}

__m128 func1(__m128 d, float oldLen, float newLen) {
        #pragma float_control(precise, on)
        return d*oldLen/newLen;
}
```
And it leads to this assembly:
```
.LCPI1_0:
        .long   0x3f800000                      # float 1
func(float __vector(4), float, float):                         # @func(float __vector(4), float, float)
        shufps  xmm1, xmm1, 0                   # xmm1 = xmm1[0,0,0,0]
        mulps   xmm0, xmm1
        movss   xmm1, dword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero,zero,zero
        divss   xmm1, xmm2
        shufps  xmm1, xmm1, 0                   # xmm1 = xmm1[0,0,0,0]
        mulps   xmm0, xmm1
        ret
func1(float __vector(4), float, float):                        # @func1(float __vector(4), float, float)
        shufps  xmm1, xmm1, 0                   # xmm1 = xmm1[0,0,0,0]
        mulps   xmm0, xmm1
        shufps  xmm2, xmm2, 0                   # xmm2 = xmm2[0,0,0,0]
        divps   xmm0, xmm2
        ret
```

Generally the use of `*(1/a)` optimization here seems questionable and cland doesn't do it for scalars, only for vector/simd types. Is this another bug that needs to be reported separately?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzNVk2P2zYQ_TXyZbCCRFm2fNBhd90tFgjQAsmlJ4MSRzYbilRIypvdX9-hJNuys2kboEAj6IMfw3nDxxEfKyNey08H6YBuf0BQUn8Gb2BvRGWUhxfpD0NH0ysFFjtrRF-jjbJ7OHjfOSpE7InuaURs7J5qb_R8-aN-qyuebaJkGyX34_sT-SInlcJ2xOR-8N9Zvm85CINOR2ztwRF0PWK3RiC8HFBD76Teg9TeSu1k7UBIi7VXrxF7hKqneI397Oa2Ybzp0HJvrIPG2KFlt2tTVoB_7dDF8_CeCftII6zcS82VeiUEV5sjWhQ0MhCloVacPKfLOInT-HZydQiWGHSI08Rmk6VXxIratJ1U5HBg9-63hjsPd61zuIwZFWhKsouyrTYaI7YJFI8gq2S6h-o0iabXNTmdaiIw0ShDtBolPqC-1DW-DPUNROuHU9ibiGUT94PVrjbErlHksQtxOAwOTBh2HmLR91bDrm13Qh53lASsuPhLNqGj7dXYMQQUWhz6dGw6BbYZ7sfboRfDc8SzaImOU_Dr7Zz8GR3p_8IHQd2fMJ4m59_Ger2G91qA9KCQCxeyZkgRTpnQVpTU31n4-MPj78_pLjn3w3TFylBiAiRfs6ZIwgXvXjTHiYN0dDDl0Ni22x3pnzKWGpbjCo3Gs0LIyfddT96jZfLjPq_m4g590zmAr22bBqvT970pBcTQD_TTjIb5Q0LGlyffXnun9AzOg3Fy9n5tYY5ushhwBW0tAjpvgZxb2RHoA5xXIt9eB9FiO8RAuOzxDa25-VxB0U90BUVf9nORQRl-SZT0v8qUWaL8oM-fipwZPjsv39_jsxM--0d8So5bfPadxbnZJsb3r6hJ_YKSBTXqHYlbA8GGkXAXRNQTD4yuEpJJL1v5xr00Gg6keEHDWgdfenShjZOOAacNK-ifmEm1MGEPC9rqaq64deMWSZCh7bScT062YpJceJ7UkGtDYVnS7v14FtCI405YYThuGOtJJx12nBQcw5b4tBBlJjbZhi-89ArLf79hz0IO54Qhuo8ff5mdJxa9VeXNuYY0uq9i0myqKHU8fe5I2P-kqVFVOkcUUSHP12m2OJSV4OumQCxWjC_rVc3zDWZZwZGt05Qv84XiFSpX0uJHjJFUwOCCyrT8C1myhLEkZ6ukyJNkGYslW3OWV1leVcWSr-ifwZZLFYc4woFrYcshJGLRUaeSzrtLJ8mJ3BOvAxz5570_GFuaqulpvYjfxYBeDtH_BV1H69o">