<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/55713>55713</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            #pragma float_control(precise, on) doesn't work for SSE intrinsics

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            new issue

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          obfuscated

      </td>

    </tr>

</table>

<pre>

    This is the link to godbolt with the full reproducer: https://godbolt.org/z/qYczcba39

The problem is that the pragma doesn't switch the mode when using intrinsics directly, but works when using the operators for the __m128 types.

I've originally discovered this in clang 14.0.1.

The code to see the problem is this (compiled with -Ofast -msse4.2 -mrecip=none):

```

__m128 func(__m128 d, float oldLen, float newLen) {

        #pragma float_control(precise, on)

        return _mm_div_ps(

                _mm_mul_ps(d, _mm_set1_ps(oldLen)),

                _mm_set1_ps(newLen)

        );

}

__m128 func1(__m128 d, float oldLen, float newLen) {

        #pragma float_control(precise, on)

        return d*oldLen/newLen;

}

```

And it leads to this assembly:

```

.LCPI1_0:

        .long   0x3f800000                      # float 1

func(float __vector(4), float, float):                         # @func(float __vector(4), float, float)

        shufps  xmm1, xmm1, 0                   # xmm1 = xmm1[0,0,0,0]

        mulps   xmm0, xmm1

        movss   xmm1, dword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero,zero,zero

        divss   xmm1, xmm2

        shufps  xmm1, xmm1, 0                   # xmm1 = xmm1[0,0,0,0]

        mulps   xmm0, xmm1

        ret

func1(float __vector(4), float, float):                        # @func1(float __vector(4), float, float)

        shufps  xmm1, xmm1, 0                   # xmm1 = xmm1[0,0,0,0]

        mulps   xmm0, xmm1

        shufps  xmm2, xmm2, 0                   # xmm2 = xmm2[0,0,0,0]

        divps   xmm0, xmm2

        ret

```

Generally the use of `*(1/a)` optimization here seems questionable and cland doesn't do it for scalars, only for vector/simd types. Is this another bug that needs to be reported separately?

</pre>

<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzNVk2P2zYQ_TXyZbCCRFm2fNBhd90tFgjQAsmlJ4MSRzYbilRIypvdX9-hJNuys2kboEAj6IMfw3nDxxEfKyNey08H6YBuf0BQUn8Gb2BvRGWUhxfpD0NH0ysFFjtrRF-jjbJ7OHjfOSpE7InuaURs7J5qb_R8-aN-qyuebaJkGyX34_sT-SInlcJ2xOR-8N9Zvm85CINOR2ztwRF0PWK3RiC8HFBD76Teg9TeSu1k7UBIi7VXrxF7hKqneI397Oa2Ybzp0HJvrIPG2KFlt2tTVoB_7dDF8_CeCftII6zcS82VeiUEV5sjWhQ0MhCloVacPKfLOInT-HZydQiWGHSI08Rmk6VXxIratJ1U5HBg9-63hjsPd61zuIwZFWhKsouyrTYaI7YJFI8gq2S6h-o0iabXNTmdaiIw0ShDtBolPqC-1DW-DPUNROuHU9ibiGUT94PVrjbErlHksQtxOAwOTBh2HmLR91bDrm13Qh53lASsuPhLNqGj7dXYMQQUWhz6dGw6BbYZ7sfboRfDc8SzaImOU_Dr7Zz8GR3p_8IHQd2fMJ4m59_Ger2G91qA9KCQCxeyZkgRTpnQVpTU31n4-MPj78_pLjn3w3TFylBiAiRfs6ZIwgXvXjTHiYN0dDDl0Ni22x3pnzKWGpbjCo3Gs0LIyfddT96jZfLjPq_m4g590zmAr22bBqvT970pBcTQD_TTjIb5Q0LGlyffXnun9AzOg3Fy9n5tYY5ushhwBW0tAjpvgZxb2RHoA5xXIt9eB9FiO8RAuOzxDa25-VxB0U90BUVf9nORQRl-SZT0v8qUWaL8oM-fipwZPjsv39_jsxM--0d8So5bfPadxbnZJsb3r6hJ_YKSBTXqHYlbA8GGkXAXRNQTD4yuEpJJL1v5xr00Gg6keEHDWgdfenShjZOOAacNK-ifmEm1MGEPC9rqaq64deMWSZCh7bScT062YpJceJ7UkGtDYVnS7v14FtCI405YYThuGOtJJx12nBQcw5b4tBBlJjbZhi-89ArLf79hz0IO54Qhuo8ff5mdJxa9VeXNuYY0uq9i0myqKHU8fe5I2P-kqVFVOkcUUSHP12m2OJSV4OumQCxWjC_rVc3zDWZZwZGt05Qv84XiFSpX0uJHjJFUwOCCyrT8C1myhLEkZ6ukyJNkGYslW3OWV1leVcWSr-ifwZZLFYc4woFrYcshJGLRUaeSzrtLJ8mJ3BOvAxz5570_GFuaqulpvYjfxYBeDtH_BV1H69o">