<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/66803>66803</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            x86-32: floating-point return values undergo implicit format conversion
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:X86
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          jyknight
      </td>
    </tr>
</table>

<pre>
    With SSE2 disabled, the floating-point semantics are pretty hopeless, with implicit conversions happening all over the place (e.g. spilling registers to stack). Most of those issues go away with SSE2 enabled, because we use SSE2 instructions and registers for all float/double operations.

The remaining issue with SSE2 enabled, is that the default C ABI requires that float and double values are returned in x87 registers. Returning a float or double value thus converts to x86_fp80 (and then back, in the caller). This conversion means that a signaling NaN cannot be returned, because the behind-the-scenes conversion to x87_fp80 will raise an FP invalid exception, and quiet the NaN.

LLVM does support other ABIs which don't have this problem: you can either use an alternative calling convention on the function (such as "fastcc"), or by annotating the return type with "inreg" (as seen here):
https://github.com/llvm/llvm-project/blob/575a6483062b8a77b35f48589b2acc1020195ac7/llvm/lib/Target/X86/X86CallingConv.td#L300-L304

While this is a fundamental problem with the x86-32 ABI, I believe we _could_ potentially fix it on the LLVM side, without breaking the ABI, because loading/storing an 80-bit value from x87 FPU register does not trigger a conversion operation. Thus, we could potentially write custom conversion routines to go from 32/64-bit float to 80-bit float (and back), and use that at the call boundary.

Such a routine would have runtime overhead vs using the X87 FPU's native conversion support, and it's also unclear whether anyone cares enough about precise x86-32 FP semantics in order to actually bother implementing it. But, it seemed worth at least recording the issue, and a possible resolution.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUVcGO27oO_RpnQyRw5DiJF1lMpy9AgemgeO177a6gbcZWq0iuRCWTv7-gbM9kLu7mbmwYlshzDnlIDEF3luiQlR-y8uMCI_fOH37dflvd9byoXXs7fNfcw9ev_1HQ6oC1oTZTj8A9wck4ZG275eC0ZQh0Rsu6CYCeYPDEfIPeDWQoBLlzlUj6PBjdaIbG2Qv5oJ0N0OMwkNW2AzQG3IV8SjAYbAgytadVt4IwaGPkjKdOByYfgB0ExuZ3pqoVfHaBwZ2AexcIdAiRAnQO8Iq3MXdiQfaVRE0NxkBwJZBX-q1tYB8bTrjQtnfZTs4nfIl3po6ti7UhcAN5TOdXWf4xyx_G57eewNMZdeKV4PwzCh2Ae-TEuKUTRsPwCA8fPoGnP1F7mv6ntAnSlPiCRiiK2p44ekstaAsv-90b6BX8N_1K2k4hnH8XAbiPYSoHJ01f9tufp2Gfi_SSj3uyUCedHyWDIG3QGPJJ-G-9DnflhDOhnTAjSIdhKtszPkOD1jqG-g3xfR0kbk29tu2Se1qGhiy9i5yw7UZsV20MeNSBAC0cv4C2FzS6BXppaJB6SGiB_ydqGuV9xud3NXp6-v9naB0FCHEYnGdw3JMX8QNce9300EqcHUOPFwGoAwze1YbOWfEANxeFEpBO1-KIBQ2Tt8j6Msok5BMJK6jAjQKeok1dJiKH2PSAATKlThi4aTKlMlUJAeehvkGSLZkt3R3FA74NU09lSmnrqcuUSkULEIgs9ORJ4hQT3555CPKljpk6dpr7WK8ad87U0ZjL_FoO3v2iRlq8Nq7O1LHclbjd7It8q-o97nZ1UZ42-3Jf1QqbZp2rfF2V2Ozu4mi59w19RxLnx347Ph9HPR6dvay4zVTxVOT58qnIN_d1-d5rM6mtg_RttC2eyTKaWf6Rt4jxst8uCyU1E70-QU1G0yW5-mfjoml_wuBYxEdjbnDSL6B5rkLqgKBbmgeUiwy1J_w9az3FnXvUOGy17TJ1DOx8spWFfb6sNU92Onl3TiY8fvnfqxHHLpPeZ6-7jjzgfWO_zhBxUxynJUFC_w781WsmaGJgd76_711kLWZhJyMvQShUpo7bTUI2Gp_djHT8nuw9Orua7TI6UczLr06H2kkJ_O2dfb6mtp2TwzXBTUbx0bI-UxrlPWELlwAxzJL-GMXJ1C7A7JM3LpMVZzia0zk0wUG0jSH0cO0pGQ7tzVlBKEOSrItdD1hLCQdPjYyGqTmOX-7Wk7bgfCs7xgE2HJOy9eh8WU8knZaGNq_gQ0xItOw3OlMLV-e5F20MYWDw1DjfzszSmJ-RIwwuBC2D1lNwJqYCL9pD0VZFhQs6rLdVuS7zfLNd9IdG5WW1UWVV7spTrmhXlbSu9qgIc2zyaqEPKldFXq2rdZXnm3JV5HWVV9sd5mvcY3XKNrksHLMSE66c7xYJz2G73efFwmBNJqRdr5TUnGybFQ_JmkrWvz8k99exC9kmNzpweIvEmg0dRjll9P1t-U8TaVpJ0bbkO_e27E_On_F-5y-iN4d_PY7GpS5tLYz-CgAA___yvP4d">