<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/128450>128450</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            [clang][X86] Wrong result for __builtin_elementwise_fma on _Float16

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            clang

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          SEt-t

      </td>

    </tr>

</table>

<pre>

    Godbolt: https://godbolt.org/z/Ydj17K17b

Clang uses single-precision FMA to emulate half-precision FMA, what is wrong as it doesn't have enough precision.

Example, round to even: 0x1.400p+8 * 0x1.008p+7 + 0x1.000p-24

Precise result: 0x1.40a0000002p+15

Half-precision FMA: 0x1.40cp+15

Single-precision FMA: 0x1.40a000p+15

(clang) Single-precision FMA -> half-precision: 0x1.408p+15

Another example: 0x1.eb8p-12 * 0x1.9p-11 - 0x1p-11

To produce correct result single-precision multiplication, then double-precision addition seems to be enough.

</pre>

<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJxsU91unDwQfRpzM2Jlhv8LLtgkfJ9UVaqUSm2vVgYP4MhghM0m7dNX3qUb5YcbrPE5ZzxnZoS1apiJKpYeWXofiM2NZq0eH1zogtbI39V_RrZGOxbXMDq3WBbXDBuGzXC9OJh1YNj8Ydj8kk9R_iXKW8Zrxus7LeYBNksWrJoHTeGyUqesMjM0X2twBmjatHAEo9D921uGd_A8CgfKwvNq5gGEBeVAGrIzw9zBKM4ENJttGOFGPVxTP7yIadHkRVazzfKS60yzr4K_RIeE84XhsQCG1wDnhQ_kwPC4B_gSYsJ4_e2iTbCS3a4-XBUEv3zoeVHKeP3_xyJu4O6GevzEireiNyjDovMeMizhMxqELH54592rVPGqw-t6Nm6kFWg3ZkdRWyxhhDcbyiWMIgj92Z-u3O8GltXIrSPozLpS53YvPrZ12rRTi1adcP4teAdupBmk2do3OCGl8giwRJP13Wn_9fIQyCqWZVyKgKooT3icRVGcBmMVyyLmRS7iWKZtn8uMl31GvOvKRPIs6gNVIceUIyY8ios0PZRZ3BcZ5iIv8iIve5ZwmoTSB63Pkx_cQFm7URVhkaQ80KIlbS-rgLg7j34r1soTwnYbLEu4VtbZVwmnnL7sz5WR3rP0-LPIWHoPPy6Du5vVmxVOp3ZT2qn5RJommt2zsnTqJwFmhlOjjXBRFmyrrt7tmnLj1h46MzFsfOb9Fy6reaLOMWwulViGzV7MucK_AQAA__9W1DZ_">