<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/88633>88633</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            The `#pragma clang fp contract (off)` should suppress FMA when `-ffast-math` is used
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            clang
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          wjristow
      </td>
    </tr>
</table>

<pre>
    For code like the following:
```
typedef float __m128 __attribute__((__vector_size__(16), __aligned__(16)));
__m128 foo(__m128 a, __m128 b, __m128 c) {
  return (a * b) + c;
}
```

FMA is used when compiled for an appropriate target (as expected).  For example:

`clang++ -S -march=haswell -O2 foo.cpp`

And if the test-case is modified to suppress the FMA via a pragma:

```
__m128 foo(__m128 a, __m128 b, __m128 c) {
#pragma clang fp contract (off)
 return (a * b) + c;
}
```

then that does correctly suppress the use of FMA.

But if `-ffast-math` is also used:

`clang++ -S -march=haswell -O2 -ffast-math foo.cpp`

then the pragma no longer successfully suppresses that use of the FMA.

I believe this is a bug in DAGCombine.  A patch like the following fixes this for me:

```
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index d9c6c28d5dac..4b033b5729c8 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -15623,8 +15623,7 @@ SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
 
   bool CanReassociate =
       Options.UnsafeFPMath || N->getFlags().hasAllowReassociation();
-  bool AllowFusionGlobally = (Options.AllowFPOpFusion == FPOpFusion::Fast ||
- Options.UnsafeFPMath || HasFMAD);
+  bool AllowFusionGlobally = N->getFlags().hasAllowContract() || HasFMAD;
   // If the addition is not contractable, do not combine.
   if (!AllowFusionGlobally && !N->getFlags().hasAllowContract())
     return SDValue();
@@ -15859,8 +15858,7 @@ SDValue DAGCombiner::visitFSUBForFMACombine(SDNode *N) {
 return SDValue();
 
   const SDNodeFlags Flags = N->getFlags();
- bool AllowFusionGlobally = (Options.AllowFPOpFusion == FPOpFusion::Fast ||
-                              Options.UnsafeFPMath || HasFMAD);
+ bool AllowFusionGlobally = N->getFlags().hasAllowContract() || HasFMAD;
 
   // If the subtraction is not contractable, do not combine.
   if (!AllowFusionGlobally && !N->getFlags().hasAllowContract())
@@ -16030,7 +16028,7 @@ SDValue DAGCombiner::visitFSUBForFMACombine(SDNode *N) {
 
 // More folding opportunities when target permits.
   if (Aggressive && isReassociable(N)) {
-    bool CanFuse = Options.UnsafeFPMath || N->getFlags().hasAllowContract();
+    bool CanFuse = N->getFlags().hasAllowContract();
     // fold (fsub (fma x, y, (fmul u, v)), z)
     //   -> (fma x, y (fma u, v, (fneg z)))
     if (CanFuse && isFusedOp(N0) &&
```

I'm out of the office for the next two weeks, so I'm not in a position to propose the above patch.  I'll get to it at some point on my return, unless someone else wants to take this.

----

On the other hand, if we use the SSE intrinsics to express this, as in:

```
#include <xmmintrin.h>
__m128 foo(__m128 a, __m128 b, __m128 c) {
#pragma clang fp contract (off)
 return _mm_add_ps(_mm_mul_ps(a, b), c);
}
```

Then my proposed patch doesn't suppress the use of FMA.  I initially thought this was also just a bug, but on looking into it, I'm now thinking it's probably by design (although it's unfortunate from a use-expectation perspective -- we had a user report this as a problem of the "pragma not working").  The reason I think it may be by design, is that those "SSE intrinsics" are implemented as functions in <xmmintrin.h>, and in general, when a CALLER has a pragma that controls code-gen, we wouldn't want to apply the effect of that pragma to CALLEEs of that CALLER.

For reference, these functions are implemented as `static __inline__`.  Maybe some "enhancement" could be made to treat `static __inline__` as though the code was directly in-line for applying the semantics of pragmas?  Or maybe some new attribute can be defined for functions (and applied to things like `_mm_add_ps`, `_mm_mul_ps`, etc.) where  that new attribute means to apply the state in the function's CALLER to the function that has that new attribute?  Just some thoughts.

For reference, here's a godbolt link illustrating some of this: https://godbolt.org/z/cjW8jYdb5
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMWF9v4zgO_zTqC-HAkZt_D3lIk_HcHLbTxXb3DvcUyDZta0aWDElu2v30B0rOv16n2OticVcUSWRL5I_kjxQl4ZxsNOKaze7YbHcjBt8auz58s9J5c7gpTPWyzo2F0lQISn5H8C1CbZQyB6kblm1YumPphs3T8T8M_UuPFdZQKyM87PfdlC9hvxfeW1kMHvd7xpeML_f7Jyy9sXsnf48Pp3PGV4xvabYibNXl4_Cf3UUlo9jamCApDERcGn4XF79LxlfAFuNKAIt-sBoYXwpgfENzV8D4HZQn8Wyxe9O2-Jnfb0A6GBxWcGhRQ2m6XiqsoDYWhAbR99b0VgqP4IVt0AdtDvC5x9JjxfhqAkDOxWfR9QrPzjyqLZXQDeN3BCx5hKQTtmxZtmuFO6BSkDxwMn9S9v0reBtdgaxDsDw6n5TCIeHtTCVriRV4A27oe4vOhVlkz5MUIKC3ounEf4K59MHHXc94FhVAMA7qHkqjvRVl8I-pawpxjNKfDpKnwPhWeKgMOiiNtVh69XJt-uAQTE0emFwuvhs8uZDN06SuhfNJJ3zL5im5UShnQuw_FrMLgT-I34gcx2iANqCMbtCCG8oSnasHdWEHumjmaMoY0CtzvkCBSuITZbB0wQgohgakht3m89Z0hdQ4AdhAL3zZvpHsUMvnoEi6QPLuDcZexqCSdQ1J0khP3MiVeuroSxaM51tT4WfUjOePqLD00ujd5jPj-RmLJbdQzD-2MmKQusJnqFblvOTLalaJcjK5LdIsK2YLviqXME3T-e1tnJwkyceRHul9N8b-TwJntym7TSGZzuY8Y3y7JOIfBwsYXz_u_iHUgBcRtBSTbPMknfT5ZrfLjc3vN-NLxpePu69UyhnffL0uicfKCIUxCrZC_4LCOVOGAsay3ek9_T30BN9NftNO1Jj_fE9MZostW2zha8KyTw36XInGhTK_mrTCbYhFZ5nS6PjulMrJqDpMzAcnjf6sTCGI5yzbURk4qo1Tfn7o47QAL9vB-Ul0Qi6cH1EdVbwH_G_C5feb3SUmCuT7qN6zdjsWtvj4tZrstBkB4znjOXyJmSuqShJKylFt_Kk-ikIh1dXKjI9jyp6kULEiTdM3sfI543NgfPrHEZ8rMf2N1Xik3Kvgnem6nK3OdF3Olv8FXR9_u_vDdH0XzZnMpdHOQ5QSDIb4-YPQXZDxr-fiu38fIOpfzdMf0NUNRVj9f8XYEx_naZZGCvK76TzlfxUfj1_RNffGho2zom3T9L2xftDSS3SxWRwbwh5tJ7177ZBN09CmLp_w6ATpTpUzuHT5NRp7BhDodCzdOfUBFPWP1ulXXr0qh2-o-YCoQPLRW-Qosrt2QxG-OwHPxJsX-ggPBgUDDZ6Ox4At_H5dnkZZAITlWspxdJQQZWpsoozXhS4G4WTfMQI0qh56cn4aG1F68U73-YXxRQdm8MeWzNS1LDH0TjTU-OzBHwwcEL87guUMxEWULVJTM25c3Au8ATpPGBebMlGYJ4yd2gTCIqWAKOUNUL_lwZkOoTdSezAaupexYpKaQSvqfWmG0QioHMJBaO9otRffY4t41T0mSZJcjh9id2p8ixZaoSuSK2s4xHaa3j0-fgKpvZXayTKIxudj0y2DucKB1O_3kIxnUpdqoJzLts9dFyVOWpZ9-h8cRfZdtxdVte-J5TToBhUHQWUxUrO8Spp3jyi_UjnoXo7Brcbum84rmvGF_-FRBeALSKopoVj61gxN62NzfhDjCeXb4Hxs8wO6IVBBGfOdypLUgSv05ki6A63X8a1nfOEIViEK9QLFC1ToZBOPYyrqO84adB0qHLWKtTUdCEKaxJNu6PWo1DkaUVFLEuJJK6o4z4JFqpARPGEPahV2x8RhnJ_OQR4OxhJExnk8Qv_aIlgUzmj4Eg2gFOjECxR4xh0IOh6TfGtCZvNrjjLOQVgESafxDrXHiuDUgw67G7H1DRYSkem0raFBjVYoehKqvIDt5qefPv0C7WhUMCEgCEQzyoWLlaTBAO-AcDCDqmLgKSMpa0TfhwAjYF1jOVYT4U_yTFTzyZ3eRLVXCZwbcnONFnUZ9mTfosML294wnM1TR-ErYb-XWkmN-z2bpxOAe_FSYCwxjHPUrdBlWEguLMkG8n0nKgwlxaLwP5JGikY2kY3hookIXMnxrC51QpPjvQr5gugZGg_shPZUW0w9OsOxLAd4sBT9I0CNBzhdPEEpNEGrsJZ6vKw5-4CoraugZbwjITo1Lh6G2Ty9yP95GjaS-GwsA_EZ-nJChebQokWIEbkG0aHQ7jq25BokEoUj94goJNfIoQDm_CqKbYV7Q35wwt8p94MDxtrg3ucDgQ36BDSmKozyoEIqKTU4b4Untwd5gWXSsWwDrfc9_Yib77huYmzDeP4743n57Z_Lb_-qitlNtc6qVbYSN7ieLqZZms0znt6069lsWqaiWmWiTqsaMS3LYs7Fcp5ixrO0vpFrnvLb9HaaTVd8xtPJjK8qnKKYCuSLar5gtyl2QqoJnbZJ9410bsD1cjnPshslClQu3G9yPt7NcDbb3dg1zU-KoXHsNlXSeXeW4KVXuKbCEmL6R_aJeQquDcw_Vez8fhMLwZt3SNRO3AxWrV85Ufp2KCal6S4uEAhpb803pD4qD_Y5xvNg4r8DAAD__1a4qTo">