<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - X86 should enable aggressive FMA formation (at least on skylake)"

   href="https://bugs.llvm.org/show_bug.cgi?id=36826">36826</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>X86 should enable aggressive FMA formation (at least on skylake)

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>new-bugs

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>unspecified

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>new bugs

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>chandlerc@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>craig.topper@gmail.com, llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>On skylake, vector FP add has the same latency and throughput as FMA, so we

should definitely enable aggressive FMA formation.

On haswell and broadwell the issue is more complex. Add has shorter latency (3

cycles vs 5 cycles according to Agner) but half the throughput (one port rather

than two ports). I suspect that this is sufficient for it to be worth using FMA

everywhere, but we should benchmark to confirm.

If we can enable it everywhere, simply the following patch will suffice:

```

--- a/llvm/lib/Target/X86/X86ISelLowering.h

+++ b/llvm/lib/Target/X86/X86ISelLowering.h

@@ -838,6 +838,9 @@ namespace llvm {

       return 2;

     }

+    /// Force aggressive FMA fusion.

+    bool enableAggressiveFMAFusion(EVT VT) const override { return true; }

+

     /// Return the value type to use for ISD::SETCC.

     EVT getSetCCResultType(const DataLayout &DL, LLVMContext &Context,

                            EVT VT) const override;

```

Otherwise, we'll need to factor the per-subtarget facility for this out of the

AArch64 backend and use it to enable this on a per-subtarget basis for the X86

backend as well.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>