<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [AArch64] Improve FNEG DAGCombine: (fneg (fmul c, x)) -> (fmul -c, x) on AArch64"
   href="https://bugs.llvm.org/show_bug.cgi?id=37269">37269</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[AArch64]  Improve FNEG DAGCombine: (fneg (fmul c, x)) -> (fmul -c, x) on AArch64
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: AArch64
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>mcrosier@codeaurora.org
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>The generic DAG combiner has the following rule in visitFNEG:

 (fneg (fmul c, x)) -> (fmul -c, x)

Given the following C code:

void t1(double a, double *ptr) {
  double b = 0.5 * a;
  ptr[0] = b;
  ptr[1] = -b;
}

Clang will generate the following assembly for AArch64 (with -O3
-ffp-contract=fast):
t1:
        fmov    d1, #0.50000000
        fmov    d2, #-0.50000000
        fmul    d1, d0, d1
        fmul    d0, d0, d2
        stp     d1, d0, [x0]
        ret

I can think of two alternative code sequences.

Sequence 1:
t1:
        fmov    d1, #0.50000000
        fmul    d2, d0, d1
        fnmul   d0, d0, d1
        stp     d2, d0, [x0]
        ret

Sequence 2:
t1:
        fmov    d1, #0.50000000
        fmul    d0, d0, d1
        fneg    d1, d0
        stp     d0, d1, [x0]

If we disable the aforementioned DAGCombine we'll get the first sequence.  I
would expect this sequence to be strictly better, since an extra fmov is
removed (which is a result of the first fmul having multipled uses).  GCC will
generate the code in sequence 2, but I haven't yet convinced myself this
sequence is better.  Specifically, while the fneg may have a lower latency as
compared to a fmul or fnmul (at least this is the case on Falkor), the fneg
must wait for the first fmul to complete, thus I expect the critical path to be
longer.

FWIW, I wanted to record my findings, but I'm not currently pursuing this
issue.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>