<html>
    <head>
      <base href="http://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - GlobalISel -O0 for AArch64 moves floating point values through GPRs way too often"
   href="http://bugs.llvm.org/show_bug.cgi?id=32550">32550</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>GlobalISel -O0 for AArch64 moves floating point values through GPRs way too often
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>GlobalISel
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>kristof.beyls@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Investigating the biggest performance and code size regressions I reported in
<a href="http://lists.llvm.org/pipermail/llvm-dev/2017-April/111713.html">http://lists.llvm.org/pipermail/llvm-dev/2017-April/111713.html</a>, it seems like
most boil down to floating point values being moved through general purpose
registers.

A very simple example is:
$ cat t3.c
double f(double r) {
  return r*r;
}
$ clang -target aarch64  -O0 -o - -S t3.c
...
f:                                      // @f
// BB#0:                                // %entry
        sub     sp, sp, #16             // =16
        str     d0, [sp, #8]
        ldr     d0, [sp, #8]
        ldr     d1, [sp, #8]
        fmul    d0, d0, d1
        add     sp, sp, #16             // =16
        ret
...
$ clang -target aarch64 -mllvm -global-isel -O0 -o - -S t3.c
...
f:                                      // @f
// BB#0:                                // %entry
        sub     sp, sp, #16             // =16
        fmov    x8, d0
        str     x8, [sp, #8]
        ldr     x8, [sp, #8]
        ldr     x9, [sp, #8]
        fmov    d0, x8
        fmov    d1, x9
        fmul    d0, d0, d1
        add     sp, sp, #16             // =16
        ret
...

It might be that this happens "just" every time a floating point variable is
loaded/stored to the stack, or it may happen more generally.
I think this needs to be fixed first to get rid of the source of the most
severe performance and code size regressions when enabling global-isel at -O0.
After this is fixed, another round of investigation can be done to find the
remaining biggest code size and performance regressions.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>