<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - LLVM misses reciprocal estimate instructions in ISel on ARMv7"
   href="https://llvm.org/bugs/show_bug.cgi?id=27107">27107</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>LLVM misses reciprocal estimate instructions in ISel on ARMv7
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>new-bugs
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>new bugs
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>steven@uplinklabs.net
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>LLVM is missing the opportunity to use VRSQRTE/VRSQRTS (and the other
reciprocal estimate instructions):

$ cat rsqrt.c
#include <math.h>

float rsqrtf(float f)
{
        return 1.0f / sqrtf(f);
}

$ clang -O3 -mcpu=native -mfpu=neon -mfloat-abi=hard -ffast-math -S -o -
rsqrt.c | showasm

rsqrtf:                                 @ @rsqrtf
        vsqrt.f32       s0, s0
        vmov.f32        s2, #1.000000e+00
        vdiv.f32        s0, s2, s0
        bx      lr


Conversely, on x86_64, LLVM does the right thing:

$ clang -O3 -march=core-avx2 -ffast-math -S -o - rsqrt.c | showasm
rsqrtf:                                 # @rsqrtf
        vrsqrtss        %xmm0, %xmm0, %xmm1
        vmulss  %xmm1, %xmm1, %xmm2
        vfmadd213ss     .LCPI1_0(%rip), %xmm0, %xmm2
        vmulss  .LCPI1_1(%rip), %xmm1, %xmm0
        vmulss  %xmm0, %xmm2, %xmm0
        retq

It will even apply this properly to packed vectors if the inputs make sense for
it.


Right now the lack of reciprocal square root estimates on ARM breaks
auto-vectorization for a silly program I wrote, and the hand-written NEON
intrinsics version is beating the auto-vectorized variants (because the
auto-vectorization fails and everything gets bottlenecked around vsqrt+vdiv).

I looked at implementing this myself but got confused trying to understand
tablegen syntax. It looks like there just needs to be an ARMTargetLowering
implementation for TargetLowering::getRsqrtEstimate and
TargetLowering::getRecipEstimate.</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>