<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - LLVM misses reciprocal estimate instructions in ISel on ARMv7"

   href="https://llvm.org/bugs/show_bug.cgi?id=27107">27107</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>LLVM misses reciprocal estimate instructions in ISel on ARMv7

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>new-bugs

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>new bugs

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>steven@uplinklabs.net

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>LLVM is missing the opportunity to use VRSQRTE/VRSQRTS (and the other

reciprocal estimate instructions):

$ cat rsqrt.c

#include <math.h>

float rsqrtf(float f)

{

        return 1.0f / sqrtf(f);

}

$ clang -O3 -mcpu=native -mfpu=neon -mfloat-abi=hard -ffast-math -S -o -

rsqrt.c | showasm

rsqrtf:                                 @ @rsqrtf

        vsqrt.f32       s0, s0

        vmov.f32        s2, #1.000000e+00

        vdiv.f32        s0, s2, s0

        bx      lr

Conversely, on x86_64, LLVM does the right thing:

$ clang -O3 -march=core-avx2 -ffast-math -S -o - rsqrt.c | showasm

rsqrtf:                                 # @rsqrtf

        vrsqrtss        %xmm0, %xmm0, %xmm1

        vmulss  %xmm1, %xmm1, %xmm2

        vfmadd213ss     .LCPI1_0(%rip), %xmm0, %xmm2

        vmulss  .LCPI1_1(%rip), %xmm1, %xmm0

        vmulss  %xmm0, %xmm2, %xmm0

        retq

It will even apply this properly to packed vectors if the inputs make sense for

it.

Right now the lack of reciprocal square root estimates on ARM breaks

auto-vectorization for a silly program I wrote, and the hand-written NEON

intrinsics version is beating the auto-vectorized variants (because the

auto-vectorization fails and everything gets bottlenecked around vsqrt+vdiv).

I looked at implementing this myself but got confused trying to understand

tablegen syntax. It looks like there just needs to be an ARMTargetLowering

implementation for TargetLowering::getRsqrtEstimate and

TargetLowering::getRecipEstimate.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>