[llvm-bugs] [Bug 27107] New: LLVM misses reciprocal estimate instructions in ISel on ARMv7

via llvm-bugs llvm-bugs at lists.llvm.org
Mon Mar 28 21:08:58 PDT 2016


https://llvm.org/bugs/show_bug.cgi?id=27107

            Bug ID: 27107
           Summary: LLVM misses reciprocal estimate instructions in ISel
                    on ARMv7
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: steven at uplinklabs.net
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

LLVM is missing the opportunity to use VRSQRTE/VRSQRTS (and the other
reciprocal estimate instructions):

$ cat rsqrt.c
#include <math.h>

float rsqrtf(float f)
{
        return 1.0f / sqrtf(f);
}

$ clang -O3 -mcpu=native -mfpu=neon -mfloat-abi=hard -ffast-math -S -o -
rsqrt.c | showasm

rsqrtf:                                 @ @rsqrtf
        vsqrt.f32       s0, s0
        vmov.f32        s2, #1.000000e+00
        vdiv.f32        s0, s2, s0
        bx      lr


Conversely, on x86_64, LLVM does the right thing:

$ clang -O3 -march=core-avx2 -ffast-math -S -o - rsqrt.c | showasm
rsqrtf:                                 # @rsqrtf
        vrsqrtss        %xmm0, %xmm0, %xmm1
        vmulss  %xmm1, %xmm1, %xmm2
        vfmadd213ss     .LCPI1_0(%rip), %xmm0, %xmm2
        vmulss  .LCPI1_1(%rip), %xmm1, %xmm0
        vmulss  %xmm0, %xmm2, %xmm0
        retq

It will even apply this properly to packed vectors if the inputs make sense for
it.


Right now the lack of reciprocal square root estimates on ARM breaks
auto-vectorization for a silly program I wrote, and the hand-written NEON
intrinsics version is beating the auto-vectorized variants (because the
auto-vectorization fails and everything gets bottlenecked around vsqrt+vdiv).

I looked at implementing this myself but got confused trying to understand
tablegen syntax. It looks like there just needs to be an ARMTargetLowering
implementation for TargetLowering::getRsqrtEstimate and
TargetLowering::getRecipEstimate.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160329/7dfdbd1f/attachment.html>


More information about the llvm-bugs mailing list