[llvm-bugs] [Bug 37344] New: vector approximate reciprocal square root generates bad code on x86

via llvm-bugs llvm-bugs at lists.llvm.org
Fri May 4 12:42:28 PDT 2018


https://bugs.llvm.org/show_bug.cgi?id=37344

            Bug ID: 37344
           Summary: vector approximate reciprocal square root generates
                    bad code on x86
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: gonzalobg88 at gmail.com
                CC: llvm-bugs at lists.llvm.org

The following LLVM IR (see it live: https://godbolt.org/g/88kuky) computes the
approximate vector reciprocal square root rsqrt(x) ~= 1/ sqrt(x):

declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)
define <4 x float> @rsqrt(<4 x float>)  {
  %a = call afn <4 x float> @llvm.sqrt.v4f32(<4 x float> %0)
  %c = fdiv <4 x float> <float 1.000000e+00, float 1.000000e+00, float
1.000000e+00, float 1.000000e+00>, %a
  ret <4 x float> %c
}

On x86_64 with -O3 and sse4.2 they generate the following assembly:

LCPI0_0:
  .long 1065353216 # float 1
  .long 1065353216 # float 1
  .long 1065353216 # float 1
  .long 1065353216 # float 1
rsqrt: # @rsqrt
  sqrtps xmm1, xmm0
  movaps xmm0, xmmword ptr [rip + .LCPI0_0]
  divps xmm0, xmm1
  ret

However, it should just generate a call to rsqrtps .

I've tried with fast math flags but haven't been able to generate rsqrtps yet.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180504/da185e10/attachment.html>


More information about the llvm-bugs mailing list