[llvm-bugs] [Bug 37344] New: vector approximate reciprocal square root generates bad code on x86
via llvm-bugs
llvm-bugs at lists.llvm.org
Fri May 4 12:42:28 PDT 2018
https://bugs.llvm.org/show_bug.cgi?id=37344
Bug ID: 37344
Summary: vector approximate reciprocal square root generates
bad code on x86
Product: new-bugs
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: enhancement
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: gonzalobg88 at gmail.com
CC: llvm-bugs at lists.llvm.org
The following LLVM IR (see it live: https://godbolt.org/g/88kuky) computes the
approximate vector reciprocal square root rsqrt(x) ~= 1/ sqrt(x):
declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)
define <4 x float> @rsqrt(<4 x float>) {
%a = call afn <4 x float> @llvm.sqrt.v4f32(<4 x float> %0)
%c = fdiv <4 x float> <float 1.000000e+00, float 1.000000e+00, float
1.000000e+00, float 1.000000e+00>, %a
ret <4 x float> %c
}
On x86_64 with -O3 and sse4.2 they generate the following assembly:
LCPI0_0:
.long 1065353216 # float 1
.long 1065353216 # float 1
.long 1065353216 # float 1
.long 1065353216 # float 1
rsqrt: # @rsqrt
sqrtps xmm1, xmm0
movaps xmm0, xmmword ptr [rip + .LCPI0_0]
divps xmm0, xmm1
ret
However, it should just generate a call to rsqrtps .
I've tried with fast math flags but haven't been able to generate rsqrtps yet.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180504/da185e10/attachment.html>
More information about the llvm-bugs
mailing list