[LLVMbugs] [Bug 24063] New: clang with -ffast-math do not match behavior of gcc in case of reciprocal codegen
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Wed Jul 8 06:12:57 PDT 2015
https://llvm.org/bugs/show_bug.cgi?id=24063
Bug ID: 24063
Summary: clang with -ffast-math do not match behavior of gcc in
case of reciprocal codegen
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: sgvozdar at gmail.com
CC: llvmbugs at cs.uiuc.edu
Classification: Unclassified
After following commit clang started to emit approximate version of sqrt and
div in case of -ffast-math option but the behavior is not precisely matched
behavior of gcc.
gcc in accordance with documentation and disasm (see examples) use approximate
version only in case of 1.0f /sqrtf() but clang do it for plain sqrtf() too.
commit 73aa02eb0979ae1d0643aee03c5d0c4b1926408f
Author: Sanjay Patel <spatel at rotateright.com>
Date: Mon Jun 22 18:29:44 2015 +0000
[x86] set default reciprocal (division and square root) codegen to match
GCC
D8982 ( checked in at http://reviews.llvm.org/rL239001 ) added command-line
options to allow reciprocal estimate instructions to be used in place of
divisions and square roots.
This patch changes the default settings for x86 targets to allow that recip
codegen (except for scalar division because that breaks too much code) when
using -ffast-math or its equivalent.
This matches GCC behavior for this kind of codegen.
Differential Revision: http://reviews.llvm.org/D10396
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240310
91177308-0d34-0410-b5e6-96231b3b80d8
https://gcc.gnu.org/onlinedocs/gcc-4.9.3/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options
***
-mrecip
This option enables use of RCPSS and RSQRTSS instructions (and their vectorized
variants RCPPS and RSQRTPS) with an additional Newton-Raphson step to increase
precision instead of DIVSS and SQRTSS (and their vectorized variants) for
single-precision floating-point arguments. These instructions are generated
only when -funsafe-math-optimizations is enabled together with
-finite-math-only and -fno-trapping-math. Note that while the throughput of the
sequence is higher than the throughput of the non-reciprocal instruction, the
precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of
1.0 equals 0.99999994).
Note that GCC implements 1.0f/sqrtf(x) in terms of RSQRTSS (or RSQRTPS) already
with -ffast-math (or the above option combination), and doesn't need -mrecip.
Also note that GCC emits the above sequence with additional Newton-Raphson step
for vectorized single-float division and vectorized sqrtf(x) already with
-ffast-math (or the above option combination), and doesn't need -mrecip.
***
====REPRODUCER 1.c ====
#include "math.h"
float foo( float a)
{
return 1.0f / sqrtf( a);
}
float bar( float a)
{
return sqrtf( a);
}
=======================
$gcc -m32 -O2 -ffast-math -march=core-avx2 -c 1.c -o 1.o && objdump -d 1.o
00000000 <foo>:
0: 83 ec 04 sub $0x4,%esp
3: c5 fa 10 44 24 08 vmovss 0x8(%esp),%xmm0
9: c5 f2 52 c8 vrsqrtss %xmm0,%xmm1,%xmm1
d: c5 f2 59 c0 vmulss %xmm0,%xmm1,%xmm0
11: c5 fa 59 c1 vmulss %xmm1,%xmm0,%xmm0
15: c5 fa 58 05 00 00 00 vaddss 0x0,%xmm0,%xmm0
1c: 00
1d: c5 f2 59 0d 04 00 00 vmulss 0x4,%xmm1,%xmm1
24: 00
25: c5 fa 59 c1 vmulss %xmm1,%xmm0,%xmm0
29: c5 fa 11 04 24 vmovss %xmm0,(%esp)
2e: d9 04 24 flds (%esp)
31: 83 c4 04 add $0x4,%esp
34: c3 ret
35: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
39: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
00000040 <bar>:
40: 83 ec 04 sub $0x4,%esp
43: c5 fa 51 44 24 08 vsqrtss 0x8(%esp),%xmm0,%xmm0
49: c5 fa 11 04 24 vmovss %xmm0,(%esp)
4e: d9 04 24 flds (%esp)
51: 83 c4 04 add $0x4,%esp
54: c3 ret
#To use vrsqrtss in sqrtf gcc requred explicit -mrecip=sqrt
$gcc -m32 -O2 -ffast-math -march=core-avx2 -c 1.c -o 1.o -mrecip=sqrt &&
objdump -d 1.o
00000000 <foo>:
0: 83 ec 04 sub $0x4,%esp
3: c5 fa 10 44 24 08 vmovss 0x8(%esp),%xmm0
9: c5 f2 52 c8 vrsqrtss %xmm0,%xmm1,%xmm1
d: c5 f2 59 c0 vmulss %xmm0,%xmm1,%xmm0
11: c5 fa 59 c1 vmulss %xmm1,%xmm0,%xmm0
15: c5 fa 58 05 00 00 00 vaddss 0x0,%xmm0,%xmm0
1c: 00
1d: c5 f2 59 0d 04 00 00 vmulss 0x4,%xmm1,%xmm1
24: 00
25: c5 fa 59 c1 vmulss %xmm1,%xmm0,%xmm0
29: c5 fa 11 04 24 vmovss %xmm0,(%esp)
2e: d9 04 24 flds (%esp)
31: 83 c4 04 add $0x4,%esp
34: c3 ret
35: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
39: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
00000040 <bar>:
40: 83 ec 04 sub $0x4,%esp
43: c5 fa 10 4c 24 08 vmovss 0x8(%esp),%xmm1
49: c5 f2 c2 15 08 00 00 vcmpneqss 0x8,%xmm1,%xmm2
50: 00 04
52: c5 fa 52 c1 vrsqrtss %xmm1,%xmm0,%xmm0
56: c5 f8 54 c2 vandps %xmm2,%xmm0,%xmm0
5a: c5 fa 59 c9 vmulss %xmm1,%xmm0,%xmm1
5e: c5 f2 59 c0 vmulss %xmm0,%xmm1,%xmm0
62: c5 fa 58 05 00 00 00 vaddss 0x0,%xmm0,%xmm0
69: 00
6a: c5 f2 59 0d 04 00 00 vmulss 0x4,%xmm1,%xmm1
71: 00
72: c5 fa 59 c1 vmulss %xmm1,%xmm0,%xmm0
76: c5 fa 11 04 24 vmovss %xmm0,(%esp)
7b: d9 04 24 flds (%esp)
7e: 83 c4 04 add $0x4,%esp
81: c3 ret
===========================================================================
clang -m32 -O2 -ffast-math -march=core-avx2 -c 1.c -lm && objdump -d 1.o
00000000 <foo>:
0: 50 push %eax
1: c5 fa 10 44 24 08 vmovss 0x8(%esp),%xmm0
7: c5 fa 52 c8 vrsqrtss %xmm0,%xmm0,%xmm1
b: c5 f2 59 d1 vmulss %xmm1,%xmm1,%xmm2
f: c4 e2 79 a9 15 00 00 vfmadd213ss 0x0,%xmm0,%xmm2
16: 00 00
18: c5 f2 59 05 00 00 00 vmulss 0x0,%xmm1,%xmm0
1f: 00
20: c5 ea 59 c0 vmulss %xmm0,%xmm2,%xmm0
24: c5 fa 11 04 24 vmovss %xmm0,(%esp)
29: d9 04 24 flds (%esp)
2c: 58 pop %eax
2d: c3 ret
2e: 66 90 xchg %ax,%ax
00000030 <bar>:
30: 50 push %eax
31: c5 fa 10 44 24 08 vmovss 0x8(%esp),%xmm0
37: c5 fa 52 c8 vrsqrtss %xmm0,%xmm0,%xmm1
3b: c5 f2 59 d1 vmulss %xmm1,%xmm1,%xmm2
3f: c4 e2 79 a9 15 00 00 vfmadd213ss 0x0,%xmm0,%xmm2
46: 00 00
48: c5 f2 59 0d 00 00 00 vmulss 0x0,%xmm1,%xmm1
4f: 00
50: c5 ea 59 c9 vmulss %xmm1,%xmm2,%xmm1
54: c5 fa 59 c9 vmulss %xmm1,%xmm0,%xmm1
58: c5 e8 57 d2 vxorps %xmm2,%xmm2,%xmm2
5c: c5 fa c2 c2 00 vcmpeqss %xmm2,%xmm0,%xmm0
61: c5 f8 55 c1 vandnps %xmm1,%xmm0,%xmm0
65: c5 fa 11 04 24 vmovss %xmm0,(%esp)
6a: d9 04 24 flds (%esp)
6d: 58 pop %eax
6e: c3 ret
===COMPILERS INFO======
$gcc -v
***
GNU C (GCC) version 4.9.4 20150703 (prerelease) (x86_64-unknown-linux-gnu)
***
$clang -v
***
clang version 3.7.0 (http://llvm.org/git/clang.git
51ef4b95aad99852e704dea9c172f72df4ecb5d1) (http://llvm.org/git/llvm.git
73aa02eb0979ae1d0643aee03c5d0c4b1926408f)
***
Sergey Gvozdarev
===============
Software Engineer
Intel Compiler Team
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150708/ed61852d/attachment.html>
More information about the llvm-bugs
mailing list