<html>
<head>
<base href="https://llvm.org/bugs/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW " title="NEW --- - clang with -ffast-math do not match behavior of gcc in case of reciprocal codegen" href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bugs_show-5Fbug.cgi-3Fid-3D24063&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=BL4cA1oPiC5aXSHuu-l3pwoYfjWW9CxKgn_njX9cxOU&s=lyGsdB93FcBjCDde0l2S4rHPMLtJOIhvqI6DPEFg5c4&e=">24063</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>clang with -ffast-math do not match behavior of gcc in case of reciprocal codegen
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: X86
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>sgvozdar@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvmbugs@cs.uiuc.edu
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr></table>
<p>
<div>
<pre>After following commit clang started to emit approximate version of sqrt and
div in case of -ffast-math option but the behavior is not precisely matched
behavior of gcc.
gcc in accordance with documentation and disasm (see examples) use approximate
version only in case of 1.0f /sqrtf() but clang do it for plain sqrtf() too.
commit 73aa02eb0979ae1d0643aee03c5d0c4b1926408f
Author: Sanjay Patel <<a href="mailto:spatel@rotateright.com">spatel@rotateright.com</a>>
Date: Mon Jun 22 18:29:44 2015 +0000
[x86] set default reciprocal (division and square root) codegen to match
GCC
D8982 ( checked in at <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_rL239001&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=BL4cA1oPiC5aXSHuu-l3pwoYfjWW9CxKgn_njX9cxOU&s=jqnVi2CtuXCizZkTXUPF8Wq4SbnS38CPxTD8v_YrGcw&e=">http://reviews.llvm.org/rL239001</a> ) added command-line
options to allow reciprocal estimate instructions to be used in place of
divisions and square roots.
This patch changes the default settings for x86 targets to allow that recip
codegen (except for scalar division because that breaks too much code) when
using -ffast-math or its equivalent.
This matches GCC behavior for this kind of codegen.
Differential Revision: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10396&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=BL4cA1oPiC5aXSHuu-l3pwoYfjWW9CxKgn_njX9cxOU&s=HFbqzWenRSB-xNZ1TKGED0nY1h3L5G2jAT9eycQaI9k&e=">http://reviews.llvm.org/D10396</a>
git-svn-id: <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_svn_llvm-2Dproject_llvm_trunk-26-2364-3B240310&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=BL4cA1oPiC5aXSHuu-l3pwoYfjWW9CxKgn_njX9cxOU&s=qIV6We1emrDIxLJj7r6f2DE-Nd1StjLJ9fK28rSgCwU&e=">https://llvm.org/svn/llvm-project/llvm/trunk@240310</a>
91177308-0d34-0410-b5e6-96231b3b80d8
<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__gcc.gnu.org_onlinedocs_gcc-2D4.9.3_gcc_i386-2Dand-2Dx86-2D64-2DOptions.html-23i386-2Dand-2Dx86-2D64-2DOptions&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=BL4cA1oPiC5aXSHuu-l3pwoYfjWW9CxKgn_njX9cxOU&s=2CuKepOSnPCijOpxDh1ZwqLt3Nv90ryAJWghldaP-9c&e=">https://gcc.gnu.org/onlinedocs/gcc-4.9.3/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options</a>
***
-mrecip
This option enables use of RCPSS and RSQRTSS instructions (and their vectorized
variants RCPPS and RSQRTPS) with an additional Newton-Raphson step to increase
precision instead of DIVSS and SQRTSS (and their vectorized variants) for
single-precision floating-point arguments. These instructions are generated
only when -funsafe-math-optimizations is enabled together with
-finite-math-only and -fno-trapping-math. Note that while the throughput of the
sequence is higher than the throughput of the non-reciprocal instruction, the
precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of
1.0 equals 0.99999994).
Note that GCC implements 1.0f/sqrtf(x) in terms of RSQRTSS (or RSQRTPS) already
with -ffast-math (or the above option combination), and doesn't need -mrecip.
Also note that GCC emits the above sequence with additional Newton-Raphson step
for vectorized single-float division and vectorized sqrtf(x) already with
-ffast-math (or the above option combination), and doesn't need -mrecip.
***
====REPRODUCER 1.c ====
#include "math.h"
float foo( float a)
{
return 1.0f / sqrtf( a);
}
float bar( float a)
{
return sqrtf( a);
}
=======================
$gcc -m32 -O2 -ffast-math -march=core-avx2 -c 1.c -o 1.o && objdump -d 1.o
00000000 <foo>:
0: 83 ec 04 sub $0x4,%esp
3: c5 fa 10 44 24 08 vmovss 0x8(%esp),%xmm0
9: c5 f2 52 c8 vrsqrtss %xmm0,%xmm1,%xmm1
d: c5 f2 59 c0 vmulss %xmm0,%xmm1,%xmm0
11: c5 fa 59 c1 vmulss %xmm1,%xmm0,%xmm0
15: c5 fa 58 05 00 00 00 vaddss 0x0,%xmm0,%xmm0
1c: 00
1d: c5 f2 59 0d 04 00 00 vmulss 0x4,%xmm1,%xmm1
24: 00
25: c5 fa 59 c1 vmulss %xmm1,%xmm0,%xmm0
29: c5 fa 11 04 24 vmovss %xmm0,(%esp)
2e: d9 04 24 flds (%esp)
31: 83 c4 04 add $0x4,%esp
34: c3 ret
35: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
39: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
00000040 <bar>:
40: 83 ec 04 sub $0x4,%esp
43: c5 fa 51 44 24 08 vsqrtss 0x8(%esp),%xmm0,%xmm0
49: c5 fa 11 04 24 vmovss %xmm0,(%esp)
4e: d9 04 24 flds (%esp)
51: 83 c4 04 add $0x4,%esp
54: c3 ret
#To use vrsqrtss in sqrtf gcc requred explicit -mrecip=sqrt
$gcc -m32 -O2 -ffast-math -march=core-avx2 -c 1.c -o 1.o -mrecip=sqrt &&
objdump -d 1.o
00000000 <foo>:
0: 83 ec 04 sub $0x4,%esp
3: c5 fa 10 44 24 08 vmovss 0x8(%esp),%xmm0
9: c5 f2 52 c8 vrsqrtss %xmm0,%xmm1,%xmm1
d: c5 f2 59 c0 vmulss %xmm0,%xmm1,%xmm0
11: c5 fa 59 c1 vmulss %xmm1,%xmm0,%xmm0
15: c5 fa 58 05 00 00 00 vaddss 0x0,%xmm0,%xmm0
1c: 00
1d: c5 f2 59 0d 04 00 00 vmulss 0x4,%xmm1,%xmm1
24: 00
25: c5 fa 59 c1 vmulss %xmm1,%xmm0,%xmm0
29: c5 fa 11 04 24 vmovss %xmm0,(%esp)
2e: d9 04 24 flds (%esp)
31: 83 c4 04 add $0x4,%esp
34: c3 ret
35: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
39: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
00000040 <bar>:
40: 83 ec 04 sub $0x4,%esp
43: c5 fa 10 4c 24 08 vmovss 0x8(%esp),%xmm1
49: c5 f2 c2 15 08 00 00 vcmpneqss 0x8,%xmm1,%xmm2
50: 00 04
52: c5 fa 52 c1 vrsqrtss %xmm1,%xmm0,%xmm0
56: c5 f8 54 c2 vandps %xmm2,%xmm0,%xmm0
5a: c5 fa 59 c9 vmulss %xmm1,%xmm0,%xmm1
5e: c5 f2 59 c0 vmulss %xmm0,%xmm1,%xmm0
62: c5 fa 58 05 00 00 00 vaddss 0x0,%xmm0,%xmm0
69: 00
6a: c5 f2 59 0d 04 00 00 vmulss 0x4,%xmm1,%xmm1
71: 00
72: c5 fa 59 c1 vmulss %xmm1,%xmm0,%xmm0
76: c5 fa 11 04 24 vmovss %xmm0,(%esp)
7b: d9 04 24 flds (%esp)
7e: 83 c4 04 add $0x4,%esp
81: c3 ret
===========================================================================
clang -m32 -O2 -ffast-math -march=core-avx2 -c 1.c -lm && objdump -d 1.o
00000000 <foo>:
0: 50 push %eax
1: c5 fa 10 44 24 08 vmovss 0x8(%esp),%xmm0
7: c5 fa 52 c8 vrsqrtss %xmm0,%xmm0,%xmm1
b: c5 f2 59 d1 vmulss %xmm1,%xmm1,%xmm2
f: c4 e2 79 a9 15 00 00 vfmadd213ss 0x0,%xmm0,%xmm2
16: 00 00
18: c5 f2 59 05 00 00 00 vmulss 0x0,%xmm1,%xmm0
1f: 00
20: c5 ea 59 c0 vmulss %xmm0,%xmm2,%xmm0
24: c5 fa 11 04 24 vmovss %xmm0,(%esp)
29: d9 04 24 flds (%esp)
2c: 58 pop %eax
2d: c3 ret
2e: 66 90 xchg %ax,%ax
00000030 <bar>:
30: 50 push %eax
31: c5 fa 10 44 24 08 vmovss 0x8(%esp),%xmm0
37: c5 fa 52 c8 vrsqrtss %xmm0,%xmm0,%xmm1
3b: c5 f2 59 d1 vmulss %xmm1,%xmm1,%xmm2
3f: c4 e2 79 a9 15 00 00 vfmadd213ss 0x0,%xmm0,%xmm2
46: 00 00
48: c5 f2 59 0d 00 00 00 vmulss 0x0,%xmm1,%xmm1
4f: 00
50: c5 ea 59 c9 vmulss %xmm1,%xmm2,%xmm1
54: c5 fa 59 c9 vmulss %xmm1,%xmm0,%xmm1
58: c5 e8 57 d2 vxorps %xmm2,%xmm2,%xmm2
5c: c5 fa c2 c2 00 vcmpeqss %xmm2,%xmm0,%xmm0
61: c5 f8 55 c1 vandnps %xmm1,%xmm0,%xmm0
65: c5 fa 11 04 24 vmovss %xmm0,(%esp)
6a: d9 04 24 flds (%esp)
6d: 58 pop %eax
6e: c3 ret
===COMPILERS INFO======
$gcc -v
***
GNU C (GCC) version 4.9.4 20150703 (prerelease) (x86_64-unknown-linux-gnu)
***
$clang -v
***
clang version 3.7.0 (<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_clang.git&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=BL4cA1oPiC5aXSHuu-l3pwoYfjWW9CxKgn_njX9cxOU&s=IQwy6rvR4p0yPxRZoI9gGAcQ2zjlS7A72CsWGuMUqRs&e=">http://llvm.org/git/clang.git</a>
51ef4b95aad99852e704dea9c172f72df4ecb5d1) (<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_llvm.git&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=BL4cA1oPiC5aXSHuu-l3pwoYfjWW9CxKgn_njX9cxOU&s=RvX-dP1XN2kCzumpcct_uTtgoGwqxMx194SPdT39COA&e=">http://llvm.org/git/llvm.git</a>
73aa02eb0979ae1d0643aee03c5d0c4b1926408f)
***
Sergey Gvozdarev
===============
Software Engineer
Intel Compiler Team</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>