<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW " title="NEW --- - clang with -ffast-math do not match behavior of gcc in case of reciprocal codegen" href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bugs_show-5Fbug.cgi-3Fid-3D24063&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=BL4cA1oPiC5aXSHuu-l3pwoYfjWW9CxKgn_njX9cxOU&s=lyGsdB93FcBjCDde0l2S4rHPMLtJOIhvqI6DPEFg5c4&e=">24063</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>clang with -ffast-math do not match behavior of gcc in case of  reciprocal codegen

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: X86

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>sgvozdar@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvmbugs@cs.uiuc.edu

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>After following commit clang started to emit approximate version of sqrt and

div in case of -ffast-math option but the behavior is not precisely matched

behavior of gcc.

gcc in accordance with documentation and disasm (see examples) use approximate

version only in case of 1.0f /sqrtf() but clang do it for plain sqrtf() too.

commit 73aa02eb0979ae1d0643aee03c5d0c4b1926408f

Author: Sanjay Patel <<a href="mailto:spatel@rotateright.com">spatel@rotateright.com</a>>

Date:   Mon Jun 22 18:29:44 2015 +0000

    [x86] set default reciprocal (division and square root) codegen to match

GCC

    D8982 ( checked in at <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_rL239001&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=BL4cA1oPiC5aXSHuu-l3pwoYfjWW9CxKgn_njX9cxOU&s=jqnVi2CtuXCizZkTXUPF8Wq4SbnS38CPxTD8v_YrGcw&e=">http://reviews.llvm.org/rL239001</a> ) added command-line

    options to allow reciprocal estimate instructions to be used in place of

    divisions and square roots.

    This patch changes the default settings for x86 targets to allow that recip

    codegen (except for scalar division because that breaks too much code) when

    using -ffast-math or its equivalent.

    This matches GCC behavior for this kind of codegen.

    Differential Revision: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10396&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=BL4cA1oPiC5aXSHuu-l3pwoYfjWW9CxKgn_njX9cxOU&s=HFbqzWenRSB-xNZ1TKGED0nY1h3L5G2jAT9eycQaI9k&e=">http://reviews.llvm.org/D10396</a>

    git-svn-id: <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_svn_llvm-2Dproject_llvm_trunk-26-2364-3B240310&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=BL4cA1oPiC5aXSHuu-l3pwoYfjWW9CxKgn_njX9cxOU&s=qIV6We1emrDIxLJj7r6f2DE-Nd1StjLJ9fK28rSgCwU&e=">https://llvm.org/svn/llvm-project/llvm/trunk@240310</a>

91177308-0d34-0410-b5e6-96231b3b80d8

<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__gcc.gnu.org_onlinedocs_gcc-2D4.9.3_gcc_i386-2Dand-2Dx86-2D64-2DOptions.html-23i386-2Dand-2Dx86-2D64-2DOptions&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=BL4cA1oPiC5aXSHuu-l3pwoYfjWW9CxKgn_njX9cxOU&s=2CuKepOSnPCijOpxDh1ZwqLt3Nv90ryAJWghldaP-9c&e=">https://gcc.gnu.org/onlinedocs/gcc-4.9.3/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options</a>

***

-mrecip

This option enables use of RCPSS and RSQRTSS instructions (and their vectorized

variants RCPPS and RSQRTPS) with an additional Newton-Raphson step to increase

precision instead of DIVSS and SQRTSS (and their vectorized variants) for

single-precision floating-point arguments. These instructions are generated

only when -funsafe-math-optimizations is enabled together with

-finite-math-only and -fno-trapping-math. Note that while the throughput of the

sequence is higher than the throughput of the non-reciprocal instruction, the

precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of

1.0 equals 0.99999994).

Note that GCC implements 1.0f/sqrtf(x) in terms of RSQRTSS (or RSQRTPS) already

with -ffast-math (or the above option combination), and doesn't need -mrecip.

Also note that GCC emits the above sequence with additional Newton-Raphson step

for vectorized single-float division and vectorized sqrtf(x) already with

-ffast-math (or the above option combination), and doesn't need -mrecip. 

***

====REPRODUCER 1.c ====

#include "math.h"

float foo( float a)

{

    return 1.0f / sqrtf( a);

}

float bar( float a)

{

    return sqrtf( a);

}

=======================

$gcc -m32 -O2  -ffast-math -march=core-avx2  -c  1.c -o 1.o  && objdump -d 1.o

00000000 <foo>:

   0:   83 ec 04                sub    $0x4,%esp

   3:   c5 fa 10 44 24 08       vmovss 0x8(%esp),%xmm0

   9:   c5 f2 52 c8             vrsqrtss %xmm0,%xmm1,%xmm1

   d:   c5 f2 59 c0             vmulss %xmm0,%xmm1,%xmm0

  11:   c5 fa 59 c1             vmulss %xmm1,%xmm0,%xmm0

  15:   c5 fa 58 05 00 00 00    vaddss 0x0,%xmm0,%xmm0

  1c:   00

  1d:   c5 f2 59 0d 04 00 00    vmulss 0x4,%xmm1,%xmm1

  24:   00

  25:   c5 fa 59 c1             vmulss %xmm1,%xmm0,%xmm0

  29:   c5 fa 11 04 24          vmovss %xmm0,(%esp)

  2e:   d9 04 24                flds   (%esp)

  31:   83 c4 04                add    $0x4,%esp

  34:   c3                      ret

  35:   8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi

  39:   8d bc 27 00 00 00 00    lea    0x0(%edi,%eiz,1),%edi

00000040 <bar>:

  40:   83 ec 04                sub    $0x4,%esp

  43:   c5 fa 51 44 24 08       vsqrtss 0x8(%esp),%xmm0,%xmm0

  49:   c5 fa 11 04 24          vmovss %xmm0,(%esp)

  4e:   d9 04 24                flds   (%esp)

  51:   83 c4 04                add    $0x4,%esp

  54:   c3                      ret

#To use vrsqrtss in sqrtf gcc requred explicit -mrecip=sqrt

$gcc -m32 -O2  -ffast-math -march=core-avx2  -c  1.c -o 1.o -mrecip=sqrt &&

objdump -d 1.o

00000000 <foo>:

   0:   83 ec 04                sub    $0x4,%esp

   3:   c5 fa 10 44 24 08       vmovss 0x8(%esp),%xmm0

   9:   c5 f2 52 c8             vrsqrtss %xmm0,%xmm1,%xmm1

   d:   c5 f2 59 c0             vmulss %xmm0,%xmm1,%xmm0

  11:   c5 fa 59 c1             vmulss %xmm1,%xmm0,%xmm0

  15:   c5 fa 58 05 00 00 00    vaddss 0x0,%xmm0,%xmm0

  1c:   00

  1d:   c5 f2 59 0d 04 00 00    vmulss 0x4,%xmm1,%xmm1

  24:   00

  25:   c5 fa 59 c1             vmulss %xmm1,%xmm0,%xmm0

  29:   c5 fa 11 04 24          vmovss %xmm0,(%esp)

  2e:   d9 04 24                flds   (%esp)

  31:   83 c4 04                add    $0x4,%esp

  34:   c3                      ret

  35:   8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi

  39:   8d bc 27 00 00 00 00    lea    0x0(%edi,%eiz,1),%edi

00000040 <bar>:

  40:   83 ec 04                sub    $0x4,%esp

  43:   c5 fa 10 4c 24 08       vmovss 0x8(%esp),%xmm1

  49:   c5 f2 c2 15 08 00 00    vcmpneqss 0x8,%xmm1,%xmm2

  50:   00 04

  52:   c5 fa 52 c1             vrsqrtss %xmm1,%xmm0,%xmm0

  56:   c5 f8 54 c2             vandps %xmm2,%xmm0,%xmm0

  5a:   c5 fa 59 c9             vmulss %xmm1,%xmm0,%xmm1

  5e:   c5 f2 59 c0             vmulss %xmm0,%xmm1,%xmm0

  62:   c5 fa 58 05 00 00 00    vaddss 0x0,%xmm0,%xmm0

  69:   00

  6a:   c5 f2 59 0d 04 00 00    vmulss 0x4,%xmm1,%xmm1

  71:   00

  72:   c5 fa 59 c1             vmulss %xmm1,%xmm0,%xmm0

  76:   c5 fa 11 04 24          vmovss %xmm0,(%esp)

  7b:   d9 04 24                flds   (%esp)

  7e:   83 c4 04                add    $0x4,%esp

  81:   c3                      ret

===========================================================================

clang -m32 -O2  -ffast-math -march=core-avx2  -c  1.c -lm && objdump -d 1.o

00000000 <foo>:

   0:   50                      push   %eax

   1:   c5 fa 10 44 24 08       vmovss 0x8(%esp),%xmm0

   7:   c5 fa 52 c8             vrsqrtss %xmm0,%xmm0,%xmm1

   b:   c5 f2 59 d1             vmulss %xmm1,%xmm1,%xmm2

   f:   c4 e2 79 a9 15 00 00    vfmadd213ss 0x0,%xmm0,%xmm2

  16:   00 00

  18:   c5 f2 59 05 00 00 00    vmulss 0x0,%xmm1,%xmm0

  1f:   00

  20:   c5 ea 59 c0             vmulss %xmm0,%xmm2,%xmm0

  24:   c5 fa 11 04 24          vmovss %xmm0,(%esp)

  29:   d9 04 24                flds   (%esp)

  2c:   58                      pop    %eax

  2d:   c3                      ret

  2e:   66 90                   xchg   %ax,%ax

00000030 <bar>:

  30:   50                      push   %eax

  31:   c5 fa 10 44 24 08       vmovss 0x8(%esp),%xmm0

  37:   c5 fa 52 c8             vrsqrtss %xmm0,%xmm0,%xmm1

  3b:   c5 f2 59 d1             vmulss %xmm1,%xmm1,%xmm2

  3f:   c4 e2 79 a9 15 00 00    vfmadd213ss 0x0,%xmm0,%xmm2

  46:   00 00

  48:   c5 f2 59 0d 00 00 00    vmulss 0x0,%xmm1,%xmm1

  4f:   00

  50:   c5 ea 59 c9             vmulss %xmm1,%xmm2,%xmm1

  54:   c5 fa 59 c9             vmulss %xmm1,%xmm0,%xmm1

  58:   c5 e8 57 d2             vxorps %xmm2,%xmm2,%xmm2

  5c:   c5 fa c2 c2 00          vcmpeqss %xmm2,%xmm0,%xmm0

  61:   c5 f8 55 c1             vandnps %xmm1,%xmm0,%xmm0

  65:   c5 fa 11 04 24          vmovss %xmm0,(%esp)

  6a:   d9 04 24                flds   (%esp)

  6d:   58                      pop    %eax

  6e:   c3                      ret

===COMPILERS INFO======

$gcc -v

***

GNU C (GCC) version 4.9.4 20150703 (prerelease) (x86_64-unknown-linux-gnu)

***

$clang -v

***

clang version 3.7.0 (<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_clang.git&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=BL4cA1oPiC5aXSHuu-l3pwoYfjWW9CxKgn_njX9cxOU&s=IQwy6rvR4p0yPxRZoI9gGAcQ2zjlS7A72CsWGuMUqRs&e=">http://llvm.org/git/clang.git</a>

51ef4b95aad99852e704dea9c172f72df4ecb5d1) (<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_llvm.git&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=BL4cA1oPiC5aXSHuu-l3pwoYfjWW9CxKgn_njX9cxOU&s=RvX-dP1XN2kCzumpcct_uTtgoGwqxMx194SPdT39COA&e=">http://llvm.org/git/llvm.git</a>

73aa02eb0979ae1d0643aee03c5d0c4b1926408f)

***

Sergey Gvozdarev

===============

Software Engineer

Intel Compiler Team</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>