[LLVMbugs] [Bug 24063] New: clang with -ffast-math do not match behavior of gcc in case of reciprocal codegen

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Wed Jul 8 06:12:57 PDT 2015


https://llvm.org/bugs/show_bug.cgi?id=24063

            Bug ID: 24063
           Summary: clang with -ffast-math do not match behavior of gcc in
                    case of  reciprocal codegen
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: sgvozdar at gmail.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

After following commit clang started to emit approximate version of sqrt and
div in case of -ffast-math option but the behavior is not precisely matched
behavior of gcc.

gcc in accordance with documentation and disasm (see examples) use approximate
version only in case of 1.0f /sqrtf() but clang do it for plain sqrtf() too.


commit 73aa02eb0979ae1d0643aee03c5d0c4b1926408f
Author: Sanjay Patel <spatel at rotateright.com>
Date:   Mon Jun 22 18:29:44 2015 +0000

    [x86] set default reciprocal (division and square root) codegen to match
GCC

    D8982 ( checked in at http://reviews.llvm.org/rL239001 ) added command-line
    options to allow reciprocal estimate instructions to be used in place of
    divisions and square roots.

    This patch changes the default settings for x86 targets to allow that recip
    codegen (except for scalar division because that breaks too much code) when
    using -ffast-math or its equivalent.

    This matches GCC behavior for this kind of codegen.

    Differential Revision: http://reviews.llvm.org/D10396



    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240310
91177308-0d34-0410-b5e6-96231b3b80d8



https://gcc.gnu.org/onlinedocs/gcc-4.9.3/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options

***
-mrecip
This option enables use of RCPSS and RSQRTSS instructions (and their vectorized
variants RCPPS and RSQRTPS) with an additional Newton-Raphson step to increase
precision instead of DIVSS and SQRTSS (and their vectorized variants) for
single-precision floating-point arguments. These instructions are generated
only when -funsafe-math-optimizations is enabled together with
-finite-math-only and -fno-trapping-math. Note that while the throughput of the
sequence is higher than the throughput of the non-reciprocal instruction, the
precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of
1.0 equals 0.99999994).
Note that GCC implements 1.0f/sqrtf(x) in terms of RSQRTSS (or RSQRTPS) already
with -ffast-math (or the above option combination), and doesn't need -mrecip.

Also note that GCC emits the above sequence with additional Newton-Raphson step
for vectorized single-float division and vectorized sqrtf(x) already with
-ffast-math (or the above option combination), and doesn't need -mrecip. 
***

====REPRODUCER 1.c ====
#include "math.h"

float foo( float a)
{
    return 1.0f / sqrtf( a);
}

float bar( float a)
{
    return sqrtf( a);
}
=======================
$gcc -m32 -O2  -ffast-math -march=core-avx2  -c  1.c -o 1.o  && objdump -d 1.o
00000000 <foo>:
   0:   83 ec 04                sub    $0x4,%esp
   3:   c5 fa 10 44 24 08       vmovss 0x8(%esp),%xmm0
   9:   c5 f2 52 c8             vrsqrtss %xmm0,%xmm1,%xmm1
   d:   c5 f2 59 c0             vmulss %xmm0,%xmm1,%xmm0
  11:   c5 fa 59 c1             vmulss %xmm1,%xmm0,%xmm0
  15:   c5 fa 58 05 00 00 00    vaddss 0x0,%xmm0,%xmm0
  1c:   00
  1d:   c5 f2 59 0d 04 00 00    vmulss 0x4,%xmm1,%xmm1
  24:   00
  25:   c5 fa 59 c1             vmulss %xmm1,%xmm0,%xmm0
  29:   c5 fa 11 04 24          vmovss %xmm0,(%esp)
  2e:   d9 04 24                flds   (%esp)
  31:   83 c4 04                add    $0x4,%esp
  34:   c3                      ret
  35:   8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi
  39:   8d bc 27 00 00 00 00    lea    0x0(%edi,%eiz,1),%edi

00000040 <bar>:
  40:   83 ec 04                sub    $0x4,%esp
  43:   c5 fa 51 44 24 08       vsqrtss 0x8(%esp),%xmm0,%xmm0
  49:   c5 fa 11 04 24          vmovss %xmm0,(%esp)
  4e:   d9 04 24                flds   (%esp)
  51:   83 c4 04                add    $0x4,%esp
  54:   c3                      ret


#To use vrsqrtss in sqrtf gcc requred explicit -mrecip=sqrt

$gcc -m32 -O2  -ffast-math -march=core-avx2  -c  1.c -o 1.o -mrecip=sqrt &&
objdump -d 1.o
00000000 <foo>:
   0:   83 ec 04                sub    $0x4,%esp
   3:   c5 fa 10 44 24 08       vmovss 0x8(%esp),%xmm0
   9:   c5 f2 52 c8             vrsqrtss %xmm0,%xmm1,%xmm1
   d:   c5 f2 59 c0             vmulss %xmm0,%xmm1,%xmm0
  11:   c5 fa 59 c1             vmulss %xmm1,%xmm0,%xmm0
  15:   c5 fa 58 05 00 00 00    vaddss 0x0,%xmm0,%xmm0
  1c:   00
  1d:   c5 f2 59 0d 04 00 00    vmulss 0x4,%xmm1,%xmm1
  24:   00
  25:   c5 fa 59 c1             vmulss %xmm1,%xmm0,%xmm0
  29:   c5 fa 11 04 24          vmovss %xmm0,(%esp)
  2e:   d9 04 24                flds   (%esp)
  31:   83 c4 04                add    $0x4,%esp
  34:   c3                      ret
  35:   8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi
  39:   8d bc 27 00 00 00 00    lea    0x0(%edi,%eiz,1),%edi

00000040 <bar>:
  40:   83 ec 04                sub    $0x4,%esp
  43:   c5 fa 10 4c 24 08       vmovss 0x8(%esp),%xmm1
  49:   c5 f2 c2 15 08 00 00    vcmpneqss 0x8,%xmm1,%xmm2
  50:   00 04
  52:   c5 fa 52 c1             vrsqrtss %xmm1,%xmm0,%xmm0
  56:   c5 f8 54 c2             vandps %xmm2,%xmm0,%xmm0
  5a:   c5 fa 59 c9             vmulss %xmm1,%xmm0,%xmm1
  5e:   c5 f2 59 c0             vmulss %xmm0,%xmm1,%xmm0
  62:   c5 fa 58 05 00 00 00    vaddss 0x0,%xmm0,%xmm0
  69:   00
  6a:   c5 f2 59 0d 04 00 00    vmulss 0x4,%xmm1,%xmm1
  71:   00
  72:   c5 fa 59 c1             vmulss %xmm1,%xmm0,%xmm0
  76:   c5 fa 11 04 24          vmovss %xmm0,(%esp)
  7b:   d9 04 24                flds   (%esp)
  7e:   83 c4 04                add    $0x4,%esp
  81:   c3                      ret

===========================================================================
clang -m32 -O2  -ffast-math -march=core-avx2  -c  1.c -lm && objdump -d 1.o
00000000 <foo>:
   0:   50                      push   %eax
   1:   c5 fa 10 44 24 08       vmovss 0x8(%esp),%xmm0
   7:   c5 fa 52 c8             vrsqrtss %xmm0,%xmm0,%xmm1
   b:   c5 f2 59 d1             vmulss %xmm1,%xmm1,%xmm2
   f:   c4 e2 79 a9 15 00 00    vfmadd213ss 0x0,%xmm0,%xmm2
  16:   00 00
  18:   c5 f2 59 05 00 00 00    vmulss 0x0,%xmm1,%xmm0
  1f:   00
  20:   c5 ea 59 c0             vmulss %xmm0,%xmm2,%xmm0
  24:   c5 fa 11 04 24          vmovss %xmm0,(%esp)
  29:   d9 04 24                flds   (%esp)
  2c:   58                      pop    %eax
  2d:   c3                      ret
  2e:   66 90                   xchg   %ax,%ax

00000030 <bar>:
  30:   50                      push   %eax
  31:   c5 fa 10 44 24 08       vmovss 0x8(%esp),%xmm0
  37:   c5 fa 52 c8             vrsqrtss %xmm0,%xmm0,%xmm1
  3b:   c5 f2 59 d1             vmulss %xmm1,%xmm1,%xmm2
  3f:   c4 e2 79 a9 15 00 00    vfmadd213ss 0x0,%xmm0,%xmm2
  46:   00 00
  48:   c5 f2 59 0d 00 00 00    vmulss 0x0,%xmm1,%xmm1
  4f:   00
  50:   c5 ea 59 c9             vmulss %xmm1,%xmm2,%xmm1
  54:   c5 fa 59 c9             vmulss %xmm1,%xmm0,%xmm1
  58:   c5 e8 57 d2             vxorps %xmm2,%xmm2,%xmm2
  5c:   c5 fa c2 c2 00          vcmpeqss %xmm2,%xmm0,%xmm0
  61:   c5 f8 55 c1             vandnps %xmm1,%xmm0,%xmm0
  65:   c5 fa 11 04 24          vmovss %xmm0,(%esp)
  6a:   d9 04 24                flds   (%esp)
  6d:   58                      pop    %eax
  6e:   c3                      ret



===COMPILERS INFO======
$gcc -v
***
GNU C (GCC) version 4.9.4 20150703 (prerelease) (x86_64-unknown-linux-gnu)
***
$clang -v
***
clang version 3.7.0 (http://llvm.org/git/clang.git
51ef4b95aad99852e704dea9c172f72df4ecb5d1) (http://llvm.org/git/llvm.git
73aa02eb0979ae1d0643aee03c5d0c4b1926408f)
***


Sergey Gvozdarev
===============
Software Engineer
Intel Compiler Team

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150708/ed61852d/attachment.html>


More information about the llvm-bugs mailing list