[llvm-dev] Should llvm optimize 1.0 / x ?

Mon Aug 31 14:21:12 PDT 2020

Hi,

Here is a small C++ program:

vec.cc:

#include <cmath>

using v4f32 = float __attribute__((__vector_size__(16)));

v4f32 fct1(v4f32 x)
{
  return 1.0 / x;
}

v4f32 fct2(v4f32 x)
{
  return __builtin_ia32_rcpps(x);
}

Which is compiled to:

vec.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <_Z4fct1Dv4_f>:
   0: c4 e2 79 18 0d 00 00 vbroadcastss 0x0(%rip),%xmm1        # 9
<_Z4fct1Dv4_f+0x9>
   7: 00 00
   9: c5 f0 5e c0          vdivps %xmm0,%xmm1,%xmm0
   d: c3                    retq
   e: 66 90                xchg   %ax,%ax

0000000000000010 <_Z4fct2Dv4_f>:
  10: c5 f8 53 c0          vrcpps %xmm0,%xmm0
  14: c3                    retq

As you can see, 1.0 / x is not turned into vrcpps. Is it because of
precision or a missing optimization?

Regards,
-- 
Alexandre Bique