[llvm-dev] Should llvm optimize 1.0 / x ?

Mon Aug 31 15:59:19 PDT 2020

Hi Alexandre,

Have you tried to compile this with fast-math enabled (`-ffast-math` https://clang.llvm.org/docs/UsersManual.html#controlling-floating-point-behavior)?

I would expect LLVM to require the `arcp` flag to perform this optimization (https://www.llvm.org/docs/LangRef.html#fast-math-flags).

Cheers,
-Quentin

> On Aug 31, 2020, at 2:21 PM, Alexandre Bique via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Hi,
> 
> Here is a small C++ program:
> 
> vec.cc:
> 
> #include <cmath>
> 
> using v4f32 = float __attribute__((__vector_size__(16)));
> 
> v4f32 fct1(v4f32 x)
> {
>  return 1.0 / x;
> }
> 
> v4f32 fct2(v4f32 x)
> {
>  return __builtin_ia32_rcpps(x);
> }
> 
> Which is compiled to:
> 
> vec.o:     file format elf64-x86-64
> 
> 
> Disassembly of section .text:
> 
> 0000000000000000 <_Z4fct1Dv4_f>:
>   0: c4 e2 79 18 0d 00 00 vbroadcastss 0x0(%rip),%xmm1        # 9
> <_Z4fct1Dv4_f+0x9>
>   7: 00 00
>   9: c5 f0 5e c0          vdivps %xmm0,%xmm1,%xmm0
>   d: c3                    retq
>   e: 66 90                xchg   %ax,%ax
> 
> 0000000000000010 <_Z4fct2Dv4_f>:
>  10: c5 f8 53 c0          vrcpps %xmm0,%xmm0
>  14: c3                    retq
> 
> 
> As you can see, 1.0 / x is not turned into vrcpps. Is it because of
> precision or a missing optimization?
> 
> Regards,
> -- 
> Alexandre Bique
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev