[llvm-dev] Should llvm optimize 1.0 / x ?

Mon Aug 31 23:44:37 PDT 2020

Hi Quentin,

You are correct, I could manage to get clang to use vrcpps, but not in
a satisfying way:

clang++ -O3 -march=native -mtune=native \
-Rpass=loop-vectorize -Rpass-missed=loop-vectorize
-Rpass-analysis=loop-vectorize \
-ffast-math -ffp-model=fast -ffp-exception-behavior=ignore -ffp-contract=fast \
-c -o vec.o vec.cc

0000000000000140 <_Z4fct4Dv4_f>:
 140: c5 f8 53 c8          vrcpps %xmm0,%xmm1
 144: c4 e2 79 18 15 00 00 vbroadcastss 0x0(%rip),%xmm2        # 14d
<_Z4fct4Dv4_f+0xd>
 14b: 00 00
 14d: c4 e2 71 ac c2        vfnmadd213ps %xmm2,%xmm1,%xmm0
 152: c4 e2 71 98 c1        vfmadd132ps %xmm1,%xmm1,%xmm0
 157: c3                    retq
 158: 0f 1f 84 00 00 00 00 nopl   0x0(%rax,%rax,1)
 15f: 00

0000000000000160 <_Z4fct5Dv4_f>:
 160: c5 f8 53 c0          vrcpps %xmm0,%xmm0
 164: c3                    retq

As you can see, fct4 is not equivalent to fct5.

Regards,
Alexandre Bique

On Tue, Sep 1, 2020 at 12:59 AM Quentin Colombet <qcolombet at apple.com> wrote:
>
> Hi Alexandre,
>
> Have you tried to compile this with fast-math enabled (`-ffast-math` https://clang.llvm.org/docs/UsersManual.html#controlling-floating-point-behavior)?
>
> I would expect LLVM to require the `arcp` flag to perform this optimization (https://www.llvm.org/docs/LangRef.html#fast-math-flags).
>
> Cheers,
> -Quentin
>
>
> > On Aug 31, 2020, at 2:21 PM, Alexandre Bique via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> >
> > Hi,
> >
> > Here is a small C++ program:
> >
> > vec.cc:
> >
> > #include <cmath>
> >
> > using v4f32 = float __attribute__((__vector_size__(16)));
> >
> > v4f32 fct1(v4f32 x)
> > {
> >  return 1.0 / x;
> > }
> >
> > v4f32 fct2(v4f32 x)
> > {
> >  return __builtin_ia32_rcpps(x);
> > }
> >
> > Which is compiled to:
> >
> > vec.o:     file format elf64-x86-64
> >
> >
> > Disassembly of section .text:
> >
> > 0000000000000000 <_Z4fct1Dv4_f>:
> >   0: c4 e2 79 18 0d 00 00 vbroadcastss 0x0(%rip),%xmm1        # 9
> > <_Z4fct1Dv4_f+0x9>
> >   7: 00 00
> >   9: c5 f0 5e c0          vdivps %xmm0,%xmm1,%xmm0
> >   d: c3                    retq
> >   e: 66 90                xchg   %ax,%ax
> >
> > 0000000000000010 <_Z4fct2Dv4_f>:
> >  10: c5 f8 53 c0          vrcpps %xmm0,%xmm0
> >  14: c3                    retq
> >
> >
> > As you can see, 1.0 / x is not turned into vrcpps. Is it because of
> > precision or a missing optimization?
> >
> > Regards,
> > --
> > Alexandre Bique
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>