[llvm-dev] Should llvm optimize 1.0 / x ?
Alexandre Bique via llvm-dev
llvm-dev at lists.llvm.org
Mon Aug 31 23:44:37 PDT 2020
Hi Quentin,
You are correct, I could manage to get clang to use vrcpps, but not in
a satisfying way:
clang++ -O3 -march=native -mtune=native \
-Rpass=loop-vectorize -Rpass-missed=loop-vectorize
-Rpass-analysis=loop-vectorize \
-ffast-math -ffp-model=fast -ffp-exception-behavior=ignore -ffp-contract=fast \
-c -o vec.o vec.cc
0000000000000140 <_Z4fct4Dv4_f>:
140: c5 f8 53 c8 vrcpps %xmm0,%xmm1
144: c4 e2 79 18 15 00 00 vbroadcastss 0x0(%rip),%xmm2 # 14d
<_Z4fct4Dv4_f+0xd>
14b: 00 00
14d: c4 e2 71 ac c2 vfnmadd213ps %xmm2,%xmm1,%xmm0
152: c4 e2 71 98 c1 vfmadd132ps %xmm1,%xmm1,%xmm0
157: c3 retq
158: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
15f: 00
0000000000000160 <_Z4fct5Dv4_f>:
160: c5 f8 53 c0 vrcpps %xmm0,%xmm0
164: c3 retq
As you can see, fct4 is not equivalent to fct5.
Regards,
Alexandre Bique
On Tue, Sep 1, 2020 at 12:59 AM Quentin Colombet <qcolombet at apple.com> wrote:
>
> Hi Alexandre,
>
> Have you tried to compile this with fast-math enabled (`-ffast-math` https://clang.llvm.org/docs/UsersManual.html#controlling-floating-point-behavior)?
>
> I would expect LLVM to require the `arcp` flag to perform this optimization (https://www.llvm.org/docs/LangRef.html#fast-math-flags).
>
> Cheers,
> -Quentin
>
>
> > On Aug 31, 2020, at 2:21 PM, Alexandre Bique via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> >
> > Hi,
> >
> > Here is a small C++ program:
> >
> > vec.cc:
> >
> > #include <cmath>
> >
> > using v4f32 = float __attribute__((__vector_size__(16)));
> >
> > v4f32 fct1(v4f32 x)
> > {
> > return 1.0 / x;
> > }
> >
> > v4f32 fct2(v4f32 x)
> > {
> > return __builtin_ia32_rcpps(x);
> > }
> >
> > Which is compiled to:
> >
> > vec.o: file format elf64-x86-64
> >
> >
> > Disassembly of section .text:
> >
> > 0000000000000000 <_Z4fct1Dv4_f>:
> > 0: c4 e2 79 18 0d 00 00 vbroadcastss 0x0(%rip),%xmm1 # 9
> > <_Z4fct1Dv4_f+0x9>
> > 7: 00 00
> > 9: c5 f0 5e c0 vdivps %xmm0,%xmm1,%xmm0
> > d: c3 retq
> > e: 66 90 xchg %ax,%ax
> >
> > 0000000000000010 <_Z4fct2Dv4_f>:
> > 10: c5 f8 53 c0 vrcpps %xmm0,%xmm0
> > 14: c3 retq
> >
> >
> > As you can see, 1.0 / x is not turned into vrcpps. Is it because of
> > precision or a missing optimization?
> >
> > Regards,
> > --
> > Alexandre Bique
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
More information about the llvm-dev
mailing list