[llvm-dev] Vectorization of math function failed?
Brian Cain via llvm-dev
llvm-dev at lists.llvm.org
Mon Aug 31 19:05:51 PDT 2020
If you're using clang you could try to see if it emits any hints about
optimizations using the remarks:
https://clang.llvm.org/docs/UsersManual.html#options-to-emit-optimization-reports
On Mon, Aug 31, 2020, 4:43 PM Alexandre Bique via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
> After reading
> https://llvm.org/docs/Vectorizers.html#vectorization-of-function-calls
> I decided to write the following C++ program:
>
> #include <cmath>
>
> using v4f32 = float __attribute__((__vector_size__(16)));
>
> v4f32 fct1(v4f32 x)
> {
> v4f32 y;
> y[0] = std::sin(x[0]);
> y[1] = std::sin(x[1]);
> y[2] = std::sin(x[2]);
> y[3] = std::sin(x[3]);
> return y;
> }
>
> v4f32 fct2(v4f32 x)
> {
> v4f32 y;
> for (int i = 0; i < 4; ++i)
> y[i] = std::sin(x[i]);
> return y;
> }
>
> void fct3(float *x)
> {
> #pragma clang loop vectorize(enable)
> for (int i = 0; i < 16; ++i)
> x[i] = sinf(x[i]);
> }
>
> Which I compiled with: clang++ -O3 -march=native -mtune=native -c -o
> vec.o vec.cc -lmvec -fno-math-errno
>
> And here is what I get:
>
> vec.o: file format elf64-x86-64
>
>
> Disassembly of section .text:
>
> 0000000000000000 <_Z4fct1Dv4_f>:
> 0: 48 83 ec 48 sub $0x48,%rsp
> 4: c5 f8 29 04 24 vmovaps %xmm0,(%rsp)
> 9: e8 00 00 00 00 callq e <_Z4fct1Dv4_f+0xe>
> e: c5 f8 29 44 24 30 vmovaps %xmm0,0x30(%rsp)
> 14: c5 fa 16 04 24 vmovshdup (%rsp),%xmm0
> 19: e8 00 00 00 00 callq 1e <_Z4fct1Dv4_f+0x1e>
> 1e: c5 f8 29 44 24 20 vmovaps %xmm0,0x20(%rsp)
> 24: c4 e3 79 05 04 24 01 vpermilpd $0x1,(%rsp),%xmm0
> 2b: e8 00 00 00 00 callq 30 <_Z4fct1Dv4_f+0x30>
> 30: c5 f9 29 44 24 10 vmovapd %xmm0,0x10(%rsp)
> 36: c4 e3 79 04 04 24 e7 vpermilps $0xe7,(%rsp),%xmm0
> 3d: e8 00 00 00 00 callq 42 <_Z4fct1Dv4_f+0x42>
> 42: c5 f8 28 4c 24 30 vmovaps 0x30(%rsp),%xmm1
> 48: c4 e3 71 21 4c 24 20 vinsertps $0x10,0x20(%rsp),%xmm1,%xmm1
> 4f: 10
> 50: c4 e3 71 21 4c 24 10 vinsertps $0x20,0x10(%rsp),%xmm1,%xmm1
> 57: 20
> 58: c4 e3 71 21 c0 30 vinsertps $0x30,%xmm0,%xmm1,%xmm0
> 5e: 48 83 c4 48 add $0x48,%rsp
> 62: c3 retq
> 63: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
> 6a: 00 00 00
> 6d: 0f 1f 00 nopl (%rax)
>
> 0000000000000070 <_Z4fct2Dv4_f>:
> 70: 48 83 ec 48 sub $0x48,%rsp
> 74: c5 f8 29 04 24 vmovaps %xmm0,(%rsp)
> 79: e8 00 00 00 00 callq 7e <_Z4fct2Dv4_f+0xe>
> 7e: c5 f8 29 44 24 30 vmovaps %xmm0,0x30(%rsp)
> 84: c5 fa 16 04 24 vmovshdup (%rsp),%xmm0
> 89: e8 00 00 00 00 callq 8e <_Z4fct2Dv4_f+0x1e>
> 8e: c5 f8 29 44 24 20 vmovaps %xmm0,0x20(%rsp)
> 94: c4 e3 79 05 04 24 01 vpermilpd $0x1,(%rsp),%xmm0
> 9b: e8 00 00 00 00 callq a0 <_Z4fct2Dv4_f+0x30>
> a0: c5 f9 29 44 24 10 vmovapd %xmm0,0x10(%rsp)
> a6: c4 e3 79 04 04 24 e7 vpermilps $0xe7,(%rsp),%xmm0
> ad: e8 00 00 00 00 callq b2 <_Z4fct2Dv4_f+0x42>
> b2: c5 f8 28 4c 24 30 vmovaps 0x30(%rsp),%xmm1
> b8: c4 e3 71 21 4c 24 20 vinsertps $0x10,0x20(%rsp),%xmm1,%xmm1
> bf: 10
> c0: c4 e3 71 21 4c 24 10 vinsertps $0x20,0x10(%rsp),%xmm1,%xmm1
> c7: 20
> c8: c4 e3 71 21 c0 30 vinsertps $0x30,%xmm0,%xmm1,%xmm0
> ce: 48 83 c4 48 add $0x48,%rsp
> d2: c3 retq
> d3: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
> da: 00 00 00
> dd: 0f 1f 00 nopl (%rax)
>
> 00000000000000e0 <_Z4fct3Pf>:
> e0: 53 push %rbx
> e1: 48 83 ec 10 sub $0x10,%rsp
> e5: 48 89 fb mov %rdi,%rbx
> e8: c5 fa 10 07 vmovss (%rdi),%xmm0
> ec: c5 fa 10 4f 04 vmovss 0x4(%rdi),%xmm1
> f1: c5 fa 11 4c 24 0c vmovss %xmm1,0xc(%rsp)
> f7: e8 00 00 00 00 callq fc <_Z4fct3Pf+0x1c>
> fc: c5 fa 11 03 vmovss %xmm0,(%rbx)
> 100: c5 fa 10 44 24 0c vmovss 0xc(%rsp),%xmm0
> 106: e8 00 00 00 00 callq 10b <_Z4fct3Pf+0x2b>
> 10b: c5 fa 11 43 04 vmovss %xmm0,0x4(%rbx)
> 110: c5 fa 10 43 08 vmovss 0x8(%rbx),%xmm0
> 115: e8 00 00 00 00 callq 11a <_Z4fct3Pf+0x3a>
> 11a: c5 fa 11 43 08 vmovss %xmm0,0x8(%rbx)
> 11f: c5 fa 10 43 0c vmovss 0xc(%rbx),%xmm0
> 124: e8 00 00 00 00 callq 129 <_Z4fct3Pf+0x49>
> 129: c5 fa 11 43 0c vmovss %xmm0,0xc(%rbx)
> 12e: c5 fa 10 43 10 vmovss 0x10(%rbx),%xmm0
> 133: e8 00 00 00 00 callq 138 <_Z4fct3Pf+0x58>
> 138: c5 fa 11 43 10 vmovss %xmm0,0x10(%rbx)
> 13d: c5 fa 10 43 14 vmovss 0x14(%rbx),%xmm0
> 142: e8 00 00 00 00 callq 147 <_Z4fct3Pf+0x67>
> 147: c5 fa 11 43 14 vmovss %xmm0,0x14(%rbx)
> 14c: c5 fa 10 43 18 vmovss 0x18(%rbx),%xmm0
> 151: e8 00 00 00 00 callq 156 <_Z4fct3Pf+0x76>
> 156: c5 fa 11 43 18 vmovss %xmm0,0x18(%rbx)
> 15b: c5 fa 10 43 1c vmovss 0x1c(%rbx),%xmm0
> 160: e8 00 00 00 00 callq 165 <_Z4fct3Pf+0x85>
> 165: c5 fa 11 43 1c vmovss %xmm0,0x1c(%rbx)
> 16a: c5 fa 10 43 20 vmovss 0x20(%rbx),%xmm0
> 16f: e8 00 00 00 00 callq 174 <_Z4fct3Pf+0x94>
> 174: c5 fa 11 43 20 vmovss %xmm0,0x20(%rbx)
> 179: c5 fa 10 43 24 vmovss 0x24(%rbx),%xmm0
> 17e: e8 00 00 00 00 callq 183 <_Z4fct3Pf+0xa3>
> 183: c5 fa 11 43 24 vmovss %xmm0,0x24(%rbx)
> 188: c5 fa 10 43 28 vmovss 0x28(%rbx),%xmm0
> 18d: e8 00 00 00 00 callq 192 <_Z4fct3Pf+0xb2>
> 192: c5 fa 11 43 28 vmovss %xmm0,0x28(%rbx)
> 197: c5 fa 10 43 2c vmovss 0x2c(%rbx),%xmm0
> 19c: e8 00 00 00 00 callq 1a1 <_Z4fct3Pf+0xc1>
> 1a1: c5 fa 11 43 2c vmovss %xmm0,0x2c(%rbx)
> 1a6: c5 fa 10 43 30 vmovss 0x30(%rbx),%xmm0
> 1ab: e8 00 00 00 00 callq 1b0 <_Z4fct3Pf+0xd0>
> 1b0: c5 fa 11 43 30 vmovss %xmm0,0x30(%rbx)
> 1b5: c5 fa 10 43 34 vmovss 0x34(%rbx),%xmm0
> 1ba: e8 00 00 00 00 callq 1bf <_Z4fct3Pf+0xdf>
> 1bf: c5 fa 11 43 34 vmovss %xmm0,0x34(%rbx)
> 1c4: c5 fa 10 43 38 vmovss 0x38(%rbx),%xmm0
> 1c9: e8 00 00 00 00 callq 1ce <_Z4fct3Pf+0xee>
> 1ce: c5 fa 11 43 38 vmovss %xmm0,0x38(%rbx)
> 1d3: c5 fa 10 43 3c vmovss 0x3c(%rbx),%xmm0
> 1d8: e8 00 00 00 00 callq 1dd <_Z4fct3Pf+0xfd>
> 1dd: c5 fa 11 43 3c vmovss %xmm0,0x3c(%rbx)
> 1e2: 48 83 c4 10 add $0x10,%rsp
> 1e6: 5b pop %rbx
> 1e7: c3 retq
>
> As you can see there is no call to a vectorized version of sin.
> Did I do something wrong?
>
> By the way I am on Linux with glibc 2.32 which has libmvec.
>
> Regards,
> --
> Alexandre Bique
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200831/a4d03079/attachment.html>
More information about the llvm-dev
mailing list