[cfe-dev] Performance problem with SIMD support

Eric Christopher echristo at gmail.com
Fri Sep 6 14:01:29 PDT 2013


> With G++ 4.5.1, the test case runs in 69 sec. with SIMD and 84 sec. without
> SIMD.
>
> With C++ 3.3, the same test case runs in 73 sec. with SIMD and 64 sec.
> without SIMD.
>
> We discovered that the function gcopy2 was at the top of the profiler's
> list, and fcopy2 and dcopy2 were also in the top 5.  A stack trace pointed
> to our SIMD code as the caller, and this indicated we should try compiling
> without the SIMD code.
>
> Before I spend too much more time with various possibilities, can anyone
> comment on this issue?

It'd be good to see a testcase that shows the problem. We're
definitely interested in optimizing this path.

>
> Perhaps we should be using __builtin_ functions, when they are available,
> and _mm_ functions only when the __builtin_ forms are not available.
>

We'd prefer not. Basically the __builtin forms are basically
equivalent to inline asm. The idea behind using only the _mm_*
versions is that the code is also capable of being optimized.

> Is there something that could be improved in Clang's SIMD support?
>

Probably if you're having this problem.

-eric



More information about the cfe-dev mailing list