[cfe-dev] Performance problem with SIMD support
Eric Christopher
echristo at gmail.com
Fri Sep 6 14:01:29 PDT 2013
> With G++ 4.5.1, the test case runs in 69 sec. with SIMD and 84 sec. without
> SIMD.
>
> With C++ 3.3, the same test case runs in 73 sec. with SIMD and 64 sec.
> without SIMD.
>
> We discovered that the function gcopy2 was at the top of the profiler's
> list, and fcopy2 and dcopy2 were also in the top 5. A stack trace pointed
> to our SIMD code as the caller, and this indicated we should try compiling
> without the SIMD code.
>
> Before I spend too much more time with various possibilities, can anyone
> comment on this issue?
It'd be good to see a testcase that shows the problem. We're
definitely interested in optimizing this path.
>
> Perhaps we should be using __builtin_ functions, when they are available,
> and _mm_ functions only when the __builtin_ forms are not available.
>
We'd prefer not. Basically the __builtin forms are basically
equivalent to inline asm. The idea behind using only the _mm_*
versions is that the code is also capable of being optimized.
> Is there something that could be improved in Clang's SIMD support?
>
Probably if you're having this problem.
-eric
More information about the cfe-dev
mailing list