[cfe-dev] Performance problem with SIMD support

Richard Hadsell hadsell at blueskystudios.com
Fri Sep 6 13:43:20 PDT 2013


I am comparing the performance of our code generated with Clang++ 3.3 and with G++ 4.5.1 for Linux x86_64 (Fedora 14).

Clang++ code performs generally a bit better than G++ code, but one of the differences in a few test cases is attributable to some SIMD code.  It uses SSE2 and SSE4.1 instructions to accelerate some small functions that are used often.

In January I reported lack of support for __builtin_ia32_blendvpd as a bug.  I learned that we have to use _mm_blendv_pd from xmmintrin.h instead of the __builtin_ form.  This was the final comment from Eli Friedman : The _mm_ forms are preferred because 
they are standardized; we consider the __builtin_ versions an implementation detail.

I converted all of our code to use the _mm_ forms for Clang++ builds.  Now I have discovered that our code actually runs more slowly with the SIMD instructions than without.

With G++ 4.5.1, the test case runs in 69 sec. with SIMD and 84 sec. without SIMD.

With C++ 3.3, the same test case runs in 73 sec. with SIMD and 64 sec. without SIMD.

We discovered that the function gcopy2 was at the top of the profiler's list, and fcopy2 and dcopy2 were also in the top 5.  A stack trace pointed to our SIMD code as the caller, and this indicated we should try compiling without the SIMD code.

Before I spend too much more time with various possibilities, can anyone comment on this issue?

Perhaps we should be using __builtin_ functions, when they are available, and _mm_ functions only when the __builtin_ forms are not available.

Is there something that could be improved in Clang's SIMD support?

-- 
Dick Hadsell			203-992-6320  Fax: 203-992-6001
Reply-to:			hadsell at blueskystudios.com
Blue Sky Studios                http://www.blueskystudios.com
1 American Lane, Greenwich, CT 06831-2560




More information about the cfe-dev mailing list