[cfe-dev] Performance problem with SIMD support

Richard Hadsell hadsell at blueskystudios.com
Fri Sep 6 14:16:21 PDT 2013


On 09/06/2013 05:01 PM, Eric Christopher wrote:
>> With G++ 4.5.1, the test case runs in 69 sec. with SIMD and 84 sec. without
>> SIMD.
>>
>> With C++ 3.3, the same test case runs in 73 sec. with SIMD and 64 sec.
>> without SIMD.
>>
>> We discovered that the function gcopy2 was at the top of the profiler's
>> list, and fcopy2 and dcopy2 were also in the top 5.  A stack trace pointed
>> to our SIMD code as the caller, and this indicated we should try compiling
>> without the SIMD code.
>>
>> Before I spend too much more time with various possibilities, can anyone
>> comment on this issue?
> It'd be good to see a testcase that shows the problem. We're
> definitely interested in optimizing this path.
>
>> Perhaps we should be using __builtin_ functions, when they are available,
>> and _mm_ functions only when the __builtin_ forms are not available.
> We'd prefer not. Basically the __builtin forms are basically
> equivalent to inline asm. The idea behind using only the _mm_*
> versions is that the code is also capable of being optimized.
>
>> Is there something that could be improved in Clang's SIMD support?
> Probably if you're having this problem.
I'll see what I can do about a simpler test case.

I just talked with a colleague more familiar with our SIMD code, and he pointed out that our function called in this test case is using asm code for the SIMD instructions, not the intrinsic functions.  My guess that the difference was due to _mm_ functions 
was wrong.  I apologize for jumping to the wrong conclusion.

This colleague thinks the problem might be related to data packing, which compilers could handle differently.  I will try to send our code in a case that demonstrates the performance issue.



More information about the cfe-dev mailing list