[cfe-users] Why is my code 9 times slower with Clang than with gcc?

Radio młodych bandytów radiomlodychbandytow at o2.pl
Sun Jul 7 11:56:30 PDT 2013


Hello.
I'm developing a hash function based on Bob Jenkins' one. From the
start, I used to compile it with gcc 4.9. Now I decided to try Clang 3.4
and was shocked to see that the results are just terrible. Now I wonder
what should I do to make Clang do fair here too.
I also tried Clang 3.1 and gcc 4.2.1 - the former was very slow, the
latter OK.
Detailed results:
Clang:
pcbsd-8973% sudo nice -n -10 ./SMHasher Spooky128
-------------------------------------------------------------------------------
--- Testing Spooky128 (Bob Jenkins' SpookyHash, 128-bit result)

[[[ Speed Tests ]]]

Bulk speed test - 262144-byte keys
Alignment  0 -  0.353 bytes/cycle - 1009.75 MB/sec @ 3 ghz
Alignment  1 -  0.372 bytes/cycle - 1063.31 MB/sec @ 3 ghz
Alignment  2 -  0.372 bytes/cycle - 1063.52 MB/sec @ 3 ghz
Alignment  3 -  0.372 bytes/cycle - 1063.23 MB/sec @ 3 ghz
Alignment  4 -  0.372 bytes/cycle - 1063.52 MB/sec @ 3 ghz
Alignment  5 -  0.372 bytes/cycle - 1063.29 MB/sec @ 3 ghz
Alignment  6 -  0.372 bytes/cycle - 1063.33 MB/sec @ 3 ghz
Alignment  7 -  0.372 bytes/cycle - 1063.52 MB/sec @ 3 ghz

Small key speed test -    1-byte keys -   215.77 cycles/hash
Small key speed test -    2-byte keys -   216.00 cycles/hash
Small key speed test -    3-byte keys -   216.00 cycles/hash
Small key speed test -    4-byte keys -   218.01 cycles/hash
Small key speed test -    5-byte keys -   219.00 cycles/hash
Small key speed test -    6-byte keys -   220.40 cycles/hash
Small key speed test -    7-byte keys -   225.79 cycles/hash
Small key speed test -    8-byte keys -   220.37 cycles/hash
Small key speed test -    9-byte keys -   220.96 cycles/hash
Small key speed test -   10-byte keys -   222.40 cycles/hash
Small key speed test -   11-byte keys -   228.68 cycles/hash
Small key speed test -   12-byte keys -   228.98 cycles/hash
Small key speed test -   13-byte keys -   230.40 cycles/hash
Small key speed test -   14-byte keys -   231.07 cycles/hash
Small key speed test -   15-byte keys -   232.00 cycles/hash
Small key speed test -   16-byte keys -   220.70 cycles/hash
Small key speed test -   17-byte keys -   216.00 cycles/hash
Small key speed test -   18-byte keys -   218.03 cycles/hash
Small key speed test -   19-byte keys -   223.99 cycles/hash
Small key speed test -   20-byte keys -   224.00 cycles/hash
Small key speed test -   21-byte keys -   225.97 cycles/hash
Small key speed test -   22-byte keys -   226.99 cycles/hash
Small key speed test -   23-byte keys -   233.85 cycles/hash
Small key speed test -   24-byte keys -   222.15 cycles/hash
Small key speed test -   25-byte keys -   218.99 cycles/hash
Small key speed test -   26-byte keys -   220.17 cycles/hash
Small key speed test -   27-byte keys -   225.97 cycles/hash
Small key speed test -   28-byte keys -   226.99 cycles/hash
Small key speed test -   29-byte keys -   228.23 cycles/hash
Small key speed test -   30-byte keys -   228.75 cycles/hash
Small key speed test -   31-byte keys -   235.19 cycles/hash


Input vcode 0x00000001, Output vcode 0x00000001, Result vcode 0x00000001
Verification value is 0x00000001 - Testing took 15.664062 seconds
-------------------------------------------------------------------------------

gcc:
pcbsd-8973% sudo nice -n -10 ./SMHasher Spooky128
-------------------------------------------------------------------------------
--- Testing Spooky128 (Bob Jenkins' SpookyHash, 128-bit result)

[[[ Speed Tests ]]]

Bulk speed test - 262144-byte keys
Alignment  0 -  3.316 bytes/cycle - 9486.77 MB/sec @ 3 ghz
Alignment  1 -  2.749 bytes/cycle - 7865.85 MB/sec @ 3 ghz
Alignment  2 -  2.749 bytes/cycle - 7865.18 MB/sec @ 3 ghz
Alignment  3 -  2.749 bytes/cycle - 7865.73 MB/sec @ 3 ghz
Alignment  4 -  2.750 bytes/cycle - 7867.80 MB/sec @ 3 ghz
Alignment  5 -  2.750 bytes/cycle - 7866.92 MB/sec @ 3 ghz
Alignment  6 -  2.750 bytes/cycle - 7867.24 MB/sec @ 3 ghz
Alignment  7 -  2.749 bytes/cycle - 7865.46 MB/sec @ 3 ghz

Small key speed test -    1-byte keys -   214.65 cycles/hash
Small key speed test -    2-byte keys -   220.98 cycles/hash
Small key speed test -    3-byte keys -   222.41 cycles/hash
Small key speed test -    4-byte keys -   223.90 cycles/hash
Small key speed test -    5-byte keys -   224.00 cycles/hash
Small key speed test -    6-byte keys -   225.19 cycles/hash
Small key speed test -    7-byte keys -   226.03 cycles/hash
Small key speed test -    8-byte keys -   210.04 cycles/hash
Small key speed test -    9-byte keys -   220.78 cycles/hash
Small key speed test -   10-byte keys -   222.40 cycles/hash
Small key speed test -   11-byte keys -   223.79 cycles/hash
Small key speed test -   12-byte keys -   224.00 cycles/hash
Small key speed test -   13-byte keys -   225.27 cycles/hash
Small key speed test -   14-byte keys -   226.03 cycles/hash
Small key speed test -   15-byte keys -   226.99 cycles/hash
Small key speed test -   16-byte keys -   211.00 cycles/hash
Small key speed test -   17-byte keys -   217.62 cycles/hash
Small key speed test -   18-byte keys -   222.00 cycles/hash
Small key speed test -   19-byte keys -   223.99 cycles/hash
Small key speed test -   20-byte keys -   225.31 cycles/hash
Small key speed test -   21-byte keys -   226.14 cycles/hash
Small key speed test -   22-byte keys -   227.19 cycles/hash
Small key speed test -   23-byte keys -   228.92 cycles/hash
Small key speed test -   24-byte keys -   212.25 cycles/hash
Small key speed test -   25-byte keys -   217.94 cycles/hash
Small key speed test -   26-byte keys -   218.99 cycles/hash
Small key speed test -   27-byte keys -   226.99 cycles/hash
Small key speed test -   28-byte keys -   227.56 cycles/hash
Small key speed test -   29-byte keys -   228.97 cycles/hash
Small key speed test -   30-byte keys -   228.92 cycles/hash
Small key speed test -   31-byte keys -   230.00 cycles/hash


Input vcode 0x00000001, Output vcode 0x00000001, Result vcode 0x00000001
Verification value is 0x00000001 - Testing took 11.031250 seconds
-------------------------------------------------------------------------------


The code is 300 lines long and relatively simple:
http://pastebin.com/zrqthX9c
Invocation:
http://pastebin.com/RvbdJwdE

Bundled with the benchmark that generated the numbers above:
http://www.multiupload.nl/2AXC4JYTL0

I run PC-BSD 9.1 on Phenom 2 @ 3.2 Ghz.

Is the problem in Clang or is it in my code? Is anybody willing to take
a look?

Regards,
-- 
Twoje radio



More information about the cfe-users mailing list