[cfe-users] Why is my code 9 times slower with Clang than with gcc?
Radio młodych bandytów
radiomlodychbandytow at o2.pl
Sun Jul 7 11:56:30 PDT 2013
Hello.
I'm developing a hash function based on Bob Jenkins' one. From the
start, I used to compile it with gcc 4.9. Now I decided to try Clang 3.4
and was shocked to see that the results are just terrible. Now I wonder
what should I do to make Clang do fair here too.
I also tried Clang 3.1 and gcc 4.2.1 - the former was very slow, the
latter OK.
Detailed results:
Clang:
pcbsd-8973% sudo nice -n -10 ./SMHasher Spooky128
-------------------------------------------------------------------------------
--- Testing Spooky128 (Bob Jenkins' SpookyHash, 128-bit result)
[[[ Speed Tests ]]]
Bulk speed test - 262144-byte keys
Alignment 0 - 0.353 bytes/cycle - 1009.75 MB/sec @ 3 ghz
Alignment 1 - 0.372 bytes/cycle - 1063.31 MB/sec @ 3 ghz
Alignment 2 - 0.372 bytes/cycle - 1063.52 MB/sec @ 3 ghz
Alignment 3 - 0.372 bytes/cycle - 1063.23 MB/sec @ 3 ghz
Alignment 4 - 0.372 bytes/cycle - 1063.52 MB/sec @ 3 ghz
Alignment 5 - 0.372 bytes/cycle - 1063.29 MB/sec @ 3 ghz
Alignment 6 - 0.372 bytes/cycle - 1063.33 MB/sec @ 3 ghz
Alignment 7 - 0.372 bytes/cycle - 1063.52 MB/sec @ 3 ghz
Small key speed test - 1-byte keys - 215.77 cycles/hash
Small key speed test - 2-byte keys - 216.00 cycles/hash
Small key speed test - 3-byte keys - 216.00 cycles/hash
Small key speed test - 4-byte keys - 218.01 cycles/hash
Small key speed test - 5-byte keys - 219.00 cycles/hash
Small key speed test - 6-byte keys - 220.40 cycles/hash
Small key speed test - 7-byte keys - 225.79 cycles/hash
Small key speed test - 8-byte keys - 220.37 cycles/hash
Small key speed test - 9-byte keys - 220.96 cycles/hash
Small key speed test - 10-byte keys - 222.40 cycles/hash
Small key speed test - 11-byte keys - 228.68 cycles/hash
Small key speed test - 12-byte keys - 228.98 cycles/hash
Small key speed test - 13-byte keys - 230.40 cycles/hash
Small key speed test - 14-byte keys - 231.07 cycles/hash
Small key speed test - 15-byte keys - 232.00 cycles/hash
Small key speed test - 16-byte keys - 220.70 cycles/hash
Small key speed test - 17-byte keys - 216.00 cycles/hash
Small key speed test - 18-byte keys - 218.03 cycles/hash
Small key speed test - 19-byte keys - 223.99 cycles/hash
Small key speed test - 20-byte keys - 224.00 cycles/hash
Small key speed test - 21-byte keys - 225.97 cycles/hash
Small key speed test - 22-byte keys - 226.99 cycles/hash
Small key speed test - 23-byte keys - 233.85 cycles/hash
Small key speed test - 24-byte keys - 222.15 cycles/hash
Small key speed test - 25-byte keys - 218.99 cycles/hash
Small key speed test - 26-byte keys - 220.17 cycles/hash
Small key speed test - 27-byte keys - 225.97 cycles/hash
Small key speed test - 28-byte keys - 226.99 cycles/hash
Small key speed test - 29-byte keys - 228.23 cycles/hash
Small key speed test - 30-byte keys - 228.75 cycles/hash
Small key speed test - 31-byte keys - 235.19 cycles/hash
Input vcode 0x00000001, Output vcode 0x00000001, Result vcode 0x00000001
Verification value is 0x00000001 - Testing took 15.664062 seconds
-------------------------------------------------------------------------------
gcc:
pcbsd-8973% sudo nice -n -10 ./SMHasher Spooky128
-------------------------------------------------------------------------------
--- Testing Spooky128 (Bob Jenkins' SpookyHash, 128-bit result)
[[[ Speed Tests ]]]
Bulk speed test - 262144-byte keys
Alignment 0 - 3.316 bytes/cycle - 9486.77 MB/sec @ 3 ghz
Alignment 1 - 2.749 bytes/cycle - 7865.85 MB/sec @ 3 ghz
Alignment 2 - 2.749 bytes/cycle - 7865.18 MB/sec @ 3 ghz
Alignment 3 - 2.749 bytes/cycle - 7865.73 MB/sec @ 3 ghz
Alignment 4 - 2.750 bytes/cycle - 7867.80 MB/sec @ 3 ghz
Alignment 5 - 2.750 bytes/cycle - 7866.92 MB/sec @ 3 ghz
Alignment 6 - 2.750 bytes/cycle - 7867.24 MB/sec @ 3 ghz
Alignment 7 - 2.749 bytes/cycle - 7865.46 MB/sec @ 3 ghz
Small key speed test - 1-byte keys - 214.65 cycles/hash
Small key speed test - 2-byte keys - 220.98 cycles/hash
Small key speed test - 3-byte keys - 222.41 cycles/hash
Small key speed test - 4-byte keys - 223.90 cycles/hash
Small key speed test - 5-byte keys - 224.00 cycles/hash
Small key speed test - 6-byte keys - 225.19 cycles/hash
Small key speed test - 7-byte keys - 226.03 cycles/hash
Small key speed test - 8-byte keys - 210.04 cycles/hash
Small key speed test - 9-byte keys - 220.78 cycles/hash
Small key speed test - 10-byte keys - 222.40 cycles/hash
Small key speed test - 11-byte keys - 223.79 cycles/hash
Small key speed test - 12-byte keys - 224.00 cycles/hash
Small key speed test - 13-byte keys - 225.27 cycles/hash
Small key speed test - 14-byte keys - 226.03 cycles/hash
Small key speed test - 15-byte keys - 226.99 cycles/hash
Small key speed test - 16-byte keys - 211.00 cycles/hash
Small key speed test - 17-byte keys - 217.62 cycles/hash
Small key speed test - 18-byte keys - 222.00 cycles/hash
Small key speed test - 19-byte keys - 223.99 cycles/hash
Small key speed test - 20-byte keys - 225.31 cycles/hash
Small key speed test - 21-byte keys - 226.14 cycles/hash
Small key speed test - 22-byte keys - 227.19 cycles/hash
Small key speed test - 23-byte keys - 228.92 cycles/hash
Small key speed test - 24-byte keys - 212.25 cycles/hash
Small key speed test - 25-byte keys - 217.94 cycles/hash
Small key speed test - 26-byte keys - 218.99 cycles/hash
Small key speed test - 27-byte keys - 226.99 cycles/hash
Small key speed test - 28-byte keys - 227.56 cycles/hash
Small key speed test - 29-byte keys - 228.97 cycles/hash
Small key speed test - 30-byte keys - 228.92 cycles/hash
Small key speed test - 31-byte keys - 230.00 cycles/hash
Input vcode 0x00000001, Output vcode 0x00000001, Result vcode 0x00000001
Verification value is 0x00000001 - Testing took 11.031250 seconds
-------------------------------------------------------------------------------
The code is 300 lines long and relatively simple:
http://pastebin.com/zrqthX9c
Invocation:
http://pastebin.com/RvbdJwdE
Bundled with the benchmark that generated the numbers above:
http://www.multiupload.nl/2AXC4JYTL0
I run PC-BSD 9.1 on Phenom 2 @ 3.2 Ghz.
Is the problem in Clang or is it in my code? Is anybody willing to take
a look?
Regards,
--
Twoje radio
More information about the cfe-users
mailing list