[cfe-users] Why is my code 9 times slower with Clang than with gcc?

Radio młodych bandytów radiomlodychbandytow at o2.pl
Sun Jul 7 22:15:12 PDT 2013


Wow...that's it:
pcbsd-8973% ./SMHasher Spooky128
-------------------------------------------------------------------------------
--- Testing Spooky128 (Bob Jenkins' SpookyHash, 128-bit result)

[[[ Speed Tests ]]]

Bulk speed test - 262144-byte keys
Alignment  0 -  2.625 bytes/cycle - 7510.45 MB/sec @ 3 ghz
Alignment  1 -  2.334 bytes/cycle - 6678.03 MB/sec @ 3 ghz
Alignment  2 -  2.334 bytes/cycle - 6678.08 MB/sec @ 3 ghz
Alignment  3 -  2.334 bytes/cycle - 6678.04 MB/sec @ 3 ghz
Alignment  4 -  2.334 bytes/cycle - 6678.11 MB/sec @ 3 ghz
Alignment  5 -  2.334 bytes/cycle - 6678.11 MB/sec @ 3 ghz
Alignment  6 -  2.334 bytes/cycle - 6678.11 MB/sec @ 3 ghz
Alignment  7 -  2.334 bytes/cycle - 6678.11 MB/sec @ 3 ghz

Small key speed test -    1-byte keys -   206.40 cycles/hash
Small key speed test -    2-byte keys -   207.99 cycles/hash
Small key speed test -    3-byte keys -   208.00 cycles/hash
Small key speed test -    4-byte keys -   210.00 cycles/hash
Small key speed test -    5-byte keys -   215.86 cycles/hash
Small key speed test -    6-byte keys -   215.99 cycles/hash
Small key speed test -    7-byte keys -   217.95 cycles/hash
Small key speed test -    8-byte keys -   206.40 cycles/hash
Small key speed test -    9-byte keys -   206.99 cycles/hash
Small key speed test -   10-byte keys -   208.00 cycles/hash
Small key speed test -   11-byte keys -   213.99 cycles/hash
Small key speed test -   12-byte keys -   212.25 cycles/hash
Small key speed test -   13-byte keys -   215.99 cycles/hash
Small key speed test -   14-byte keys -   215.99 cycles/hash
Small key speed test -   15-byte keys -   217.98 cycles/hash
Small key speed test -   16-byte keys -   207.12 cycles/hash
Small key speed test -   17-byte keys -   208.00 cycles/hash
Small key speed test -   18-byte keys -   209.59 cycles/hash
Small key speed test -   19-byte keys -   210.99 cycles/hash
Small key speed test -   20-byte keys -   212.94 cycles/hash
Small key speed test -   21-byte keys -   212.66 cycles/hash
Small key speed test -   22-byte keys -   218.99 cycles/hash
Small key speed test -   23-byte keys -   220.34 cycles/hash
Small key speed test -   24-byte keys -   207.99 cycles/hash
Small key speed test -   25-byte keys -   209.23 cycles/hash
Small key speed test -   26-byte keys -   211.20 cycles/hash
Small key speed test -   27-byte keys -   212.75 cycles/hash
Small key speed test -   28-byte keys -   211.62 cycles/hash
Small key speed test -   29-byte keys -   213.00 cycles/hash
Small key speed test -   30-byte keys -   219.00 cycles/hash
Small key speed test -   31-byte keys -   218.99 cycles/hash


Input vcode 0x00000001, Output vcode 0x00000001, Result vcode 0x00000001
Verification value is 0x00000001 - Testing took 10.695312 seconds
-------------------------------------------------------------------------------

Could you please tell what difference does the namespace make?
If you (or anybody) are still interested, the main() (with lots of other
code making up the benchmark) is here:
http://www.multiupload.nl/2AXC4JYTL0

Thanks,
Twoje radio

On 08/07/2013 01:12, Nick Lewycky wrote:
> Here's something to try: wrap template class SpookyHash in an anonymous
> namespace. What impact does this have on performance?
> 
> You didn't include a main() function so I can't run it and see concrete
> numbers. I think the problem is that it looks like the code is manually
> unrolled in parts (h0 through h11?!) and in turn that's causing the
> functions to be so big that llvm is refusing to inline them.
> 
> Nick
> 
> On 7 July 2013 11:56, Radio młodych bandytów <radiomlodychbandytow at o2.pl
> <mailto:radiomlodychbandytow at o2.pl>> wrote:
> 
>     Hello.
>     I'm developing a hash function based on Bob Jenkins' one. From the
>     start, I used to compile it with gcc 4.9. Now I decided to try Clang 3.4
>     and was shocked to see that the results are just terrible. Now I wonder
>     what should I do to make Clang do fair here too.
>     I also tried Clang 3.1 and gcc 4.2.1 - the former was very slow, the
>     latter OK.
>     Detailed results:
>     Clang:
>     pcbsd-8973% sudo nice -n -10 ./SMHasher Spooky128
>     -------------------------------------------------------------------------------
>     --- Testing Spooky128 (Bob Jenkins' SpookyHash, 128-bit result)
> 
>     [[[ Speed Tests ]]]
> 
>     Bulk speed test - 262144-byte keys
>     Alignment  0 -  0.353 bytes/cycle - 1009.75 MB/sec @ 3 ghz
>     Alignment  1 -  0.372 bytes/cycle - 1063.31 MB/sec @ 3 ghz
>     Alignment  2 -  0.372 bytes/cycle - 1063.52 MB/sec @ 3 ghz
>     Alignment  3 -  0.372 bytes/cycle - 1063.23 MB/sec @ 3 ghz
>     Alignment  4 -  0.372 bytes/cycle - 1063.52 MB/sec @ 3 ghz
>     Alignment  5 -  0.372 bytes/cycle - 1063.29 MB/sec @ 3 ghz
>     Alignment  6 -  0.372 bytes/cycle - 1063.33 MB/sec @ 3 ghz
>     Alignment  7 -  0.372 bytes/cycle - 1063.52 MB/sec @ 3 ghz
> 
>     Small key speed test -    1-byte keys -   215.77 cycles/hash
>     Small key speed test -    2-byte keys -   216.00 cycles/hash
>     Small key speed test -    3-byte keys -   216.00 cycles/hash
>     Small key speed test -    4-byte keys -   218.01 cycles/hash
>     Small key speed test -    5-byte keys -   219.00 cycles/hash
>     Small key speed test -    6-byte keys -   220.40 cycles/hash
>     Small key speed test -    7-byte keys -   225.79 cycles/hash
>     Small key speed test -    8-byte keys -   220.37 cycles/hash
>     Small key speed test -    9-byte keys -   220.96 cycles/hash
>     Small key speed test -   10-byte keys -   222.40 cycles/hash
>     Small key speed test -   11-byte keys -   228.68 cycles/hash
>     Small key speed test -   12-byte keys -   228.98 cycles/hash
>     Small key speed test -   13-byte keys -   230.40 cycles/hash
>     Small key speed test -   14-byte keys -   231.07 cycles/hash
>     Small key speed test -   15-byte keys -   232.00 cycles/hash
>     Small key speed test -   16-byte keys -   220.70 cycles/hash
>     Small key speed test -   17-byte keys -   216.00 cycles/hash
>     Small key speed test -   18-byte keys -   218.03 cycles/hash
>     Small key speed test -   19-byte keys -   223.99 cycles/hash
>     Small key speed test -   20-byte keys -   224.00 cycles/hash
>     Small key speed test -   21-byte keys -   225.97 cycles/hash
>     Small key speed test -   22-byte keys -   226.99 cycles/hash
>     Small key speed test -   23-byte keys -   233.85 cycles/hash
>     Small key speed test -   24-byte keys -   222.15 cycles/hash
>     Small key speed test -   25-byte keys -   218.99 cycles/hash
>     Small key speed test -   26-byte keys -   220.17 cycles/hash
>     Small key speed test -   27-byte keys -   225.97 cycles/hash
>     Small key speed test -   28-byte keys -   226.99 cycles/hash
>     Small key speed test -   29-byte keys -   228.23 cycles/hash
>     Small key speed test -   30-byte keys -   228.75 cycles/hash
>     Small key speed test -   31-byte keys -   235.19 cycles/hash
> 
> 
>     Input vcode 0x00000001, Output vcode 0x00000001, Result vcode 0x00000001
>     Verification value is 0x00000001 - Testing took 15.664062 seconds
>     -------------------------------------------------------------------------------
> 
>     gcc:
>     pcbsd-8973% sudo nice -n -10 ./SMHasher Spooky128
>     -------------------------------------------------------------------------------
>     --- Testing Spooky128 (Bob Jenkins' SpookyHash, 128-bit result)
> 
>     [[[ Speed Tests ]]]
> 
>     Bulk speed test - 262144-byte keys
>     Alignment  0 -  3.316 bytes/cycle - 9486.77 MB/sec @ 3 ghz
>     Alignment  1 -  2.749 bytes/cycle - 7865.85 MB/sec @ 3 ghz
>     Alignment  2 -  2.749 bytes/cycle - 7865.18 MB/sec @ 3 ghz
>     Alignment  3 -  2.749 bytes/cycle - 7865.73 MB/sec @ 3 ghz
>     Alignment  4 -  2.750 bytes/cycle - 7867.80 MB/sec @ 3 ghz
>     Alignment  5 -  2.750 bytes/cycle - 7866.92 MB/sec @ 3 ghz
>     Alignment  6 -  2.750 bytes/cycle - 7867.24 MB/sec @ 3 ghz
>     Alignment  7 -  2.749 bytes/cycle - 7865.46 MB/sec @ 3 ghz
> 
>     Small key speed test -    1-byte keys -   214.65 cycles/hash
>     Small key speed test -    2-byte keys -   220.98 cycles/hash
>     Small key speed test -    3-byte keys -   222.41 cycles/hash
>     Small key speed test -    4-byte keys -   223.90 cycles/hash
>     Small key speed test -    5-byte keys -   224.00 cycles/hash
>     Small key speed test -    6-byte keys -   225.19 cycles/hash
>     Small key speed test -    7-byte keys -   226.03 cycles/hash
>     Small key speed test -    8-byte keys -   210.04 cycles/hash
>     Small key speed test -    9-byte keys -   220.78 cycles/hash
>     Small key speed test -   10-byte keys -   222.40 cycles/hash
>     Small key speed test -   11-byte keys -   223.79 cycles/hash
>     Small key speed test -   12-byte keys -   224.00 cycles/hash
>     Small key speed test -   13-byte keys -   225.27 cycles/hash
>     Small key speed test -   14-byte keys -   226.03 cycles/hash
>     Small key speed test -   15-byte keys -   226.99 cycles/hash
>     Small key speed test -   16-byte keys -   211.00 cycles/hash
>     Small key speed test -   17-byte keys -   217.62 cycles/hash
>     Small key speed test -   18-byte keys -   222.00 cycles/hash
>     Small key speed test -   19-byte keys -   223.99 cycles/hash
>     Small key speed test -   20-byte keys -   225.31 cycles/hash
>     Small key speed test -   21-byte keys -   226.14 cycles/hash
>     Small key speed test -   22-byte keys -   227.19 cycles/hash
>     Small key speed test -   23-byte keys -   228.92 cycles/hash
>     Small key speed test -   24-byte keys -   212.25 cycles/hash
>     Small key speed test -   25-byte keys -   217.94 cycles/hash
>     Small key speed test -   26-byte keys -   218.99 cycles/hash
>     Small key speed test -   27-byte keys -   226.99 cycles/hash
>     Small key speed test -   28-byte keys -   227.56 cycles/hash
>     Small key speed test -   29-byte keys -   228.97 cycles/hash
>     Small key speed test -   30-byte keys -   228.92 cycles/hash
>     Small key speed test -   31-byte keys -   230.00 cycles/hash
> 
> 
>     Input vcode 0x00000001, Output vcode 0x00000001, Result vcode 0x00000001
>     Verification value is 0x00000001 - Testing took 11.031250 seconds
>     -------------------------------------------------------------------------------
> 
> 
>     The code is 300 lines long and relatively simple:
>     http://pastebin.com/zrqthX9c
>     Invocation:
>     http://pastebin.com/RvbdJwdE
> 
>     Bundled with the benchmark that generated the numbers above:
>     http://www.multiupload.nl/2AXC4JYTL0
> 
>     I run PC-BSD 9.1 on Phenom 2 @ 3.2 Ghz.
> 
>     Is the problem in Clang or is it in my code? Is anybody willing to take
>     a look?
> 
>     Regards,
>     --
>     Twoje radio
>     _______________________________________________
>     cfe-users mailing list
>     cfe-users at cs.uiuc.edu <mailto:cfe-users at cs.uiuc.edu>
>     http://lists.cs.uiuc.edu/mailman/listinfo/cfe-users
> 
> 

-- 
Twoje radio



More information about the cfe-users mailing list