[cfe-users] Why is my code 9 times slower with Clang than with gcc?

Nick Lewycky nlewycky at google.com
Sun Jul 7 16:12:17 PDT 2013


Here's something to try: wrap template class SpookyHash in an anonymous
namespace. What impact does this have on performance?

You didn't include a main() function so I can't run it and see concrete
numbers. I think the problem is that it looks like the code is manually
unrolled in parts (h0 through h11?!) and in turn that's causing the
functions to be so big that llvm is refusing to inline them.

Nick

On 7 July 2013 11:56, Radio młodych bandytów <radiomlodychbandytow at o2.pl>wrote:

> Hello.
> I'm developing a hash function based on Bob Jenkins' one. From the
> start, I used to compile it with gcc 4.9. Now I decided to try Clang 3.4
> and was shocked to see that the results are just terrible. Now I wonder
> what should I do to make Clang do fair here too.
> I also tried Clang 3.1 and gcc 4.2.1 - the former was very slow, the
> latter OK.
> Detailed results:
> Clang:
> pcbsd-8973% sudo nice -n -10 ./SMHasher Spooky128
>
> -------------------------------------------------------------------------------
> --- Testing Spooky128 (Bob Jenkins' SpookyHash, 128-bit result)
>
> [[[ Speed Tests ]]]
>
> Bulk speed test - 262144-byte keys
> Alignment  0 -  0.353 bytes/cycle - 1009.75 MB/sec @ 3 ghz
> Alignment  1 -  0.372 bytes/cycle - 1063.31 MB/sec @ 3 ghz
> Alignment  2 -  0.372 bytes/cycle - 1063.52 MB/sec @ 3 ghz
> Alignment  3 -  0.372 bytes/cycle - 1063.23 MB/sec @ 3 ghz
> Alignment  4 -  0.372 bytes/cycle - 1063.52 MB/sec @ 3 ghz
> Alignment  5 -  0.372 bytes/cycle - 1063.29 MB/sec @ 3 ghz
> Alignment  6 -  0.372 bytes/cycle - 1063.33 MB/sec @ 3 ghz
> Alignment  7 -  0.372 bytes/cycle - 1063.52 MB/sec @ 3 ghz
>
> Small key speed test -    1-byte keys -   215.77 cycles/hash
> Small key speed test -    2-byte keys -   216.00 cycles/hash
> Small key speed test -    3-byte keys -   216.00 cycles/hash
> Small key speed test -    4-byte keys -   218.01 cycles/hash
> Small key speed test -    5-byte keys -   219.00 cycles/hash
> Small key speed test -    6-byte keys -   220.40 cycles/hash
> Small key speed test -    7-byte keys -   225.79 cycles/hash
> Small key speed test -    8-byte keys -   220.37 cycles/hash
> Small key speed test -    9-byte keys -   220.96 cycles/hash
> Small key speed test -   10-byte keys -   222.40 cycles/hash
> Small key speed test -   11-byte keys -   228.68 cycles/hash
> Small key speed test -   12-byte keys -   228.98 cycles/hash
> Small key speed test -   13-byte keys -   230.40 cycles/hash
> Small key speed test -   14-byte keys -   231.07 cycles/hash
> Small key speed test -   15-byte keys -   232.00 cycles/hash
> Small key speed test -   16-byte keys -   220.70 cycles/hash
> Small key speed test -   17-byte keys -   216.00 cycles/hash
> Small key speed test -   18-byte keys -   218.03 cycles/hash
> Small key speed test -   19-byte keys -   223.99 cycles/hash
> Small key speed test -   20-byte keys -   224.00 cycles/hash
> Small key speed test -   21-byte keys -   225.97 cycles/hash
> Small key speed test -   22-byte keys -   226.99 cycles/hash
> Small key speed test -   23-byte keys -   233.85 cycles/hash
> Small key speed test -   24-byte keys -   222.15 cycles/hash
> Small key speed test -   25-byte keys -   218.99 cycles/hash
> Small key speed test -   26-byte keys -   220.17 cycles/hash
> Small key speed test -   27-byte keys -   225.97 cycles/hash
> Small key speed test -   28-byte keys -   226.99 cycles/hash
> Small key speed test -   29-byte keys -   228.23 cycles/hash
> Small key speed test -   30-byte keys -   228.75 cycles/hash
> Small key speed test -   31-byte keys -   235.19 cycles/hash
>
>
> Input vcode 0x00000001, Output vcode 0x00000001, Result vcode 0x00000001
> Verification value is 0x00000001 - Testing took 15.664062 seconds
>
> -------------------------------------------------------------------------------
>
> gcc:
> pcbsd-8973% sudo nice -n -10 ./SMHasher Spooky128
>
> -------------------------------------------------------------------------------
> --- Testing Spooky128 (Bob Jenkins' SpookyHash, 128-bit result)
>
> [[[ Speed Tests ]]]
>
> Bulk speed test - 262144-byte keys
> Alignment  0 -  3.316 bytes/cycle - 9486.77 MB/sec @ 3 ghz
> Alignment  1 -  2.749 bytes/cycle - 7865.85 MB/sec @ 3 ghz
> Alignment  2 -  2.749 bytes/cycle - 7865.18 MB/sec @ 3 ghz
> Alignment  3 -  2.749 bytes/cycle - 7865.73 MB/sec @ 3 ghz
> Alignment  4 -  2.750 bytes/cycle - 7867.80 MB/sec @ 3 ghz
> Alignment  5 -  2.750 bytes/cycle - 7866.92 MB/sec @ 3 ghz
> Alignment  6 -  2.750 bytes/cycle - 7867.24 MB/sec @ 3 ghz
> Alignment  7 -  2.749 bytes/cycle - 7865.46 MB/sec @ 3 ghz
>
> Small key speed test -    1-byte keys -   214.65 cycles/hash
> Small key speed test -    2-byte keys -   220.98 cycles/hash
> Small key speed test -    3-byte keys -   222.41 cycles/hash
> Small key speed test -    4-byte keys -   223.90 cycles/hash
> Small key speed test -    5-byte keys -   224.00 cycles/hash
> Small key speed test -    6-byte keys -   225.19 cycles/hash
> Small key speed test -    7-byte keys -   226.03 cycles/hash
> Small key speed test -    8-byte keys -   210.04 cycles/hash
> Small key speed test -    9-byte keys -   220.78 cycles/hash
> Small key speed test -   10-byte keys -   222.40 cycles/hash
> Small key speed test -   11-byte keys -   223.79 cycles/hash
> Small key speed test -   12-byte keys -   224.00 cycles/hash
> Small key speed test -   13-byte keys -   225.27 cycles/hash
> Small key speed test -   14-byte keys -   226.03 cycles/hash
> Small key speed test -   15-byte keys -   226.99 cycles/hash
> Small key speed test -   16-byte keys -   211.00 cycles/hash
> Small key speed test -   17-byte keys -   217.62 cycles/hash
> Small key speed test -   18-byte keys -   222.00 cycles/hash
> Small key speed test -   19-byte keys -   223.99 cycles/hash
> Small key speed test -   20-byte keys -   225.31 cycles/hash
> Small key speed test -   21-byte keys -   226.14 cycles/hash
> Small key speed test -   22-byte keys -   227.19 cycles/hash
> Small key speed test -   23-byte keys -   228.92 cycles/hash
> Small key speed test -   24-byte keys -   212.25 cycles/hash
> Small key speed test -   25-byte keys -   217.94 cycles/hash
> Small key speed test -   26-byte keys -   218.99 cycles/hash
> Small key speed test -   27-byte keys -   226.99 cycles/hash
> Small key speed test -   28-byte keys -   227.56 cycles/hash
> Small key speed test -   29-byte keys -   228.97 cycles/hash
> Small key speed test -   30-byte keys -   228.92 cycles/hash
> Small key speed test -   31-byte keys -   230.00 cycles/hash
>
>
> Input vcode 0x00000001, Output vcode 0x00000001, Result vcode 0x00000001
> Verification value is 0x00000001 - Testing took 11.031250 seconds
>
> -------------------------------------------------------------------------------
>
>
> The code is 300 lines long and relatively simple:
> http://pastebin.com/zrqthX9c
> Invocation:
> http://pastebin.com/RvbdJwdE
>
> Bundled with the benchmark that generated the numbers above:
> http://www.multiupload.nl/2AXC4JYTL0
>
> I run PC-BSD 9.1 on Phenom 2 @ 3.2 Ghz.
>
> Is the problem in Clang or is it in my code? Is anybody willing to take
> a look?
>
> Regards,
> --
> Twoje radio
> _______________________________________________
> cfe-users mailing list
> cfe-users at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-users/attachments/20130707/5266d37c/attachment.html>


More information about the cfe-users mailing list