[cfe-dev] clang performance when building Linux

Chandler Carruth chandlerc at google.com
Mon Apr 18 23:06:55 PDT 2011


On Mon, Apr 18, 2011 at 9:44 PM, Douglas Gregor <dgregor at apple.com> wrote:

>
> Sent from my iPhone
>
> On Apr 18, 2011, at 6:34 PM, Chandler Carruth <chandlerc at google.com>
> wrote:
>
> On Sat, Apr 16, 2011 at 1:48 PM, Török Edwin < <edwintorok at gmail.com>
> edwintorok at gmail.com> wrote:
>
>> On 2011-04-16 23:32, Chandler Carruth wrote:
>> > On Sat, Apr 16, 2011 at 5:01 AM, Benjamin Kramer
>> > < <benny.kra at googlemail.com>benny.kra at googlemail.com <mailto:<benny.kra at googlemail.com>
>> benny.kra at googlemail.com>> wrote:
>> >
>> >     >     3.65%     clang  clang                              [.]
>> >     llvm::StringMapImpl::LookupBucketFor(llvm::StringRef)
>> >
>> >     I'm a bit surprised that StringMap is the most expensive entry here,
>> >     maybe microoptimizing
>> >     the hash function (which is a byte-wise djb hash at the moment) can
>> >     help a bit. If someone is
>> >     really bored it would also be useful to test if other string hash
>> >     functions like murmurhash or google's
>> >     new city hash give better performance.
>> >
>> >
>> > Interesting. I'm familiar with murmurhash and watched the development of
>> > city hash and am quite familiar with it. I'll take a look at what it
>> > would take to use cityhash here. Anything special done to produce these
>> > numbers? Just a build of the kernel?
>> >
>> > If you could paste how you collected the perf data that would be useful
>> > as well... i've not used the 'perf' tool extensively before.
>>
>> Here is what I used:
>> $ make allmodconfig
>> $ perf record make CC=clang -j6
>> (this creates a file perf.data, let it run for at least 2 or 5 minutes,
>> then interrupt it, or wait for it to finish)
>> $ perf report
>> (ncurses-like interface to browse perf.data)
>>
>
> Cool, thanks!
>
> I was never able to get the lookup to take as much of my CPU time as you
> did, but the benchmarks were very noisy. When I used my own stress test
> benchmarks (massive C++ file and the single-source GCC file) I would see
> roughly 1.5% of the CPU cycles in this function.
>
> I got CityHash into the codebase and taught StringMap to use it. This saved
> roughly 50% of the time in the function, taking it under the 1% line. I
> haven't looked in detail to see what is taking the time now.
>
> On another benchmark where this function was a bit hotter (2.4% roughly,
> similar numbers to those I got by profiling the kernel build) I saw as much
> as 1% over-all speed up. Nothing stellar, but not terrible either.
>
> If folks are interested, I'll look at getting City Hash checked in, and
> investigate using it in a few other places as well where collisions and/or
> hashing cost us some.
>
>
> Definitely interested!
>

So, even using a "torture test" for this part of Clang (-Eonly on gcc.c, a
0.75 MLOC file) I can only make it about 0.5% to 1.5% faster overall. We
stop getting collisions, and the CPU time spent in LookupBucketFor drops
from 7% to 4%, along with memcmp time drops, but the time just goes
elsewhere, at least for the test cases I have and the CPU I'm measuring on.

If anyone has a good test case to reproduce the performance impact and
measure significant benefit from this, let me know and I'll send you my
patch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110418/68fb9adf/attachment.html>


More information about the cfe-dev mailing list