[cfe-dev] clang performance when building Linux
Török Edwin
edwintorok at gmail.com
Tue Apr 19 00:10:47 PDT 2011
On 2011-04-19 09:06, Chandler Carruth wrote:
> On Mon, Apr 18, 2011 at 9:44 PM, Douglas Gregor <dgregor at apple.com
> <mailto:dgregor at apple.com>> wrote:
>
>
> Sent from my iPhone
>
> On Apr 18, 2011, at 6:34 PM, Chandler Carruth <chandlerc at google.com
> <mailto:chandlerc at google.com>> wrote:
>
>> On Sat, Apr 16, 2011 at 1:48 PM, Török Edwin
>> <<mailto:edwintorok at gmail.com>edwintorok at gmail.com
>> <mailto:edwintorok at gmail.com>> wrote:
>>
>> On 2011-04-16 23:32, Chandler Carruth wrote:
>> > On Sat, Apr 16, 2011 at 5:01 AM, Benjamin Kramer
>> > <<mailto:benny.kra at googlemail.com>benny.kra at googlemail.com
>> <mailto:benny.kra at googlemail.com>
>> <mailto:<mailto:benny.kra at googlemail.com>benny.kra at googlemail.com
>> <mailto:benny.kra at googlemail.com>>> wrote:
>> >
>> > > 3.65% clang clang
>> [.]
>> > llvm::StringMapImpl::LookupBucketFor(llvm::StringRef)
>> >
>> > I'm a bit surprised that StringMap is the most expensive
>> entry here,
>> > maybe microoptimizing
>> > the hash function (which is a byte-wise djb hash at the
>> moment) can
>> > help a bit. If someone is
>> > really bored it would also be useful to test if other
>> string hash
>> > functions like murmurhash or google's
>> > new city hash give better performance.
>> >
>> >
>> > Interesting. I'm familiar with murmurhash and watched the
>> development of
>> > city hash and am quite familiar with it. I'll take a look at
>> what it
>> > would take to use cityhash here. Anything special done to
>> produce these
>> > numbers? Just a build of the kernel?
>> >
>> > If you could paste how you collected the perf data that
>> would be useful
>> > as well... i've not used the 'perf' tool extensively before.
>>
>> Here is what I used:
>> $ make allmodconfig
>> $ perf record make CC=clang -j6
>> (this creates a file perf.data, let it run for at least 2 or 5
>> minutes,
>> then interrupt it, or wait for it to finish)
>> $ perf report
>> (ncurses-like interface to browse perf.data)
>>
>>
>> Cool, thanks!
>>
>> I was never able to get the lookup to take as much of my CPU time
>> as you did, but the benchmarks were very noisy. When I used my own
>> stress test benchmarks (massive C++ file and the single-source GCC
>> file) I would see roughly 1.5% of the CPU cycles in this function.
>>
>> I got CityHash into the codebase and taught StringMap to use it.
>> This saved roughly 50% of the time in the function, taking it
>> under the 1% line. I haven't looked in detail to see what is
>> taking the time now.
>>
>> On another benchmark where this function was a bit hotter (2.4%
>> roughly, similar numbers to those I got by profiling the kernel
>> build) I saw as much as 1% over-all speed up. Nothing stellar, but
>> not terrible either.
>>
>> If folks are interested, I'll look at getting City Hash checked
>> in, and investigate using it in a few other places as well where
>> collisions and/or hashing cost us some.
>
> Definitely interested!
>
>
> So, even using a "torture test" for this part of Clang (-Eonly on gcc.c,
> a 0.75 MLOC file) I can only make it about 0.5% to 1.5% faster overall.
> We stop getting collisions, and the CPU time spent in LookupBucketFor
> drops from 7% to 4%, along with memcmp time drops, but the time just
> goes elsewhere, at least for the test cases I have and the CPU I'm
> measuring on.
So no overall improvement of build time?
Would using PTH help here? If so could clang cache the parsed headers,
and only recreate the PTH when they changed?
>
> If anyone has a good test case to reproduce the performance impact and
> measure significant benefit from this, let me know and I'll send you my
> patch.
I'd be happy to try your patch when I have time.
Please send it.
Best regards,
--Edwin
More information about the cfe-dev
mailing list