[cfe-dev] Diagnostic IDs, parsing speed and how to generate lookup tables

John McCall rjmccall at apple.com
Thu Dec 13 01:56:01 PST 2012


On Dec 13, 2012, at 1:46 AM, Manuel Klimek <klimek at google.com> wrote:
> On Wed, Dec 12, 2012 at 11:45 PM, John McCall <rjmccall at apple.com> wrote:
> On Dec 12, 2012, at 6:00 AM, Manuel Klimek <klimek at google.com> wrote:
>> while doing some parsing speed investigation I noted that a lot of diagnostics stuff is in the very hot path for parsing time. One thing that fell out of that is Benjamin's recent patch that speeds up getting the diagnostic info. But, there's more to be gotten - with a simple cache for the diagnotic classes, I was able to get another 1.5% parsing speedup (benchmarked over all google code).
>> 
>> Unfortunately the patch in its current state is not thread safe; the obvious solution to that problem would be to generate the diagnostic class table statically instead of writing to a cache at runtime.
>> 
>> This turns out to be surprisingly hard though - the diagnostic id start values are defined in an enum, while the actual diagnostics come from the .td files.
>> 
>> A first idea was that we might define the start values inside the .td files, and create the enum from that, instead of the other way around.
>> 
>> That led to the question posed by Dmitri on irc why there are start ranges in the first place - we could instead tablegen all diagnostics at once and let the tablegen take care of generating the appropriate enum values for the start of certain ranges.
>> 
>> So if anybody has better ideas for how to solve the lookup problem, or knows why the diagnostic ids have fixed start values, I'd be very interested to learn more about it :)
> 
> The purpose of the fixed starting values is just compile-time efficiency: we'd like to be able to add/remove diagnostics (generally in one part of clang) without requiring a full recompile.  As long as your scheme still allows this in most cases — maybe tblgen gets re-run for all diagnostics whenever any of them change, but tblgen rounds up to the next multiple of 50 between ranges so that adding a diagnostic in one range doesn't usually force a full recompile — I think we're fine.
> 
> Of the Diagnostic*Kinds.td files the only one that gets updated fairly regularly (every 1-2 days) is DiagnosticSemaKinds.td. Can't we just put it last, and save the hassle of putting holes into the generated lookup tables? (The next highest frequent change is ~4-5 times per month; in that time frame a full rebuild is in order anyway).

It's not really about the compile times for people periodically pulling the tree;  changes on trunk regularly cause full or nearly-full rebuilds anyway.  It's that someone working on a patch to the driver/frontend/parser/lexer/whatever should be able to add and tweak their diagnostics without forcing a full rebuild every time.

Also, under the current system, the hassle of these holes to typical clang developers is basically zero, and even that will pretty much disappear completely if we get tblgen to manage them automatically.

John.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20121213/26dbf36a/attachment.html>


More information about the cfe-dev mailing list