[early patch] Speed up decl chaining

Sat Oct 19 05:22:09 PDT 2013

The code hot path showed up in a build of postgresql.

Synthetic benchmarks like these do have their value. They expose bad
asymptotic behaviour that does show up in user code, but is harder to
measure.

For example, when this benchmark first came to being, the linkage
computation was non linear and dominated. Fixing it helped existing
code and moved the hot spot to decl linking. It looks like the hot
path is back to linkage computation, and we are still a lot slower
than gcc on this one, so fixing decl chaining will make this a good
linkage benchmark again.

Unbounded super linear algorithms in general provide a minefield that
is not very user friendly.

On 19 October 2013 02:25, Sean Silva <silvas at purdue.edu> wrote:
>
>
>
> On Tue, Oct 8, 2013 at 11:09 PM, Rafael Espíndola
> <rafael.espindola at gmail.com> wrote:
>>
>> I found this old incomplete patch while cleaning my git repo. I just
>> want to see if it is crazy or not before trying to finish it.
>
>
> What originally motivated this? Did you measure something that made you
> think that this had the potential to be faster?
>
>>
>>
>> Currently decl chaining is O(n). We use a circular singly linked list
>> that points to the previous element and has a bool to say if we are
>> the first element (and actually point to the last).
>>
>> Adding a new decl is O(n) because we have to find the first element by
>> walking the prev links. One way to make this O(1) that is sure to work
>> is a doubly linked list, but that would be very wasteful in memory.
>>
>> What this patch does is reverse the list so that a decl points to the
>> next decl (or to the first if it is the last). With this chaining
>> becomes O(1). The flip side is that getPreviousDecl is now O(n).
>>
>> In this patch I just got check-clang to work and replaced enough uses
>> of getPreviousDecl to get a speedup in
>>
>>     #define M extern int a;
>>     #define M2 M M
>>     #define M4 M2 M2
>>     #define M8 M4 M4
>>     #define M16 M8 M8
>>     #define M32 M16 M16
>>     #define M64 M32 M32
>>     #define M128 M64 M64
>>     #define M256 M128 M128
>>     #define M512 M256 M256
>>     #define M1024 M512 M512
>>     #define M2048 M1024 M1024
>>     #define M4096 M2048 M2048
>>     #define M8192 M4096 M4096
>>     #define M16384 M8192 M8192
>>     M16384
>>
>> In my machine this patch takes clang -cc1 on the pre processed version
>> of that from 0m4.748s to 0m1.525s.
>
>
> What is this microbenchmark even measuring? Is there any reason to believe
> that this is representative enough of anything to guide a decision?
>
> I feel like what's missing here are measurements of the actual behavior of
> this code path. For example, how long are we spending walking these
> redeclaration chains in real code? On average how long are the redeclaration
> chains when compiling real code? Almost always 1? Usually 2? Generally
> between 3 and 5? >100? Each of the cases I just listed puts the situation in
> a completely different light. Are any particular sites that call these API's
> (or particular AST classes) inducing far more link traversals than other
> sites when compiling typical code? (i.e., instrument the "get next link"
> routine to tally up by call site). Maybe the usage patterns of some AST
> nodes benefit more from forward traversal, and others from backward?
>
>
> Side note (completely impractical): if you have spare bits in the bottom of
> the pointer, then you could store bits of the address of the first decl (or
> whichever one is O(n) links away) in each link, so that in the worst case
> you only have to walk a constant number of links before you collect all the
> bits of the first pointer :)
>
> -- Sean Silva
>
>>
>>
>> There are still a lot of uses of getPreviousDecl to go, but can anyone
>> see a testecase where this strategy would not work?
>>
>> Cheers,
>> Rafael
>>
>> _______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>
>