[early patch] Speed up decl chaining

Sat Oct 19 13:04:46 PDT 2013

On 19 Oct 2013 05:28, "Rafael Espíndola" <rafael.espindola at gmail.com> wrote:
>
> The code hot path showed up in a build of postgresql.
>
> Synthetic benchmarks like these do have their value. They expose bad
> asymptotic behaviour that does show up in user code, but is harder to
> measure.
>
> For example, when this benchmark first came to being, the linkage
> computation was non linear and dominated. Fixing it helped existing
> code and moved the hot spot to decl linking. It looks like the hot
> path is back to linkage computation, and we are still a lot slower
> than gcc on this one, so fixing decl chaining will make this a good
> linkage benchmark again.
>
> Unbounded super linear algorithms in general provide a minefield that
> is not very user friendly.

I don't see how this fixes the superlinearity; it seems like it just moves
it around. Doesn't it make getPreviousDecl linear? Some of the places
you've changed from getPreviousDecl to getFirstDecl are also now linear.

The change to setObjectOfFriendDecl looks incorrect: we really wanted to
look at the IDNS of the most recent decl, not of the first one.

> On 19 October 2013 02:25, Sean Silva <silvas at purdue.edu> wrote:
> >
> >
> >
> > On Tue, Oct 8, 2013 at 11:09 PM, Rafael Espíndola
> > <rafael.espindola at gmail.com> wrote:
> >>
> >> I found this old incomplete patch while cleaning my git repo. I just
> >> want to see if it is crazy or not before trying to finish it.
> >
> >
> > What originally motivated this? Did you measure something that made you
> > think that this had the potential to be faster?
> >
> >>
> >>
> >> Currently decl chaining is O(n). We use a circular singly linked list
> >> that points to the previous element and has a bool to say if we are
> >> the first element (and actually point to the last).
> >>
> >> Adding a new decl is O(n) because we have to find the first element by
> >> walking the prev links. One way to make this O(1) that is sure to work
> >> is a doubly linked list, but that would be very wasteful in memory.
> >>
> >> What this patch does is reverse the list so that a decl points to the
> >> next decl (or to the first if it is the last). With this chaining
> >> becomes O(1). The flip side is that getPreviousDecl is now O(n).
> >>
> >> In this patch I just got check-clang to work and replaced enough uses
> >> of getPreviousDecl to get a speedup in
> >>
> >>     #define M extern int a;
> >>     #define M2 M M
> >>     #define M4 M2 M2
> >>     #define M8 M4 M4
> >>     #define M16 M8 M8
> >>     #define M32 M16 M16
> >>     #define M64 M32 M32
> >>     #define M128 M64 M64
> >>     #define M256 M128 M128
> >>     #define M512 M256 M256
> >>     #define M1024 M512 M512
> >>     #define M2048 M1024 M1024
> >>     #define M4096 M2048 M2048
> >>     #define M8192 M4096 M4096
> >>     #define M16384 M8192 M8192
> >>     M16384
> >>
> >> In my machine this patch takes clang -cc1 on the pre processed version
> >> of that from 0m4.748s to 0m1.525s.
> >
> >
> > What is this microbenchmark even measuring? Is there any reason to
believe
> > that this is representative enough of anything to guide a decision?
> >
> > I feel like what's missing here are measurements of the actual behavior
of
> > this code path. For example, how long are we spending walking these
> > redeclaration chains in real code? On average how long are the
redeclaration
> > chains when compiling real code? Almost always 1? Usually 2? Generally
> > between 3 and 5? >100? Each of the cases I just listed puts the
situation in
> > a completely different light. Are any particular sites that call these
API's
> > (or particular AST classes) inducing far more link traversals than other
> > sites when compiling typical code? (i.e., instrument the "get next link"
> > routine to tally up by call site). Maybe the usage patterns of some AST
> > nodes benefit more from forward traversal, and others from backward?
> >
> >
> > Side note (completely impractical): if you have spare bits in the
bottom of
> > the pointer, then you could store bits of the address of the first decl
(or
> > whichever one is O(n) links away) in each link, so that in the worst
case
> > you only have to walk a constant number of links before you collect all
the
> > bits of the first pointer :)
> >
> > -- Sean Silva
> >
> >>
> >>
> >> There are still a lot of uses of getPreviousDecl to go, but can anyone
> >> see a testecase where this strategy would not work?
> >>
> >> Cheers,
> >> Rafael
> >>
> >> _______________________________________________
> >> cfe-commits mailing list
> >> cfe-commits at cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
> >>
> >
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131019/ef6a7194/attachment.html>