[PATCH] D36352: [LCG] Switch one of the update methods for the LazyCallGraph to support limited batch updates.

Chandler Carruth via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Aug 5 15:19:00 PDT 2017


chandlerc added a comment.

In https://reviews.llvm.org/D36352#833118, @silvas wrote:

> > When compiling the IR for this file with 'opt' and 'https://reviews.llvm.org/owners/package/3/', this patch reduces the total time by 8-9%.
>
> Just as a point of reference, do you have handy how much of the total compile time are we spending in the PM's CGSCC / LCG  stuff?


Yep, that's how I got here. Before all of my changes to LCG, we spent just under 40% of the total optimizer time (and nearly that much of the total compile time) in this one method.

Removing the parent sets made this method obnoxiously faster (over 2x) and shaved just over 20% off the total optimizer time.

This patch makes this method another 2x faster, and shaves 8% off the total optimizer time.

After both of these, this method remains the only thing hot really, and it is now between 6-8% depending on the noise in the profile. I'm still looking for more things to make faster here. I think i have another 2-4 things that will make it faster.

> If we can speed up the overall time by 8-9% by improving it then it suggests that the total time is much larger, which I find somewhat surprising (presumably, we should be spending the vast majority of our time inside of the optimizations themselves).

Me too. Turns out this method is really, really hot.

> Is this module you're working on just a really pathological case?

It is entirely possible this module is pathological -- I've not seen this when profiling other compiles.

Unfortunately, it is a pathological case that is *generated code* that for me, happens to represent like 10% of all the files we compile and many of the slowest files we compile. So even if this module is really, really weird I still need it to compile very fast. =]

> Or is this maybe a situation where the denormalized ref edge representation is causing a bunch of extra work?

I'm sure this is part of the problem. But fixing this is (much) harder than making this routine fast. And it doesn't *appear* to be the primary problem. I instrumented this routine expecting to see removal of 100,000+ edges all the time and think "oh, so its denormalized...". That wasn't what happened. In this (potentially pathological) case, there are only like 7k ref edges ever removed in the entire opt run! And most (some 3.5k) are removed due to a *single* ref edge becoming dead after the inliner runs. The fact that the optimizer is deleting one edge at a time is pretty strong evidence that the denormalization isn't as prominent in this problem as I would have expected. I'm pretty sure based on these numbers that the real reason this ends up slow in this module is the size of RefSCC that we're removing edges from.

Still, I suspect *this* patch to be mostly making the denormalized representation scale better (by batching their removal). But it seems much easier to do this than to switch representations, and this patch actually makes the code simpler, so it seemed like a fine intermediate step.


https://reviews.llvm.org/D36352





More information about the llvm-commits mailing list