[cfe-dev] [llvm-dev] llvm and clang are getting slower

Tue Mar 8 10:22:51 PST 2016

On Tue, Mar 8, 2016 at 9:55 AM, Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> ----- Original Message -----
> > From: "Mehdi Amini via cfe-dev" <cfe-dev at lists.llvm.org>
> > To: "Rafael Espíndola" <rafael.espindola at gmail.com>
> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "cfe-dev" <
> cfe-dev at lists.llvm.org>
> > Sent: Tuesday, March 8, 2016 11:40:47 AM
> > Subject: Re: [cfe-dev] [llvm-dev] llvm and clang are getting slower
> >
> > Hi Rafael,
> >
> > CC: cfe-dev
> >
> > Thanks for sharing. We also noticed this internally, and I know that
> > Bruno and Chris are working on some infrastructure and tooling to
> > help tracking closely compile time regressions.
> >
> > We had this conversation internally about the tradeoff between
> > compile-time and runtime performance, and I planned to bring-up the
> > topic on the list in the coming months, this looks like a good
> > occasion to plant the seed. Apparently in the past (years/decade
> > ago?) the project was very conservative on adding any optimizations
> > that would impact compile time, however there is no explicit policy
> > (that I know of) to address this tradeoff.
> > The closest I could find would be what Chandler wrote in:
> > http://reviews.llvm.org/D12826 ; for instance for O2 he stated that
> > "if an optimization increases compile time by 5% or increases code
> > size by 5% for a particular benchmark, that benchmark should also be
> > one which sees a 5% runtime improvement".
> >
> > My hope is that with better tooling for tracking compile time in the
> > future, we'll reach a state where we'll be able to consider
> > "breaking" the compile-time regression test as important as breaking
> > any test: i.e. the offending commit should be reverted unless it has
> > been shown to significantly (hand wavy...) improve the runtime
> > performance.
> >
> > <troll>
> > With the current trend, the Polly developers don't have to worry
> > about improving their compile time, we'll catch up with them ;)
> > </troll>
>
> My two largest pet peeves in this area are:
>

I think you hit on something that i would expand on:

We don't hold the line very well on adding little things to passes and
analysis over time.
We add 1000 little walkers and pattern matchers to try to get better code,
and then often add knobs to try to control their overall compile time.
At some point, these all add up. You end up with the same flat profile if
you do this everywhere, but your compiler gets slower.
At some point, someone has to stop and say "well, wait a minute, are there
better algorithms or architecture we should be using to do this", and
either do it, or not let it get worse :) I'd suggest, in most cases, we
know better ways to do almost all of these things.

Don't get me wrong, i don't believe there is any theoretically pure way to
do everything that we can just implement and never have to tweak.  But it's
a continuum, and at some point you have to stop and re-evaluate whether the
current approach is really the right one if you have to have a billion
little things to it get what you want.
We often don't do that.
We go *very* far down the path of a billion tweaks and adding knobs, and
what we have now, compile time wise, is what you get when you do that :)
I suspect this is because we don't really want to try to force work on
people who are just trying to get crap done.  We're all good contributors
trying to do the right thing, and saying no often seems obstructionist, etc.
The problem is at some point you end up with the tragedy of the commons.

(also, not everything in the compiler has to catch every case to get good
code)

>  1. We often use functions from ValueTracking (to get known bits, the
> number of sign bits, etc.) as through they're low cost. They're not really
> low cost. The problem is that they *should* be. These functions do
> bottom-up walks, and could cache their results. Instead, they do a limited
> walk and recompute everything each time. This is expensive, and a
> significant amount of our InstCombine time goes to ValueTracking, and that
> shouldn't be the case. The more we add to InstCombine (and related passes),
> and the more we run InstCombine, the worse this gets. On the other hand,
> fixing this will help both compile time and code quality.
>

(LVI is another great example. Fun fact: If you ask for value info for
everything, it's no longer lazy ....)

>
>   Furthermore, BasicAA has the same problem.
>
>  2. We have "cleanup" passes in the pipeline, such as those that run after
> loop unrolling and/or vectorization, that run regardless of whether the
> preceding pass actually did anything. We've been adding more of these, and
> they catch important use cases, but we need a better infrastructure for
> this (either with the new pass manager or otherwise).
>
> Also, I'm very hopeful that as our new MemorySSA and GVN improvements
> materialize, we'll see large compile-time improvements from that work. We
> spend a huge amount of time in GVN computing memory-dependency information
> (the dwarfs the time spent by GVN doing actual value numbering work by an
> order of magnitude or more).
>

I'm a working on it ;)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160308/7d4c2729/attachment.html>