[llvm-dev] RFC: Large, especially super-linear, compile time regressions are bugs.

Thu Mar 31 14:46:33 PDT 2016

On 31 March 2016 at 21:41, Mehdi Amini via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> TLDR: I totally support considering compile time regression as bug.

Me too.

I also agree that reverting fresh and reapplying is *much* easier than
trying to revert late.

But I'd like to avoid dubious metrics.

> The closest I could find would be what Chandler wrote in:
> http://reviews.llvm.org/D12826 ; for instance for O2 he stated that "if an
> optimization increases compile time by 5% or increases code size by 5% for a
> particular benchmark, that benchmark should also be one which sees a 5%
> runtime improvement".

I think this is a bit limited and can lead to which hunts, especially
wrt performance measurements.

Chandler's title is perfect though... Large can be vague, but
"super-linear" is not. We used to have the concept that any large
super-linear (quadratic+) compile time introductions had to be in O3
or, for really bad cases, behind additional flags. I think we should
keep that mindset.

> My hope is that with better tooling for tracking compile time in the future,
> we'll reach a state where we'll be able to consider "breaking" the
> compile-time regression test as important as breaking any test: i.e. the
> offending commit should be reverted unless it has been shown to
> significantly (hand wavy...) improve the runtime performance.

In order to have any kind of threshold, we'd have to monitor with some
accuracy the performance of both compiler and compiled code for the
main platforms. We do that to certain extent with the test-suite bots,
but that's very far from ideal.

So, I'd recommend we steer away from any kind of percentage or ratio
and keep at least the quadratic changes and beyond on special flags
(n.logn is ok for most cases).

> Since you raise the discussion now, I take the opportunity to push on the
> "more aggressive" side: I think the policy should be a balance between the
> improvement the commit brings compared to the compile time slow down.

This is a fallacy.

Compile time often regress across all targets, while execution
improvements are focused on specific targets and can have negative
effects on those that were not benchmarked on. Overall, though,
compile time regressions dilute over the improvements, but not on a
commit per commit basis. That's what I meant by which hunt.

I think we should keep an eye on those changes, ask for numbers in
code review and even maybe do some benchmarking on our own before
accepting it. Also, we should not commit code that we know hurts
performance that badly, even if we believe people will replace them in
the future. It always takes too long. I myself have done that last
year, and I learnt my lesson.

Metrics are often more dangerous than helpful, as they tend to be used
as a substitute for thinking.

My tuppence.

--renato