[llvm-dev] RFC: Large, especially super-linear, compile time regressions are bugs.

Thu Mar 31 15:34:32 PDT 2016

Hi Renato,

> On Mar 31, 2016, at 2:46 PM, Renato Golin <renato.golin at linaro.org> wrote:
> 
> On 31 March 2016 at 21:41, Mehdi Amini via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> TLDR: I totally support considering compile time regression as bug.
> 
> Me too.
> 
> I also agree that reverting fresh and reapplying is *much* easier than
> trying to revert late.
> 
> But I'd like to avoid dubious metrics.

I'm not sure about how "this commit regress the compile time by 2%" is a dubious metric.
The metric is not dubious IMO, it is what it is: a measurement. 
You just have to cast a good process around it to exploit this measurement in a useful way for the project.

>> The closest I could find would be what Chandler wrote in:
>> http://reviews.llvm.org/D12826 ; for instance for O2 he stated that "if an
>> optimization increases compile time by 5% or increases code size by 5% for a
>> particular benchmark, that benchmark should also be one which sees a 5%
>> runtime improvement".
> 
> I think this is a bit limited and can lead to which hunts, especially
> wrt performance measurements.
> 
> Chandler's title is perfect though... Large can be vague, but
> "super-linear" is not. We used to have the concept that any large
> super-linear (quadratic+) compile time introductions had to be in O3
> or, for really bad cases, behind additional flags. I think we should
> keep that mindset.
> 
> 
>> My hope is that with better tooling for tracking compile time in the future,
>> we'll reach a state where we'll be able to consider "breaking" the
>> compile-time regression test as important as breaking any test: i.e. the
>> offending commit should be reverted unless it has been shown to
>> significantly (hand wavy...) improve the runtime performance.
> 
> In order to have any kind of threshold, we'd have to monitor with some
> accuracy the performance of both compiler and compiled code for the
> main platforms. We do that to certain extent with the test-suite bots,
> but that's very far from ideal.

I agree. Did you read the part where I was mentioning that we're working in the tooling part and that I was waiting for it to be done to start this thread?

> 
> So, I'd recommend we steer away from any kind of percentage or ratio
> and keep at least the quadratic changes and beyond on special flags
> (n.logn is ok for most cases).

How to do you suggest we address the long trail of 1-3% slow down that lead to the current situation (cf the two links I posted in my previous email)?
Because there *is* a problem here, and I'd really like someone to come up with a solution for that.

>> Since you raise the discussion now, I take the opportunity to push on the
>> "more aggressive" side: I think the policy should be a balance between the
>> improvement the commit brings compared to the compile time slow down.
> 
> This is a fallacy.

Not sure why or what you mean? The fact that an optimization improves only some target does not invalidate the point.

> 
> Compile time often regress across all targets, while execution
> improvements are focused on specific targets and can have negative
> effects on those that were not benchmarked on.

Yeah, as usual in LLVM: if you care about something on your platform, setup a bot and track trunk closely, otherwise you're less of a priority.

> Overall, though,
> compile time regressions dilute over the improvements, but not on a
> commit per commit basis. That's what I meant by which hunt.

There is no "witch hunt", at least that's not my objective.
I think everyone is pretty enthusiastic with every new perf improvement (I do), but just like without bot in general (and policy) we would break them all the time unintentionally.
I talking about chasing and tracking every single commit were a developer would regress compile time *without even being aware*.
I'd personally love to have a bot or someone emailing me with compile time regression I would introduce.

> 
> I think we should keep an eye on those changes, ask for numbers in
> code review and even maybe do some benchmarking on our own before
> accepting it. Also, we should not commit code that we know hurts
> performance that badly, even if we believe people will replace them in
> the future. It always takes too long. I myself have done that last
> year, and I learnt my lesson.

Agree. 

> Metrics are often more dangerous than helpful, as they tend to be used
> as a substitute for thinking.

I don't relate this sentence to anything concrete at stance here. 
I think this list is full of people that are very good at thinking and won't substitute it :)

Best,

-- 
Mehdi