[llvm-dev] RFC: Large, especially super-linear, compile time regressions are bugs.

Mehdi Amini via llvm-dev llvm-dev at lists.llvm.org
Thu Mar 31 17:09:31 PDT 2016


> On Mar 31, 2016, at 4:40 PM, Renato Golin <renato.golin at linaro.org> wrote:
> 
> On 31 March 2016 at 23:34, Mehdi Amini <mehdi.amini at apple.com> wrote:
>> I'm not sure about how "this commit regress the compile time by 2%" is a dubious metric.
>> The metric is not dubious IMO, it is what it is: a measurement.
> 
> Ignoring for a moment the slippery slope we recently had on compile
> time performance, 2% is an acceptable regression for a change that
> improves most targets around 2% execution time, more than if only one
> target was affected.

Sure, I don't think I have suggested anything else, if I did it is because I don't express myself correctly then :)
I'm excited about runtime performance, and I'm willing to spend compile-time budget to achieve these. 
I'd even say that my view is that by tracking compile-time on other things, it'll help to preserve more compile-time budget for the kind of commit you mention above.

> 
> Different people see performance with different eyes, and companies
> have different expectations about it, too, so those percentages can
> have different impact on different people for the same change.
> 
> I guess my point is that no threshold

I don't suggest a threshold that says "a commit can't regress x%", and that would be set in stone.

What I have in mind is more: if a commit regress the build above a threshold (1% on average for instance), then we should be able to have a discussion about this commit to evaluate if it belongs to O2 or if it should go to O3 for instance.
Also if the commit is about refactoring, or introducing a new feature, the regression might not be intended at all by the author! 


> will please everybody, and
> people are more likely to "abuse" of the metric if the results are far
> from what they see as acceptable, even if everyone else is ok with it.

The metric is "the commit regressed 1%". The natural thing that follows is what happens usually in the community: we look at the data (what is the performance improvement), and decide on a case by case if it is fine as is or not.
I feel like you're talking about the "metric" like an automatic threshold that triggers an automatic revert and block things, this is not the goal and that is not what I mean when I use of the word metric (but hey, I'm not a native speaker!).
As I said before, I'm mostly chasing *untracked* and *unintentional* compile time regression. 


> My point about replacing metrics for thinking is not to the lazy
> programmers (of which there are very few here), but to how far does
> the encoded threshold fall from your own. Bias is a *very* hard thing
> to remove, even for extremely smart and experienced people.
> 
> So, while "which hunt" is a very strong term for the mild bias we'll
> all have personally, we have seen recently how some discussions end up
> in rage when a group of people strongly disagree with the rest,
> self-reinforcing their bias to levels that they would never reach
> alone. In those cases, the term stops being strong, and may be
> fitting... Makes sense?
> 
> 
>> I agree. Did you read the part where I was mentioning that we're working in the tooling part and that I was waiting for it to be done to start this thread?
> 
> I did, and should have mentioned on my reply. I think you guys (and
> ARM) are doing an amazing job at quality measurement. I wasn't trying
> to reduce your efforts, but IMHO, the relationship between effort and
> bias removal is not linear, ie. you'll have to improve quality
> exponentially to remove bias linearly. So, the threshold we're
> prepared to stop might not remove all the problems and metrics could
> still play a negative role.

I'm not sure I really totally understand everything you mean.


> 
> I think I'm just asking for us to be aware of the fact, not to stop
> any attempt to introduce metrics. If they remain relevant to the final
> objective, and we're allowed to break them with enough arguments, it
> should work fine.
> 
> 
>> How to do you suggest we address the long trail of 1-3% slow down that lead to the current situation (cf the two links I posted in my previous email)?
>> Because there *is* a problem here, and I'd really like someone to come up with a solution for that.
> 
> Indeed, we're now slower than GCC, and that's a place that looked
> impossible two years ago. But I doubt reverting a few patches will
> help. For this problem, we'll need a task force to hunt for all the
> dragons, and surgically alter them, since at this time, all relevant
> patches are too far in the past.

Obviously, my immediate concern is "what tools and process to make sure it does not get worse", and starting with "community awareness" is not bad. Improving and recovering from the current state is valuable, but orthogonal to what I'm trying to achieve.
Another things is the complain from multiple people that are trying to JIT using LLVM, we know LLVM is not designed in a way that helps with latency and memory consumption, but getting worse is not nice.

> For the future, emailing on compile time regressions (as well as run
> time) is a good thing to have and I vouch for it. But I don't want
> that to become a tool that will increase stress in the community.

Sure, I'm glad you step up to make sure it does not happen. So please continue to voice up in the future as we try to roll thing. 
I hope we're on the same track past the initial misunderstanding we had each other?

What I'd really like is to have a consensus on the goal to pursue (knowing to not be alone to care about compile time is a great start!), so that the tooling can be set up to serve this goal the best way possible (and decreasing stress instead of increasing it).

Best,

-- 
Mehdi


> 
> 
>> Not sure why or what you mean? The fact that an optimization improves only some target does not invalidate the point.
> 
> Sorry, I seem to have misinterpreted your point.
> 
> The fallacy is about the measurement of "benefit" versus the
> regression "effect". The former is very hard to measure, while the
> latter is very precise. Comparisons with radically different standard
> deviations can easily fall into "undefined behaviour" land, and be
> seed for rage threads.
> 
> 
>> I talking about chasing and tracking every single commit were a developer would regress compile time *without even being aware*.
> 
> That's a goal worth pursuing, regardless of the patch's benefit, I
> agree wholeheartedly. And for that, I'm very grateful of the work you
> guys are doing.
> 
> cheers,
> --renato



More information about the llvm-dev mailing list