submit [PATCH] SimplifyCFG for code review

Fri Jun 21 11:52:36 PDT 2013

Hi Renato

Thanks for sharing this story and your experience.  I certainly don't want to repeat this pain.
(As a side note,   turning vectorization on by default is not desirable for AMD GPU).

I am still thinking about the code sharing part.  Adding a new pass could be expensive since it might require re-computations of dependency information, re-visit of the whole function and re-iterate to exhaust all opportunities.  One pass for several related opts may be justified, but one pass for one opt is hardly convincing.
I wonder whether each opt can be singled out as an add-on, and each target initialization can configure which set of add-ons to turn on.  In another word, coding is shared but invocation is not.

-Mei

From: Renato Golin [mailto:renato.golin at linaro.org]
Sent: Friday, June 21, 2013 11:09 AM
To: Ye, Mei
Cc: Evan Cheng; llvm-commits at cs.uiuc.edu
Subject: Re: submit [PATCH] SimplifyCFG for code review

On 21 June 2013 18:40, Ye, Mei <Mei.Ye at amd.com<mailto:Mei.Ye at amd.com>> wrote:
What I take from this discussion is that optimization works should be pushed down to code gen whenever possible, which is fine with me. But it will be good to know that this is common guideline in this community.

Ok, lets ignore completely your patch, then, and start a new discussion: when good is good enough when it comes to benchmarks?

So far, I haven't seen any canonical way of benchmarking things, but I can give you some examples.

Big Changes

Since December, the vectorizer has been worked from scratch to produce some amazing results. In the beginning, there were flags (-vectorize, -bb-vectorize, -slp-vectorize, etc) and they were not enabled by default under any optimization level. People started using, we turned on on our bots, tests and benchmarks and most people reported positive results with very little regressions.

Regressions were fixed and Nadav announced that he wanted to turn it on by default on -O3, just like GCC. People (including me) reported their findings, have done some more benchmarking and we (the community) concluded (via the mailing list) that it was, indeed, a good idea.

A few months later, and a lot more hard work from the guys, Nadav comes again and say: "Folks, we ran some benchmarks here, and I haven't found any significant regression of the vectorizer on -O2 and -Os, should we enable it by default?". That sparked even more controversy, folks (including me) run more benchmarks on their own platforms, programs, and problem sets, and the report was similar: the vectorizer rocks, even at -Os! Go for it!

And today (llvm 3.4svn, not 3.3), the vectorizer is enabled by default on -Os, -O2 and -O3.

So, it involved the large community, running very different benchmarks on very different hardware/software/problem-set configurations and only when everybody was happy, we made it default.

Your patch would jump over all that validation and there is no way you could run the benchmarks that we run because you don't have access to specialized hardware, or proprietary software or private data, that people in the community have, and care about. Even if you could buy all hardware on the planet, your evaluation (and mine too, for that matter), by itself, wouldn't be "good enough".

Small Changes

There are other ways in which the community does that, for instance, what to twinkle in the vectorizer. Once the vectorizer is done and enabled, engineers will run small benchmarks and will tweak parameters and sometimes that can hurt performance on private code. But, since the vectorizer is default already and those engineers are testing what they can, this then becomes a performance regression, a bug is filled and either fixed or reverted. The author + the affected parties will work together to fix it in private and upstream a better fix.

Why can't we do that with your patch, you may ask. Well, new features are more likely to affect a larger number of people than a small change on an already existing default feature, and there's no way you will be able to deal with all regression reports at the same time (imagine AMD fixing bugs on MIPS, ARM, SystemZ and Intel all at the same time because of a commit on a GPU optimization). In that case, what will happen is that your patch will be reverted and you'll have to start from scratch, or create an optional flag (that you should have done in the beginning).

When a compiler works on a single architecture and belongs to a single company, these kinds of instability are acceptable if the gain is big, and it seems to be in your case. But I learnt from experience, not many people care about 50% improvement in your own core as much as you do, and for them, the instability generated is at least a nuisance, and if it keeps repeating itself, people leave. We don't want that.

In a nutshell, there isn't a hammer in the world that will help you get your patch in. You'll need small, sharp tools and insert it with surgical precision. I suggest you follow up with Evan, he knows what he's talking about.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130621/01c94a2f/attachment.html>