[LLVMdev] Enabling the vectorizer for -Os

Wed Jun 5 05:32:07 PDT 2013

On 5 June 2013 11:59, David Tweed <david.tweed at arm.com> wrote:

(I've very rarely had O3 optimzation, rather than some program specific
subset of the options, acheive any non-noise-level speed-up over O2  with
gcc/g++.)

[snip]

Ø  "We find that, while -O2 has a significant impact relative to -O1, the
performance impact of -O3 over -O2 optimizations is indistinguishable from
random noise."

That's something I remember well, but there's an obvious question lurking in
there: is this because the transformations that apply at O3, while they
count as "aggressive", not actually ever transforms to faster code or are
they things which are capable of optimizing when used in the right places
_but we don't do well at deciding where that is_? I don't have any actual
evidence, but I'm inclined towards thinking it's more likely to be the
second (and occasionally having looked at gcc assembly it can be seen to
have done things like loop unrolling in the most unlikely to be profitable
places). So to simplify a lot the difference between O2 and O3 (at least on
gcc) might well be the difference between "guaranteed wins only" and "add
some transforms that we don't predict the optimization effects of well". At
least from some mailing lists I've read other people share that view of the
optimization flags in practice, not aggressiveness or stability. Maybe they
shouldn't have this "interpretation" in LLVM/clang; I'm just pointing out
what some people might expect from previous experience.

Under that view, if the LLVM vectorizer was well enough understood I would
think it would be good to include at O2. However, I suspect that the effects
from having effectively two versions of each loop around are probably
conflicting enough that it's a better decision to make O3 be the level at
which it is blanket enabled.

Ø  My view of O3 is that it *only* regards how aggressive you want to
optimize your code. Some special cases are proven to run faster on O3,
mostly benchmarks improvements that feed compiler engineers, and on those
grounds, O3 can be noticeable if you're allowed to be more aggressive than
usual. This is why I mentioned FP-safety, undefined behaviour,
vectorization, etc.

Again, I can see this as a logical position, I've just never actually
encountered differences in FP-safety or undefined behaviour between O2 and
O3. Likewise I haven't really seen any instability or undefined behaviour
from the vectorizer. (Sorry if I'm sounding a bit pendantic; I've been doing
a lot of performance testing/exploration recently so I've been knee deep in
the difference between "I'm sure it must be the case that..." expectations
and what experimentation reveals is actually happening.)

Ø  I don't expect O3 results to be faster than O2 results on average, but on
specific cases where you know that the potential disaster is acceptable,
should be fine to assume O3. Most people, though, use O3 (or O9!) in the
expectancy that this will be always better. It not being worse than O2
doesn't help, either. ;)

Again, my experience is that I haven't seen any "semantic" disasters from
O3, just that it mostly it doesn't help much, sometimes speeds execution up
relative to O2, sometimes slows execution down relative to O2 and certainly
increases compile time. It sounds like you've had a wilder ride than me and
seen more cases where O3 has actually changed observable behaviour.

Ø  I don't think it's *wrong* to put aut-vec on O2, I just think it's not
its place to be, that's all. The potential to change results are there.

This is what I'd like to know about: what specific potential to change
results have you seen in the vectorizer?

Cheers,

Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130605/06c085ef/attachment.html>