[LLVMdev] Enabling the vectorizer for -Os

David Tweed david.tweed at arm.com
Wed Jun 5 03:59:30 PDT 2013


On 5 June 2013 04:26, Nadav Rotem <nrotem at apple.com> wrote:

I would like to start a discussion about enabling the loop vectorizer by
default for -Os. The loop vectorizer can accelerate many workloads and
enabling it for -Os and -O2 has obvious performance benefits.

 

Hi Nadav,

 

| As it stands, O2 is very similar to O3 with a few, more aggressive,
optimizations running, including the vectorizers. I think this is a good
rationale, at O3, I expect the compiler to throw all it's got at the
problem. O2 is somewhat more conservative, and

| people normally use it when they want more stability of the code and
results (regarding FP, undefined behaviour, etc). I also use it for finding
bugs on the compiler that are introduced by O3, and making them more similar
won't help that either. I'm yet

| to see a good reason to enable the vectorizer by default into O2.

 

Just to note that I think a lot of people used to the switches from gcc may
be coming in with a different "historical expectations". At least recently
(at least past 5 years), O2 has in practice been "optimizations that are
straightforward enough they do achieve speed-ups" while O3 tends to be "more
aggressive optimizations which potentially could cause speed-ups, but don't
understand the context/trade-offs well enough so they often don't result in
a speed-up". (I've very rarely had O3 optimzation, rather than some program
specific subset of the options, acheive any non-noise-level speed-up over O2
with gcc/g++.) I know it's been said that llvm/clang should aim for
"validated" O2/O3 settings that  actually do result in better performance,
but then I imagine so did gcc... From what I've been seeing I haven't been
seeing any instability of code or results from using the vectorizer. (Mind
you, I deliberately try to write code to avoid letting chips with "80-bit
intermediate floating point values" use them precisely because it can make
things more vulnerable to minor compilation changes.)

 

Under that view, if the LLVM vectorizer was well enough understood I would
think it would be good to include at O2. However, I suspect that the effects
from having effectively two versions of each loop around are probably
conflicting enough that it's a better decision to make O3 be the level at
which it is blanket enabled.

 

Cheers,

Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130605/c67a1476/attachment.html>


More information about the llvm-dev mailing list