[LLVMdev] RFC: Codifying (but not formalizing) the optimization levels in LLVM and Clang
chandlerc at gmail.com
Mon Jan 14 01:09:01 PST 2013
This has been an idea floating around in my head for a while and after
several discussions with others it continues to hold up so I thought I
would mail it out. Sorry for cross posting to both lists, but this is an
issue that would significantly impact both LLVM and Clang.
Essentially, LLVM provides canned optimization "levels" for frontends to
re-use. This is nothing new. However, we don't have good names for them, we
don't expose them at the IR level, and we have a hard time figuring out
which optimizations belong in which levels. I'd like to try addressing that
by coming up with names and a description of the basic intend goal of each
level. I would like, if folks are happy with these ideas, to add these
types of descriptions along side these attributes to the langref. Ideas on
other (better?) places to document this would be welcome. Certainly,
Clang's documentation would need to be updated to reflect this.
Hopefully we can minimally debate this until the bikeshed is a tolerable
shade. Note that I'm absolutely biased based on the behavior of Clang and
GCC with these optimization levels, and the relevant history there.
However, I'm adding and deviating from the purely historical differences to
try and better model the latest developments in LLVM's optimizer... So here
1) The easiest: 'Minimize Size' or '-Oz'
- Attribute: minsize (we already have it, nothing to do here)
- Goal: minimize the size of the resulting binary, at (nearly) any cost.
2) Optimize for size or '-Os'
- Attribute: optsize (we already have it, nothing to do here)
- Goal: Optimize the execution of the binary without unreasonably
increasing the binary size.
This one is a bit fuzzy, but usually people don't have a hard time figuring
out where the line is. The primary difference between minsize and optsize
is that with minsize a pass is free to *hurt* performance to shrink the
 The definition of 'unreasonable' is of course subjective, but here is
at least one strong indicator: any code size growth which is inherently
*speculative* (that is, there isn't a known, demonstrable performance
benefit, but rather it is "often" or "maybe" a benefit) is unlikely to be a
good fit in optsize. The canonical example IMO is a vectorizer -- while it
is reasonable to vectorize a loop, if the vector version might not be
executed, and thus the scalar loop remains as well, then it is a poor fit
3) Optimize quickly or '-O1'
- Attribute: quickopt (this would be a new attribute)
- Goal: Perform basic optimizations to improve both performance and
simplicity of the code, but perform them *quickly*.
This level is all about compile time, but in a holistic sense. It tries to
perform basic optimizations to get reasonably efficient code, and get it
4) Good, well-balanced optimizations, or '-O2'
- Attribute: opt (new attribute)
- Goal: produce a well optimized binary trading off compile time, space,
and runtime efficiency.
This should be an excellent default for general purpose programs. The idea
is to do as much optimization as we can, in as reasonable of a time frame,
and with as reasonable code size impact as possible. This level should
always produce binaries at least as fast as optsize, but they might be both
bigger and faster. This level should always produce binaries at least as
fast as quickopt, but they might be both slower to compile.
5) Optimize to the max or '-O3'
- Attribute: maxopt (new attribute)
- Goal: produce the fastest binary possible.
This level has historically been almost exclusively about trading off more
binary size for speed than '-O2', but I would propose we change it to be
more about trading off either binary size or compilation time to achieve a
better performing binary. This level should always produce binaries at
least as fast as opt, but they might be faster at the cost of them being
larger and taking more time to compile. This would in some cases be a
change for LLVM and is definitely a deviation from GCC where O3 will in
many cases produce *slower* binaries due to code size increases that are
not accompanied by corresponding performance increases.
To go with these LLVM attributes I'd like to add support for adding
attributes in Clang, both compatible with GCC and with the names above for
clarity. The goal being to allow a specific function to have its
optimization level overridden from the command line based level.
A final note: I would like to remove all other variations on the '-O' flag.
That includes the really strange '-O4' behavior. Whether the compilation is
LTO should be an orthogonal decision to the particular level of
optimization, and we have -flto to achieve this.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev