[cfe-dev] [PROPOSAL] per-function optimization level control
Andrea_DiBiagio at sn.scee.net
Andrea_DiBiagio at sn.scee.net
Wed Apr 24 06:00:28 PDT 2013
Hello,
We've had a high priority feature request from a number of our customers
to
provide per-function optimization in our Clang/LLVM compiler.
I would be interested in working with the community to implement this.
The idea is to allow the optimization level to be overridden
for specific functions.
The rest of this proposal is organized as follows:
- Section 1. describes this new feature and explains why and when
per-function optimization options are useful;
- Sections 2. and 3. describe how the optimizer could be adapted/changed
to allow the definition of per-function optimizations;
- Section 4. tries to outline a possible workflow for implementing this
new feature.
I am looking for any feedback or suggestions etc.
Thanks!
Andrea Di Biagio
SN Systems Ltd.
http://www.snsys.com
1. Description
==============
The idea is to add pragmas to control the optimization level on functions.
A similar approach has been implemented by GCC as well.
Since GCC 4.4, new function specific option pragmas have been added to
allow
users to set the optimization level on a per function basis.
http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html
describes the pragmas as
#pragma GCC optimize ("string")
#pragma GCC push_options
#pragma GCC pop_options
#pragma GCC reset_options
Instead of imitating GCC's syntax, I think it would be better to use a
syntax
consistent with existing pragma clang diagnostics:
#pragma clang optimize push
#pragma clang optimize "string"
#pragma clang optimize pop
Each directive would have its own stack, which in my opinion keeps
everything
more modular and simpler to implement.
#pragma clang optimize push
#pragma clang optimize pop
A "optimize push" will temporary push the current set of optimization
options while a "optimize pop" could be used to pop back to the
previous set optimization options.
#pragma clang optimize "string"
This pragma allows to override the optimization level on
functions defined later in the source code. Argument "string" is a
string
that begins with 'O' and it is assumed to be an optimization level
(examples:
"O0" for optimization level 0; "O1" for optimization level 1).
In the future we may also extend the set of accepted strings in input to
allow other codegen options to be overridden for specific functions.
Example:
////
#pragma clang optimize push
#pragma clang optimize "O0"
void f1() { ... }
#pragma clang optimize push
#pragma clang optimize "O2"
void f2() { ... }
void f3() { ... }
#pragma clang optimize pop
void f4() { ... }
#pragma clang optimize pop
////
Optimization level for f1 and f4 is -O0.
Optimization level for f2 and f3 is -O2.
1.1 Why it is useful to define per-function optimization levels
===============================================================
The main motivation of our customers is to be able to selectively disable
optimizations when debugging one function in a compilation unit, in the
case
where compiling the whole unit at -O0 would make the program run too
slowly.
Being able to set the optimization level on a per function basis can also
help in those cases where we know that there is a problem in an
optimization
but for some reasons either
a) we don't know which optimization is performing the wrong
transformation or
b) we know the problematic Pass, however there is not an easy way
to workaround the problem and fixing it would take too much time or
c) there is an unknown error in the code being compiled that only causes
problems when optimized (example: the code breaks strict aliasing).
If we know that the bug only affects few functions in the code, we could
think of disabling optimizations for those functions only. This would
allow us
to provide quick workarounds to customers encountering optimization bugs.
2. CHANGES REQUIRED IN clang
============================
Clang must be able to parse the new "pragma clang optimize".
The idea is that optimization levels would be codified as IR attributes to
functions.
A discussion on how to codify the optimization levels in LLVM was
originally
started by Chandler here:
lists.cs.uiuc.edu/pipermail/llvmdev/2013-January/058112.html
3. CHANGES REQUIRED IN LLVM
===========================
The global optimization level strongly affects how Passes are added
to PassManagers.
Example:
When the global optimization level is -O0,
method PassManagerBuilder::populateModulePassManager
[in lib/Transforms/IPO/PassManagerBuilder.cpp] populates the per-module
pass manager with the following passes:
- AlwaysInliner (if inlining is not disabled)
- extra Passes which may have been registered as extensions
"to be enabled at optimization level 0".
With an optimization level bigger than zero however
several analysis and transform passes are potentially added to
the "per-module" pass manager.
The major problem with this approach is that both the optimizer
and the backend work under the assumption that the set of codegen options
is the same for all modules and functions.
This also means that the sequence of passes to run is fixed at each
optimization
level and cannot be dynamically changed or adapted. If a FunctionPass is
scheduled for running then it will be always run on all functions in the
code
(i.e. there is no way to control which passes to run on a per-function
basis).
One solution to allow the definition of optimization levels on a
per-function
basis is to implement a "common" pipeline of passes for all optimization
levels.
Rather than statically composing the sequence of passes to run, we
could instead teach pass managers how to dynamically select which passes
to run
based on the knowledge of pass constraints.
A pass constraint could be used to specify at which optimization levels it
is
safe to run the pass. Constraints on passes could be made available for
example
through the global PassRegistry, in which case the pass managers would
then be
able to query the registry to obtain the constraints.
In conclusion, we could teach PassManagers how to retrieve constraints on
passes and which passes to run taking into account both:
- the information stored on Pass Constraints and
- the optimization level associated to single functions (if available);
3.1 How pass constraints can be used to select passes to run
------------------------------------------------------------
A pass with no constraints can always be run at any optimization level.
A Pass P is run by a PassManager if and only if its constraints match the
"effective" optimization level (see below the definition of effective
optimization level).
By default the effective optimization level for all passes is equal
to the global optimization level (i.e. the command line based
optimization level).
The effective optimization level for a Pass running on a function F
(or a basic block BB) is the optimization level overridden by F
(or by the function containing BB). If F does not specify any optimization
level
then the effective optimization level is set equal to the
global optimization level.
It is the responsibility of the pass manager to check the effective
optimization
level for all passes with a registered set of constraints.
Example:
--------
The following sequence of passes are given: A,B,C,D,E.
Pass constraints are:
1. A is only run at OptLevel == 0
2. B is only run at OptLevel > 0
3. D is only run at OptLevel > 1
Given the following scenario where:
- the global optimization level is set equal to 2 and
- there are two IR functions, namely Fun1 and Fun2, where:
* Fun1 does not override the default optimization level;
* Fun2 overrides the optimization level to -O0;
* Fun3 overrides the optimization level to -O1.
The table below describes the relationship between functions and
passes that are expected to be run on them.
Boxes with an 'X' in them represent the pass being allowed to run on the
function.
\ A B C D E
+---+---+---+---+---+
Fun1 | | X | X | X | X |
+---+---+---+---+---+
Fun2 | X | | X | | X |
+---+---+---+---+---+
Fun3 | | X | X | | X |
+---+---+---+---+---+
In the case of Fun1, the effective optimization level is equal
to the global optimization level (i.e. 2). Therefore
the PassManager will skip pass A and run passes B,C,D,E on it.
In the case of Fun2, the effective optimization level is
set equal to 0 since Fun2 overrides it.
The Pass Manager will therefore run Passes A,C,E on it.
In the case of Fun3, the PassManager will run B,C,E.
3.2 How to deal with size levels
--------------------------------
By default, clang sets the optimization level to 2 when either option
"-Os" or
"-Oz" is specified. See for example in clang how function
`getOptimizationLevel'
is implemented (in File lib/Frontend/CompilerInvocation.cpp).
This is also true for the 'opt' tool but not for bugpoint which
only accepts options -O1, -O2, -O3 to control the optimization level.
In addition to "-Os" and "-Oz" clang also accepts option "-O".
By default "-O" has the effect of setting the optimization level to 2.
Internally, clang differentiates between optimization level and "size
level".
Option "-Os" has the effect of setting the SizeLevel to 1, while option
"-Oz"
has the effect of setting the SizeLevel to 2.
Pass Constraints should allow the definition of constraints on both
the optimization level and the size level.
The effective optimization level described in 3.1 used by the pass
managers must
take into account both the optimization and the size level.
3.3 How Pass Constraints could be implemented
---------------------------------------------
Constraints on the optimization level could be implement as pairs of
values of
the form of (minOptLevel,maxOptLevel), where:
- minOptLevel is the minimum allowed optimization level;
- maxOptLevel is the maximum allowed optimization level.
Similarly, constraints on the size level could be implemented as pairs of
values
of the form (minSizeLevel,maxSizeLevel).
Examples:
A Pass with optimization constraints (0,0) is a Pass that can only be run
at -O0
while a Pass with optimization constraints (1,MAXOPTLEVEL) is a Pass that
can
only be run at optimization level >=1.
More than one set of constraints can be registered for each pass.
Example, a Pass with optimization constraints (2,2) and size constraints
(1,1)
is a Pass that can only be run at -Os (since "-Os" sets respectively
the optimization level to 2 and the size level to 1).
3.4 About the inlining strategy
-------------------------------
At the current state there are two strategies available in LLVM
for function inlining:
1) Inline Always (by default only used at -O0 and -O1);
2) Inline Simple (OptLevel >= 2).
The Inline Always strategy can be used in place of the Inline Simple
if specifically requested by the user.
The constructor of SimpleInliner (see
"lib/Transform/IPO/InlineSimple.cpp")
requires that we pass a Threshold value as an argument to the constructor.
In general, the threshold would be set by the front-end (it could
be either clang or bugpoint or opt etc.) according to both the OptLevel
and
the SizeLevel.
In order to support per-function optimizations, we should modify the
existing SimpleInliner to allow adapting the Threshold dynamically based
on changes in the effective optimization level.
As a future develelopment, we might allow setting the inlining threshold
using the optimize pragma.
3.5 Backend changes
-------------------
Code generator passes would benefit from the same changes described in
Section 3. A MachineFunctionPass is also a FunctionPass, which means that
it
should always be possible to specify optimization constraints for it.
Class TargetPassConfig (see "include/CodeGen/Passes.h") provides several
methods
for populating the pass manager with common CodeGen passes.
It is the responsibility of each target to override the default behavior
for
some of the methods exposed by the TargetPassConfig interface.
Unfortunately changing how code generator passes are added to pass
managers
require that we potentially make changes on target specific parts of the
backend.
Examples:
file "Target/X86/X86TargetMachine.cpp";
file "Target/Sparc/SparcTargetMachine.cpp";
file "Target/PowerPC/PPCTargetMachine.cpp" etc.
In general, changes are required in every place in the backend where
decisions
are made based on the optimization level.
More specifically, changes are required in the following components:
1. Instruction Selector:
-- Use the effective optimization level to decide whether FastISel
should be enable/disable;
2. Register Allocator:
-- Select the register allocation strategy based on the effective
optimization level;
3. CodeGen Passes whose behavior is affected by the global optimization
Level:
-- TwoAddressInstructionPass
(lib/CodeGen/TwoAddressInstructionPass.cpp)
-- PostRASchedulerList
(lib/CodeGen/PostRASchedulerList.cpp)
4. Proposed Implementation Workflow
===================================
The proposed work is:
1. Add support for modeling constraints on Passes:
- The idea is to support constraints on optimization levels.
In future we could think of adding support for constraints on other
codegen options using the same framework;
2. Add support for registering constraints on passes into the
PassRegistry;
3. Teach Pass Managers how to identify passes which are safe to be run;
4. Adapt the existing SimpleInliner Algorithm (or add a new algorithm);
5. Teach both the optimizer and backend how to register constraints on
passes;
6. Define (or use the existing) IR attributes to decorate functions with
optimization levels.
7. Teach Clang how to parse the new #pragma optimize and also how
to emit IR attributes for controlling the optimization level on
functions.
**********************************************************************
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify postmaster at scee.net
This footnote also confirms that this email message has been checked for
all known viruses.
Sony Computer Entertainment Europe Limited
Registered Office: 10 Great Marlborough Street, London W1F 7LP, United
Kingdom
Registered in England: 3277793
**********************************************************************
P Please consider the environment before printing this e-mail
More information about the cfe-dev
mailing list