[LLVMdev] [PROPOSAL] per-function optimization level control

Wed Apr 24 06:00:28 PDT 2013

Hello,

We've had a high priority feature request from a number of our customers 
to
provide per-function optimization in our Clang/LLVM compiler. 
I would be interested in working with the community to implement this.
The idea is to allow the optimization level to be overridden
for specific functions.

The rest of this proposal is organized as follows:
 - Section 1. describes this new feature and explains why and when
   per-function optimization options are useful;
 - Sections 2. and 3. describe how the optimizer could be adapted/changed 
   to allow the definition of per-function optimizations;
 - Section 4. tries to outline a possible workflow for implementing this
   new feature.

I am looking for any feedback or suggestions etc.

Thanks!
Andrea Di Biagio
SN Systems Ltd.
http://www.snsys.com

1. Description
==============
The idea is to add pragmas to control the optimization level on functions.

A similar approach has been implemented by GCC as well.
Since GCC 4.4, new function specific option pragmas have been added to 
allow
users to set the optimization level on a per function basis.

http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html
describes the pragmas as
  #pragma GCC optimize ("string")
  #pragma GCC push_options
  #pragma GCC pop_options
  #pragma GCC reset_options

Instead of imitating GCC's syntax, I think it would be better to use a 
syntax 
consistent with existing pragma clang diagnostics:

  #pragma clang optimize push
  #pragma clang optimize "string"
  #pragma clang optimize pop

Each directive would have its own stack, which in my opinion keeps 
everything 
more modular and simpler to implement.

#pragma clang optimize push
#pragma clang optimize pop
  A "optimize push" will temporary push the current set of optimization
  options while a "optimize pop" could be used to pop back to the 
  previous set optimization options.

#pragma clang optimize "string"
  This pragma allows to override the optimization level on
  functions defined later in the source code. Argument "string" is a 
string
  that begins with 'O' and it is assumed to be an optimization level 
(examples: 
  "O0" for optimization level 0; "O1" for optimization level 1).
  In the future we may also extend the set of accepted strings in input to 

  allow other codegen options to be overridden for specific functions.

Example:

////
#pragma clang optimize push
#pragma clang optimize "O0"
void f1() { ... }
#pragma clang optimize push
#pragma clang optimize "O2"
void f2() { ... }
void f3() { ... }
#pragma clang optimize pop
void f4() { ... }
#pragma clang optimize pop
////

Optimization level for f1 and f4 is -O0.
Optimization level for f2 and f3 is -O2.

1.1 Why it is useful to define per-function optimization levels
===============================================================
The main motivation of our customers is to be able to selectively disable
optimizations when debugging one function in a compilation unit, in the 
case 
where compiling the whole unit at -O0 would make the program run too 
slowly.

Being able to set the optimization level on a per function basis can also 
help in those cases where we know that there is a problem in an 
optimization 
but for some reasons either
 a) we don't know which optimization is performing the wrong 
    transformation or
 b) we know the problematic Pass, however there is not an easy way
    to workaround the problem and fixing it would take too much time or
 c) there is an unknown error in the code being compiled that only causes
    problems when optimized (example: the code breaks strict aliasing).

If we know that the bug only affects few functions in the code, we could 
think of disabling optimizations for those functions only. This would 
allow us
to provide quick workarounds to customers encountering optimization bugs.

2. CHANGES REQUIRED IN clang
============================
Clang must be able to parse the new "pragma clang optimize".
The idea is that optimization levels would be codified as IR attributes to
functions.

A discussion on how to codify the optimization levels in LLVM was 
originally 
started by Chandler here: 
lists.cs.uiuc.edu/pipermail/llvmdev/2013-January/058112.html

3. CHANGES REQUIRED IN LLVM
===========================
The global optimization level strongly affects how Passes are added 
to PassManagers.

Example:
When the global optimization level is -O0,
method PassManagerBuilder::populateModulePassManager
[in lib/Transforms/IPO/PassManagerBuilder.cpp] populates the per-module
pass manager with the following passes:
 - AlwaysInliner (if inlining is not disabled)
 - extra Passes which may have been registered as extensions 
   "to be enabled at optimization level 0".

With an optimization level bigger than zero however
several analysis and transform passes are potentially added to
the "per-module" pass manager.

The major problem with this approach is that both the optimizer 
and the backend work under the assumption that the set of codegen options
is the same for all modules and functions.
This also means that the sequence of passes to run is fixed at each 
optimization
level and cannot be dynamically changed or adapted. If a FunctionPass is 
scheduled for running then it will be always run on all functions in the 
code 
(i.e. there is no way to control which passes to run on a per-function 
basis).

One solution to allow the definition of optimization levels on a 
per-function 
basis is to implement a "common" pipeline of passes for all optimization 
levels.

Rather than statically composing the sequence of passes to run, we 
could instead teach pass managers how to dynamically select which passes 
to run
based on the knowledge of pass constraints.

A pass constraint could be used to specify at which optimization levels it 
is
safe to run the pass. Constraints on passes could be made available for 
example
through the global PassRegistry, in which case the pass managers would 
then be
able to query the registry to obtain the constraints.

In conclusion, we could teach PassManagers how to retrieve constraints on
passes and which passes to run taking into account both:
 - the information stored on Pass Constraints and 
 - the optimization level associated to single functions (if available);

3.1 How pass constraints can be used to select passes to run
------------------------------------------------------------
A pass with no constraints can always be run at any optimization level.

A Pass P is run by a PassManager if and only if its constraints match the 
"effective" optimization level (see below the definition of effective
optimization level).

By default the effective optimization level for all passes is equal
to the global optimization level (i.e. the command line based 
optimization level).

The effective optimization level for a Pass running on a function F
(or a basic block BB) is the optimization level overridden by F 
(or by the function containing BB). If F does not specify any optimization 
level
then the effective optimization level is set equal to the 
global optimization level.

It is the responsibility of the pass manager to check the effective 
optimization 
level for all passes with a registered set of constraints.

Example:
--------

The following sequence of passes are given: A,B,C,D,E.
Pass constraints are:
  1. A is only run at OptLevel == 0
  2. B is only run at OptLevel > 0
  3. D is only run at OptLevel > 1

Given the following scenario where:
 - the global optimization level is set equal to 2 and
 - there are two IR functions, namely Fun1 and Fun2, where: 
   * Fun1 does not override the default optimization level;
   * Fun2 overrides the optimization level to -O0;
   * Fun3 overrides the optimization level to -O1.

The table below describes the relationship between functions and 
passes that are expected to be run on them.
Boxes with an 'X' in them represent the pass being allowed to run on the
function.

        \  A   B   C   D   E
         +---+---+---+---+---+
 Fun1    |   | X | X | X | X |
         +---+---+---+---+---+
 Fun2    | X |   | X |   | X |
         +---+---+---+---+---+
 Fun3    |   | X | X |   | X |
         +---+---+---+---+---+

In the case of Fun1, the effective optimization level is equal
to the global optimization level (i.e. 2). Therefore
the PassManager will skip pass A and run passes B,C,D,E on it.

In the case of Fun2, the effective optimization level is
set equal to 0 since Fun2 overrides it.
The Pass Manager will therefore run Passes A,C,E on it.

In the case of Fun3, the PassManager will run B,C,E.

3.2 How to deal with size levels
--------------------------------
By default, clang sets the optimization level to 2 when either option 
"-Os" or 
"-Oz" is specified. See for example in clang how function 
`getOptimizationLevel'
is implemented (in File lib/Frontend/CompilerInvocation.cpp).
This is also true for the 'opt' tool but not for bugpoint which
only accepts options -O1, -O2, -O3 to control the optimization level.

In addition to "-Os" and "-Oz" clang also accepts option "-O".
By default "-O" has the effect of setting the optimization level to 2.

Internally, clang differentiates between optimization level and "size 
level".
Option "-Os" has the effect of setting the SizeLevel to 1, while option 
"-Oz"
has the effect of setting the SizeLevel to 2.

Pass Constraints should allow the definition of constraints on both 
the optimization level and the size level.

The effective optimization level described in 3.1 used by the pass 
managers must
take into account both the optimization and the size level.

3.3 How Pass Constraints could be implemented
---------------------------------------------
Constraints on the optimization level could be implement as pairs of 
values of 
the form of (minOptLevel,maxOptLevel), where:
 - minOptLevel is the minimum allowed optimization level;
 - maxOptLevel is the maximum allowed optimization level.

Similarly, constraints on the size level could be implemented as pairs of 
values
of the form (minSizeLevel,maxSizeLevel).

Examples:
A Pass with optimization constraints (0,0) is a Pass that can only be run 
at -O0
while a Pass with optimization constraints (1,MAXOPTLEVEL) is a Pass that 
can 
only be run at optimization level >=1.

More than one set of constraints can be registered for each pass.
Example, a Pass with optimization constraints (2,2) and size constraints 
(1,1)
is a Pass that can only be run at -Os (since "-Os" sets respectively 
the optimization level to 2 and the size level to 1).

3.4 About the inlining strategy
-------------------------------
At the current state there are two strategies available in LLVM 
for function inlining:
  1) Inline Always (by default only used at -O0 and -O1);
  2) Inline Simple (OptLevel >= 2).

The Inline Always strategy can be used in place of the Inline Simple
if specifically requested by the user.

The constructor of SimpleInliner (see 
"lib/Transform/IPO/InlineSimple.cpp")
requires that we pass a Threshold value as an argument to the constructor. 

In general, the threshold would be set by the front-end (it could
be either clang or bugpoint or opt etc.) according to both the OptLevel 
and
the SizeLevel.

In order to support per-function optimizations, we should modify the
existing SimpleInliner to allow adapting the Threshold dynamically based
on changes in the effective optimization level.

As a future develelopment, we might allow setting the inlining threshold
using the optimize pragma.

3.5 Backend changes
-------------------
Code generator passes would benefit from the same changes described in 
Section 3. A MachineFunctionPass is also a FunctionPass, which means that 
it
should always be possible to specify optimization constraints for it.

Class TargetPassConfig (see "include/CodeGen/Passes.h") provides several 
methods
for populating the pass manager with common CodeGen passes. 
It is the responsibility of each target to override the default behavior 
for 
some of the methods exposed by the TargetPassConfig interface.

Unfortunately changing how code generator passes are added to pass 
managers
require that we potentially make changes on target specific parts of the 
backend.

Examples:
  file "Target/X86/X86TargetMachine.cpp";
  file "Target/Sparc/SparcTargetMachine.cpp";
  file "Target/PowerPC/PPCTargetMachine.cpp" etc.

In general, changes are required in every place in the backend where 
decisions 
are made based on the optimization level.
More specifically, changes are required in the following components:
  1. Instruction Selector:
    -- Use the effective optimization level to decide whether FastISel 
       should be enable/disable;
  2. Register Allocator:
    -- Select the register allocation strategy based on the effective 
       optimization level;
  3. CodeGen Passes whose behavior is affected by the global optimization 
Level:
      -- TwoAddressInstructionPass
         (lib/CodeGen/TwoAddressInstructionPass.cpp)
      -- PostRASchedulerList
         (lib/CodeGen/PostRASchedulerList.cpp)

4. Proposed Implementation Workflow
===================================
The proposed work is:
 1. Add support for modeling constraints on Passes:
  - The idea is to support constraints on optimization levels.
    In future we could think of adding support for constraints on other
    codegen options using the same framework;
 2. Add support for registering constraints on passes into the 
PassRegistry;
 3. Teach Pass Managers how to identify passes which are safe to be run;
 4. Adapt the existing SimpleInliner Algorithm (or add a new algorithm);
 5. Teach both the optimizer and backend how to register constraints on 
passes;
 6. Define (or use the existing) IR attributes to decorate functions with 
    optimization levels.
 7. Teach Clang how to parse the new #pragma optimize and also how
    to emit IR attributes for controlling the optimization level on 
functions.

**********************************************************************
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. 
If you have received this email in error please notify postmaster at scee.net
This footnote also confirms that this email message has been checked for 
all known viruses.
Sony Computer Entertainment Europe Limited
Registered Office: 10 Great Marlborough Street, London W1F 7LP, United 
Kingdom
Registered in England: 3277793
**********************************************************************

P Please consider the environment before printing this e-mail