[llvm-dev] Proposal for O1/Og Optimization and Code Generation Pipeline

Mon Apr 1 18:47:22 PDT 2019

Hi All,

Looks like people are generally supportive of the changes to O1 even
if there's some discussion around O1/Og equivalence that I'll take to
a separate set of replies. Mostly so that I can get O1 changes
separate from "new Og pipeline" changes even if I think that's the
direction we should ultimately go :)

In general though I wanted to give a heads up as to the direction I
was planning on taking things and seeing if anyone necessarily
disagreed with the general idea. I'll plan on working up a set of pass
pipeline patches (after some cleanup) and send out some reviews -
please let me know (privately) if you'd like to be explicitly cc'd on
any review otherwise I'll probably pick a set of people out of the cc
line here.

Thanks a lot for the responses and looking forward to getting some
changes going in this direction!

-eric

On Thu, Mar 28, 2019 at 7:09 PM Eric Christopher <echristo at gmail.com> wrote:
>
> Hi All,
>
> I’ve been thinking about both O1 and Og optimization levels and have a
> proposal for an improved O1 that I think overlaps in functionality
> with our desires for Og. The design goal is to rewrite the O1
> optimization and code generation pipeline to include the set of
> optimizations that minimizes build and test time while retaining our
> ability to debug.
>
> This isn’t to minimize efforts around optimized debugging or negate O0
> builds, but rather to provide a compromise mode that encompasses some
> of the benefits of both. In effect to create a “build mode for
> everyday development”.
>
> This proposal is a first approximation guess on direction. I’ll be
> exploring different options and combinations, but I think this is a
> good place to start for discussion. Unless there are serious
> objections to the general direction I’d like to get started so we can
> explore and look at the code as it comes through review.
>
>
> Optimization and Code Generation Pipeline
>
> The optimization passes chosen fall into a few main categories,
> redundancy elimination and basic optimization/abstraction elimination.
> The idea is that these are going to be the optimizations that a
> programmer would expect to happen without affecting debugging. This
> means not eliminating redundant calls or non-redundant loads as those
> could fail in different ways and locations while executing.  These
> optimizations will also reduce the overall amount of code going to the
> code generator helping both linker input size and code generation
> speed.
>
> Dead code elimination
>
>  - Dead code elimination (ADCE, BDCE)
>  - Dead store elimination
>  - Parts of CFG Simplification
>  - Removing branches and dead code paths and not including commoning
> and speculation
>
> Basic Scalar Optimizations
>
>  - Constant propagation including SCCP and IPCP
>  - Constant merging
>  - Instruction Combining
>  - Inlining: always_inline and normal inlining passes
>  - Memory to register promotion
>  - CSE of “unobservable” operations
>  - Reassociation of expressions
>  - Global optimizations - try to fold globals to constants
>
> Loop Optimizations
>
> Loop optimizations have some problems around debuggability and
> observability, but a suggested set of passes would include
> optimizations that remove abstractions and not ones that necessarily
> optimize for performance.
>
>  - Induction Variable Simplification
>  - LICM but not promotion
>  - Trivial Unswitching
>  - Loop rotation
>  - Full loop unrolling
>  - Loop deletion
>
> Pass Structure
>
> Overall pass ordering will look similar to the existing pass layout in
> llvm with passes added or subtracted for O1 rather than a new pass
> ordering. The motivation here is to make the overall proposal easier
> to understand initially upstream while also maintaining existing pass
> pipeline synergies between passes.
>
> Instruction selection
>
> We will use the fast instruction selector (where it exists) for three reasons:
>  - Significantly faster code generation than llvm’s dag based
> instruction selection
>  - Better debugability than selection dag - fewer instructions moved around
>  - Fast instruction selection has been optimized somewhat and
> shouldn’t be an outrageous penalty on most architectures
>
> Register allocation
>
> The fast register allocator should be used for compilation speed.
>
> Thoughts?
>
> Thanks!
>
> -eric