[llvm-dev] Proposal for O1/Og Optimization and Code Generation Pipeline

Mon Apr 1 19:56:36 PDT 2019

On Fri, Mar 29, 2019 at 12:41 AM Mehdi AMINI <joker.eph at gmail.com> wrote:
>
> Hi Eric,
>
> On Thu, Mar 28, 2019 at 7:09 PM Eric Christopher via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>
>> Hi All,
>>
>> I’ve been thinking about both O1 and Og optimization levels and have a
>> proposal for an improved O1 that I think overlaps in functionality
>> with our desires for Og. The design goal is to rewrite the O1
>> optimization and code generation pipeline to include the set of
>> optimizations that minimizes build and test time while retaining our
>> ability to debug.
>
>
> That would be nice: how do you distinguish O1 and Og with this view? (which from your list would / wouldn't be included in Og?)
>

As a first pass I'd just make them identical, but there's a lot of
work around statistics collecting (I'll refer you to Greg and Paul's
emails as well) before saying "this is 'done'".

>>
>>
>> This isn’t to minimize efforts around optimized debugging or negate O0
>> builds, but rather to provide a compromise mode that encompasses some
>> of the benefits of both. In effect to create a “build mode for
>> everyday development”.
>>
>> This proposal is a first approximation guess on direction. I’ll be
>> exploring different options and combinations, but I think this is a
>> good place to start for discussion. Unless there are serious
>> objections to the general direction I’d like to get started so we can
>> explore and look at the code as it comes through review.
>>
>>
>> Optimization and Code Generation Pipeline
>>
>> The optimization passes chosen fall into a few main categories,
>> redundancy elimination and basic optimization/abstraction elimination.
>> The idea is that these are going to be the optimizations that a
>> programmer would expect to happen without affecting debugging. This
>> means not eliminating redundant calls or non-redundant loads as those
>> could fail in different ways and locations while executing.  These
>> optimizations will also reduce the overall amount of code going to the
>> code generator helping both linker input size and code generation
>> speed.
>>
>> Dead code elimination
>>
>>  - Dead code elimination (ADCE, BDCE)
>>  - Dead store elimination
>>  - Parts of CFG Simplification
>>  - Removing branches and dead code paths and not including commoning
>> and speculation
>>
>> Basic Scalar Optimizations
>>
>>  - Constant propagation including SCCP and IPCP
>>  - Constant merging
>>  - Instruction Combining
>>  - Inlining: always_inline and normal inlining passes
>>  - Memory to register promotion
>>  - CSE of “unobservable” operations
>>  - Reassociation of expressions
>>  - Global optimizations - try to fold globals to constants
>>
>> Loop Optimizations
>>
>> Loop optimizations have some problems around debuggability and
>> observability, but a suggested set of passes would include
>> optimizations that remove abstractions and not ones that necessarily
>> optimize for performance.
>>
>>  - Induction Variable Simplification
>>  - LICM but not promotion
>>  - Trivial Unswitching
>>  - Loop rotation
>>  - Full loop unrolling
>>  - Loop deletion
>
>
> That is already a pretty good list. I would find interesting if we know the opposite list: the passes that we should not include for speed and debugaibility? Vectorizer? Unrolling? Jump Threading?

Jump threading for sure :)

Vectorization and the passes to get us there as well as unrolling are
likely to hurt debugging by changing induction variables and overall
stepping behavior. It's -sometimes- possible to recover, but it's
pretty difficult. Things like GVN and aggressive CSE/commoning can
really hurt in that you don't know how you got into a section of code
and from where.

> Also couldn't constant propagation and reassociation which are in your list hurt debugability?
>

The former can affect which variables exist, and there's a little bit
of conversation in Paul's email around how you measure or deal with
it. Reassociation can affect how things move around a bit, but IMO
it's not too terrible. Measuring will help of course :)

Thanks!

-eric

> Thanks!
>
> --
> Mehdi
>
>
>>
>>
>> Pass Structure
>>
>> Overall pass ordering will look similar to the existing pass layout in
>> llvm with passes added or subtracted for O1 rather than a new pass
>> ordering. The motivation here is to make the overall proposal easier
>> to understand initially upstream while also maintaining existing pass
>> pipeline synergies between passes.
>>
>> Instruction selection
>>
>> We will use the fast instruction selector (where it exists) for three reasons:
>>  - Significantly faster code generation than llvm’s dag based
>> instruction selection
>>  - Better debugability than selection dag - fewer instructions moved around
>>  - Fast instruction selection has been optimized somewhat and
>> shouldn’t be an outrageous penalty on most architectures
>>
>> Register allocation
>>
>> The fast register allocator should be used for compilation speed.
>>
>> Thoughts?
>>
>> Thanks!
>>
>> -eric
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev