[llvm-dev] Proposal for O1/Og Optimization and Code Generation Pipeline

Greg Bedwell via llvm-dev llvm-dev at lists.llvm.org
Fri Mar 29 05:25:12 PDT 2019


Thanks for posting this.  I'm absolutely of the opinion that current -O1 is
almost a "worst of all worlds" optimization level, where the performance of
the generated code isn't good enough to be particularly useful (for our
users at least) but the debug experience is already getting close to being
as bad as -O2/3, so I'm personally very happy with your direction of
redefining -O1 (especially as that could then open up the way to future
enhancements like using PGO data to let us compile everything at -O1 for
the build time performance win, except for the critical hot functions that
get the full -O2/3 pipeline for the run time performance win).

How will this optimization level interact with LTO (specifically ThinLTO)?
Would -O1 -flto=thin to run through a different, faster LTO pipeline or are
we expecting that any everyday development build configuration won't
include LTO?

I'm a little bit more on the fence with what this would mean for -Og, as
I'd really like to try and come to some sort of community consensus on
exactly what -Og should mean and what its aims should be.  If you happen to
be at EuroLLVM this year then that would be absolutely perfect timing as
I'd already submitted a round table topic to try and start just that
process [ http://llvm.org/devmtg/2019-04/#rounds ].  My team's main focus
right now is in trying to fix as many -O2 debug experience issues as
possible, with the hope that we could consider using an -Og mode to mop up
what's left, but we've been surveying our users for a few years now about
what they'd find useful in such an optimization level.

The general consensus is that performance must not be significantly worse
than -O2.  We've heard a few numbers thrown around like 5-10% runtime
slowdown compared to -O2 being the absolute maximum acceptable level of
intrusion for them to consider using such a mode.  I'm not really sure how
realistic that is and I'm inclined to think that we could probably stretch
that limit a little bit here and there if the debugging experience really
was that much better, but I think it gives a good indication of at least
what our users are looking for.  Essentially -O2 but with as few changes as
we can get away with making to make the debugging experience better.  I
know that this is somewhat woolly, so it might be that your proposed
pipeline is the closest we can get that matches such an aim, but once we've
decided what -Og should mean, I'd like to try and justify any changes with
some real data.  I'm willing for my team to contribute as much data as we
can.  We've also been using dexter [
http://llvm.org/devmtg/2018-04/slides/Bedwell-Measuring_the_User_Debugging_Experience.pdf
]
to target our -O2 debugging improvement work, but hopefully it will be
useful to provide another datapoint for the effects on the debugging
experience of disabling specific passes.

In my mind, -Og probably would incorporate a few things:
* Tweak certain pass behaviors in order to be more favorable towards
debugging [ https://reviews.llvm.org/D59431#1437716 ]
* Enable features favorable to debugging [
http://llvm.org/devmtg/2017-10/#lightning8 ]
* Disable whole passes that are known to fundamentally harm the debugging
experience if there is no other alternative approach (this proposal?)
* Still give a decent debug experience when used in conjunction with LTO.

Thanks again for writing up your proposal.  I'm really happy to see
movement in this area!

-Greg



On Fri, 29 Mar 2019 at 02:09, Eric Christopher via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi All,
>
> I’ve been thinking about both O1 and Og optimization levels and have a
> proposal for an improved O1 that I think overlaps in functionality
> with our desires for Og. The design goal is to rewrite the O1
> optimization and code generation pipeline to include the set of
> optimizations that minimizes build and test time while retaining our
> ability to debug.
>
> This isn’t to minimize efforts around optimized debugging or negate O0
> builds, but rather to provide a compromise mode that encompasses some
> of the benefits of both. In effect to create a “build mode for
> everyday development”.
>
> This proposal is a first approximation guess on direction. I’ll be
> exploring different options and combinations, but I think this is a
> good place to start for discussion. Unless there are serious
> objections to the general direction I’d like to get started so we can
> explore and look at the code as it comes through review.
>
>
> Optimization and Code Generation Pipeline
>
> The optimization passes chosen fall into a few main categories,
> redundancy elimination and basic optimization/abstraction elimination.
> The idea is that these are going to be the optimizations that a
> programmer would expect to happen without affecting debugging. This
> means not eliminating redundant calls or non-redundant loads as those
> could fail in different ways and locations while executing.  These
> optimizations will also reduce the overall amount of code going to the
> code generator helping both linker input size and code generation
> speed.
>
> Dead code elimination
>
>  - Dead code elimination (ADCE, BDCE)
>  - Dead store elimination
>  - Parts of CFG Simplification
>  - Removing branches and dead code paths and not including commoning
> and speculation
>
> Basic Scalar Optimizations
>
>  - Constant propagation including SCCP and IPCP
>  - Constant merging
>  - Instruction Combining
>  - Inlining: always_inline and normal inlining passes
>  - Memory to register promotion
>  - CSE of “unobservable” operations
>  - Reassociation of expressions
>  - Global optimizations - try to fold globals to constants
>
> Loop Optimizations
>
> Loop optimizations have some problems around debuggability and
> observability, but a suggested set of passes would include
> optimizations that remove abstractions and not ones that necessarily
> optimize for performance.
>
>  - Induction Variable Simplification
>  - LICM but not promotion
>  - Trivial Unswitching
>  - Loop rotation
>  - Full loop unrolling
>  - Loop deletion
>
> Pass Structure
>
> Overall pass ordering will look similar to the existing pass layout in
> llvm with passes added or subtracted for O1 rather than a new pass
> ordering. The motivation here is to make the overall proposal easier
> to understand initially upstream while also maintaining existing pass
> pipeline synergies between passes.
>
> Instruction selection
>
> We will use the fast instruction selector (where it exists) for three
> reasons:
>  - Significantly faster code generation than llvm’s dag based
> instruction selection
>  - Better debugability than selection dag - fewer instructions moved around
>  - Fast instruction selection has been optimized somewhat and
> shouldn’t be an outrageous penalty on most architectures
>
> Register allocation
>
> The fast register allocator should be used for compilation speed.
>
> Thoughts?
>
> Thanks!
>
> -eric
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190329/c895e5ea/attachment.html>


More information about the llvm-dev mailing list