[llvm-dev] Proposal for O1/Og Optimization and Code Generation Pipeline

Fri Mar 29 11:12:22 PDT 2019

Nice to have metrics - so thanks for mentioning that, even if it doesn't
end up being suitable, it's certainly worth looking at.

Did you do anything similar for the values of variables? I could imagine
"printing the value of a variable" (not necessarily being able to modify
it) at all those locations should render the same value (not undefined).

& to me, that's actually where I would've guessed -Og (which might be a
better discussion for a separate thread, to be honest - as much as it was
brought up in the subject of this thread) would diverge from -O1. Doing
things like "leaking the value of any variable at the end of its scope" to
avoid dead store/unused value elimination ("oh, we saw the last use of this
variable half way through the function, so we reused its register for
something else later on") - and that's a case where that behavior can't
really (that I can think of) be justified to be unconditional at -O1
(because it pessimizes the code in a way that /only/ gives improvements to
a debugger, really) - though I'm happy to be wrong/hear other opinions on
that.

So my model is more "-Og would be an even more pessimized -O1" (or
potentially -Og isn't really an optimization level, but an orthogonal
setting to optimization that does things like actively pessimize certain
features to make them more debuggable somewhat independently of what
optimizations are used - sort of like the sanitizers) but perhaps that's
inconsistent with what other folks have in mind.

- Dave

On Fri, Mar 29, 2019 at 6:41 AM via llvm-dev <llvm-dev at lists.llvm.org>
wrote:

> Awesome start.
>
>
>
> Back when I did a similar project at HP/NonStop, the class of
> optimizations we turned off for our O1 (Og equivalent) tended to be those
> that reordered code or otherwise messed with the CFG.  In fact one of our
> metrics was:
>
> -        The set of breakpoint locations available at Og should be the
> same as those available at O0.
>
> This is pretty easy to measure. It can mean either turning off
> optimizations or doing a better job with the line table; either way you get
> the preferred user experience. Not saying *Clang* has to use the "must be
> the same" criterion, but being able to measure this will be extremely
> helpful.  Comparing the metric with/without a given pass will give us a
> good idea of how much that pass damages the single-stepping experience, and
> gives us hard data to decide whether certain passes should stay or go.
>
>
>
> I don't remember whether HP/NonStop turned off constant/value propagation,
> but I *think* we did, because that can have a really bad effect on
> availability of variables.  Now, if we're more industrious about generating
> DIExpressions to recover values that get optimized away, that's probably
> good enough, as usually you want to be looking at things and not so much
> modifying things during a debugging session.
>
>
>
> As for Sony's users in particular, working in a real-time environment does
> constrain how much performance we can give away for other benefits like
> good debugging.  I think we'll have to see how that falls out.
>
>
>
> --paulr
>
>
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Greg
> Bedwell via llvm-dev
> *Sent:* Friday, March 29, 2019 8:25 AM
> *To:* Eric Christopher
> *Cc:* llvm-dev; Ahmed Bougacha; Petr Hosek
> *Subject:* Re: [llvm-dev] Proposal for O1/Og Optimization and Code
> Generation Pipeline
>
>
>
> Thanks for posting this.  I'm absolutely of the opinion that current -O1
> is almost a "worst of all worlds" optimization level, where the performance
> of the generated code isn't good enough to be particularly useful (for our
> users at least) but the debug experience is already getting close to being
> as bad as -O2/3, so I'm personally very happy with your direction of
> redefining -O1 (especially as that could then open up the way to future
> enhancements like using PGO data to let us compile everything at -O1 for
> the build time performance win, except for the critical hot functions that
> get the full -O2/3 pipeline for the run time performance win).
>
>
>
> How will this optimization level interact with LTO (specifically
> ThinLTO)?  Would -O1 -flto=thin to run through a different, faster LTO
> pipeline or are we expecting that any everyday development build
> configuration won't include LTO?
>
>
>
> I'm a little bit more on the fence with what this would mean for -Og, as
> I'd really like to try and come to some sort of community consensus on
> exactly what -Og should mean and what its aims should be.  If you happen to
> be at EuroLLVM this year then that would be absolutely perfect timing as
> I'd already submitted a round table topic to try and start just that
> process [ http://llvm.org/devmtg/2019-04/#rounds ].  My team's main focus
> right now is in trying to fix as many -O2 debug experience issues as
> possible, with the hope that we could consider using an -Og mode to mop up
> what's left, but we've been surveying our users for a few years now about
> what they'd find useful in such an optimization level.
>
>
>
> The general consensus is that performance must not be significantly worse
> than -O2.  We've heard a few numbers thrown around like 5-10% runtime
> slowdown compared to -O2 being the absolute maximum acceptable level of
> intrusion for them to consider using such a mode.  I'm not really sure how
> realistic that is and I'm inclined to think that we could probably stretch
> that limit a little bit here and there if the debugging experience really
> was that much better, but I think it gives a good indication of at least
> what our users are looking for.  Essentially -O2 but with as few changes as
> we can get away with making to make the debugging experience better.  I
> know that this is somewhat woolly, so it might be that your proposed
> pipeline is the closest we can get that matches such an aim, but once we've
> decided what -Og should mean, I'd like to try and justify any changes with
> some real data.  I'm willing for my team to contribute as much data as we
> can.  We've also been using dexter [
> http://llvm.org/devmtg/2018-04/slides/Bedwell-Measuring_the_User_Debugging_Experience.pdf ]
> to target our -O2 debugging improvement work, but hopefully it will be
> useful to provide another datapoint for the effects on the debugging
> experience of disabling specific passes.
>
>
>
> In my mind, -Og probably would incorporate a few things:
>
> * Tweak certain pass behaviors in order to be more favorable towards
> debugging [ https://reviews.llvm.org/D59431#1437716 ]
>
> * Enable features favorable to debugging [
> http://llvm.org/devmtg/2017-10/#lightning8 ]
>
> * Disable whole passes that are known to fundamentally harm the debugging
> experience if there is no other alternative approach (this proposal?)
>
> * Still give a decent debug experience when used in conjunction with LTO.
>
>
>
> Thanks again for writing up your proposal.  I'm really happy to see
> movement in this area!
>
>
>
> -Greg
>
>
>
>
>
>
>
> On Fri, 29 Mar 2019 at 02:09, Eric Christopher via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi All,
>
> I’ve been thinking about both O1 and Og optimization levels and have a
> proposal for an improved O1 that I think overlaps in functionality
> with our desires for Og. The design goal is to rewrite the O1
> optimization and code generation pipeline to include the set of
> optimizations that minimizes build and test time while retaining our
> ability to debug.
>
> This isn’t to minimize efforts around optimized debugging or negate O0
> builds, but rather to provide a compromise mode that encompasses some
> of the benefits of both. In effect to create a “build mode for
> everyday development”.
>
> This proposal is a first approximation guess on direction. I’ll be
> exploring different options and combinations, but I think this is a
> good place to start for discussion. Unless there are serious
> objections to the general direction I’d like to get started so we can
> explore and look at the code as it comes through review.
>
>
> Optimization and Code Generation Pipeline
>
> The optimization passes chosen fall into a few main categories,
> redundancy elimination and basic optimization/abstraction elimination.
> The idea is that these are going to be the optimizations that a
> programmer would expect to happen without affecting debugging. This
> means not eliminating redundant calls or non-redundant loads as those
> could fail in different ways and locations while executing.  These
> optimizations will also reduce the overall amount of code going to the
> code generator helping both linker input size and code generation
> speed.
>
> Dead code elimination
>
>  - Dead code elimination (ADCE, BDCE)
>  - Dead store elimination
>  - Parts of CFG Simplification
>  - Removing branches and dead code paths and not including commoning
> and speculation
>
> Basic Scalar Optimizations
>
>  - Constant propagation including SCCP and IPCP
>  - Constant merging
>  - Instruction Combining
>  - Inlining: always_inline and normal inlining passes
>  - Memory to register promotion
>  - CSE of “unobservable” operations
>  - Reassociation of expressions
>  - Global optimizations - try to fold globals to constants
>
> Loop Optimizations
>
> Loop optimizations have some problems around debuggability and
> observability, but a suggested set of passes would include
> optimizations that remove abstractions and not ones that necessarily
> optimize for performance.
>
>  - Induction Variable Simplification
>  - LICM but not promotion
>  - Trivial Unswitching
>  - Loop rotation
>  - Full loop unrolling
>  - Loop deletion
>
> Pass Structure
>
> Overall pass ordering will look similar to the existing pass layout in
> llvm with passes added or subtracted for O1 rather than a new pass
> ordering. The motivation here is to make the overall proposal easier
> to understand initially upstream while also maintaining existing pass
> pipeline synergies between passes.
>
> Instruction selection
>
> We will use the fast instruction selector (where it exists) for three
> reasons:
>  - Significantly faster code generation than llvm’s dag based
> instruction selection
>  - Better debugability than selection dag - fewer instructions moved around
>  - Fast instruction selection has been optimized somewhat and
> shouldn’t be an outrageous penalty on most architectures
>
> Register allocation
>
> The fast register allocator should be used for compilation speed.
>
> Thoughts?
>
> Thanks!
>
> -eric
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190329/4653dd7b/attachment.html>