<div dir="ltr"><div>Hi Eric,</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 28, 2019 at 7:09 PM Eric Christopher via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Hi All,<br>

<br>

I’ve been thinking about both O1 and Og optimization levels and have a<br>

proposal for an improved O1 that I think overlaps in functionality<br>

with our desires for Og. The design goal is to rewrite the O1<br>

optimization and code generation pipeline to include the set of<br>

optimizations that minimizes build and test time while retaining our<br>

ability to debug.<br></blockquote><div><br></div><div>That would be nice: how do you distinguish O1 and Og with this view? (which from your list would / wouldn't be included in Og?)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<br>

This isn’t to minimize efforts around optimized debugging or negate O0<br>

builds, but rather to provide a compromise mode that encompasses some<br>

of the benefits of both. In effect to create a “build mode for<br>

everyday development”.<br>

<br>

This proposal is a first approximation guess on direction. I’ll be<br>

exploring different options and combinations, but I think this is a<br>

good place to start for discussion. Unless there are serious<br>

objections to the general direction I’d like to get started so we can<br>

explore and look at the code as it comes through review.<br>

<br>

<br>

Optimization and Code Generation Pipeline<br>

<br>

The optimization passes chosen fall into a few main categories,<br>

redundancy elimination and basic optimization/abstraction elimination.<br>

The idea is that these are going to be the optimizations that a<br>

programmer would expect to happen without affecting debugging. This<br>

means not eliminating redundant calls or non-redundant loads as those<br>

could fail in different ways and locations while executing.  These<br>

optimizations will also reduce the overall amount of code going to the<br>

code generator helping both linker input size and code generation<br>

speed.<br>

<br>

Dead code elimination<br>

<br>

 - Dead code elimination (ADCE, BDCE)<br>

 - Dead store elimination<br>

 - Parts of CFG Simplification<br>

 - Removing branches and dead code paths and not including commoning<br>

and speculation<br>

<br>

Basic Scalar Optimizations<br>

<br>

 - Constant propagation including SCCP and IPCP<br>

 - Constant merging<br>

 - Instruction Combining<br>

 - Inlining: always_inline and normal inlining passes<br>

 - Memory to register promotion<br>

 - CSE of “unobservable” operations<br>

 - Reassociation of expressions<br>

 - Global optimizations - try to fold globals to constants<br>

<br>

Loop Optimizations<br>

<br>

Loop optimizations have some problems around debuggability and<br>

observability, but a suggested set of passes would include<br>

optimizations that remove abstractions and not ones that necessarily<br>

optimize for performance.<br>

<br>

 - Induction Variable Simplification<br>

 - LICM but not promotion<br>

 - Trivial Unswitching<br>

 - Loop rotation<br>

 - Full loop unrolling<br>

 - Loop deletion<br></blockquote><div><br></div><div>That is already a pretty good list. I would find interesting if we know the opposite list: the passes that we should not include for speed and debugaibility? Vectorizer? Unrolling? Jump Threading?</div><div>Also couldn't constant propagation and reassociation which are in your list hurt debugability?</div><div><br></div><div>Thanks!</div><div><br></div><div>-- </div><div>Mehdi</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<br>

Pass Structure<br>

<br>

Overall pass ordering will look similar to the existing pass layout in<br>

llvm with passes added or subtracted for O1 rather than a new pass<br>

ordering. The motivation here is to make the overall proposal easier<br>

to understand initially upstream while also maintaining existing pass<br>

pipeline synergies between passes.<br>

<br>

Instruction selection<br>

<br>

We will use the fast instruction selector (where it exists) for three reasons:<br>

 - Significantly faster code generation than llvm’s dag based<br>

instruction selection<br>

 - Better debugability than selection dag - fewer instructions moved around<br>

 - Fast instruction selection has been optimized somewhat and<br>

shouldn’t be an outrageous penalty on most architectures<br>

<br>

Register allocation<br>

<br>

The fast register allocator should be used for compilation speed.<br>

<br>

Thoughts?<br>

<br>

Thanks!<br>

<br>

-eric<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div></div>