<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: arial,helvetica,sans-serif; font-size: 10pt; color: #000000'><br><hr id="zwchr"><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"James Molloy" <james@jamesmolloy.co.uk><br><b>To: </b>"Justin Bogner" <mail@justinbogner.com>, "James Molloy via llvm-dev" <llvm-dev@lists.llvm.org><br><b>Cc: </b>"Hal Finkel" <hfinkel@anl.gov>, "Chandler Carruth" <chandlerc@google.com>, "Matthias Braun" <matze@braunis.de>, "Pete Cooper" <peter_cooper@apple.com><br><b>Sent: </b>Tuesday, July 19, 2016 9:16:02 AM<br><b>Subject: </b>Re: [llvm-dev] RFC: Enabling Module passes post-ISel<br><br><div dir="ltr">Hi all,<div><br></div><div>I like all the ideas so far. Here are my thoughts:</div><div><br></div><div id="DWT10922">I think that fundamentally users of LLVM should be able to opt-in to more aggressive or intensive computation at compile time if they wish. Users' needs differ, and while a 33% increase in clang LTO is absolutely out of the question for some people, for those developing microcontrollers or HPC applications that may well be irrelevant.</div></div></blockquote>I agree. A 33% increase is absorbable in many environments.<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div> Either the volume of code expected is significantly smaller or they're happy to trade off compile time for expensive server time. That does not mean that we shouldn't strive for a solution that can be acceptable by all users. On the other hand making something opt-in makes it non-default, and that increases the testing surface.</div><div><br></div><div>Tangentially I think that LLVM currently doesn't have the right tuning knobs to allow the user to select their desired tradeoff. We have one optimization flag -O{s,z,0,1,2,3} which encodes both optimization *goal* (a point on the pareto curve between size and speed) and amount of effort to expend at compile time achieving that goal. Anyway, that's besides the point.</div><div><br></div><div>I like Justin's idea of removing IR from the backend to free up memory. I think it's a very long term project though, one that requires significant (re)design; alias analysis access in the backend would be completely broken and BasicAA among others depends on seeing the IR at query time. We'd need to work out a way of providing alias analysis with no IR present. I don't think that is feasible for the near future.</div><div><br></div><div id="DWT10923">So my suggestion is that we go with Matthias' idea - do the small amount of refactoring needed to allow MachineModulePasses on an opt-in basis. The knobs to enable that opt-in might need some more bikeshedding.</div></div></blockquote>This makes sense to me. I expect that targets will be able to opt-in in some optimization-level-dependent fashion.<br><br> -Hal<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div></div><div><br></div><div>Cheers,</div><div><br></div><div>James</div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, 19 Jul 2016 at 08:21 Justin Bogner <<a href="mailto:mail@justinbogner.com" target="_blank">mail@justinbogner.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">James Molloy via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> writes:<br>

> In LLVM it is currently not possible to write a Module-level pass (a pass that<br>

> modifies or analyzes multiple MachineFunctions) after DAG formation. This<br>

> inhibits some optimizations[1] and is something I'd like to see changed.<br>

><br>

> The problem is that in the backend, we emit a function at a time, from DAG<br>

> formation to object emission. So no two MachineFunctions ever exist at any one<br>

> time. Changing this necessarily means increasing memory usage.<br>

><br>

> I've prototyped this change and have measured peak memory usage in the worst<br>

> case scenario - LTO'ing llc and clang. Without further ado:<br>

><br>

>   llvm-lto llc:   before: 1.44GB maximum resident set size<br>

>                   after:  1.68GB (+17%)<br>

><br>

>   llvm-lto clang: before: 2.48GB maximum resident set size<br>

>                   after:  3.42GB (+33%)<br>

><br>

> The increases are very large. This is worst-case (non-LTO builds would see the<br>

> peak usage of the backend masked by the peak of the midend) but still - pretty<br>

> big. Thoughts? Is this completely no-go? is this something that we *just need*<br>

> to do? Is crippling the backend architecture to keep memory down justified? Is<br>

> this something we could enable under an option?<br>

<br>

Personally, I think this price is too high. I think that if we want to<br>

enable machine module passes (which we probably do) we need to turn<br>

MachineFunction into more of a first class object that isn't just a<br>

wrapper around IR.<br>

<br>

This can and should be designed to work something like Pete's solution,<br>

where we get rid of the IR and just have machine level stuff in memory.<br>

This way, we may still increase the memory usage here, but it should be<br>

far less dramatic.<br>

<br>

You'll note that doing this also has tangential benefits - it should be<br>

helpful for simplifying MIR and generally improving testability of the<br>

backends.<br>

</blockquote></div>

</blockquote><br><br><br>-- <br><div><span name="x"></span>Hal Finkel<br>Assistant Computational Scientist<br>Leadership Computing Facility<br>Argonne National Laboratory<span name="x"></span><br></div></div></body></html>