[llvm-dev] RFC: Enabling Module passes post-ISel

Tue Jul 19 00:21:30 PDT 2016

James Molloy via llvm-dev <llvm-dev at lists.llvm.org> writes:
> In LLVM it is currently not possible to write a Module-level pass (a pass that
> modifies or analyzes multiple MachineFunctions) after DAG formation. This
> inhibits some optimizations[1] and is something I'd like to see changed.
>
> The problem is that in the backend, we emit a function at a time, from DAG
> formation to object emission. So no two MachineFunctions ever exist at any one
> time. Changing this necessarily means increasing memory usage.
>
> I've prototyped this change and have measured peak memory usage in the worst
> case scenario - LTO'ing llc and clang. Without further ado:
>
>   llvm-lto llc:   before: 1.44GB maximum resident set size
>                   after:  1.68GB (+17%)
>
>   llvm-lto clang: before: 2.48GB maximum resident set size
>                   after:  3.42GB (+33%)
>
> The increases are very large. This is worst-case (non-LTO builds would see the
> peak usage of the backend masked by the peak of the midend) but still - pretty
> big. Thoughts? Is this completely no-go? is this something that we *just need*
> to do? Is crippling the backend architecture to keep memory down justified? Is
> this something we could enable under an option?

Personally, I think this price is too high. I think that if we want to
enable machine module passes (which we probably do) we need to turn
MachineFunction into more of a first class object that isn't just a
wrapper around IR.

This can and should be designed to work something like Pete's solution,
where we get rid of the IR and just have machine level stuff in memory.
This way, we may still increase the memory usage here, but it should be
far less dramatic.

You'll note that doing this also has tangential benefits - it should be
helpful for simplifying MIR and generally improving testability of the
backends.