[llvm-dev] [RFC] Writing loop transformations on the right representation is more productive

Sun Jan 12 22:07:16 PST 2020

On Jan 3, 2020, at 3:26 AM, Michael Kruse via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> In the 2018 LLVM DevMtg [1], I presented some shortcomings of how LLVM
> optimizes loops. In summary, the biggest issues are (a) the complexity
> of writing a new loop optimization pass (including needing to deal
> with a variety of low-level issues, a significant amount of required
> boilerplate, the difficulty of analysis preservation, etc.), (b)
> independent optimization heuristics and a fixed pass ordering and (c)
> code explosion due to versioning. Currently, different people are
> working on improving different loop optimization passes such as
> LoopInterchange, LoopFuse and LoopVectorize. Even if LLVM had
> 'perfect' individual loop passes, they would still have the
> aforementioned problems due to still being independent and the system
> as a whole will be suboptimal.

Hi Michael,

Thank you for bringing this up.  This is an area of interest, and I certainly share you view of what a pain this all is right now.  I can tell you’ve put a lot of thought into this and time into your RFC!

> The central idea is to use a modifiable loop tree -- similar to
> LoopInfo -- as the primary representation. LLVM-IR is converted to a
> loop tree, then optimized and finally LLVM-IR is generated again for
> subtrees that are considered profitable. This is not a new concept, it
> has already been used in compilers such as IBM XL Fortran (called ASTI
> [4]) and Silicon Graphics/Open64 (called LNO [10]), and in research
> such as the Value State Dependence Graph [3].

Ignoring the details of its representation, this is also conceptually how Polly works: code is lifted into its representation, transformed, then lowered back down.

> 4. Represents loops and predicates instead of CFGs

Yes, totally!

Overall, I think that this discussion would be easier to process if we broke it into a few pieces.  There seems to be consensus that LLVM IR (as is) is not the right representation for aggressive loop transformations.  If we don’t have consensus on this, then I’d make sure to start there.

Once that is established, there is a question of “what is the right representation to use”?  This question has two subcomponents: what data structure should we use, and what is the IR within it.

If you propose introducing a brand new data structure, please expect me to push back on that pretty hard.  This is a perfect application of MLIR, and using it provides a lot of value (including amazing testing tools, round-tripping, location tracking, etc) which would otherwise would have to be reinvented, and doesn’t not dictate the IR structure otherwise.  MLIR absolutely supports nested loop structures etc, as is seen in the affine dialect.

The MLIR community also is highly invested in HPC-style transformations on this, and a lot of thought has gone into it.  You can learn more about this in the slides and videos from the MLIR open design meetings <https://docs.google.com/document/d/1y_9f1AbfgcoVdJh4_aM6-BaSHvrHl8zuA5G4jv_94K8/edit#heading=h.cite1kolful9>.

One you achieve consensus on data structure, there is the question of what IR to use within it.  I would recommend starting with some combination of “existing LLVM IR operations + high level control flow representation”, e.g. parallel and affine loops.  The key here is that you need to always be able to lower in a simple and predictable way to LLVM IR (this is one of the things that classic polyhedral systems did sub optimally, making it difficult to reason about the cost model of various transformations), and this is a natural incremental starting point anyway.  Over time, more high level concepts can be gradually introduced.  FYI, MLIR already has a reasonable LLVM dialect <https://mlir.llvm.org/docs/Dialects/LLVM/> and can generate LLVM IR from it, so we’d just need an “LLVM IR -> MLIR LLVM dialect” conversion, which should be straightforward to build.

Once you have the data structure and the dialect within it decided, you have the set of transformations.  Again, you’ve given a lot of thought to this, and that all sounds great to me, but the priorities can be driven by whoever wants to contribute and what concrete problems they have to solve.

Once the infra for “raising to this representation and lowering back down” is figured out, we can open the box of having clang and other front ends directly generate it.

> Q: Relation to MLIR?
> A: MLIR is more similar to LLVM-IR than a loop hierarchy.

This is not true, MLIR is great for dialects that want to model loop hierarchies naturally, this is a major focus of the affine dialect <https://mlir.llvm.org/docs/Dialects/Affine/> (e.g. see affine.for on that page).  MLIR is not limited to affine loops, that is just a choice made by the affine dialect - the loop dialect <https://mlir.llvm.org/docs/Dialects/LoopOps/> has more general constructs that are less developed.

> For
> instance, it also does not feature cheap copies.

I’m not sure what this means.

> An advantage is that
> loops and multi-dimensional arrays can be represented in the language
> without the need of being rediscovered, but have to be inserted by a
> front-end.

This is correct, but I don’t see how this helps if your focus is raising code that has already been lowered to LLVM IR, e.g. by Clang or some other frontend that generates LLVM IR today.

> That is, if Clang was generating MLIR, loops and arrays
> still have to be rediscovered.

This isn’t true, it would be perfectly sensible to lower C control flow structures directly too LLVM.  The primary concern are things like unstructured switches (think duff’s device) and unstructured gotos, but these occur rarely: they need to be handled correctly, but can do so with a traditionally lowered CFG and other “best effort” attempts to raise them.

Other frontends like Swift and Flang could also generate this directly if they chose to, getting the benefits of progressive lowering.

> However, a loop hierarchy optimizer
> could be applied to MLIR just as well as to LLVM-IR.

Right!  In addition to structured control flow, MLIR has great support for CFG representations like LLVM of course. :-)

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200112/c16521ae/attachment.html>