[llvm-dev] [RFC] Writing loop transformations on the right representation is more productive

Sun Jan 26 19:04:27 PST 2020

> On Jan 22, 2020, at 12:58 AM, Michael Kruse <llvmdev at meinersbur.de> wrote:
> Am Mi., 15. Jan. 2020 um 20:27 Uhr schrieb Chris Lattner <clattner at nondot.org <mailto:clattner at nondot.org>>:
>> One you achieve consensus on data structure, there is the question of what IR to use within it.  I would recommend starting with some combination of “existing LLVM IR operations + high level control flow representation”, e.g. parallel and affine loops.  The key here is that you need to always be able to lower in a simple and predictable way to LLVM IR (this is one of the things that classic polyhedral systems did sub optimally, making it difficult to reason about the cost model of various transformations), and this is a natural incremental starting point anyway.  Over time, more high level concepts can be gradually introduced.  FYI, MLIR already has a reasonable LLVM dialect <https://mlir.llvm.org/docs/Dialects/LLVM/> and can generate LLVM IR from it, so we’d just need an “LLVM IR -> MLIR LLVM dialect” conversion, which should be straightforward to build.
>> 
>> Adding a LLVM-IR -> MLIR -> LLVM-IR round-trip would at the beginning just introduce compile-time overhead and what Renato described as brittleness. I fear this hurts adaption.
> 
> Isn’t this true of *any* higher level IR?  Unless I’m missing something big, this seems inherent to your proposal.
> 
> No. A loop hierarchy may be created on-demand and can be skipped if, e.g., the function does not contain a loop.

I don’t see how this is specific to a loop IR.  Any “LLVMIR -> X -> LLVMIR” system has the behavior you describe, whether X is polly, mlir, or some other loop IR; any decision about skipping the round trip could be applied to any of them.

The advantage of MLIR in this discussion is that it has the opportunity to subsume LLVMIR at some point in the future, eliminating the round trip at that point.

> For IRs that are translation-unit based, the entire module will have to do a round-trip whether changed or not.

I think that this must be the misunderstanding.  There is no requirement to do a “Full LLVM IR module to MLIR module” conversion, you can convert one function, one loop nest, one basic block or whatever you else you’d want to do.

>> This is definitely subjective question. I think that MLIR is closer to LLVM-IR for how it is processed. Both have a sequence of passes running over a single source of truth. Both allow walking the entire structure from every instruction/operation/block. Analyses are on function or module level. Both have CFGs (I think for a certain kind of transformations it is an advantage that control flow is handled implicitly).
> 
> Right, but a frequent way that MLIR is used is without its CFG: most machine learning kernels use nests of loops and ifs, not CFGs.  CFGs are exposed when those are lowered out.  See some simple examples like:
> https://github.com/llvm/llvm-project/blob/master/mlir/test/Transforms/affine-data-copy.mlir <https://github.com/llvm/llvm-project/blob/master/mlir/test/Transforms/affine-data-copy.mlir>
> 
> 
> I agree that a loop nest can be represented in MLIR. What is missing IMHO is being able to have multiple versions of the same code. For instance, raising emitted C++ to such representation to make it more optimizable may only be possible under preconditions and by itself making the code slower. If the raised representation cannot be optimized, we will want to use the original one.

This is pretty straight-forward (in principle, I haven’t actually built a system to prove this), because you can just have an operation with regions for each version, e.g.:

for {
  op1
  versioned {
    stuff
  } {
    other stuff
  }
  op2
}

Then transform the code and select which version you want later.

MLIR doesn’t magically make the algorithms happen for you, but it does handle the representational issues, making it super flexible and easy to support things like this.

> 
>> On the other side, natural loop detection on CFGs is quite mature (with a remaining issue of irreducible loops that might appear, but can also be eliminated again). As a plus, optimization does depend less on how the source code is written.
> 
> Yep totally.  The question is whether you lose semantic information from lowering to a CFG and reconstructing back up.  This can affect you when you have higher level language semantics (e.g. Fortran parallel loops, openmp or other concurrency constructs etc).  This is where MLIR excels of course.
> 
> 
> Indeed it is easier to not lower these constructs, but not impossible (as shown in https://reviews.llvm.org/D69930 <https://reviews.llvm.org/D69930>). I think the relevant difference is that these constructs come with additional guarantees (e.g. Single-Entry-Single-Exit regions) and optimization hurdles (e.g. thread synchronization; where programmers do not expect the compiler to do a lot of things) compared to C++ loop constructs.

There is a dangerous implication in the way you phrased this, and while it may not have been intentional,  is something I think is important to point out.   The whole point of good IR design is to make certain things *easy*.  IR design rarely makes things “possible”, and so saying something is “not impossible” is a dangerous ground to stand on.

To reduce the point to absurdity, you could implement loop transformations on machine code: you can reconstruct a CFG, discover natural loops, raise the machine code to something like LLVM IR, etc.  The problem with this is that it is extremely cumbersome, fragile, and would lose the ability to use high level information that cannot be persisted in machine code.  The same thing is true about LLVM IR, just less absurdly so.

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200126/96ecb1b0/attachment.html>