[PATCH] D77678: [mlir][Linalg] Add loop.parallel lowering for all Linalg Ops.

Wed Apr 8 03:44:30 PDT 2020

ftynse added a comment.
Herald added a subscriber: frgossen.

In D77678#1968555 <https://reviews.llvm.org/D77678#1968555>, @bondhugula wrote:

> . @mehdi_amini, @nicolasvasilache, @andydavis1 - has there been any thought and a clear design direction on this? If you go down this path, you'd be forced to duplicate even more of the infrastructure that exists on affine.for on loop.for in strictly less powerful ways and without a good reason. There may be a *few* things that you may just want to do on loop.for rather than on affine.for, but you could do that anyway even after having passed through the affine dialect.

I did think about this, and we even had a document back in the time when had access to those ;) The discussion you want to have here is mostly independent of this patch, and pertains to the motivation for having the loop dialect in the first place. We had that discussion when the dialect was introduced.

Loop dialect was split out from Linalg, where the loop-related ops had been introduced to remove some of the affine constraints that were irrelevant and/or constraining for Linalg's use case. One of the constraints is the need for what I call "affine provenance", i.e. the set of rules spread out in the code that define which SSA values are allowed to be used as dimensions or as symbols in affine constructs. Supporting non-constant steps can be seen as a consequence of lifting those constraints. Linalg had (and still has) a forward-looking design that accounted for things like non-dense buffers and custom types. Plumbing all that through the affine machinery is hard (trust me, I tried).

While one can, in many cases, wiggle their way out of the representation problem, like you suggest with parametric steps, the question of whether one should remains pertinent. It's a complexity trade-off question. We can introduce extra operations and affine maps to model non-constant steps, call this an "affine idiom for parametric steps" and try to discover it when we reason about steps. We can introduce another idiom for another case that doesn't fit affine (let's take indirect accesses). And so on. This introduces extra complexity to the IR and to the code that manipulates it. What's the counterpart? Linalg-based flow does not intend to run affine transformations, so we cannot claim we pay the complexity price for having better optimization. We can spare some lowering code by... writing some other lowering code with more complex abstractions.

On a minor note, have you actually tried running the example you proposed in the linked forum post? :) There are many places where semi-affine maps are poorly supported, or not supported at all. Conversion to LLVM is one of them.

That being said, I was the one who has been arguing that Linalg lowering should go through affine when it can (and so was the design document). The problem is when it cannot, or when doing so would just increase the complexity of the system without visible benefits. Let's assume there are cases where it cannot (today, examples would be: using linalg ops with values that don't qualify as affine symbols, reductions that explicitly want to go through SSA values; tomorrow, we'll have sparse buffers). Then we need the possibility to emit at least some non-affine loops, which this patch contributes. Now, if there is one specific loop in a Linalg op that does not fit into affine, do we actually want a mix of affine and non-affine loops, or do we prefer a single non-affine loop nest that, e.g., preserves the idea of permutability that would be no longer discoverable by affine analysis. I can see value in having both options.

The actual duplication here is between Linalg->`loop.for` and Linalg->`loop.parallel` lowering, which I pointed out in one if the previous patches. Given that we have the lowering from `loop.parallel` to `loop.for`, we should remove the Linalg->`loop.for` and replace it with this. My recollection is that it was the plan, but it requires the lowering to `loop.parallel` to also support reductions, which this patch does not do.

Transformations on different kinds of loops are another question, unrelated to this patch. Again, I see value in removing affine restrictions or, conversely, having stricter restrictions such as hyper-rectangular, and separating the legality and profitability analysis (likely based on those restrictions) from the IR manipulation logic.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77678/new/

https://reviews.llvm.org/D77678