[PATCH] D77678: [mlir][Linalg] Add loop.parallel lowering for all Linalg Ops.

Wed Apr 8 10:19:04 PDT 2020

mravishankar added a comment.

In D77678#1968752 <https://reviews.llvm.org/D77678#1968752>, @bondhugula wrote:

> In D77678#1968714 <https://reviews.llvm.org/D77678#1968714>, @mravishankar wrote:
>
> > Another point that is off the top of my head if the recommendation is to go through affine dialect. There is already mechanism to generate loop.parallel when tiling linalg operations. AFAIK, the tile size can be dynamic, and therefore cannot be expressed using affine.parallel loops.
>
>
> I've pointed this out a couple of times that this isn't accurate - you can represent non-constant tile sizes using either affine.parallel or affine.for (https://llvm.discourse.group/t/beginner-q-help-with-loops-affine-linalg/707/4).

Thanks for the pointer. As was done in that post, I just looked at the op definition and reached the conclusion about parametric tiling. I havent worked with affine dialect as much to know about such things. Its definitely something I want to look into in due course.

>> So if the codegeneration process is tiling linalg ops and then lowering the tiled ops to loops, you can end up in a situation where the outer loops are in Loop dialect but the inner loops are in affine dialect. I am not sure there is an issue with that cause eventually you can lower the affine loops to loop dialect, but its just something that I havent reasoned fully about for myself.
> 
> Second, there is no issue with using a mix of affine and loop dialect ops - '-lower-to-affine' should be able to handle it by design. From a mix of affine.for and loop.for, it'll take you to just loop.for's. Please file a bug report if it doesn't!

Agreed (and said so earlier). It should be OK to mix loop.parallel/loop.for with affine.for/affine.parallel. But based on your post is it possible to generate affine.for/affine.parallel while tiling linalg ops as well? That way the same benefit of going to affine.for/affine.parallel would be available at the inter-tile loops as well.

In D77678#1969051 <https://reviews.llvm.org/D77678#1969051>, @ftynse wrote:

> In D77678#1968555 <https://reviews.llvm.org/D77678#1968555>, @bondhugula wrote:
>
> > . @mehdi_amini, @nicolasvasilache, @andydavis1 - has there been any thought and a clear design direction on this? If you go down this path, you'd be forced to duplicate even more of the infrastructure that exists on affine.for on loop.for in strictly less powerful ways and without a good reason. There may be a *few* things that you may just want to do on loop.for rather than on affine.for, but you could do that anyway even after having passed through the affine dialect.
>
>
> I did think about this, and we even had a document back in the time when had access to those ;) The discussion you want to have here is mostly independent of this patch, and pertains to the motivation for having the loop dialect in the first place. We had that discussion when the dialect was introduced.
>
> Loop dialect was split out from Linalg, where the loop-related ops had been introduced to remove some of the affine constraints that were irrelevant and/or constraining for Linalg's use case. One of the constraints is the need for what I call "affine provenance", i.e. the set of rules spread out in the code that define which SSA values are allowed to be used as dimensions or as symbols in affine constructs. Supporting non-constant steps can be seen as a consequence of lifting those constraints. Linalg had (and still has) a forward-looking design that accounted for things like non-dense buffers and custom types. Plumbing all that through the affine machinery is hard (trust me, I tried).
>
> While one can, in many cases, wiggle their way out of the representation problem, like you suggest with parametric steps, the question of whether one should remains pertinent. It's a complexity trade-off question. We can introduce extra operations and affine maps to model non-constant steps, call this an "affine idiom for parametric steps" and try to discover it when we reason about steps. We can introduce another idiom for another case that doesn't fit affine (let's take indirect accesses). And so on. This introduces extra complexity to the IR and to the code that manipulates it. What's the counterpart? Linalg-based flow does not intend to run affine transformations, so we cannot claim we pay the complexity price for having better optimization. We can spare some lowering code by... writing some other lowering code with more complex abstractions.

Thanks @ftynse for the really useful background. I am certainly unaware of the discussion here. Would be really good if we could surface this back up on the discussion forum. But as you mentioned, I hope that this patch will be seen independent of that discussion. I am not trying to weigh the scales one way or the other, but rather just filling missing pieces where I can and when I need them.

> The actual duplication here is between Linalg->`loop.for` and Linalg->`loop.parallel` lowering, which I pointed out in one if the previous patches. Given that we have the lowering from `loop.parallel` to `loop.for`, we should remove the Linalg->`loop.for` and replace it with this. My recollection is that it was the plan, but it requires the lowering to `loop.parallel` to also support reductions, which this patch does not do.

Agreed that lowering from linalg to loop.for should become redundant eventually, but right now the lowering to loop.parallel does not support reductions (Apologies for misrepresenting earlier that I am "finishing" the linalg to loop.parallel lowering, there are a couple of cases missing). As it stands, with this patch itself we "can" remove the linalg -> loop.for lowering. For the unhandled cases thats the fallback used anyway. So there is no change in functionality by merging the lowering to loop.for and loop.parallel.

================
Comment at: mlir/test/Dialect/Linalg/loops.mlir:128
+//       CHECKLOOP: %[[C:.*]] = std.view %{{.*}}[][] : memref<?xi8> to memref<f32>
+//       CHECKLOOP: loop.for %{{.*}} = %{{.*}} to %[[K]] step %{{.*}} {
+//   CHECKLOOP-DAG:   %[[a:.*]] = load %[[A]][%{{.*}}] : memref<?xf32, #[[strided1D]]>
----------------
bondhugula wrote:
> Is there a need to match all of the trailing 'step %{{.*}}'? You always print step right?
Probably not. I didnt change what was already there, just changed the check-prefix. I would rather keep it as is.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77678/new/

https://reviews.llvm.org/D77678