[PATCH] D77678: [mlir][Linalg] Add loop.parallel lowering for all Linalg Ops.

Tue Apr 7 23:57:47 PDT 2020

mravishankar added a subscriber: herhut.
mravishankar added a comment.

@bondhugula : Thanks for the comments. I certainly cannot think of a case in Linalg currently where the path of going from linalg -> affine -> loop will not be able to cover everything that currently is supported in Linalg. All I was trying to achieve with this change is to finish up the work that was started a while ago of lowering linalg to loop.parallel. From my perspective, loop.parallel and loop.for are in the same group. I certainly didnt intend to make a design or infrastructure level decision here.

I think it would be good to reach some consensus here on whether linalg should always lower to affine, and then lowered to loop dialect, and if it is not recommended to lower linalg to loops directly. As you mentioned going from affine.for -> loop.for is always guaranteed, and so is going from affine.parallel -> loop.parallel. So we can always make the decision to lower from linalg to affine and then to loops for both the parallel and non-parallel version.  My only concern is that If there exists a lowering from linalg to loops, there should also be a way to target loop.parallel since without that you are dropping semantic information about the computation. This information is really important when lowering from loop dialect to GPU dialect (which was AFAIK one of the main motivations behind having loop.parallel in the first place).

@herhut who is probably also interested in this and has comments. I am assuming @nicolasvasilache will also provide his thoughts on this.  FWIW, I am fine going through the affine dialect instead of directly going to loop dialect. I hadnt given thought about using affine dialect for my use cases, but i definitely dont see an issue with it.

Another point that is off the top of my head if the recommendation is to go through affine dialect. There is already mechanism to generate loop.parallel when tiling linalg operations. AFAIK, the tile size can be dynamic, and therefore cannot be expressed using affine.parallel loops. So if the codegeneration process is tiling linalg ops and then lowering the tiled ops to loops, you can end up in a situation where the outer loops are in Loop dialect but the inner loops are in affine dialect. I am not sure there is an issue with that cause eventually you can lower the affine loops to loop dialect, but its just something that I havent reasoned fully about for myself.

Side note: This change seems bigger than it is cause it is also doing some changes to EDSC which are probably not needed. Will try to remove those.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77678/new/

https://reviews.llvm.org/D77678