[PATCH] D73893: [MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch.

Mon Feb 3 13:52:11 PST 2020

ftynse added a comment.

I suppose you want high-level feedback on this, so I'm not nitpicking in the code.

I think this goes into the right direction, but you are right that this should not be the final design. Using pattern rewrites looks like the right thing to do. I would consider having a pattern and/or a utility function that a future driver can call and have a parallel loop mapped to GPU. We can have test passes where the application is controlled by attributes, but we can also have more automated approaches where we would, e.g., greedily map the outermost parallel loop to GPUs. My thinking is that we should eventually have a tree-like loop structure on which we can decide where to map to blocks/threads/do promotions. That being said, I am not 100% sure we actually want to materialize the decisions as attributes on the operations, or at least as attributes that operations know about.

One thing to keep in mind, nested parallel loops may be tricky, in particular you may need synchronizations inside the outermost parallel loop and make sure they are called by all threads in a "convergent" fashion. For the first take, I'd map one parallel loop construct or perfectly nested constructs and extend from there.

================
Comment at: mlir/include/mlir/Dialect/LoopOps/LoopOps.td:16

+include "mlir/Dialect/AffineOps/AffineOpsBase.td"
 include "mlir/IR/OpBase.td"
----------------
It's a bit reversed to have loops depend on affine loops. Is this necessary for AffineMapAttr?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73893/new/

https://reviews.llvm.org/D73893