[PATCH] D73893: [MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch.

Wed Feb 12 12:15:42 PST 2020

ftynse accepted this revision.
ftynse added a comment.
This revision is now accepted and ready to land.

Looks okay for the first take at it.

In general, I wouldn't bother with sequential loops here and implement a separate rewrite that we could run before this that splits the parallel loop into two nested parallel ops, and then use parallel-to-sequential transformation on the inner op.

================
Comment at: mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp:576
+/// To note the end of the current scope in case a loop or conditional was
+/// inserted, a sentinel (the `gpu.launch` operation) is inserted into the
+/// worklist. This signals the processor of the worklist to pop the rewriter
----------------
Nit: why not `nullptr` as a sentinel?

================
Comment at: mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp:611
+    MappingAnnotation annotation =
+        extractMappingAnnotation(std::get<0>(config));
+    Value newIndex;
----------------
Readablity nit: can we assign std::get<0>, etc to variables with more explicative names? The body of this loop is 3 screens long (consider refactoring in a follow-up :))  and it's hard to keep track of what was zip'ed.

================
Comment at: mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp:627
+      if (annotation.boundMap) {
+        auto save = rewriter.saveInsertionPoint();
+        rewriter.setInsertionPoint(launchOp);
----------------
Nit: you can use OpRewriter::InsertionGuard with a C++ scope

================
Comment at: mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp:701
+/// `mapping`, which has three entries:
+///  - processor: the hardware id to map to. 0-2 are block dimensions, 3-5 are
+///               thread dimensions and 6 is sequential.
----------------
For a future revision, I'd consider having a transformation function that is controlled by actual arguments rather than by an attached attribute, and have the pattern read the attribute and call that function. This way, we can call the transformation programmatically without modifying the IR.

================
Comment at: mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp:748
+    // TODO: This is only correct if there either is no further loop.parallel
+    //       nested or this code is side-effect free. Otherwise we might need
+    //       predication.
----------------
Can we just walk recursively and check for the no side effects trait, returning matchFailure if there are side effects?

================
Comment at: mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp:750
+    //       predication.
+    if (auto nestedParallel = dyn_cast<ParallelOp>(op)) {
+      // A nested loop.parallel needs insertion of code to compute indices.
----------------
The comment above says "it is only correct if there either is no further loop.parallel", which I cannot match with this code.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73893/new/

https://reviews.llvm.org/D73893