[PATCH] D73893: [MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch.

Sat Feb 8 10:55:41 PST 2020

mravishankar requested changes to this revision.
mravishankar added a comment.
This revision now requires changes to proceed.

(Sorry for jumping on this late)

This is now adding three different lowering of loops to GPU

1. From perfectly nested loops to GPU
2. From imperfectly nested loops to GPU
3. Parallel loops to GPU

I confess I contributed to one of them (2). We should probably have a plan to consolidate these. This is too much duplication of functionality. We should probably deprecate (2) in favor of this change, and adapt all uses accordingly.

================
Comment at: mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp:503
+
+std::tuple<unsigned, AffineMap, AffineMap>
+extractMapAndOperand(Attribute attribute) {
----------------
rriddle wrote:
> ftynse wrote:
> > I would suggest creating a struct with named fields here for better readability. And document the function plz
> Generally only classes should be within anonmyous namespaces. static functions should be in global scope and marked 'static'.
Add some comments please.

================
Comment at: mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp:504
+std::tuple<unsigned, AffineMap, AffineMap>
+extractMapAndOperand(Attribute attribute) {
+  DictionaryAttr dict = attribute.dyn_cast<DictionaryAttr>();
----------------
If this is really a dictionary_attribute, this lookup might be simplified using a StructAttr

================
Comment at: mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp:516
+
+LogicalResult processParallelLoop(ParallelOp parallelOp, gpu::LaunchOp launchOp,
+                                  BlockAndValueMapping &cloning_map,
----------------
Please add some comments about this method, preconditions, etc.

================
Comment at: mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp:522
+  // processor ids: 0-2 block [x/y/z], 3-5 -> thread [x/y/z], 6-> sequential
+  ArrayAttr mapping = parallelOp.getAttrOfType<ArrayAttr>("mapping");
+  // TODO(herhut): Support reductions.
----------------
I am just coming upto speed on implementation of loop.parallel, but in the ODS I dont see any attribute for "mapping". When is this added?

================
Comment at: mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp:524
+  // TODO(herhut): Support reductions.
+
+  if (!mapping || parallelOp.getNumResults() != 0)
----------------
nit: Please remove this line. Didnt make the immediate leap to reductions and parallelop.getNumResults() != 0

================
Comment at: mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp:671
+    // Now walk over the body and clone it.
+    // TODO: This is only correct if there either is no further loop.parallel
+    //       nested or this code is side-effect free. Otherwise we might need
----------------
This comment seems outdated now.

================
Comment at: mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp:703
+  OwningRewritePatternList patterns;
+  patterns.insert<ParallelToGpuLaunchLowering>(&getContext());
+  ConversionTarget target(getContext());
----------------
It would be useful to also expose a method to populate this pattern (and also other patterns that might be needed in the future) in a method like `populateParallelLoopToGPUPatterns` . Then a pass can just import these patterns without having to run a separate pass to do this lowering.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73893/new/

https://reviews.llvm.org/D73893