[PATCH] D74963: [MLIR][GPU] Implement a simple greedy loop mapper.

Mon Feb 24 06:22:55 PST 2020

ftynse accepted this revision.
ftynse added a comment.
This revision is now accepted and ready to land.

Mostly nits from my side. Feel free to address and land.

================
Comment at: mlir/include/mlir/Dialect/GPU/ParallelLoopMapper.h:1
+//===- MemoryPromotion.h - Utilities for moving data across GPU -*- C++ -*-===//
+//
----------------
Copy-pasta in the header

================
Comment at: mlir/include/mlir/Dialect/GPU/ParallelLoopMapper.h:26
+/// mapped to sequential loops.
+void greedilyMapParallelLoopsToGPU(FuncOp op);
+
----------------
Do you about the op being a FuncOp? I'd just go with a body region here.

================
Comment at: mlir/include/mlir/Dialect/GPU/ParallelLoopMapper.h:31
+#endif // MLIR_DIALECT_GPU_PARALLELLOOPMAPPER_H
\ No newline at end of file

----------------
Nit: missing newline at the end of file

================
Comment at: mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp:1
+//===- MemoryPromotion.cpp - Utilities for moving data across GPU memories ===//
+//
----------------
Also copy-pasta.

================
Comment at: mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp:34
+/// level unless Sequential was already reached.
+MappingLevel &operator++(MappingLevel &mappingLevel) {
+  if (mappingLevel < Sequential) {
----------------
Nit: the convention is to have file-local types declared in an anonymous namespaces, but file-local functions as "static" outside of such namespaces

================
Comment at: mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp:44
+/// sequential after.
+int64_t getHardwareIdForMapping(MappingLevel level, int dimension) {
+  if (dimension >= kNumHardwareIds || level == Sequential)
----------------
This looks like it would be better placed next to the code that consumes hardware ids, that is the ploop->gpu.launch transformation.

================
Comment at: mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp:66-67
+    entries.emplace_back(
+        Identifier::get("processor", ctx),
+        b.getI64IntegerAttr(getHardwareIdForMapping(mappingLevel, i)));
+    entries.emplace_back(Identifier::get("map", ctx),
----------------
`b.getNamedAttr("processor", b.getI64IntegerAttr(...))`

================
Comment at: mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp:74
+  }
+  parallelOp.setAttr("mapping", ArrayAttr::get(attrs, ctx));
+  ++mappingLevel;
----------------
Nit: let's factor out reused strings into named constants

================
Comment at: mlir/test/Dialect/GPU/mapping.mlir:56
+// CHECK-SAME:             {bound = affine_map<(d0) -> (d0)>, map = affine_map<(d0) -> (d0)>, processor = 5 : i64}, 
+// CHECK-SAME:             {bound = affine_map<(d0) -> (d0)>, map = affine_map<(d0) -> (d0)>, processor = 6 : i64}]}
+// CHECK:      {mapping = [{bound = affine_map<(d0) -> (d0)>, map = affine_map<(d0) -> (d0)>, processor = 0 : i64}, 
----------------
This test made me wonder about the semantic choice of saying an inner loop in a loop nest being mapped to "sequential" execution as opposed to "don't care about order" . Saying it's sequential implies the first iteration must be completed before starting the second and so on, which isn't the semantics of the original loop nest and isn't that of the generated code that does not include a synchronization after the nested parallel loop.

================
Comment at: mlir/test/lib/Transforms/TestGpuParallelLoopMapping.cpp:1
+//===- TestGPUMemoryPromotionPass.cpp - Test pass for GPU promotion -------===//
+//
----------------
Copy-pasta

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74963/new/

https://reviews.llvm.org/D74963