[PATCH] D76165: [mlir][GPU] Use StructAttr to drive lowering from loop.parallel to gpu.launch

Mon Mar 16 09:14:38 PDT 2020

mravishankar added inline comments.

================
Comment at: mlir/include/mlir/Dialect/GPU/ParallelLoopMapper.h:52
+///   the ploopOp.
+/// - the mapping does not map multiple loops to the same processor.
+LogicalResult setMappingAttr(Operation *op,
----------------
herhut wrote:
> This is not illegal.
I can remove that check, but I was going by what is here : https://github.com/llvm/llvm-project/blob/5c261c9c452959985de19540c168b224af24e2d3/mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp#L673

================
Comment at: mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp:51
+        static_cast<gpu::Processor>(dimAttr.processor().getInt());
+    if (processor != gpu::Processor::SEQUENTIAL &&
+        specifiedMappings.count(processor))
----------------
herhut wrote:
> Why is this invalid? You could map to the same processor but use `ceilDiv` and `modulo` in the `map` attribute to decompose the bound again.
I was going by what is here : https://github.com/llvm/llvm-project/blob/5c261c9c452959985de19540c168b224af24e2d3/mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp#L673. Sorry if I misread it

================
Comment at: mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp:83
 /// sequential after.
-static int64_t getHardwareIdForMapping(MappingLevel level, int dimension) {
+/// TODO(ravishankarm/herhut) : Make this use x for the inner-most loop that is
+/// distributed to map to x, the next innermost to y and the next innermost to
----------------
herhut wrote:
> I do not understand this comment.
Maybe this comment is misplaced (I should move it below). At line 136 below, the 0th-induction variable is mapped to processor x, and 1th-induction variable to processor y, etc. Typically the 0th induction variable is the "outer" parallel loop. The 1th induction variable is the next inner, etc. There is no strong reason for it, but typically the inner-most parallel loop is also used to access the data in stride 1 (in elementwise operations for example). So a default mapping can just try to handle this common case. There are cases where this doesnt work obviously, and maybe a more general mechanism to control which dimensions maps to which processor dimension is useful. Is that what the mapping and bound are expected to do?

================
Comment at: mlir/test/Conversion/LoopsToGPU/parallel_loop.mlir:312
   %four = constant 4 : index
-  // expected-error at +2 {{cannot redefine the bound for processor 1}}
+  // expected-error at +2 {{cannot redefine the bound for processor BLOCKY}}
   // expected-error at +1 {{failed to legalize operation 'loop.parallel'}}
----------------
herhut wrote:
> Now the error message no longer corresponds to what is encoded in the attribute. If I search for `BLOCKY` in the input I would not find it.
I think this is an issue with the parsing of custom StructAttr while roundtripping. Its implemented as an IntegerAttr. So while printing and parsing it will use the enum value. If you adapt the loop.parallel parser/printer to custom parse the mapping attribute you can make it accept keywords like "BLOCKX", "BLOCKY", etc. and print it out that way as well. Then the error message here would match up well.
For now I reverted the error message to print it as it was before. I can adapt the parser/printer to rountrip the processor value as something more meaningful. (I can add that to this change itself, but it would be a breaking change and not an NFC). Your call.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76165/new/

https://reviews.llvm.org/D76165