[Mlir-commits] [mlir] [mlir][gpu] reverse parallel loop to gpu dimension mapping order. (PR #79592)

Mon Jan 29 06:03:05 PST 2024

================
@@ -78,23 +77,23 @@ static Processor getHardwareIdForMapping(MappingLevel level, int dimension) {
   case MapGrid:
     switch (dimension) {
     case 0:
-      return Processor::BlockX;
+      return Processor::BlockZ;
     case 1:
       return Processor::BlockY;
     case 2:
-      return Processor::BlockZ;
+      return Processor::BlockX;
----------------
jungpark-mlir wrote:

That's good point.

I was only considering the small sized GPU where having `x` for the innermost could help memory locality but the situation might be totally different under larger GPU where neighbouring blocks in `x` could be spread all over the different shader cores.
It seems different mappings would be beneficial to the different architecture.  

Do you think it makes sense to add options? let's say `thread-x-to-z=true` and `block-x-to-z=true` as the default to represent the original mapping.
Hope mapping is still reasonably simple enough but also I can create a separate pass to reverse the mapping after the `gpu-map-parallel-loops`.

https://github.com/llvm/llvm-project/pull/79592