[Mlir-commits] [mlir] [mlir][gpu] reverse parallel loop to gpu dimension mapping order. (PR #79592)

Mon Jan 29 07:53:14 PST 2024

================
@@ -78,23 +77,23 @@ static Processor getHardwareIdForMapping(MappingLevel level, int dimension) {
   case MapGrid:
     switch (dimension) {
     case 0:
-      return Processor::BlockX;
+      return Processor::BlockZ;
     case 1:
       return Processor::BlockY;
     case 2:
-      return Processor::BlockZ;
+      return Processor::BlockX;
----------------
grypp wrote:

Just my 2 cents: cuda has larger limit for block `X` than `Y` and `Z`. The loop-to-block mapping could be done based on trip count whenever they are compile-time constant. 

https://github.com/llvm/llvm-project/pull/79592