[Mlir-commits] [mlir] [mlir][gpu] reverse parallel loop to gpu dimension mapping order. (PR #79592)
Guray Ozen
llvmlistbot at llvm.org
Mon Jan 29 07:53:14 PST 2024
================
@@ -78,23 +77,23 @@ static Processor getHardwareIdForMapping(MappingLevel level, int dimension) {
case MapGrid:
switch (dimension) {
case 0:
- return Processor::BlockX;
+ return Processor::BlockZ;
case 1:
return Processor::BlockY;
case 2:
- return Processor::BlockZ;
+ return Processor::BlockX;
----------------
grypp wrote:
Just my 2 cents: cuda has larger limit for block `X` than `Y` and `Z`. The loop-to-block mapping could be done based on trip count whenever they are compile-time constant.
https://github.com/llvm/llvm-project/pull/79592
More information about the Mlir-commits
mailing list