[Mlir-commits] [mlir] [mlir][gpu] reverse parallel loop to gpu dimension mapping order. (PR #79592)

Mon Jan 29 07:47:53 PST 2024

grypp wrote:

Great observation on placing thread `x` within the inner loop instead of the outer loop – that's a sensible adjustment. Thanks for catching that!

The mapping can benefit from further optimization to enhance its usefulness. It could be valuable for the compiler to analyse memory accesses within the loops, identifying the loop with the most coalesced memory accesses and maps them to thread `x` . Additionally, considering trip counts is crucial; for instance, if a trip count is small (< warpSize), it might be beneficial to avoid parallelization across threads to prevent underutilization of lanes. Moreover, the current implementation only maps a loop to a thread or block, mapping together isn't possible 

https://github.com/llvm/llvm-project/pull/79592