matthias-springer wrote: Can you post an example that shows the IR after each pattern application? It's hard for me to judge whether the ideal fix would be in MLIR or in Triton. https://github.com/llvm/llvm-project/pull/173436