Artem-B wrote: The problem was in triton code, which copy/pasted use of `syncscope("agent")` from AMDGPU code. Replacing it with `syncscope("device")` resolved the issue. https://github.com/llvm/llvm-project/pull/140812