grypp wrote: The PR looks good, but we might want to deprecate this operation (`nvvm.barrier0`) and use `nvvm.barrier` OP. It is more capable and aligns better with the PTX instruction. https://github.com/llvm/llvm-project/pull/119965