spaceotter wrote: @antiagainst I added the test. Synchronizing the barriers does have a performance cost, even if the threads are already synchronized, perhaps more in some architectures than others. https://github.com/llvm/llvm-project/pull/71575