giuseros wrote: > Please just use atomicrmw fadd. I will shortly be pushing to remove the intrinsic Hi @arsenm , the problem is that `atomicrmw fadd` does not support vectors. So this gets translated into a cas loop which is very slow https://github.com/llvm/llvm-project/pull/94486