[Mlir-commits] [mlir] [ROCDL] Add the global.atomic.fadd intrinsic in ROCDL (PR #94486)

Thu Jun 6 09:16:52 PDT 2024

arsenm wrote:

> > > Please just use atomicrmw fadd. I will shortly be pushing to remove the intrinsic
> > 
> > 
> > Hi @arsenm , the problem is that `atomicrmw fadd` does not support vectors. So, in the case of `fp16`, this gets translated into a cas loop which is very slow
> 
> Or maybe it does?

atomicrmw FP operations do since 4cb110a84f587d3c65b85d79ab6fc8aa5489fb86. I still need to implement the AMDGPU codegen changes to start using the vector instructions though (plus eventually the new metadata from #85052 will be needed 

https://github.com/llvm/llvm-project/pull/94486