[PATCH] D19203: AMDGPU/SI: Add llvm.amdgcn.s.waitcnt.all intrinsic

Mon Apr 18 08:45:00 PDT 2016

nhaehnle added a comment.

Yes, we need consistency between all shader invocations, which can span all the CUs and SEs on the chip. There isn't really a notion of workgroups for GLSL graphics shaders. Basically, the instruction needs to make sure that all past memory writes by the shader (actually, only 'coherent' and 'volatile' ones) are visible to all other shaders. I'm not sure about what OpenCL needs.

With this patch, the idea is to implement this by setting glc=1 on the coherent/volatile writes and using a wait. I believe (but have not tried) that an alternative would be to always use glc=0 and wait + explicitly request an L1 cache flush at the memory barrier.

Tom, do you want the numeric counts as input, or just bits that indicate whether to wait for vm/exp/lgkm?

http://reviews.llvm.org/D19203