[PATCH] D39060: AMDGPU: Lower buffer store and atomic intrinsics manually

Tue Oct 24 17:40:39 PDT 2017

nhaehnle added inline comments.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4566-4568
+      MachineMemOperand::MOStore |
+      MachineMemOperand::MODereferenceable |
+      MachineMemOperand::MOVolatile,
----------------
nhaehnle wrote:
> t-tye wrote:
> > nhaehnle wrote:
> > > arsenm wrote:
> > > > nhaehnle wrote:
> > > > > Unfortunately, there's not a lot of documentation on MOVolatile, but I suspect this should not be set at least when GLC == SLC == 0. And I image that that would fix the issue with D39012 as well... (which means the order of patches should be reversed).
> > > > You should not be setting MOVolatile out of nowhere. Adding that defeats what you are trying to accomplish. I also think we aren't setting volatile directly to GLC and the memory legalizer pass is supposed to set GLC.
> > > You're right, MOVolatile should be unnecessary even with GLC. I was thinking of GLSL writes to coherent buffer objects, but those still need memoryBarrier()s for guaranteed ordering. So I agree, buffer stores should never be MOVolatile.
> > MMO now have atomic memory ordering and memory scope that convey how atomics are required to be coherent. The memory legalizer pass uses this information to set glc bit, generate appropriate watcnt, and cache invalidate instructions.
> > 
> > These are separate from the volatile property which has a different purpose.
> > 
> > So if the goal is to request atomic coherence (release/acquire memory model semantics) shouldn't the MMO memory ordering/scope be set correctly?
> This is mostly about stores, not atomics. (GLSL) buffer stores don't imply any ordering by themselves. As far as GLSL is concerned, //some// buffer stores (those to "coherent" buffers) can be combined with memoryBarrier builtins, in which case there are some guarantees about ordering wrt other shader invocations, but the stores themselves provide no such guarantee.
> 
> Maybe the "coherent" flag can be modeled with with those memory scopes - where are they documented?
> 
> In general, I definitely agree that we should use the MMO machinery correctly :)
That said, it of course makes sense to talk about how to set the MMO for buffer_atomic intrinsics as well. "relaxed" (or "monotonic", in LLVM speak) ordering might actually be sufficient for those for GLSL semantics (again, because GLSL kind of wants you to add explicit memoryBarrier() builtin function calls), but I haven't fully thought this through.

https://reviews.llvm.org/D39060