[PATCH] D39060: AMDGPU: Lower buffer store and atomic intrinsics manually

Tue Oct 31 15:49:07 PDT 2017

t-tye added inline comments.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4566-4568
+      MachineMemOperand::MOStore |
+      MachineMemOperand::MODereferenceable |
+      MachineMemOperand::MOVolatile,
----------------
mareko wrote:
> t-tye wrote:
> > nhaehnle wrote:
> > > nhaehnle wrote:
> > > > t-tye wrote:
> > > > > nhaehnle wrote:
> > > > > > arsenm wrote:
> > > > > > > nhaehnle wrote:
> > > > > > > > Unfortunately, there's not a lot of documentation on MOVolatile, but I suspect this should not be set at least when GLC == SLC == 0. And I image that that would fix the issue with D39012 as well... (which means the order of patches should be reversed).
> > > > > > > You should not be setting MOVolatile out of nowhere. Adding that defeats what you are trying to accomplish. I also think we aren't setting volatile directly to GLC and the memory legalizer pass is supposed to set GLC.
> > > > > > You're right, MOVolatile should be unnecessary even with GLC. I was thinking of GLSL writes to coherent buffer objects, but those still need memoryBarrier()s for guaranteed ordering. So I agree, buffer stores should never be MOVolatile.
> > > > > MMO now have atomic memory ordering and memory scope that convey how atomics are required to be coherent. The memory legalizer pass uses this information to set glc bit, generate appropriate watcnt, and cache invalidate instructions.
> > > > > 
> > > > > These are separate from the volatile property which has a different purpose.
> > > > > 
> > > > > So if the goal is to request atomic coherence (release/acquire memory model semantics) shouldn't the MMO memory ordering/scope be set correctly?
> > > > This is mostly about stores, not atomics. (GLSL) buffer stores don't imply any ordering by themselves. As far as GLSL is concerned, //some// buffer stores (those to "coherent" buffers) can be combined with memoryBarrier builtins, in which case there are some guarantees about ordering wrt other shader invocations, but the stores themselves provide no such guarantee.
> > > > 
> > > > Maybe the "coherent" flag can be modeled with with those memory scopes - where are they documented?
> > > > 
> > > > In general, I definitely agree that we should use the MMO machinery correctly :)
> > > That said, it of course makes sense to talk about how to set the MMO for buffer_atomic intrinsics as well. "relaxed" (or "monotonic", in LLVM speak) ordering might actually be sufficient for those for GLSL semantics (again, because GLSL kind of wants you to add explicit memoryBarrier() builtin function calls), but I haven't fully thought this through.
> > For the AMDGPU target the memory model currently implemented is documented in https://llvm.org/docs/AMDGPUUsage.html#memory-model . It does include the LLVM IR fence. Feel free to ping me if you want to discuss what settings you should use to achieve the GLSL memory model semantics as we worked through doing the OpenCL/HSA memory model mapping.
> > 
> We could do what amdgcn_atomic_inc/dec intrinsics do: have "i1 volatile" as an intrinsic parameter. In the meantime, I'll just remove MOVolatile.
Note that using volatile is not the same think as using the atomic memory_order. So if intrinsics are relying on volatile to indicate an atomic operation without setting the memory_order correctly, then that sounds like a bug that would be good to fix:-)

https://reviews.llvm.org/D39060