[PATCH] D120544: [AMDGPU] Omit unnecessary waitcnt before barriers

Wed May 4 04:21:54 PDT 2022

nhaehnle added a comment.

Thank you for the summary.

In D120544#3488714 <https://reviews.llvm.org/D120544#3488714>, @mareko wrote:

> - only HS output shared memory opcodes

This one is tricky.

> - all memory opcodes
> - only global memory opcodes (not buffer or image)
> - only buffer opcodes
> - only image opcodes
> - only buffer opcodes using shader storage buffer object variables (not other buffers if not aliased)
> - only buffer/image opcodes using image variables (image variables can also access buffers, but shouldn't include any other buffers if not aliased)

I suppose the only difference this makes here is potentially in the counter value we wait for? I.e., we may wait for vmcnt(N) with N != 0?

> - fences never wait for opcodes writing into write-only buffers
> - fences never wait for opcodes reading from read-only buffers and images

IMO this one should be modeled like `private/nonprivate` in Vulkan. We had some discussions on this internally but not enough pressure to actually make it happen.

We should make all of this happen. I'm thinking in the direction of a `memoryscopes` operand, somewhat analogous to `syncscope`, except that while syncscope captures the set of threads we're potentially communicating with, memoryscope would capture the set of memoy through which we're potentially communicating.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120544/new/

https://reviews.llvm.org/D120544