[PATCH] D120544: [AMDGPU] Omit unnecessary waitcnt before barriers
Nicolai Hähnle via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed May 4 04:21:54 PDT 2022
nhaehnle added a comment.
Thank you for the summary.
In D120544#3488714 <https://reviews.llvm.org/D120544#3488714>, @mareko wrote:
> - only HS output shared memory opcodes
This one is tricky.
> - all memory opcodes
> - only global memory opcodes (not buffer or image)
> - only buffer opcodes
> - only image opcodes
> - only buffer opcodes using shader storage buffer object variables (not other buffers if not aliased)
> - only buffer/image opcodes using image variables (image variables can also access buffers, but shouldn't include any other buffers if not aliased)
I suppose the only difference this makes here is potentially in the counter value we wait for? I.e., we may wait for vmcnt(N) with N != 0?
> - fences never wait for opcodes writing into write-only buffers
> - fences never wait for opcodes reading from read-only buffers and images
IMO this one should be modeled like `private/nonprivate` in Vulkan. We had some discussions on this internally but not enough pressure to actually make it happen.
We should make all of this happen. I'm thinking in the direction of a `memoryscopes` operand, somewhat analogous to `syncscope`, except that while syncscope captures the set of threads we're potentially communicating with, memoryscope would capture the set of memoy through which we're potentially communicating.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D120544/new/
https://reviews.llvm.org/D120544
More information about the llvm-commits
mailing list