[PATCH] D140537: SIInsertWait: Skip tied source of d16 buffer instruction
Ruiling, Song via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 22 05:49:20 PST 2022
ruiling added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1181
+
+ // D16 buffer instructions does not actually read the tied source
+ // operand, so we can skip the source operand.
----------------
arsenm wrote:
> ruiling wrote:
> > foad wrote:
> > > Should we do the same thing for _d16 DS, FLAT, SCRATCH and GLOBAL instructions?
> > For _d16 DS/FLAT, skipping the tied source does not change the code generation. As we always need a s_waitcnt for two successive _d16 ds/flat load because they may return out of order. So, I would rather not handle them here. I am actually setting the D16Buf bit for global/scratch load in parent change. Should I change `D16Buf` to something else? I will update the comment.
> Since when can't they return out of order?
>
> Could you demonstrate a change by combining a ds load with a global load? The two halves don't need to access the same address space
> Since when can't they return out of order?
Sorry I don't know what you are asking. Could you help explain?
>
> Could you demonstrate a change by combining a ds load with a global load? The two halves don't need to access the same address space
A ds load followed by a global_load to the same VGPR will always have a WAW dependency. The change here is to remove a false read-after-write dependency.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D140537/new/
https://reviews.llvm.org/D140537
More information about the llvm-commits
mailing list