[PATCH] D57737: [AMDGPU] Fix DPP sequence in atomic optimizer.

Tim Renouf via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 11 05:04:29 PST 2019

tpr accepted this revision.
tpr added a comment.

Ah, right, I see about the need for the exclusive and inclusive scan results.

I checked https://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ and I didn't see any reasoning about why you need the shift by 3. In the diagram there, I reckon you could change instruction 2 to have the result of instruction 1 as both operands, and omit instruction 3 (the shift by 3), and get the same result, saving one instruction. However that might actually take one wait state longer, assuming you can't schedule other stuff into the middle of it, which you probably can't.

So I'm happy now.



More information about the llvm-commits mailing list