[PATCH] D57737: [AMDGPU] Fix DPP sequence in atomic optimizer.

Tim Renouf via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 11 03:43:23 PST 2019


tpr added a comment.

But I still don't understand it:

1. Why do you want an exclusive scan? Surely what you're trying to do is just "sum" up all lanes into lane 63, which is an inclusive scan.
2. Can't you do an exclusive scan with powers of 2 shifts like an inclusive scan, but just with the wf_sr1 on the front? (Although I think that gives the wrong answer due to (1)).
3. Isn't the only thing wrong with this code before this fix that you forgot to put the bank masks on steps 2, 3 and 4? (Although you're correct to remove the unnecessary intermediate wwm intrinsic calls.)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57737/new/

https://reviews.llvm.org/D57737





More information about the llvm-commits mailing list