[PATCH] D57737: [AMDGPU] Fix DPP sequence in atomic optimizer.

Tim Renouf via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 11 03:15:30 PST 2019


tpr added a comment.

I don't understand this fix. Surely a reduction is done with just power of two shifts. Why do we need the shift by 3 as well? What is the extra wf_sr1 dpp at the start for?



================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:258
+                        {B.getInt32(1), B.getInt32(0), B.getInt32(33)});
+  setConvergent(Ballot);
 
----------------
Do you need the setConvergent? It's already marked setConvergent in the .td file. This might also apply to the setConvergent calls lower down, but I haven't checked.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57737/new/

https://reviews.llvm.org/D57737





More information about the llvm-commits mailing list