[PATCH] D57737: [AMDGPU] Fix DPP sequence in atomic optimizer.

Mon Feb 11 03:15:30 PST 2019

tpr added a comment.

I don't understand this fix. Surely a reduction is done with just power of two shifts. Why do we need the shift by 3 as well? What is the extra wf_sr1 dpp at the start for?

================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:258
+                        {B.getInt32(1), B.getInt32(0), B.getInt32(33)});
+  setConvergent(Ballot);

----------------
Do you need the setConvergent? It's already marked setConvergent in the .td file. This might also apply to the setConvergent calls lower down, but I haven't checked.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57737/new/

https://reviews.llvm.org/D57737