[PATCH] D57737: [AMDGPU] Fix DPP sequence in atomic optimizer.
Tim Renouf via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 11 03:15:30 PST 2019
tpr added a comment.
I don't understand this fix. Surely a reduction is done with just power of two shifts. Why do we need the shift by 3 as well? What is the extra wf_sr1 dpp at the start for?
================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:258
+ {B.getInt32(1), B.getInt32(0), B.getInt32(33)});
+ setConvergent(Ballot);
----------------
Do you need the setConvergent? It's already marked setConvergent in the .td file. This might also apply to the setConvergent calls lower down, but I haven't checked.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D57737/new/
https://reviews.llvm.org/D57737
More information about the llvm-commits
mailing list