[PATCH] D147408: [AMDGPU] Enable AMDGPU Atomic Optimizer Pass by default.
Pravin Jagtap via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 4 02:32:30 PDT 2023
pravinjagtap added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:463
+
+ for (unsigned LaneIdx = 0; LaneIdx < WaveFrontSize; LaneIdx++) {
+ // Iterate over all the lanes of a wavefront to compute the partial sum. If
----------------
ruiling wrote:
> Why do we choose to unroll the loop over wave-front-size? I think this makes the sp3 assembly hard to read. Shouldn't a loop over active lanes just work?
Hello @ruiling,
One of the considerations for selecting this approach is its simplicity and efforts required for implementation. We know that the most optimized implementation for the scan is DPP with WWM. In the future, this iterative approach will become redundant when concerns related to WWM robustness are addressed. If you and everyone else think that loop over active lanes is the right thing to do, I will start implementing it.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D147408/new/
https://reviews.llvm.org/D147408
More information about the llvm-commits
mailing list