[PATCH] D147408: [AMDGPU] Enable AMDGPU Atomic Optimizer Pass by default.

Pravin Jagtap via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 4 02:32:30 PDT 2023


pravinjagtap added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:463
+
+  for (unsigned LaneIdx = 0; LaneIdx < WaveFrontSize; LaneIdx++) {
+    // Iterate over all the lanes of a wavefront to compute the partial sum. If
----------------
ruiling wrote:
> Why do we choose to unroll the loop over wave-front-size? I think this makes the sp3 assembly hard to read. Shouldn't a loop over active lanes just work?
Hello @ruiling,

One of the considerations for selecting this approach is its simplicity and efforts required for implementation. We know that the most optimized implementation for the scan is DPP with WWM. In the future, this iterative approach will become redundant when concerns related to WWM robustness are addressed. If you and everyone else think that loop over active lanes is the right thing to do, I will start implementing it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147408/new/

https://reviews.llvm.org/D147408



More information about the llvm-commits mailing list