[PATCH] D147408: [AMDGPU] Enable AMDGPU Atomic Optimizer Pass by default.

Tue Apr 4 02:32:30 PDT 2023

pravinjagtap added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:463
+
+  for (unsigned LaneIdx = 0; LaneIdx < WaveFrontSize; LaneIdx++) {
+    // Iterate over all the lanes of a wavefront to compute the partial sum. If
----------------
ruiling wrote:
> Why do we choose to unroll the loop over wave-front-size? I think this makes the sp3 assembly hard to read. Shouldn't a loop over active lanes just work?
Hello @ruiling,

One of the considerations for selecting this approach is its simplicity and efforts required for implementation. We know that the most optimized implementation for the scan is DPP with WWM. In the future, this iterative approach will become redundant when concerns related to WWM robustness are addressed. If you and everyone else think that loop over active lanes is the right thing to do, I will start implementing it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147408/new/

https://reviews.llvm.org/D147408