[PATCH] D152649: [AMDGPU] Enable Atomic Optimizer and Default to Iterative Scan Strategy.

Mon Jun 19 04:01:40 PDT 2023

foad added a comment.

> The pass seems to take an atomic operation that lowers to a single instruction and replace it with a loop over active lanes, each of which calls that same instruction.

No - it takes an atomic operations that is executed by (we assume) many lanes, and replaces it with an atomic that is only executed by a single lane, because it is inside some kind of "if (laneid==0)" check.

To make this work you might have to fettle the inputs or outputs of the atomic op, to make it work "as if" it was executed many times by many lanes. E.g. for an atomic add you have to do a plus-reduction of the inputs to the many-lane atomic adds, to get the value to pass into the single-lane atomic add. That's where the loop comes in: it is one way of calculating the plus-reduction. But since it is only doing ALU work, it is still supposed to be better than running a whole bunch of serialised atomic memory operations.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152649/new/

https://reviews.llvm.org/D152649