[PATCH] D157265: [AMDGPU] Reorder atomic optimizer to avoid CAS loop.

Mon Aug 7 07:50:25 PDT 2023

arsenm added a comment.

In D157265#4565286 <https://reviews.llvm.org/D157265#4565286>, @pravinjagtap wrote:

> In D157265#4565049 <https://reviews.llvm.org/D157265#4565049>, @foad wrote:
>
>>> Expand-Atomic pass emits the CAS loop for FP operations
>>> which limits the optimizations offered by atomic optimizer.
>>>
>>> Moving atomic optimizer before expand-atomics allows
>>> better codegen.
>>
>> So the intention is that you still get a CAS loop, but the whole loop is executed by a single lane, instead of switching into and out of single-lane mode each time around the loop?
>
> At the moment, yes, that the current behavior. As a extension, will be creating new patch where atomic-expand calls `simplifyCFG` on the relevant blocks if it emits a CAS loop. [suggested by @arsenm]

AArch64 deals with this by inserting an extra simplifyCFG pass run, which seems excessive given we're only making local changes

================
Comment at: llvm/test/CodeGen/AMDGPU/r600.global_atomics.ll:2
+; RUN: llc -march=r600 -mcpu=cypress -amdgpu-atomic-optimizer-strategy=None -verify-machineinstrs < %s | FileCheck -check-prefix=EG -check-prefix=FUNC %s
+; RUN: llc -march=r600 -mcpu=cayman -amdgpu-atomic-optimizer-strategy=None -verify-machineinstrs < %s | FileCheck -check-prefix=EG -check-prefix=FUNC %s

----------------
It seems you're working around the pass not handling r600, can you just not add the pass in that case

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157265/new/

https://reviews.llvm.org/D157265