[PATCH] D147408: [AMDGPU] Enable AMDGPU Atomic Optimizer Pass by default.

Tue Apr 4 23:52:55 PDT 2023

nhaehnle added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:609
         Op == AtomicRMWInst::Sub ? AtomicRMWInst::Add : Op;
-    if (!NeedResult && ST->hasPermLaneX16()) {
-      // On GFX10 the permlanex16 instruction helps us build a reduction without
-      // too many readlanes and writelanes, which are generally bad for
-      // performance.
-      NewV = buildReduction(B, ScanOp, NewV, Identity);
+    if (IsGraphicsShader) {
+      // First we need to set all inactive invocations to the identity value, so
----------------
arsenm wrote:
> cdevadas wrote:
> > I'm not sure if this should get enabled for all graphics CCs. @foad can you confirm?
> I think part of the point of doing this is to stop special casing graphics usage. Semantically the shaderiness shouldn't matter. A strategy switch would be a separate control if we wanted such a thing
Let's be clear:

* Using the loop is bound to be slower in almost all cases, often significantly so.
* The fast path is currently always used in graphics.
* We cannot cause such significant performance regressions for graphics.

I agree that if we do have two different paths here, it doesn't make sense to make them "graphics" vs. "compute", but to instead have a dedicated switch. The important part is that that switch defaults to the existing, fast path for graphics.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147408/new/

https://reviews.llvm.org/D147408