[PATCH] D147408: [AMDGPU] Enable AMDGPU Atomic Optimizer Pass by default.

Tue Apr 11 06:09:01 PDT 2023

pravinjagtap updated this revision to Diff 512415.
pravinjagtap edited the summary of this revision.
pravinjagtap added a comment.

Implemented @ruiling suggestions. In this approach, we iterate over only active lanes of a wavefront using `llvm.cttz` to precompute an exclusive scan scan.

I have attempted the unrolled version of this loop to avoid the conditional cost of `taken branch`, but, unfortunately compile time cost increases exponentially as we need to create 64 basic blocks for one atomic operation (one for each active lane).

TODO:

- Not finalized the dedicated switch between `graphics` vs `compute`. I am not sure about how this can be addressed. If we default to DDP then users of compute need to explicitly set the flag for selecting this iterative approach for compute.
- Device function test.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147408/new/

https://reviews.llvm.org/D147408

Files:
  llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
  llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
  llvm/test/CodeGen/AMDGPU/GlobalISel/mubuf-global.ll
  llvm/test/CodeGen/AMDGPU/atomic_optimizations_buffer.ll
  llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll
  llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll
  llvm/test/CodeGen/AMDGPU/atomic_optimizations_raw_buffer.ll
  llvm/test/CodeGen/AMDGPU/atomic_optimizations_struct_buffer.ll
  llvm/test/CodeGen/AMDGPU/buffer-intrinsics-mmo-offsets.ll
  llvm/test/CodeGen/AMDGPU/dag-divergence-atomic.ll
  llvm/test/CodeGen/AMDGPU/gds-allocation.ll
  llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
  llvm/test/CodeGen/AMDGPU/noclobber-barrier.ll
  llvm/test/CodeGen/AMDGPU/should-not-hoist-set-inactive.ll