[PATCH] D147408: [AMDGPU] Enable AMDGPU Atomic Optimizer Pass by default.
Pravin Jagtap via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 11 06:09:01 PDT 2023
pravinjagtap updated this revision to Diff 512415.
pravinjagtap edited the summary of this revision.
pravinjagtap added a comment.
Implemented @ruiling suggestions. In this approach, we iterate over only active lanes of a wavefront using `llvm.cttz` to precompute an exclusive scan scan.
I have attempted the unrolled version of this loop to avoid the conditional cost of `taken branch`, but, unfortunately compile time cost increases exponentially as we need to create 64 basic blocks for one atomic operation (one for each active lane).
TODO:
- Not finalized the dedicated switch between `graphics` vs `compute`. I am not sure about how this can be addressed. If we default to DDP then users of compute need to explicitly set the flag for selecting this iterative approach for compute.
- Device function test.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D147408/new/
https://reviews.llvm.org/D147408
Files:
llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
llvm/test/CodeGen/AMDGPU/GlobalISel/mubuf-global.ll
llvm/test/CodeGen/AMDGPU/atomic_optimizations_buffer.ll
llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll
llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll
llvm/test/CodeGen/AMDGPU/atomic_optimizations_raw_buffer.ll
llvm/test/CodeGen/AMDGPU/atomic_optimizations_struct_buffer.ll
llvm/test/CodeGen/AMDGPU/buffer-intrinsics-mmo-offsets.ll
llvm/test/CodeGen/AMDGPU/dag-divergence-atomic.ll
llvm/test/CodeGen/AMDGPU/gds-allocation.ll
llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
llvm/test/CodeGen/AMDGPU/noclobber-barrier.ll
llvm/test/CodeGen/AMDGPU/should-not-hoist-set-inactive.ll
More information about the llvm-commits
mailing list