[PATCH] D147408: [AMDGPU] Enable AMDGPU Atomic Optimizer Pass by default.
Nicolai Hähnle via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 4 23:52:55 PDT 2023
nhaehnle added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:609
Op == AtomicRMWInst::Sub ? AtomicRMWInst::Add : Op;
- if (!NeedResult && ST->hasPermLaneX16()) {
- // On GFX10 the permlanex16 instruction helps us build a reduction without
- // too many readlanes and writelanes, which are generally bad for
- // performance.
- NewV = buildReduction(B, ScanOp, NewV, Identity);
+ if (IsGraphicsShader) {
+ // First we need to set all inactive invocations to the identity value, so
----------------
arsenm wrote:
> cdevadas wrote:
> > I'm not sure if this should get enabled for all graphics CCs. @foad can you confirm?
> I think part of the point of doing this is to stop special casing graphics usage. Semantically the shaderiness shouldn't matter. A strategy switch would be a separate control if we wanted such a thing
Let's be clear:
* Using the loop is bound to be slower in almost all cases, often significantly so.
* The fast path is currently always used in graphics.
* We cannot cause such significant performance regressions for graphics.
I agree that if we do have two different paths here, it doesn't make sense to make them "graphics" vs. "compute", but to instead have a dedicated switch. The important part is that that switch defaults to the existing, fast path for graphics.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D147408/new/
https://reviews.llvm.org/D147408
More information about the llvm-commits
mailing list