[PATCH] D57737: [AMDGPU] Fix DPP sequence in atomic optimizer.
Nicolai Hähnle via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 7 03:54:57 PST 2019
nhaehnle added a comment.
Did you actually test this? The shift-by-3 should be unnecessary.
================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:316-317
+ LaneOffset = B.CreateIntrinsic(Intrinsic::amdgcn_wwm, Ty, NewV);
+ NewV = B.CreateIntrinsic(Intrinsic::amdgcn_wwm, Ty,
+ B.CreateBinOp(Op, NewV, SetInactive));
----------------
So I hadn't noticed this before, but I think the wwm intrinsic shouldn't be applied *after* the readlane below.
With wwm before readlane, there's a theoretical possibility that register allocation splits the live range of the value and inserts a V_MOV in between which ends up executed with bit 63 disabled, leading to an incorrect results from the readlane.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D57737/new/
https://reviews.llvm.org/D57737
More information about the llvm-commits
mailing list