[PATCH] D57737: [AMDGPU] Fix DPP sequence in atomic optimizer.
Neil Henning via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 7 05:02:26 PST 2019
sheredom marked an inline comment as done.
sheredom added inline comments.
================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:316-317
+ LaneOffset = B.CreateIntrinsic(Intrinsic::amdgcn_wwm, Ty, NewV);
+ NewV = B.CreateIntrinsic(Intrinsic::amdgcn_wwm, Ty,
+ B.CreateBinOp(Op, NewV, SetInactive));
----------------
nhaehnle wrote:
> So I hadn't noticed this before, but I think the wwm intrinsic shouldn't be applied *after* the readlane below.
>
> With wwm before readlane, there's a theoretical possibility that register allocation splits the live range of the value and inserts a V_MOV in between which ends up executed with bit 63 disabled, leading to an incorrect results from the readlane.
Oh yeah - so I was assuming because readlane ignores the exec mask it should be fine, but I can see why that might be an issue. I'll change the code.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D57737/new/
https://reviews.llvm.org/D57737
More information about the llvm-commits
mailing list