[PATCH] D57737: [AMDGPU] Fix DPP sequence in atomic optimizer.

Thu Feb 7 03:54:57 PST 2019

nhaehnle added a comment.

Did you actually test this? The shift-by-3 should be unnecessary.

================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:316-317
+    LaneOffset = B.CreateIntrinsic(Intrinsic::amdgcn_wwm, Ty, NewV);
+    NewV = B.CreateIntrinsic(Intrinsic::amdgcn_wwm, Ty,
+                             B.CreateBinOp(Op, NewV, SetInactive));

----------------
So I hadn't noticed this before, but I think the wwm intrinsic shouldn't be applied *after* the readlane below.

With wwm before readlane, there's a theoretical possibility that register allocation splits the live range of the value and inserts a V_MOV in between which ends up executed with bit 63 disabled, leading to an incorrect results from the readlane.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57737/new/

https://reviews.llvm.org/D57737