[PATCH] D57737: [AMDGPU] Fix DPP sequence in atomic optimizer.
    Nicolai Hähnle via Phabricator via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Thu Feb  7 03:54:57 PST 2019
    
    
  
nhaehnle added a comment.
Did you actually test this? The shift-by-3 should be unnecessary.
================
Comment at: lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:316-317
+    LaneOffset = B.CreateIntrinsic(Intrinsic::amdgcn_wwm, Ty, NewV);
+    NewV = B.CreateIntrinsic(Intrinsic::amdgcn_wwm, Ty,
+                             B.CreateBinOp(Op, NewV, SetInactive));
 
----------------
So I hadn't noticed this before, but I think the wwm intrinsic shouldn't be applied *after* the readlane below.
With wwm before readlane, there's a theoretical possibility that register allocation splits the live range of the value and inserts a V_MOV in between which ends up executed with bit 63 disabled, leading to an incorrect results from the readlane.
CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57737/new/
https://reviews.llvm.org/D57737
    
    
More information about the llvm-commits
mailing list