[PATCH] D73815: AMDGPU: Fix divergence analysis of control flow intrinsics

Mon Feb 3 15:24:38 PST 2020

arsenm added a comment.

I think the only obstacle to eliminating requiresUniformRegister is due to the treatment of phis with always uniform inputs. This case for example, the LCSSA phi in the return block was incorrectly concluded to be divergent, despite there only being one always uniform input

  Printing analysis 'Legacy Divergence Analysis' for function 'atomic_nand_i32_lds':
  DIVERGENT: i32 addrspace(3)* %ptr

             :
  DIVERGENT:       %1 = load i32, i32 addrspace(3)* %ptr, align 4
                   br label %atomicrmw.start

             atomicrmw.start:
                   %phi.broken = phi i64 [ %4, %atomicrmw.start ], [ 0, %0 ]
  DIVERGENT:       %loaded = phi i32 [ %1, %0 ], [ %newloaded, %atomicrmw.start ]
  DIVERGENT:       %2 = and i32 %loaded, 4
  DIVERGENT:       %new = xor i32 %2, -1
  DIVERGENT:       %3 = cmpxchg i32 addrspace(3)* %ptr, i32 %loaded, i32 %new seq_cst seq_cst
  DIVERGENT:       %success = extractvalue { i32, i1 } %3, 1
  DIVERGENT:       %newloaded = extractvalue { i32, i1 } %3, 0
                   %4 = call i64 @llvm.amdgcn.if.break.i64(i1 %success, i64 %phi.broken)
  DIVERGENT:       %5 = call i1 @llvm.amdgcn.loop.i64(i64 %4)
  DIVERGENT:       br i1 %5, label %atomicrmw.end, label %atomicrmw.start

             atomicrmw.end:
  DIVERGENT:       %newloaded.lcssa = phi i32 [ %newloaded, %atomicrmw.start ]
  DIVERGENT:       %.lcssa = phi i64 [ %4, %atomicrmw.start ]
  DIVERGENT:       call void @llvm.amdgcn.end.cf.i64(i64 %.lcssa)
  DIVERGENT:       ret i32 %newloaded.lcssa

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73815/new/

https://reviews.llvm.org/D73815