[PATCH] D73815: AMDGPU: Fix divergence analysis of control flow intrinsics
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 3 15:24:38 PST 2020
arsenm added a comment.
I think the only obstacle to eliminating requiresUniformRegister is due to the treatment of phis with always uniform inputs. This case for example, the LCSSA phi in the return block was incorrectly concluded to be divergent, despite there only being one always uniform input
Printing analysis 'Legacy Divergence Analysis' for function 'atomic_nand_i32_lds':
DIVERGENT: i32 addrspace(3)* %ptr
:
DIVERGENT: %1 = load i32, i32 addrspace(3)* %ptr, align 4
br label %atomicrmw.start
atomicrmw.start:
%phi.broken = phi i64 [ %4, %atomicrmw.start ], [ 0, %0 ]
DIVERGENT: %loaded = phi i32 [ %1, %0 ], [ %newloaded, %atomicrmw.start ]
DIVERGENT: %2 = and i32 %loaded, 4
DIVERGENT: %new = xor i32 %2, -1
DIVERGENT: %3 = cmpxchg i32 addrspace(3)* %ptr, i32 %loaded, i32 %new seq_cst seq_cst
DIVERGENT: %success = extractvalue { i32, i1 } %3, 1
DIVERGENT: %newloaded = extractvalue { i32, i1 } %3, 0
%4 = call i64 @llvm.amdgcn.if.break.i64(i1 %success, i64 %phi.broken)
DIVERGENT: %5 = call i1 @llvm.amdgcn.loop.i64(i64 %4)
DIVERGENT: br i1 %5, label %atomicrmw.end, label %atomicrmw.start
atomicrmw.end:
DIVERGENT: %newloaded.lcssa = phi i32 [ %newloaded, %atomicrmw.start ]
DIVERGENT: %.lcssa = phi i64 [ %4, %atomicrmw.start ]
DIVERGENT: call void @llvm.amdgcn.end.cf.i64(i64 %.lcssa)
DIVERGENT: ret i32 %newloaded.lcssa
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D73815/new/
https://reviews.llvm.org/D73815
More information about the llvm-commits
mailing list