[PATCH] D124158: [Clang][Attr] Skip adding noundef attribute to arguments when function has convergent attribute
krishna chaitanya sankisa via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Apr 27 09:31:17 PDT 2022
skc7 added a comment.
In D124158#3477281 <https://reviews.llvm.org/D124158#3477281>, @jdoerfert wrote:
>> For Ex: SimplifyCFG pass removes the branch leading to a BB which has an incoming value that will always trigger undefined behavior. This basically modifies the CFG and combines the basic blocks. This works for CPU execution. But on a GPU, there are intrinsics like "__shfl_sync(unsigned mask, T var, int srcLane, int width=warpSize)", Where the exchange of variable occurs simultaneously for all active threads within the warp. So, here in the cuda/hip kernel, variable var in shuffl_sync may not be initialised, and LLVM IR treats it as undef. Currently all the arguments are tagged with noundef attribute and the above mentioned optimization by SimplifyCFG gets applied and the kernel execution becomes ambiguous. So, the proposed change is to skip adding noundef attribute to arguments when a function has been tagged with convergent attribute.
>
> Can we please have an example for this. I don't know what would be broken w/ noundef + convergent and I somewhat doubt noundef is the problem.
For the below source kernel from hypre, the optimisation by simplifyCFG pass caused issue with kernel execution on GPU.
We figured out that enabling noudef analysis by default is triggering this optimization.
**source kernel:**
Note: variable t is uninitialised intially and gets initialiazed when lane is 0.
void kernel{
double t, measure_row;
int lane = hypre_cuda_get_lane_id<1>();
...
if (lane == 0) {t = read_only_load(measure_diag + row);}
measure_row = __shfl_sync(HYPRE_WARP_FULL_MASK, t, 0);
...
}
**Example LLVM IR for a similar scenario:**
define void @func(i32 noundef %arg17) {
bb1:
%i1 = icmp eq i32 %arg17, 0
br i1 %i1, label %bb2, label %bb3
bb2: ; preds = %bb1
%i2 = call noundef double @read_only_load()
br label %bb3
bb3: ; preds = %bb2, %bb1
%i3 = phi double [ %i2, %bb2 ], [ undef, %bb1 ]
%i4 = call noundef double @__shfl_sync(double noundef %i3)
ret void
}
declare double @read_only_load()
declare double @__shfl_sync(double noundef) convergent
**IR Dump After SimplifyCFGPass on func:**
define void @func(i32 noundef %arg17) {
bb1:
%i1 = icmp eq i32 %arg17, 0
call void @llvm.assume(i1 %i1)
%i2 = call noundef double @read_only_load()
%i4 = call noundef double @__shfl_sync(double noundef %i2)
ret void
}
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D124158/new/
https://reviews.llvm.org/D124158
More information about the cfe-commits
mailing list