[PATCH] D40547: AMDGPU: Fix copying i1 value out of loop with non-uniform exit

Mon Jan 29 07:05:01 PST 2018

nhaehnle added a comment.

In https://reviews.llvm.org/D40547#990379, @alex-t wrote:

> In https://reviews.llvm.org/D40547#989137, @nhaehnle wrote:
>
> > In https://reviews.llvm.org/D40547#938894, @alex-t wrote:
> >
> > > If I understand everything correct...
> > >  The problem you're trying to solve is well known.
> > >  You have divergent loop-exit and a value that is uniformly defined inside the loop but used outside the loop.
> >
> >
> > More or less. However, whether the value is uniform or not doesn't really make a difference: I can change the test case so that %cc is non-uniform, and the same issue occurs. So this isn't really about DivergenceAnalysis.
> >
> > > Could you please look here: https://reviews.llvm.org/D40556
> > > 
> > > Could you use same approach?
> > > 
> > > You have 2 blocks: defBlock and useBlock and you want to know:
> > > 
> > > 1. is useBlock  is control dependent of defBlock  ?
> > > 2. if 1 is true is defBlock's termination branch uniform? The set of control dependencies for defBlock is it's post-dominance frontier set The set of control dependencies for useBlock is it's post-dominance frontier set We need to check the branches that are NOT common in 2 sets above.
> >
> > I don't think this works, but perhaps I'm misunderstanding you. In the test case which I've added, the defBlock is %for.body, and the useBlock is %for.end.
> >
> > %for.end post-dominates the entire loop, so its post-dominance frontier is empty.
> >
> > %for.body post-dominates %entry and %end.loop, so its PDF is only %mid.loop.
> >
> > None of that information seems to help?
>
>
> for.body:
>
>    %i = phi i32 [0, %entry], [%i.inc, %end.loop]
>   ** %cc = icmp ult i32 %i, 4**                               <-- definition
>    br i1 %cc, label %mid.loop, label %for.end
>   
>
> mid.loop:
>
>    %v = call float @llvm.amdgcn.buffer.load.f32(<4 x i32> %rsrc, i32 %tid, i32 %i, i1 false, i1 false)
>    %cc2 = fcmp oge float %v, 0.0
>   ** br i1 %cc2, label %end.loop, label %for.end   **           <-- divergent branch condition
>   
>
> end.loop:
>
>   %i.inc = add i32 %i, 1
>   br label %for.body
>   
>
> for.end:
>
>   **br i1 %cc, label %if, label %end**     <-- use
>   
>
> Since the use block's PDF is empty and def block PDF contains the only one block "mid.loop" we only should check the "mid.loop"'s termination branch divergence.
>  Here it's immediately clear that the "cc2" is divergent and the branch in "mid.loop" is divergent as well.
>  So, the use in "for.end" is divergent by the control dependency of the "mid.block" divergent branch.

Right, but what if the uniformness were reversed? We could have something like:

  for.body:
    %i = phi i32 [0, %entry], [%i.inc, %end.loop]
    %cc = icmp ult i32 %i, 4**                               <-- uniform definition
    %v = call float @llvm.amdgcn.buffer.load.f32(<4 x i32> %rsrc, i32 %tid, i32 %i, i1 false, i1 false)
    %cc2 = fcmp oge float %v, 0.0
    br i1 %cc2, label %mid.loop, label %for.end <-- divergent branch

  mid.loop:
    ...
    br i1 %cc3, label %end.loop, label %for.end   <-- uniform branch condition defined somewhere else

The same problem would still exist, but the proposed way of looking at PDFs would not detect the situation.

https://reviews.llvm.org/D40547