[PATCH] D133840: AMDGPU: Add a pass to rewrite certain undef in PHI

Thu Sep 15 00:41:13 PDT 2022

ruiling added a comment.

In D133840#3791463 <https://reviews.llvm.org/D133840#3791463>, @sameerds wrote:

> In D133840#3791418 <https://reviews.llvm.org/D133840#3791418>, @ruiling wrote:
>
>> Move the pass after SIAnnotateControlFlow.
>> Simplify the backedge undef logic since joining divergent threads at loop header has been handled by SIAnnotateControlFlow.
>
> Is there an advantage in making this assumption that it always runs after SIAnnotateControlFlow? It makes the pass less useful for anything other than the current AMDGPU. If there is no harm in keeping the backedge logic, then the pass will remain useful for any future use where such loops do occur.

Please see my inline comment, we cannot work correctly for such kind of loops which may appear before SIAnnotateControlFlow for now. We should move the block-split logic inside SIAnnotateControlFlow into structurizer itself later, then the pass should be able to work with any pass after StructurizeCFG.

================
Comment at: llvm/test/CodeGen/AMDGPU/rewrite-undef-for-phi.ll:92
+  %ind = phi i32 [ 0, %entry ], [ 0, %if ], [ %inc, %loop ]
+  %c2 = phi float [ undef, %if ], [ %c, %entry ], [ undef, %loop ]
+  %inc = add i32 %ind, 1
----------------
ruiling wrote:
> In case we have different IR "%c2 = phi float [ undef, %if ], [ %c, %entry ], [ %some_other_value, %loop ]", %c2 will be divergent after StructurizeCFG. But such kind of "joining divergent threads in loop header" will be split by SIAnnotateControlFlow. So %c2 will be lowered to something like:
> ```
> split.cf
>   %c2_split = phi float [ undef, %if ], [ %c, %entry ]
>   call void @llvm.amdgcn.end.cf()
>   br label %loop
> 
> loop:
>   %c2 = phi float [%c2_split, %split.cf], [%some_other_value, %loop]
> ```
> %c2_split should also be rewritten by this pass. So I have to move the pass after SIAnnotateControlFlow to handle this situation.
> 
> 
> In case we have different IR "%c2 = phi float [ undef, %if ], [ %c, %entry ], [ %some_other_value, %loop ]", %c2 will be divergent after StructurizeCFG. But such kind of "joining divergent threads in loop header" will be split by SIAnnotateControlFlow. So %c2 will be lowered to something like:
> ```
> split.cf
>   %c2_split = phi float [ undef, %if ], [ %c, %entry ]
>   call void @llvm.amdgcn.end.cf()
>   br label %loop
> 
> loop:
>   %c2 = phi float [%c2_split, %split.cf], [%some_other_value, %loop]
> ```
> %c2_split should also be rewritten by this pass. So I have to move the pass after SIAnnotateControlFlow to handle this situation.
> 
> 

================
Comment at: llvm/test/CodeGen/AMDGPU/rewrite-undef-for-phi.ll:92
+  %ind = phi i32 [ 0, %entry ], [ 0, %if ], [ %inc, %loop ]
+  %c2 = phi float [ undef, %if ], [ %c, %entry ], [ undef, %loop ]
+  %inc = add i32 %ind, 1
----------------
ruiling wrote:
> ruiling wrote:
> > In case we have different IR "%c2 = phi float [ undef, %if ], [ %c, %entry ], [ %some_other_value, %loop ]", %c2 will be divergent after StructurizeCFG. But such kind of "joining divergent threads in loop header" will be split by SIAnnotateControlFlow. So %c2 will be lowered to something like:
> > ```
> > split.cf
> >   %c2_split = phi float [ undef, %if ], [ %c, %entry ]
> >   call void @llvm.amdgcn.end.cf()
> >   br label %loop
> > 
> > loop:
> >   %c2 = phi float [%c2_split, %split.cf], [%some_other_value, %loop]
> > ```
> > %c2_split should also be rewritten by this pass. So I have to move the pass after SIAnnotateControlFlow to handle this situation.
> > 
> > 
> > In case we have different IR "%c2 = phi float [ undef, %if ], [ %c, %entry ], [ %some_other_value, %loop ]", %c2 will be divergent after StructurizeCFG. But such kind of "joining divergent threads in loop header" will be split by SIAnnotateControlFlow. So %c2 will be lowered to something like:
> > ```
> > split.cf
> >   %c2_split = phi float [ undef, %if ], [ %c, %entry ]
> >   call void @llvm.amdgcn.end.cf()
> >   br label %loop
> > 
> > loop:
> >   %c2 = phi float [%c2_split, %split.cf], [%some_other_value, %loop]
> > ```
> > %c2_split should also be rewritten by this pass. So I have to move the pass after SIAnnotateControlFlow to handle this situation.
> > 
> > 
> 
> 
This is the major motivation to move the pass after SIAnnotateControlFlow, because we cannot work correctly for this case.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133840/new/

https://reviews.llvm.org/D133840