[llvm] [X86, SimplifyCFG] Support hoisting load/store with conditional faulting (Part II) (PR #108812)

Sat Sep 28 20:44:55 PDT 2024

KanRobert wrote:

> > > > For cases like this why aren't we folding to:
> > > > ```
> > > > void src(int a, int *c, int *d) {
> > > >   if (a)
> > > >    *c = a;
> > > >   else
> > > >    *d = a;
> > > > }
> > > > 
> > > > void tgt(int a, int *c, int *d) {
> > > >   int *p = a ? c : d;
> > > >   *p = a;
> > > > }
> > > > ```
> > > > 
> > > > 
> > > >     
> > > >       
> > > >     
> > > > 
> > > >       
> > > >     
> > > > 
> > > >     
> > > >   
> > > > or the even more general:
> > > > ```
> > > > void src(int s, int a, int b, int *c, int *d) {
> > > >   if (s)
> > > >    *c = a;
> > > >   else
> > > >    *d = b;
> > > > }
> > > > 
> > > > void tgt(int s, int a, int b, int *c, int *d) {
> > > >   int *p = s ? c : d;
> > > >   int v = s ? a : b;
> > > >   *p = v;
> > > > }
> > > > ```
> > > 
> > > 
> > > That's a good question. It doesn't rely on hardware feature, so can be a general branch to select transformation. I'm also wondering why didn't we do it before, especially the first one. Is there any concern on possible sideeffect of memory operand?
> > 
> > 
> > From this perspective, should we implement this optimization w/o any HW feature and do further optimization w/ CFCMOV at backend?
> 
> Actually, we have already had this optimization done by SimplifyCFG, we just need to increase NumPHIInsts from 1 to 2:
> 
> ```
> $ git diff
> diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
> index 69c4475a494c..488cc18f07e3 100644
> --- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
> +++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
> @@ -2360,7 +2360,7 @@ static bool sinkCommonCodeFromPredecessors(BasicBlock *BB,
>          }
>        }
>        LLVM_DEBUG(dbgs() << "SINK: #phi insts: " << NumPHIInsts << "\n");
> -      return NumPHIInsts <= 1;
> +      return NumPHIInsts <= 2;
>      };
> 
>      // We've determined that we are going to sink last ScanIdx instructions,
> 
> $ cat tmp.c
> void foo(int a, int *c, int *d) {
>   if (a)
>    *c = a;
>   else
>    *d = a;
> }
> 
> void bar(int s, int a, int b, int *c, int *d) {
>   if (s)
>    *c = a;
>   else
>    *d = b;
> }
> 
> $ clang -S tmp.c -O2 -o - | grep -E ':|^\s[a-z]'
> foo:                                    # @foo
> # %bb.0:                                # %entry
>         testl   %edi, %edi
>         cmoveq  %rdx, %rsi
>         movl    %edi, (%rsi)
>         retq
> .Lfunc_end0:
> bar:                                    # @bar
> # %bb.0:                                # %entry
>         testl   %edi, %edi
>         cmoveq  %r8, %rcx
>         cmovel  %edx, %esi
>         movl    %esi, (%rcx)
>         retq
> ```

Good knowledge! Then it seems this optimization for this case w/ CFCMOV does not have any value...
We just need to add a tuning knob for `NumPHIInsts <= ProfitableToSinkInstructionThreshold`, from my perspective.

https://github.com/llvm/llvm-project/pull/108812