[llvm] [X86, SimplifyCFG] Support hoisting load/store with conditional faulting (Part II) (PR #108812)
Shengchen Kan via llvm-commits
llvm-commits at lists.llvm.org
Sat Sep 28 20:44:55 PDT 2024
KanRobert wrote:
> > > > For cases like this why aren't we folding to:
> > > > ```
> > > > void src(int a, int *c, int *d) {
> > > > if (a)
> > > > *c = a;
> > > > else
> > > > *d = a;
> > > > }
> > > >
> > > > void tgt(int a, int *c, int *d) {
> > > > int *p = a ? c : d;
> > > > *p = a;
> > > > }
> > > > ```
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > or the even more general:
> > > > ```
> > > > void src(int s, int a, int b, int *c, int *d) {
> > > > if (s)
> > > > *c = a;
> > > > else
> > > > *d = b;
> > > > }
> > > >
> > > > void tgt(int s, int a, int b, int *c, int *d) {
> > > > int *p = s ? c : d;
> > > > int v = s ? a : b;
> > > > *p = v;
> > > > }
> > > > ```
> > >
> > >
> > > That's a good question. It doesn't rely on hardware feature, so can be a general branch to select transformation. I'm also wondering why didn't we do it before, especially the first one. Is there any concern on possible sideeffect of memory operand?
> >
> >
> > From this perspective, should we implement this optimization w/o any HW feature and do further optimization w/ CFCMOV at backend?
>
> Actually, we have already had this optimization done by SimplifyCFG, we just need to increase NumPHIInsts from 1 to 2:
>
> ```
> $ git diff
> diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
> index 69c4475a494c..488cc18f07e3 100644
> --- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
> +++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
> @@ -2360,7 +2360,7 @@ static bool sinkCommonCodeFromPredecessors(BasicBlock *BB,
> }
> }
> LLVM_DEBUG(dbgs() << "SINK: #phi insts: " << NumPHIInsts << "\n");
> - return NumPHIInsts <= 1;
> + return NumPHIInsts <= 2;
> };
>
> // We've determined that we are going to sink last ScanIdx instructions,
>
> $ cat tmp.c
> void foo(int a, int *c, int *d) {
> if (a)
> *c = a;
> else
> *d = a;
> }
>
> void bar(int s, int a, int b, int *c, int *d) {
> if (s)
> *c = a;
> else
> *d = b;
> }
>
> $ clang -S tmp.c -O2 -o - | grep -E ':|^\s[a-z]'
> foo: # @foo
> # %bb.0: # %entry
> testl %edi, %edi
> cmoveq %rdx, %rsi
> movl %edi, (%rsi)
> retq
> .Lfunc_end0:
> bar: # @bar
> # %bb.0: # %entry
> testl %edi, %edi
> cmoveq %r8, %rcx
> cmovel %edx, %esi
> movl %esi, (%rcx)
> retq
> ```
Good knowledge! Then it seems this optimization for this case w/ CFCMOV does not have any value...
We just need to add a tuning knob for `NumPHIInsts <= ProfitableToSinkInstructionThreshold`, from my perspective.
https://github.com/llvm/llvm-project/pull/108812
More information about the llvm-commits
mailing list