[PATCH] D80744: DAGCombiner optimization for pow(x, 0.75) even in case massv function is asked

Wed Jun 3 08:15:04 PDT 2020

masoud.ataei added a comment.

In D80744#2070174 <https://reviews.llvm.org/D80744#2070174>, @steven.zhang wrote:

> In D80744#2068900 <https://reviews.llvm.org/D80744#2068900>, @masoud.ataei wrote:
>
> > In D80744#2067821 <https://reviews.llvm.org/D80744#2067821>, @steven.zhang wrote:
> >
> > > In D80744#2066590 <https://reviews.llvm.org/D80744#2066590>, @masoud.ataei wrote:
> > >
> > > > In D80744#2062144 <https://reviews.llvm.org/D80744#2062144>, @steven.zhang wrote:
> > > >
> > > > > What we are doing is as follows:
> > > > >  llvm.pow(IR) --> FPOW(ISD) --> __powf4_P8/9(ISD/IR)
> > > > >
> > > > > It makes more sense to do it in the IR pass from what I see. And then, you can query the TargetLibraryInfoImpl::isFunctionVectorizable(name) instead of the option. Maybe, we can do it in ppc pass: PPCLowerMASSVEntries which did something like:
> > > > >
> > > > >   __sind2_massv --> __sind2_P9 for a Power9 subtarget.
> > > > >
> > > > >
> > > > > Does it make sense ?
> > > >
> > > >
> > > > I agree in general it makes more sense to do this kind of conversions in an IR pass like PPCLowerMASSVEntries. This is what we are currently doing in LLVM but there is a problem here. If we change the llvm intrinsic to libcall earlier than a later optimization (like in DAGCombiner) the later optimization won't be triggered. In the case that I am proposing the change, if we have pow(x,0.75) in the code, the PPCLowerMASSVEntries pass will currently change it to __powf4_P* libcall then later in the DAGCombiner we will not get the optimization `pow(x,0.75) --> sqrt(x)*sqrt(sqrt(x))`. So that's a problem, because sqrt(x)*sqrt(sqrt(x)) is faster.
> > > >
> > > > What I am proposing is to move the conversion for powf4 to late in the compiler pipeline. With this change we will we will get above optimization when exponent is 0.75 and MASSV calls otherwise.
> > >
> > >
> > > If we expect the llvm.pow(x, 0.75) to be lowered as two sqrt and for others, they are libcall, can we just don't transform it as libcall for 0.75 in PPCLowerMASSVEntries? And it will be lowered as two sqrts in DAGCombine.
> >
> >
> > For `pow` to be `__powf4_P8`, there are two step:
> >
> > 1. in LoopVectorizePass,  `pow` becomes `__powf4_massv`, then
> > 2. in PPCLowerMASSVEntries `__powf4_massv` becomes `__powf4_P8`
> >
> >   So when we reach PPCLowerMASSVEntries the pow intrinsic is already a libcall. I thought it is really ugly to undo the LoopVectorizePass conversion in PPCLowerMASSVEntries pass when there is an special case.
>
>
> As what I see from your test is that, we are trying to lower the pow intrinsic(not pow libcall) to two sqrts or `__powf4_P8/9`. It is different from the case that, `pow -> __powf4_massv -> __powf4_P8` as they are all libcall path.
>
> If your intention is to have `__powf4_massv` transformed to two fsqrts if argument is 0.75, the test is missing in this patch and I am not sure if this patch could work. You might need to transform the  `__powf4_massv` to llvm.pow intrinsic at some place like PartiallyInlineLibCallsLegacyPass if the argument is 0.75
>
> Maybe, I miss some background here and I just want to get a clear understanding of the reason why we have to do it in DAG. Thank you for your patience.

Thank you for reviewing this patch.

I was also thinking that I needed to revert the conversion in LoopVectorizePass and PPCLowerMASSVEntries to llvm intrinsic llvm.pow before optimization to sqrt's happening. But it seems just with setting the operation action to custom `setOperationAction(ISD::FPOW, MVT::v4f32, Custom);` we will get the sqrt's optimization when pow(x,0.75) is used in the code.

I tested with a c code too

  $ cat vst.c 
  #include<math.h>
  void my_vspow_075 (float y[], float x[]) {
    #pragma disjoint (*y, *x)

    float *xp=x, *yp=y;
    int i;

    for (i=0; i<1024; i++) {
       *yp=powf(*xp, 0.75);
       xp++;
       yp++;
    }
  }

compile it with `clang -Ofast -fveclib=MASSV vst.c -mllvm -print-after-all`. I can see that after LoopVectorizePass, llvm.pow.f32 is converted to __powf4_massv and after PPCLowerMASSVEntries __powf4_massv is converted to __powf4_P8. But later we will get conversion to sqrt's. This is happening if you just set `setOperationAction(ISD::FPOW, MVT::v4f32, Custom);` no other changes. The rest of changes in PPCISelLowring.cpp is needed to handle the cases of powf(x,y) when y is not 0.75.

About whether we apply this changes in LoopVectorizePass, PPCLowerMASSVEntries or in PPCISelLowing, I was not the only person that decided to handle it in PPCISelLowing. I will be happy if they can speak up and help me to correctly answer your question. Their names are in the list of reviewers so I won't tag them here again.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80744/new/

https://reviews.llvm.org/D80744