[PATCH] D80744: DAGCombiner optimization for pow(x, 0.75) even in case massv function is asked

Tue Jun 2 21:21:09 PDT 2020

steven.zhang added a comment.

In D80744#2068900 <https://reviews.llvm.org/D80744#2068900>, @masoud.ataei wrote:

> In D80744#2067821 <https://reviews.llvm.org/D80744#2067821>, @steven.zhang wrote:
>
> > In D80744#2066590 <https://reviews.llvm.org/D80744#2066590>, @masoud.ataei wrote:
> >
> > > In D80744#2062144 <https://reviews.llvm.org/D80744#2062144>, @steven.zhang wrote:
> > >
> > > > What we are doing is as follows:
> > > >  llvm.pow(IR) --> FPOW(ISD) --> __powf4_P8/9(ISD/IR)
> > > >
> > > > It makes more sense to do it in the IR pass from what I see. And then, you can query the TargetLibraryInfoImpl::isFunctionVectorizable(name) instead of the option. Maybe, we can do it in ppc pass: PPCLowerMASSVEntries which did something like:
> > > >
> > > >   __sind2_massv --> __sind2_P9 for a Power9 subtarget.
> > > >
> > > >
> > > > Does it make sense ?
> > >
> > >
> > > I agree in general it makes more sense to do this kind of conversions in an IR pass like PPCLowerMASSVEntries. This is what we are currently doing in LLVM but there is a problem here. If we change the llvm intrinsic to libcall earlier than a later optimization (like in DAGCombiner) the later optimization won't be triggered. In the case that I am proposing the change, if we have pow(x,0.75) in the code, the PPCLowerMASSVEntries pass will currently change it to __powf4_P* libcall then later in the DAGCombiner we will not get the optimization `pow(x,0.75) --> sqrt(x)*sqrt(sqrt(x))`. So that's a problem, because sqrt(x)*sqrt(sqrt(x)) is faster.
> > >
> > > What I am proposing is to move the conversion for powf4 to late in the compiler pipeline. With this change we will we will get above optimization when exponent is 0.75 and MASSV calls otherwise.
> >
> >
> > If we expect the llvm.pow(x, 0.75) to be lowered as two sqrt and for others, they are libcall, can we just don't transform it as libcall for 0.75 in PPCLowerMASSVEntries? And it will be lowered as two sqrts in DAGCombine.
>
>
> For `pow` to be `__powf4_P8`, there are two step:
>
> 1. in LoopVectorizePass,  `pow` becomes `__powf4_massv`, then
> 2. in PPCLowerMASSVEntries `__powf4_massv` becomes `__powf4_P8`
>
>   So when we reach PPCLowerMASSVEntries the pow intrinsic is already a libcall. I thought it is really ugly to undo the LoopVectorizePass conversion in PPCLowerMASSVEntries pass when there is an special case.

As what I see from your test is that, we are trying to lower the pow intrinsic(not pow libcall) to two sqrts or `__powf4_P8/9`. It is different from the case that, `pow -> __powf4_massv -> __powf4_P8` as they are all libcall path.

This is the case I can imagine from your test:
pow -> llvm.pow (llvm opt with Ofast)

So, now, we have the llvm.pow.v4f32 intrinsic now as what your tests show. And we want to lower it to libcall `__powf4_P8/9` if it is not 0.75. So, this looks like to me that, **it is an IR transformation.** If it is llvm.pow.v4f32 intrinsic and the argument is NOT 0.75, transform it to `__powf4_P8/9` if massv vector library is enabled or transform it to `__powf4_massv` at some pass that before PPCLowerMASSVEntries and then, it will be transformed to `__powf4_P8/9` after PPCLowerMASSVEntries.

Maybe, I miss some background here and thank you for your patient.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80744/new/

https://reviews.llvm.org/D80744