[PATCH] D37616: [X86] PR34149 Suboptimal codegen for fast minnum and maxnum.

Sun Sep 10 10:32:36 PDT 2017

jbhateja added a comment.

In https://reviews.llvm.org/D37616#865981, @spatel wrote:

> In https://reviews.llvm.org/D37616#865798, @jbhateja wrote:
>
> > In https://reviews.llvm.org/D37616#865418, @spatel wrote:
> >
> > > I might be missing some context here. If we have fast/nnan on these calls, then can't we simplify this in IR to fmp+select and not have to deal with this in the backend? The intrinsics only exist to make sure that NaN behavior in IR meets the higher level standards, so if we have nnan, then we don't need the intrinsic?
> >
> >
> > Intrinsic function defer code geneation/expansion to backend this give backend control over geneating efficient code as per specific target.
>
>
> It's incorrect that intrinsics are passed unaltered to the backend for expansion/optimization. See the optimizations for both generic and target-specific intrinsics in InstCombiner::visitCallInst().
>
> Again, I may be missing some context - who created this IR? Creating a 'call fast llvm.maxnum()' just doesn't make sense to me, so if we can fix that in IR, we should do that. The intrinsic inhibits the large number of potential optimizations for fcmp+select that we have in IR. No target should benefit from having extra NaN semantics requirements provided by the intrinsic that are then overridden by FMF.
>
> Please split the FlagsAcquirer diff into a separate patch.

Your point of not genrating call fast @llvm.minnum in the first place and instead fcmp+setcc should have been generated is valid. But, its still a valid IR syntactically and semantically which could be thrown at backend.

Please consider following case which is being compiled for arm( -mcpu=cortex-r52 -march=arm  test.ll -mattr=fp-armv8)

define <4 x double> @CASEA(<4 x double> %x, <4 x double> %y) {

  %z = call fast <4 x double> @llvm.minnum.v4f64(<4 x double> %x, <4 x double> %y) readnone
  ret <4 x double> %z

}

define <4 x double> @CASEB(<4 x double> %x, <4 x double> %y) {

  %c = fcmp ule <4 x double> %x, %y
  %z = select <4 x i1> %c, <4 x double> %x, <4 x double> %y
  ret <4 x double> %z

}

Instruction selector does not generates vminnm for CASEB where as same is generated for CASEA. 
As of now SelectionDAGBuilder generates fminnum (or fminnnan) SDNode for llvm.minnum intrinsic. 
Thus different targets lowers fminnum (SDNode) differently.

So handling for fast math with intrinsic here should be fine ?

Otherwise, I can put Acquirer in a seperate patch.

https://reviews.llvm.org/D37616