[PATCH] D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management.

Mon Sep 11 09:18:02 PDT 2017

jbhateja added a comment.

In https://reviews.llvm.org/D37686#866584, @spatel wrote:

> Here's a potential test case that would show a difference from having FMF on a sqrt intrinsic:
>
>   define float @fast_recip_sqrt(float %x) {
>     %y = call fast float @llvm.sqrt.f32(float %x)
>     %z = fdiv fast float 1.0,  %y
>     ret float %z
>   }
>   declare float @llvm.sqrt.f32(float) nounwind readonly
>
>
> ...but as I said earlier, we need to fix the DAGCombiner code where this fold is implemented to recognize the flags on the individual nodes. Currently, it just checks the global state:
>
>   if (Options.UnsafeFPMath) {
>   
>
> On x86 currently, this will use the full-precision sqrtss+divss, but it should be using rsqrtss followed by mulss/addss to refine the estimate.

In https://reviews.llvm.org/D37686#866521, @spatel wrote:

> There is a functional difference from this improvement (several FP intrinsics should now correctly propagate flags), but I'm not sure if it's visible without fixing something else in DAGCombiner to recognize the flags.
>
> How does this work if an instruction maps to multiple nodes? For example, the FMA intrinsic can map to 2 nodes?

This propagation happens during SelectionDAGBuilder::visit, I scanned thorugh various instructions and there is 1:1 mapping b/w instructions and initial SDNode created for it.
Like for llvm.fma (Intrinsic Instruction ) -> SDNode(ISD::FMA )

  for llvm.minnum (Insturction)       ->  SDNode(ISD::FMINNUM/FMINNAN)

etc.  There initial SDNode are later lowered/expanded during following DAG phases.

In https://reviews.llvm.org/D37686#866584, @spatel wrote:

> Here's a potential test case that would show a difference from having FMF on a sqrt intrinsic:
>
>   define float @fast_recip_sqrt(float %x) {
>     %y = call fast float @llvm.sqrt.f32(float %x)
>     %z = fdiv fast float 1.0,  %y
>     ret float %z
>   }
>   declare float @llvm.sqrt.f32(float) nounwind readonly
>
>
> ...but as I said earlier, we need to fix the DAGCombiner code where this fold is implemented to recognize the flags on the individual nodes. Currently, it just checks the global state:
>
>   if (Options.UnsafeFPMath) {
>   
>
> On x86 currently, this will use the full-precision sqrtss+divss, but it should be using rsqrtss followed by mulss/addss to refine the estimate.

In https://reviews.llvm.org/D37686#866584, @spatel wrote:

> Here's a potential test case that would show a difference from having FMF on a sqrt intrinsic:
>
>   define float @fast_recip_sqrt(float %x) {
>     %y = call fast float @llvm.sqrt.f32(float %x)
>     %z = fdiv fast float 1.0,  %y
>     ret float %z
>   }
>   declare float @llvm.sqrt.f32(float) nounwind readonly
>
>
> ...but as I said earlier, we need to fix the DAGCombiner code where this fold is implemented to recognize the flags on the individual nodes. Currently, it just checks the global state:
>
>   if (Options.UnsafeFPMath) {
>   
>
> On x86 currently, this will use the full-precision sqrtss+divss, but it should be using rsqrtss followed by mulss/addss to refine the estimate.

Ok, we also have another usage of Fast Maths flage in reviev https://reviews.llvm.org/D37616.  Can you please file a bugzilla for this to track it.

https://reviews.llvm.org/D37686