[PATCH] Allow FMAs in safe math mode in some cases when one operand of the fmul is either exactly 0.0 or exactly 1.0.

Tue Jul 9 14:45:47 PDT 2013

----- Original Message -----
> On Tue, Jul 9, 2013 at 2:11 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> > Stephen,
> >
> > Should it be 0, 1 or undef? I doubt that the undef case matters for
> > scalars in practice, but it can have a big effect on vectors
> > (after widening).
> 
> Is there undef at the DAG level, actually? I wasn't sure; I haven't
> done much SelectionDAG hacking.

Yes, there is an undef, and it appears fairly often.

> 
> >
> > +    // FIXME: handle more cases
> >
> > Out of curiosity, what else did you have in mind? [I think that we
> > should either make it more specific, or remove it]
> 
> To be honest, I've only done cases that I cared about for my intended
> application, and ones that I was fairly sure I understood well enough
> to say that they are safe. If anyone wants to suggest more cases
> before it goes in than that's fine.
> 
> I can change the comment to not say "FIXME:" and just say that the
> list is non-exhaustive, though, if that's more appropriate.

Fair enough.

> 
> >
> > A few thoughts:
> >
> >  - For vectors, we can check if all elements are either 0 or 1 (or
> >  undef), right?
> >
> >  - We can look through EXTRACT_VECTOR_ELT, EXTRACT_SUBVECTOR,
> >  SCALAR_TO_VECTOR, VECTOR_SHUFFLE, SCALAR_TO_VECTOR, BUILD_VECTOR
> >  (and perhaps others)
> >
> >  - Likewise, when checking:
> >  +    if (V->getValueType() == MVT::i1)
> >    you probably want to check the scalar type (to catch v?i1)
> 
> I thought about vectors but it would require something either
> recursive or very complicated to do correctly; the former is
> non-ideal
> and the latter is bug-prone. The purpose of this function is similar
> to sameNoopInput in llvm/CodeGen/Analysis.cpp, and it was difficult
> to
> try to generalize that fully to vectors.

It depends on how exhaustive you'd like to be. But if you have a vector of all 1.0, then extracting an element from it is 1.0, and it is pretty easy to check for that. I'm not saying that you need to do that now in this initial patch.

> 
> I ended up making it work on vectors on constrained cases (since the
> vector case was tested for when the function was recursive), but it
> tools a lot of thinking to convince myself I hadn't made a mistake.
> 
> Do you think the vector case will come up enough to make this
> worthwhile? The problem is that the vector has to be transparently
> defined in the same basic block as the fmul and fadd for it to work,
> and most I think most vectors will be created using some kind of
> control flow that will necessarily span more than one basic block.

I'm thinking of something like this: autovectorize this:
for (...) {
  a[i] = 1.0 + b[i]*c[i];
}
and you'll get a uniform vector of 1.0. I think that this is not uncommon.

> 
> >
> > Also, after type legalization, i1 has probably been replaced by
> > something larger (like i32). Nevertheless, depending on what
> > getBooleanContents() returns, the SETCC may be restricted to
> > returning zero or one.
> 
> Yes, but I see through truncates and extends to I think that covers a
> lot of the legalization cases. I see your point of
> getBooleanContents() though, I will add that case.

Sounds good.

> 
> >
> >  - We can probably also look though FP_ROUND, FP_EXTEND, (FCEIL,
> >  FTRUNC, FRINT, FNEARBYINT, FFLOOR)
> 
> OK, that makes sense. I'll add those too. Thanks for the feedback!

Great, thanks!

 -Hal

> 
> >
> >  -Hal
> >
> > ----- Original Message -----
> >> Hi,
> >>
> >> The attached patch allows FMAs to be formed in DAGCombiner even
> >> without unsafe math mode when one operand is known to be either
> >> 0.0
> >> or
> >> 1.0 exactly (this is safe because no rounding occurs in the fmul
> >> step,
> >> and behavior on all limit case inputs is preserved, as far as I
> >> can
> >> tell.)
> >>
> >> This allows formation of FMAs in cases like the following:
> >>
> >>     extern bool f;
> >>
> >>     double foo(bool a, double b, double c, double d, double e) {
> >>      return (b * double(a) + c) + (d * (1.0 - double(a)) + e);
> >>     }
> >>
> >> The intent of this patch is to address a subset of the issues
> >> which
> >> required r181216 to be partially reverted (tracked as PR16164)
> >> although there are still many other cases affected by that patch
> >> which
> >> are not resolved.
> >>
> >> Unfortunately, due to SelectionDAG limitations, this
> >> transformation
> >> will only be done if the input can be determined to be zero or one
> >> with information only in the same basic block, so depending on
> >> optimization settings and phase ordering, will fail in cases like
> >> the
> >> following:
> >>
> >>     extern bool f;
> >>
> >>     double foo(bool a, double b, double c, double d, double e) {
> >>      if (a)
> >>        return b * double(a) + c;
> >>      else
> >>        return d * (1.0 - double(a)) + e;
> >>     }
> >>
> >> Please let me know if you have any feedback.
> >>
> >> Thanks,
> >> Stephen
> >>
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>
> >
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory