[cfe-commits] [llvm-commits] [PATCH] Add llvm.fmuladd intrinsic.

Tue Jun 5 13:08:47 PDT 2012

Hey Lang,

Sorry to jump in late, but was catching on up email and finally read
through this thread. This is the exchange that caught my interest:

On Fri, Jun 1, 2012 at 4:50 AM, Stephen Canon <scanon at apple.com> wrote:

> On May 31, 2012, at 10:40 PM, John McCall <rjmccall at apple.com> wrote:
>
> > On May 31, 2012, at 7:22 PM, Lang Hames wrote:
> >> Thanks for the suggestion Matthieu. I spoke to Doug and he recommended
> using attributes rather than a FunctionDecl bit to represent the
> fp_contract state.
> >
> > Hmm.  I had suggested a bit on FunctionDecl on the assumption that this
> would often be controlled globally, maybe by using a flag to control the
> default or by activating a #pragma before including all the headers.
>  Actually, I could even imagine a target (maybe a GPU target?) even
> opting-in to this behavior by default.  If we're going to use an Attr, we
> need to make sure it doesn't get added unless the current #pragma state is
> different from the global default;  we really don't want to be allocating
> an attribute for every function definition in the translation unit.
>
> We want FP_CONTRACT ON to be the default for all targets.  It's also worth
> noting that it's critical that we support setting the pragma to OFF, but in
> practice this will be exceedingly rare (almost certainly less than 1% of
> sources, and probably far less than that).
>

Based on this comment, I'm really not keen on the current representation,
but maybe I've mis-understood it, so I'll ask questions first:

The 'fmuladd' intrinsic is used to whitelist specific operations for fused
multiply+add handling, correct?

If so, and if Stephen's stance is correct (I certainly agree with it!) that
this should be allowed for the vast majority of code, that means that
almost every fmul and fadd in the current IR should be a candidate for
fusing?

If that's the case, it seems like forming a white-list intrinsic is a bit
backwards -- we shouldn't overly constrain the set of multiplies and adds
that are fused if we expect most code to allow fusing. Similarly, we
wouldn't want the default representation to be a call to an intrinsic
function, we would want it to be an instruction, no?

What happens if the FE in unable to see that a multiply and and add are
elligible for fusing? What I'm thinking of is:

float f(float *x, float *y, float *z) {
  float tmp = *x * *y;
  if (some_global_constant) {
    some_external_function();
  }
  return tmp + *z;
}

How does Clang form the 'fmuladd' here? It seems like it would have to
essentially do something very much like an optimization pass to do this.
This doesn't seem to belong in the FE.

Another example:

float g(float a, float b) {
  return a + b;
}

float f2(float *x, float *y, float *z) {
  return g(*x * *y, *z);
}

Here, the FE would have to inline to see this opportunity.

Now, these are certainly contrived, but I hope they make clear my concern:
the pairing of a multiply and an add seems like it needs to be the business
of the optimizer, not the FE. If the only mechanism we have to distinguish
between "safe" and "unsafe" expressions is a pre-fused pair, I worry the
optimizer will be unable to form these constructs.

Since Stephen indicates we should expect 1% of the code to need to *avoid*
fused multiply+add, it would seem more natural to represent this with a
blacklist of "unpairable" instructions. I can imagine a bunch of different
ways to represent this:

1) fp_contract.start/.end intrinsics that mark the beginning and end of a
series of instructions that must not be fused.
2) a function attribute on functions whose instructions must not be fused
3) a bit on the 'fmul' and 'fadd' instructions marking them is ineligible
for fusing
4) probably a bunch of other options I've not thought of...

I think #1 could work, but it has lots of problems surrounding inlining.
You could always make conservatively correct decisions though, so its
probably tractable. I suspect it has the smallest impact on the existing
optimization passes.

I *really* like #2 from a simplicity perspective, but I suspect it is too
coarse grained for the use cases Stephen and others have in mind. (please
correct me if i'm wrong... I hate pragmas, if we could support this only in
terms of function attributes, I think that would be a huge win...)

I think #3 is the most functionally complete way to implement this. It
would work like 'nsw' or other flags, but be a bit lower impact on the IR
because most optimizations could ignore them. I suggest flagging the
ineligible cases to keep the textual IR clean -- we would clearly have to
auto-upgrade old bitcode to say that everything is ineligible, but that
seems OK. This is heavyweight (flag on instructions) and requires auditing
optimization passes, but I think it more precisely captures the intent of
the pragma: *these* specific multiplies and adds must not be fused.

I'd love to hear other ideas, especially ones that enable the formation of
fused operations by the middle end. =] I think that's going to be essential
long-term.

-Chandler

PS: I didn't suggest metadata for a reason -- this is a semantic contract,
and I really don't think metadata applies cleanly to that. We could (in
theory) add metadata to 99% of the instructions indicating that fusing
isn't appropriate, but that seems really gross. Also, fundamentally, this
isn't about "fast math"; we're not losing precision, as Stephen points out,
it is only very subtle and strange cases that need to avoid the particular
*increase* in precision caused by fma...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120605/13a4b1cc/attachment.html>