<div class="gmail_quote">On Sat, Apr 14, 2012 at 11:44 PM, Duncan Sands <span dir="ltr"><<a href="mailto:baldrick@free.fr">baldrick@free.fr</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im"><br></div><div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I think you have a step in the right direction, walking away from ULPs, which<br>


are pretty useless for the purpose of describing allowed fp optimizations IMHO.<br>

But using just "fast" keyword (or whatever else will be added in the future) is<br>

not enough without strict definition of this keyword in terms of IR<br>

transformations. For example, particular transformation may be interested if<br>

reassociation is allowed or not ((a+b)+c=> a+(b+c)), if fp contraction is<br>

allowed or not (ab+c = >fma(a,b,c)), if addition of zero may be canceled<br>

(x+0=>x) and etc. If this definition is not given on infrastructure level, this<br>

may lead to disaster, when each transformation interprets "fast" in its own way.<br>

</blockquote>

<br></div>

This is actually the main reason for using metadata rather than a flag like the<br>

"nsw" flag on integer operations: it is easily extendible with more info to say<br>

whether reassociation is OK and so forth.<br>

<br>

The kinds of transforms I think can reasonably be done with the current<br>

information are things like: x + 0.0 -> x; x / constant -> x * (1 / constant) if<br>

constant and 1 / constant are normal (and not denormal) numbers.<br></blockquote><div><br></div><div>The particular definition is not that important, as the fact that this definition exists :) I.e. I think we need a set of transformations to be defined (as enum the most likely, as Renato pointed out) and an interface, which accepts "fp-model" (which is "fast", "strict" or whatever keyword we may end up) and the particular transformation and returns true of false, depending whether the definition of fp-model allows this transformation or not. So the transformation would request, for example, if reassociation is allowed or not.</div>

<div><br></div><div>Another point, important from practical point of view, is that fp-model is almost always the same for any instructions in the function (or even module) and tagging every instruction with fp-model metadata is quite a substantial waste of resources. So it makes sense to me to have a default fp-model defined for the function or module, which can be overwritten with instruction metadata.</div>

<div><br></div><div>I also understand that clang generally derives GCC switches and fp precision switches are not an exception, but I'd like to point out that there's a far more orderly way of defining fp precision model (IMHO, of course :-) ), adopted by MS and Intel Compiler (-fp-model [strict|precise|fast]). It would be nice to have it adopted in clang.</div>

<div><br></div><div>But while adding MS-style fp-model switches is different topic (and I guess quite arguable one), I'm mentioning it to show the importance of an idea of abstracting internal compiler fp-model from external switches and exposing a querying interface to transformations. Transformations shouldn't care about particular model, they need to know only if particular type of transformation is allowed.</div>

<div><br></div><div>Dmitry.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Ciao, Duncan.<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">

<br>

Dmitry.<br>

<br>

On Sat, Apr 14, 2012 at 10:28 PM, Duncan Sands <<a href="mailto:baldrick@free.fr" target="_blank">baldrick@free.fr</a><br></div><div><div class="h5">

<mailto:<a href="mailto:baldrick@free.fr" target="_blank">baldrick@free.fr</a>>> wrote:<br>

<br>

    The attached patch is a first attempt at representing "-ffast-math" at the IR<br>

    level, in fact on individual floating point instructions (fadd, fsub etc).  It<br>

    is done using metadata.  We already have a "fpmath" metadata type which can be<br>

    used to signal that reduced precision is OK for a floating point operation, eg<br>

<br>

        %z = fmul float %x, %y, !fpmath !0<br>

      ...<br>

      !0 = metadata !{double 2.5}<br>

<br>

    indicates that the multiplication can be done in any way that doesn't introduce<br>

    more than 2.5 ULPs of error.<br>

<br>

    The first observation is that !fpmath can be extended with additional operands<br>

    in the future: operands that say things like whether it is OK to assume that<br>

    there are no NaNs and so forth.<br>

<br>

    This patch doesn't add additional operands though.  It just allows the existing<br>

    accuracy operand to be the special keyword "fast" instead of a number:<br>

<br>

        %z = fmul float %x, %y, !fpmath !0<br>

      ...<br>

      !0 = metadata !{!metadata "fast"}<br>

<br>

    This indicates that accuracy loss is acceptable (just how much is unspecified)<br>

    for the sake of speed.  Thanks to Chandler for pushing me to do it this way!<br>

<br>

    It also creates a simple way of getting and setting this information: the<br>

    FPMathOperator class: you can cast appropriate instructions to this class<br>

    and then use the querying/mutating methods to get/set the accuracy, whether<br>

    2.5 or "fast".  The attached clang patch uses this to set the openCL 2.5 ULPs<br>

    accuracy rather than doing it by hand for example.<br>

<br>

    In addition it changes IRBuilder so that you can provide an accuracy when<br>

    creating floating point operations.  I don't like this so much.  It would<br>

    be more efficient to just create the metadata once and then splat it onto<br>

    each instruction.  Also, if fpmath gets a bunch more options/operands in<br>

    the future then this interface will become more and more awkward.  Opinions<br>

    welcome!<br>

<br>

    I didn't actually implement any optimizations that use this yet.<br>

<br>

    I took a look at the impact on aermod.f90, a reasonably floating point heavy<br>

    Fortran benchmark (4% of the human readable IR consists of floating point<br>

    operations).  At -O3 (the worst), the size of the bitcode increases by 0.8%.<br>

    No idea if that's acceptable - hopefully it is!<br>

<br>

    Enjoy!<br>

<br>

    Duncan.<br>

<br>

    ______________________________<u></u>_________________<br>

    LLVM Developers mailing list<br></div></div>

    <a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <mailto:<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>


    <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/<u></u>mailman/listinfo/llvmdev</a><br>

<br>

<br>

</blockquote>

<br>

</blockquote></div><br>