[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

Wed Nov 14 19:37:38 PST 2012

On Wed, Nov 14, 2012 at 7:19 PM, Michael Ilseman <milseman at apple.com> wrote:
> I see now that it's only binary operators that have OptimizationFlags
> reserved for them in the bitcode. Adding fast-math flags for only binary ops
> is straight-forward, but adding them for other ops might require a more
> involved bitcode change.
...
> Given all that, I think it makes sense to add support for fast-math flags
> only to binary ops in this iteration, and think about adding it to other
> operations in the future. Thoughts?

I'm really not trying to rehash a discussion in too much depth, but I
have to wonder with all of this -- why not use metadata as the
*encoding* mechanism for these flags?

Just to be clear, I have no strong feelings about any of this, but it
feels like metadata at the bitcode level provides a nice extensible
encoding scheme. Simultaneously the super convenient accessor methods
on the C++ instruction APIs much like flags seem really convenient.
But I don't see why we can't have the best of both worlds.
Essentially, put the flags in "metadata", but still provide nice
first-class APIs etc so that they're significantly easier to use.

:: shrug :: just an idea that might simplify modeling this stuff.

>
>
> On Nov 12, 2012, at 5:42 PM, Chris Lattner <clattner at apple.com> wrote:
>
>
> On Nov 12, 2012, at 10:39 AM, Joe Abbey <jabbey at arxan.com> wrote:
>
> Michael,
>
> Since you won't be using metadata to store this information and are
> augmenting the IR, I'd recommend incrementing the bitcode version number.
> The current version stored in a local variable in BitcodeWriter.cpp:1814*
>
> I would suspect then you'll also need to provide additional logic for
> reading:
>
>       switch (module_version) {
>         default: return Error("Unknown bitstream version!");
>         case 2:
>   EncodesFastMathIR = true;
>         case 1:
>           UseRelativeIDs = true;
>           break;
>   case 0:
>           UseRelativeIDs = false;
>           break;
>
>       }
>
>
> Couldn't this be handled by adding an extra operand to the binary operators?
>
> -Chris
>
>
> Joe
>
> (*TODO: Put this somewhere else).
>
> On Nov 9, 2012, at 5:34 PM, Michael Ilseman <milseman at apple.com> wrote:
>
> Revision 2
>
> Revision 2 changes:
> * Add in separate Reciprocal flag
> * Clarified wording of flags, specified undefined values, not behavior
> * Removed some confusing language
> * Mentioned optimizations/analyses adding in flags due to inferred knowledge
>
> Revision 1 changes:
> * Removed Fusion flag from all sections
> * Clarified and changed descriptions of remaining flags:
>   * Make 'N' and 'I' flags be explicitly concerning values of operands, and
>     producing undef values if a NaN/Inf is provided.
>   * 'S' is now only about distinguishing between +/-0.
>   * LangRef changes updated to reflect flags changes
>   * Updated Quesiton section given the now simpler set of flags
>   * Optimizations changed to reflect 'N' and 'I' describing operands and not
>     results
> * Be explicit on what LLVM's default behavior is (no signaling NaNs, etc)
> * Mention that this could be alternatively solved with metadata, and open
> the
>   debate
>
>
> Introduction
> ---
>
> LLVM IR currently does not have any support for specifying fine-grained
> control
> over relaxing floating point requirements for the optimizer. The below is a
> proposal to extend floating point IR instructions to support a number of
> flags
> that a creator of IR can use to allow for greater optimizations when
> desired. Such changes are sometimes referred to as fast-math, but this
> proposal
> is about finer-grained specifications at a per-instruction level.
>
>
> What this doesn't address
> ---
>
> Default behavior is retained, and this proposal is only addressing relaxing
> restrictions. LLVM currently by default:
> - ignores signaling NaNs
> - assumes default rounding mode
> - assumes FENV_ACCESS is off
>
> Discussion on changing the default behavior of LLVM or allowing for more
> restrictive behavior is outside the scope of this proposal. This proposal
> does
> not address behavior of denormals, which is more of a backend concern.
>
> Specifying exact precision control or requirements is outside the scope of
> this
> proposal, and can probably be handled with the existing metadata
> implementation.
>
> This proposal covers changes to and optimizations over LLVM IR, and changes
> to
> codegen are outside the scope of this proposal. The flags described in the
> next
> section exist only at the IR level, and will not be propagated into codegen
> or
> the SelectionDAG.
>
>
> Flags
> ---
>
> LLVM IR instructions will have the following flags that can be set by the
> creator of the IR.
>
> no NaNs (N)
> - Allow optimizations that assume the arguments and result are not NaN. Such
>   optimizations are required to retain defined behavior over NaNs, but the
>   value of the result is undefined.
>
> no Infs (I)
> - Allow optimizations that assume the arguments and result are not
>   +/-Inf. Such optimizations are required to retain defined behavior over
>   +/-Inf, but the value of the result is undefined.
>
> no signed zeros (S)
> - Allow optimizations to treat the sign of a zero argument or result as
>   insignificant.
>
> allow reciprocal (R)
> - Allow optimizations to use the reciprocal of an argument instead of
> dividing
>
> unsafe algebra (A)
> - The optimizer is allowed to perform algebraically equivalent
> transformations
>    that may dramatically change results in floating point. (e.g.
>    reassociation).
>
> Throughout I'll refer to these options in their short-hand, e.g. 'A'.
> Internally, these flags are to reside in SubclassData.
>
> Setting the 'A' flag implies the setting of all the others ('N', 'I', 'S',
> 'R').
>
>
> Changes to LangRef
> ---
>
> Change the definitions of floating point arithmetic operations, below is how
> fadd will change:
>
> 'fadd' Instruction
> Syntax:
>
> <result> = fadd {flag}* <ty> <op1>, <op2>   ; yields {ty}:result
> ...
> Semantics:
> ...
> flag can be one of the following optimizer hints to enable otherwise unsafe
> floating point optimizations:
> N: no NaNs - Allow optimizations that assume the arguments and result are
> not
>   NaN. Such optimizations are required to retain defined behavior over NaNs,
>   but the value of the result is undefined.
> I: no infs - Allow optimizations that assume the arguments and result are
> not
>   +/-Inf. Such optimizations are required to retain defined behavior over
>   +/-Inf, but the value of the result is undefined.
> S: no signed zeros - Allow optimizations to treat the sign of a zero
> argument
>   or result as insignificant.
> A: unsafe algebra - The optimizer is allowed to perform algebraically
>    equivalent transformations that may dramatically change results in
> floating
>    point. (e.g.  reassociation).
>
> fdiv will also mention that 'R' allows the fdiv to be replaced by a
> multiply-by-reciprocal.
>
>
> Changes to optimizations
> ---
>
> Optimizations should be allowed to perform unsafe optimizations provided the
> instructions involved have the corresponding restrictions relaxed. When
> combining instructions, optimizations should do what makes sense to not
> remove
> restrictions that previously existed (commonly, a bitwise-AND of the flags).
>
> Below are some example optimizations that could be allowed with the given
> relaxations.
>
> N - no NaNs
> x == x ==> true
>
> S - no signed zeros
> x - 0 ==> x
> 0 - (x - y) ==> y - x
>
> NIS - no signed zeros AND no NaNs AND no Infs
> x * 0 ==> 0
>
> NI - no infs AND no NaNs
> x - x ==> 0
>
> R - reciprocal
>  x / y ==> x * (1/y)
>
> A - unsafe-algebra
> Reassociation
>   (x + y) + z ==> x + (y + z)
>   (x + C1) + C2 ==> x + (C1 + C2)
> Redistribution
>   (x * C) + x ==> x * (C+1)
>   (x * C) + (x + x) ==> x * (C + 2)
>
> I propose to expand -instsimplify and -instcombine to perform these kinds of
> optimizations. -reassociate will be expanded to reassociate floating point
> operations when allowed. Similar to existing behavior regarding integer
> wrapping, -early-cse will not CSE FP operations with mismatched flags, while
> -gvn will (conservatively). This allows later optimizations to optimize the
> expressions independently between runs of -early-cse and -gvn.
>
> Optimizations and analyses that are able to infer certain properties of
> instructions are allowed to set relevant flags. For example, if some
> analysis
> has determined that the arguments and result of an instruction are not NaNs
> or
> Infs, then it may set the 'N' and 'I' flags, allowing every other
> optimization
> and analysis to benefit from this inferred knowledge.
>
> Changes to frontends
> ---
>
> Frontends are free to generate code with flags set as they desire. Frontends
> should continue to call llc with their desired options, as the flags apply
> only
> at the IR level and not at codegen or the SelectionDAGs.
>
> The intention behind the flags are to allow the IR creator to say something
> along the lines of:
> "If this operation is given a NaN, or the result is a NaN, then I don't care
> what answer I get back. However, I expect my program to otherwise behave
> properly."
>
> Below is a suggested change to clang's command-line options.
>
> -ffast-math
> Currently described as:
> Enable the *frontend*'s 'fast-math' mode. This has no effect on
> optimizations,
> but provides a preprocessor macro __FAST_MATH__ the same as GCC's
> -ffast-math
> flag
>
> I propose to change the description and behavior to:
>
> Enable 'fast-math' mode. This allows for optimizations that may produce
> incorrect and unsafe results, and thus should only be used with care. This
> also provides a preprocessor macro __FAST_MATH__ the same as GCC's
> -ffast-math
> flag
>
> I propose that this turn on all flags for all floating point instructions.
> If
> this flag doesn't already cause clang to run llc with
> -enable-unsafe-fp-math,
> then I propose that it does so as well.
>
> (Optional)
> I propose adding the below flags:
>
> -ffinite-math-only
> Allow optimizations to assume that floating point arguments and results are
> NaNs or +/-Inf. This may produce incorrect results, and so should be used
> with
> care.
>
> This would set the 'I' and 'N' bits on all generated floating point
> instructions.
>
> -fno-signed-zeros
> Allow optimizations to ignore the signedness of zero. This may produce
> incorrect results, and so should be used with care.
>
> This would set the 'S' bit on all FP instructions.
>
> -freciprocal-math
> Allow optimizations to use the reciprocal of an argument instead of using
> division. This may produce less precise results, and so should be used with
> care.
>
> This would set the 'R' bit on all relevant FP instructions
>
> Changes to llvm cli tools
> ---
> opt and llc already have the command line options
> -enable-unsafe-fp-math: Enable optimizations that may decrease FP precision
> -enable-no-infs-fp-math: Enable FP math optimizations that assume no +-Infs
> -enable-no-nans-fp-math: Enable FP math optimizations that assume no NaNs
> However, opt makes no use of them as they are currently only considered to
> be
> TargetOptions. llc will remain unchanged, as these options apply to DAG
> optimizations while this proposal deals with IR optimizations.
>
> (Optional)
> Have an opt pass that adds the desired flags to floating point instructions.
>
>
> Miscellaneous explanations in the form of Q&A
> ---
>
> Why not just have "fast-math" rather than individual flags?
>
> Having the individual flags gives the granularity to choose the levels of
> optimizations. For example, unsafe-algebra can lead to dramatically
> different
> results in corner cases, and may not be desired when a user just wants to
> ensure
> that x*0 folds to 0.
>
>
> Why have these flags attached to the instruction itself, rather than be a
> compiler mode?
>
> Being attached to the instruction itself allows much greater flexibility
> both
> for other optimizations and for the concerns of the source and target. For
> example, a frontend may desire that x - x be folded to 0. This would require
> no-NaNs for the subtract. However, the frontend may want to keep NaNs for
> its
> comparisons.
>
> Additionally, these properties can be set internally in the optimizer when
> the
> property has been proven. For example, if x has been found to be positive,
> then
> operations involving x and a constant can be marked to ignore signed zero.
>
> Finally, having these flags allows for greater safety and optimization when
> code
> of different flags are mixed. For example, a function author may set the
> unsafe-algebra flag knowing that such transformations will not meaningfully
> alter its result. If that function gets inlined into a caller, however, we
> don't
> want to always assume that the function's expressions can be reassociated
> with
> the caller's expressions. These properties allow us to preserve the
> optimizations of the inlined function without affecting the caller.
>
>
> Why not use metadata rather than flags?
>
> There is existing metadata to denote precisions, and this proposal is
> orthogonal
> to those efforts. While these properties could still be expressed as
> metadata,
> the proposed flags are analogous to nsw/nuw and are inherent properties of
> the
> IR instructions themselves that all transformations should respect.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>