[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
Michael Ilseman
milseman at apple.com
Tue Oct 30 14:25:43 PDT 2012
Here's a new version of the RFC, incorporating and addressing the feedback from Krzysztof, Eli, Duncan, and Dan.
Revision 1 changes:
* Removed Fusion flag from all sections
* Clarified and changed descriptions of remaining flags:
* Make 'N' and 'I' flags be explicitly concerning values of operands, and
producing undef values if a NaN/Inf is provided.
* 'S' is now only about distinguishing between +/-0.
* LangRef changes updated to reflect flags changes
* Updated Quesiton section given the now simpler set of flags
* Optimizations changed to reflect 'N' and 'I' describing operands and not
results
* Be explicit on what LLVM's default behavior is (no signaling NaNs, etc)
* Mention that this could be solved with metadata, and open the debate
Introduction
---
LLVM IR currently does not have any support for specifying fine-grained control
over relaxing floating point requirements for the optimizer. The below is a
proposal to extend floating point IR instructions to support a number of flags
that a creator of IR can use to allow for greater optimizations when
desired. Such changes are sometimes referred to as fast-math, but this proposal
is about finer-grained specifications at a per-instruction level.
What this doesn't address
---
Default behavior is retained, and this proposal is only addressing relaxing
restrictions. LLVM currently by default:
- ignores signaling NaNs
- assumes default rounding mode
- assumes FENV_ACCESS is off
Discussion on changing the default behavior of LLVM or allowing for more
restrictive behavior is outside the scope of this proposal. This proposal does
not address behavior of denormals, which is more of a backend concern.
Specifying exact precision control or requirements is outside the scope of this
proposal, and can probably be handled with the existing metadata implementation.
This proposal covers changes to and optimizations over LLVM IR, and changes to
codegen are outside the scope of this proposal. The flags described in the next
section exist only at the IR level, and will not be propagated into codegen or
the SelectionDAG.
Flags
---
no NaNs (N)
- The optimizer is allowed to optimize under the assumption that the operands'
values are not NaN. If one of the operands is NaN, the value of the result
is undefined.
no Infs (I)
- The optimizer is allowed to optimize under the assumption that the operands'
values are not +/-Inf. If one of the operands is +/-Inf, the value of the
result is undefined.
no signed zeros (S)
- The optimizer is allowed to not distinguish between -0 and +0 for the
purposes of optimizations.
unsafe algebra (A)
- The optimizer is allowed to perform algebraically equivalent transformations
that may dramatically change results in floating point. (e.g.
reassociation)
Throughout I'll refer to these options in their short-hand, e.g. 'A'.
Internally, these flags are to reside in SubclassData.
======
Question:
Not all combinations make sense (e.g. 'A' pretty much implies all other flags).
Basically, I have the below lattice of sensible relations:
A > S > N
A > I > N
Meaning that 'A' implies all the others, 'S' implies 'N', etc.
It might be desirable to simplify this into just being a fast-math level.
======
Changes to LangRef
---
Change the definitions of floating point arithmetic operations, below is how
fadd will change:
'fadd' Instruction
Syntax:
<result> = fadd {flag}* <ty> <op1>, <op2> ; yields {ty}:result
...
Semantics:
...
flag can be one of the following optimizer hints to enable otherwise unsafe
floating point optimizations:
N: no NaNs - The optimizer is allowed to optimize under the assumption that
the operands' values are not NaN. If one of the operands is NaN, the value
of the result is undefined.
I: no infs - The optimizer is allowed to optimize under the assumption that
the operands' values are not +/-Inf. If one of the operands is +/-Inf, the
value of the result is undefined.
S: no signed zeros - The optimizer is allowed to not distinguish between -0
and +0 for the purposes of optimizations.
A: unsafe algebra - The optimizer is allowed to perform algebraically
equivalent transformations that may dramatically change results in floating
point. (e.g. reassociation)
Changes to optimizations
---
Optimizations should be allowed to perform unsafe optimizations provided the
instructions involved have the corresponding restrictions relaxed. When
combining instructions, optimizations should do what makes sense to not remove
restrictions that previously existed (commonly, a bitwise-AND of the flags).
Below are some example optimizations that could be allowed with the given
relaxations.
N - no NaNs
x == x ==> true
S - no signed zeros
x - 0 ==> x
0 - (x - y) ==> y - x
NIS - no signed zeros AND no NaNs AND no Infs
x * 0 ==> 0
NI - no infs AND no NaNs
x - x ==> 0
A - unsafe-algebra
Reassociation
(x + y) + z ==> x + (y + z)
(x + C1) + C2 ==> x + (C1 + C2)
Redistribution
(x * C) + x ==> x * (C+1)
(x * C) + (x + x) ==> x * (C + 2)
Reciprocal
x / C ==> x * (1/C)
These examples apply when the new constants are permitted, e.g. not denormal,
and all the instructions involved have the needed flags.
I propose to expand -instsimplify and -instcombine to perform these kinds of
optimizations. -reassociate will be expanded to reassociate floating point
operations when allowed. Similar to existing behavior regarding integer
wrapping, -early-cse will not CSE FP operations with mismatched flags, while
-gvn will (conservatively). This allows later optimizations to optimize the
expressions independently between runs of -early-cse and -gvn.
Changes to frontends
---
Frontends are free to generate code with flags set as they desire. Frontends
should continue to call llc with their desired options, as the flags apply only
at the IR level and not at codegen or the SelectionDAGs.
Below is a suggested change to clang's command-line options.
-ffast-math
Currently described as:
Enable the *frontend*'s 'fast-math' mode. This has no effect on optimizations,
but provides a preprocessor macro __FAST_MATH__ the same as GCC's -ffast-math
flag
I propose to change the description and behavior to:
Enable 'fast-math' mode. This allows for optimizations that may produce
incorrect and unsafe results, and thus should only be used with care. This
also provides a preprocessor macro __FAST_MATH__ the same as GCC's -ffast-math
flag
I propose that this turn on all flags for all floating point instructions. If
this flag doesn't already cause clang to run llc with -enable-unsafe-fp-math,
then I propose that it does so as well.
(Optional)
I propose adding the below flags:
-ffinite-math-only
Allow optimizations to assume that floating point arguments and results are
NaNs or +/-Inf. This may produce incorrect results, and so should be used with
care.
This would set the 'I' and 'N' bits on all generated floating point instructions.
-fno-signed-zeros
Allow optimizations to ignore the signedness of zero. This may produce
incorrect results, and so should be used with care.
This would set the 'S' bit on all FP instructions.
Changes to llvm cli tools
---
opt and llc already have the command line options
-enable-unsafe-fp-math: Enable optimizations that may decrease FP precision
-enable-no-infs-fp-math: Enable FP math optimizations that assume no +-Infs
-enable-no-nans-fp-math: Enable FP math optimizations that assume no NaNs
However, opt makes no use of them as they are currently only considered to be
TargetOptions. llc will remain unchanged, as these options apply to DAG
optimizations while this proposal deals with IR optimizations.
(Optional)
Have an opt pass that adds the desired flags to floating point instructions.
Miscellaneous explanations in the form of Q&A
---
Why not just have "fast-math" rather than individual flags?
Having the individual flags gives the granularity to choose the levels of
optimizations. For example, unsafe-algebra can lead to dramatically different
results in corner cases, and may not be desired when a user just wants to ensure
that x*0 folds to 0.
Why have these flags attached to the instruction itself, rather than be a
compiler mode?
Being attached to the instruction itself allows much greater flexibility both
for other optimizations and for the concerns of the source and target. For
example, a frontend may desire that x - x be folded to 0. This would require
no-NaNs for the subtract. However, the frontend may want to keep NaNs for its
comparisons.
Additionally, these properties can be set internally in the optimizer when the
property has been proven. For example, if x has been found to be positive, then
operations involving x and a constant can be marked to ignore signed zero.
Finally, having these flags allows for greater safety and optimization when code
of different flags are mixed. For example, a function author may set the
unsafe-algebra flag knowing that such transformations will not meaningfully
alter its result. If that function gets inlined into a caller, however, we don't
want to always assume that the function's expressions can be reassociated with
the caller's expressions. These properties allow us to preserve the
optimizations of the inlined function without affecting the caller.
Why not use metadata rather than flags?
There is existing metadata to denote precisions, and this proposal is orthogonal
to those efforts. While these properties could still be expressed as metadata,
the proposed flags are analogous to nsw/nuw and are inherent properties of the
IR instructions themselves that all transformations should respect. There is
still some debate on what form, metadata vs flags, should be used.
More information about the llvm-dev
mailing list