[llvm-dev] [RFC] FP Environment and Rounding mode handling in LLVM

Fri Feb 5 16:25:07 PST 2016

----- Original Message -----
> From: "Chandler Carruth" <chandlerc at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>, "Chandler Carruth" <chandlerc at gmail.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Friday, February 5, 2016 4:36:54 PM
> Subject: Re: [llvm-dev] [RFC] FP Environment and Rounding mode handling in LLVM
> 
> On Fri, Feb 5, 2016 at 2:10 PM Hal Finkel via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:
> 
> 
> Hi Chandler,
> 
> This scheme has significant advantages over what was being pursued,
> but one question (or two)...
> 
> Under the proposed system, how would you represent the necessary
> dependency edges between the fp intrinsics and function calls? How
> is the state 'returned' to the caller? [I was thinking that our new
> operand bundles could help for the inputs, but the outputs? Plus
> what about the live-in state?]
> 
> This is important because any external subroutine call could
> (potentially) change the rounding mode or any other part of the
> floating-point environment.
> 
> 
> 
> So, one thing that was missing in my original email and that talking
> with Steve Canon offline clarified was that we need a way to
> directly query the current modes for systems where those can be set
> externally.
> 
> 
> My suggestion was to have an intrinsic that "loads" this state. This
> could then be used to load whatever the current state is, and pass
> that to the floating point intrinsics proposed in order to pick up
> whatever the "current" state happens to be on systems where this is
> truly a background stateful thing, while still allowing us to model
> operation-specific state for other systems. Naturally, there should
> be a complimenting "store" of the state as well.
> 
> 
> Then, for code which really needs this degree of faithful FP
> environment handling, you would expect the #pragma to be present
> enabling that mode. While that pragma is in place, all floating
> point operations would be lowered using these intrinsics, and
> external function calls could be guarded by storing and reloading
> this state at the IR level. This would make the IR substantially
> more verbose when the pragma is enabled, but that seems like an
> acceptable tradeoff given that we expect this code to be rare (see
> my preconditions section). And naturally, on any system that
> actually manages FP environment in a state "register" or whatever,
> we'd want to do some work to try to optimize away state changes.
> Much like we have attributes that can be inferred about access to
> memory, we could infer attributes on functions about whether they
> change the FP environment state, and if not, propagate across the
> function call boundaries.
> 
> 
> But even though this would be some amount of work to optimize, the
> nice thing (IMO) is that it would be localized. We would have
> specific code that dealt with optimizing the FP environment
> concerns, while the rest of LLVM could remain oblivious and rely on
> simple common constructs to provide conservatively correct behavior.
> 
> What do you think?

SGTM.

 -Hal

> -Chandler
> 
> 
> 
> 
> Thanks again,
> Hal
> 
> ----- Original Message -----
> > From: "Chandler Carruth" < chandlerc at gmail.com >
> > To: "Mehdi Amini" < mehdi.amini at apple.com >, "llvm-dev" <
> > llvm-dev at lists.llvm.org >
> > Cc: "Steve (Numerics) Canon" < scanon at apple.com >, "Sergey
> > Dmitrouk" < sdmitrouk at accesssoftek.com >, "David Majnemer"
> > < david.majnemer at gmail.com >, "Hal Finkel" < hfinkel at anl.gov >
> > Sent: Thursday, February 4, 2016 8:05:38 PM
> > Subject: Re: [RFC] FP Environment and Rounding mode handling in
> > LLVM
> > 
> > 
> > First, thanks Mehdi for putting something on llvm-dev and getting
> > wider awareness of this.
> > 
> > 
> > I am actually really interested in finding a way for LLVM to
> > support
> > the interesting functionality we are missing from fenv-like
> > interfaces. Things like rounding modes, exceptions, etc. However, I
> > think the current design is going to be a really high burden for
> > the
> > entire optimizer and I think there is a simpler model that we might
> > pursue instead.
> > 
> > 
> > I'll start off with some underlying principles that I'm operating
> > from:
> > a) Most code in the world will be very happy with the default
> > floating point environment, doesn't need to carefully model
> > floating
> > point exceptions, etc. Essentially, I think that LLVM's behavior
> > today is probably right for most code. Now, the code which needs
> > support for the other features of floating point isn't bad or
> > unimportant! But it is relatively speaking rare, and so I think it
> > is reasonable to optimize the *representation* model for the common
> > case provided we don't lose support for functionality.
> > 
> > 
> > a) When outside the default floating point environment's rules,
> > there
> > are few if any optimizations that we realistically expect from
> > LLVM.
> > Certainly, any changes to the LLVM optimizer which impact code
> > outside the default needs to be done *much* more carefully to avoid
> > introducing subtle bugs.
> > 
> > 
> > OK, based on that, consider the following model:
> > We provide intrinsics that mirror the instructions 'fadd', 'fsub',
> > 'fmul', 'fdiv', and 'frem' (so 5 total). From here on out, I'll
> > exclusively use 'fadd' as my examples. The intrinsics would look
> > like:
> > 
> > declare {f32, i1} @llvm.fadd.with.environment.f32(f32 %lhs, f32
> > %rhs,
> > i8 %rounding_mode, i8 %exception_behavior)
> > 
> > 
> > Then we define specific values to be used for the IEEE rounding
> > modes. And we define values to control exception behavior. I'm not
> > an expert on floating point exceptions in particular (my platforms
> > don't use them) but I'm imagining three states "ignore", "return",
> > and "trap". I've used a single 'i1', but I'm assuming it would need
> > to be several i1s or an iN in order to model the set of FP
> > exceptions. I'm using i1 here just to simplify the explanation, I
> > think it generalizes and I'll let the experts suggest the exact
> > formulation.
> > 
> > 
> > If the default rounding mode is provided to these intrinsics and
> > the
> > "ignore" exception behavior is provided, they behave exactly as the
> > existing instructions do, and instcombine should canonicalize to
> > the
> > existing instructions.
> > 
> > 
> > The semantics of non-default rounding modes are to perform the
> > operation with that rounding mode.
> > 
> > 
> > If "return" is provided for the exception behavior, then the i1
> > component of the result is true if an FP exception occured and
> > false
> > otherwise. If "ignore" is provided then any FP exceptions are
> > ignored and the i1 is always false. If "trap" is provided then the
> > i1 is always false, but the call to the intrinsic might trap. We
> > could either define a trap as precisely the same as a call to
> > @llvm.trap(), or we could introduce an @llvm.fp.trap() and define
> > it
> > as a call to that.
> > 
> > 
> > The frontend would then be responsible for lowering floating point
> > arithmetic using these intrinsics. This may be somewhat challenging
> > because in the frontend behavior is controlled dynamically in some
> > languages. In those situations, we can either allow these
> > intrinsics
> > to accept non-constant arguments for %rounding_mode and
> > %exception_behavior so that frontends can emit code that just
> > dynamically computes them, or we could follow the same model that
> > atomics use, and if the frontend cannot trivially compute a
> > constant, it can emit a switch over the possible states with a
> > specific intrinsic call in each case. I don't have strong opinions
> > about which would be best, I think either could be made to work.
> > 
> > 
> > If we go with constant arguments being required, we could use
> > "metadata arguments" which aren't actually metadata but just
> > encoded
> > arguments for intrinsics.
> > 
> > 
> > When emitting constants and trying to respect floating point
> > environment settings, frontends will have to emit runtime calls
> > instead of actual constants. But this seems actually good because
> > that is what we'll need anyways -- we aren't able to with full
> > generality emulate all the environment options if I understand
> > things correctly (and let me know if I've misunderstood).
> > 
> > 
> > 
> > 
> > The two really big reasons why I like this model much more than
> > extending flags are:
> > 
> > 
> > 1) This avoids implicit state. The implicit state of the floating
> > point environment makes things like code motion extremely hard to
> > reason about. I think we will just get it wrong too often to make
> > this a good approach. By modeling all of this as actual SSA values
> > I
> > think there is a much better chance we'll get this stuff right. For
> > example by or-ing all the i1s for floating point exceptions and
> > testing the result to implement fetestexcept. Then the backend can
> > spill the state when necessary and reload it when needed even if
> > other floating point math is introduced. I admit that first class
> > aggregate returns aren't a beautiful way to encapsulate this, but
> > they are an *effective* way that we know how to work with in the
> > LLVM IR. If we ever come up with a better multi-def model, we can
> > always switch these and all the other intrinsics which need this to
> > that model.
> > 
> > 
> > 2) Every pass will conservatively correctly model the operations.
> > This is most significant when modeling trapping on exceptions. We
> > need every pass to realize that control flow might not proceed past
> > such operations. We already have this logic for calls, and it seems
> > a really nice fit for allowing most of the optimizer to be unaware
> > of these constructs while respecting them and preserving behavior
> > in
> > the face of them.
> > 
> > 
> > 
> > 
> > I suspect that there are things this model doesn't handle that I've
> > not thought of (as this is outside the are of FP that I'm deeply
> > familiar with), but I really think this model would be easier to
> > reason about and would be much less invasive within the IR and
> > optimizer. I wonder if folks think this could work and would be up
> > for moving their efforts in this direction?
> > 
> > 
> > -Chandler
> > 
> > 
> > On Wed, Feb 3, 2016 at 3:04 PM Mehdi Amini < mehdi.amini at apple.com
> > >
> > wrote:
> > 
> > 
> > Hi everyone,
> > 
> > Sergey (CC’ed) worked on a series of patches to add support for
> > floating-point environment and floating-point rounding modes in
> > LLVM.
> > This started *in 2014* and the patches after multiple rounds of
> > review in the last months (involving amongst other Steve Canon, Hal
> > Finkel, David Majnemer, and myself) are getting very close (IMO) to
> > be in a state where we can land them.
> > 
> > This is the thread that started this development: “ [LLVMdev] More
> > careful treatment of floating point exceptions"
> > http://marc.info/?l=llvm-dev&m=141113983302113&w=2
> > And this is the thread where most of the discussion on the design
> > occurred: "[PATCH] Flag to enable IEEE-754 friendly FP
> > optimizations”
> > http://marc.info/?l=llvm-commits&m=141235814915999&w=2
> > 
> > Since Chandler raised some concerns on IRC today, so I figured I
> > should send a heads-up on this topic to allow any one to comment on
> > the current plan.
> > 
> > We plan on adding two new FP env flags to the existing FMF
> > (fast-math
> > flags). Without these flags set, the optimizer has to assume that
> > the FP env can be observed, or the rounding mode can be changed.
> > For
> > clang, these flags would be set unless a command line option would
> > require to preserve the FP env.
> > 
> > Here is the list of patches:
> > 
> > [FPEnv Core 01/14] Add flags and command-line switches for FPEnv:
> > http://reviews.llvm.org/D14066
> > [FPEnv Core 02/14] Add FPEnv access flags to fast-math flags:
> > http://reviews.llvm.org/D14067
> > [FPEnv Core 03/14] Make SelectionDAG aware of FPEnv flags:
> > http://reviews.llvm.org/D14068
> > [FPEnv Core 04/14] Skip constant folding to preserve FPEnv:
> > http://reviews.llvm.org/D14069
> > [FPEnv Core 05/14] Teach IR builder and folders about new flags:
> > http://reviews.llvm.org/D14070
> > [FPEnv Core 06/14] Do not fold constants on reading in IR
> > asm/bitcode: http://reviews.llvm.org/D14071
> > [FPEnv Core 07/14] Prevent undesired folding by InstSimplify:
> > http://reviews.llvm.org/D14072
> > [FPEnv Core 08/14] Do not simplify expressions with FPEnv access:
> > http://reviews.llvm.org/D14073
> > [FPEnv Core 09/14] Make Strict flag available for more clients:
> > http://reviews.llvm.org/D14074
> > [FPEnv Core 10/14] Use Strict in IRBuilder:
> > http://reviews.llvm.org/D14075
> > [FPEnv Core 11/14] Don't convert fpops to constexprs in SCCP:
> > http://reviews.llvm.org/D14076
> > [FPEnv Core 13/14] Don't hoist FP-ops with side-effects in LICM:
> > http://reviews.llvm.org/D14078
> > [FPEnv Core 14/14] Introduce F*_W_CHAIN instrs to prevent
> > reordering:
> > http://reviews.llvm.org/D14079
> > 
> > 
> > —
> > Mehdi
> > 
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory