[llvm-dev] [RFC] FP Environment and Rounding mode handling in LLVM

Tue Feb 9 19:57:57 PST 2016

+1 to this.  Having it structured this way would make things much easier 
if we someday decided to promote these intrinsics to instructions or 
merge them (via non-optional modifiers like "volatile") with the 
existing floating point instructions.

Philip

On 02/05/2016 06:03 PM, Chandler Carruth via llvm-dev wrote:
> Agreed.
>
> On Fri, Feb 5, 2016 at 5:54 PM Pete Cooper via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>     FWIW, +1 from me.
>
>     Just one request on the implementation though.  However we model
>     these intrinsics and their properties (metadata, constants, etc),
>     can we please abstract away those details the same way we have
>     MemCpyInst which just wraps an IntrinsicInst?
>
>     I think this would be very beneficial if we ever need to add more
>     state, or change something about the underlying implementation,
>     and not have to search all the code for ‘bool traps =
>     cast<ConstantInt>(I->getOperand(1))->getZextValue()’ or whatever
>     it happens to be.
>
>     Pete
>     > On Feb 5, 2016, at 4:36 PM, Stephen Canon via llvm-dev
>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>     >
>     > Seems like everyone’s on board, but I want to mention that I
>     also think this is very much the right approach.  In particular,
>     it allows us to support both existing CPU designs with dynamic
>     rounding modes as well as GPU designs and soft-float libraries
>     with statically specified rounding.
>     >
>     > Support for “I want the flags, but I really don’t care about
>     when they happen specifically” is somewhat interesting; I assume
>     this would take the form of “returning” the flag state and OR-ing
>     it into an integer that represents the cumulative flags (much like
>     common cpu hardware does, but this would also let us support
>     soft-float implementations).  This wouldn’t impose ordering
>     restrictions, but would prevent speculation.
>     >
>     > – Steve
>     >
>     >> On Feb 5, 2016, at 4:25 PM, Hal Finkel via llvm-dev
>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>     >>
>     >> ----- Original Message -----
>     >>> From: "Chandler Carruth" <chandlerc at gmail.com
>     <mailto:chandlerc at gmail.com>>
>     >>> To: "Hal Finkel" <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>,
>     "Chandler Carruth" <chandlerc at gmail.com <mailto:chandlerc at gmail.com>>
>     >>> Cc: "llvm-dev" <llvm-dev at lists.llvm.org
>     <mailto:llvm-dev at lists.llvm.org>>
>     >>> Sent: Friday, February 5, 2016 4:36:54 PM
>     >>> Subject: Re: [llvm-dev] [RFC] FP Environment and Rounding mode
>     handling in LLVM
>     >>>
>     >>> On Fri, Feb 5, 2016 at 2:10 PM Hal Finkel via llvm-dev <
>     >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > wrote:
>     >>>
>     >>>
>     >>> Hi Chandler,
>     >>>
>     >>> This scheme has significant advantages over what was being
>     pursued,
>     >>> but one question (or two)...
>     >>>
>     >>> Under the proposed system, how would you represent the necessary
>     >>> dependency edges between the fp intrinsics and function calls? How
>     >>> is the state 'returned' to the caller? [I was thinking that
>     our new
>     >>> operand bundles could help for the inputs, but the outputs? Plus
>     >>> what about the live-in state?]
>     >>>
>     >>> This is important because any external subroutine call could
>     >>> (potentially) change the rounding mode or any other part of the
>     >>> floating-point environment.
>     >>>
>     >>>
>     >>>
>     >>> So, one thing that was missing in my original email and that
>     talking
>     >>> with Steve Canon offline clarified was that we need a way to
>     >>> directly query the current modes for systems where those can
>     be set
>     >>> externally.
>     >>>
>     >>>
>     >>> My suggestion was to have an intrinsic that "loads" this
>     state. This
>     >>> could then be used to load whatever the current state is, and pass
>     >>> that to the floating point intrinsics proposed in order to pick up
>     >>> whatever the "current" state happens to be on systems where
>     this is
>     >>> truly a background stateful thing, while still allowing us to
>     model
>     >>> operation-specific state for other systems. Naturally, there
>     should
>     >>> be a complimenting "store" of the state as well.
>     >>>
>     >>>
>     >>> Then, for code which really needs this degree of faithful FP
>     >>> environment handling, you would expect the #pragma to be present
>     >>> enabling that mode. While that pragma is in place, all floating
>     >>> point operations would be lowered using these intrinsics, and
>     >>> external function calls could be guarded by storing and reloading
>     >>> this state at the IR level. This would make the IR substantially
>     >>> more verbose when the pragma is enabled, but that seems like an
>     >>> acceptable tradeoff given that we expect this code to be rare (see
>     >>> my preconditions section). And naturally, on any system that
>     >>> actually manages FP environment in a state "register" or whatever,
>     >>> we'd want to do some work to try to optimize away state changes.
>     >>> Much like we have attributes that can be inferred about access to
>     >>> memory, we could infer attributes on functions about whether they
>     >>> change the FP environment state, and if not, propagate across the
>     >>> function call boundaries.
>     >>>
>     >>>
>     >>> But even though this would be some amount of work to optimize, the
>     >>> nice thing (IMO) is that it would be localized. We would have
>     >>> specific code that dealt with optimizing the FP environment
>     >>> concerns, while the rest of LLVM could remain oblivious and
>     rely on
>     >>> simple common constructs to provide conservatively correct
>     behavior.
>     >>>
>     >>> What do you think?
>     >>
>     >> SGTM.
>     >>
>     >> -Hal
>     >>
>     >>> -Chandler
>     >>>
>     >>>
>     >>>
>     >>>
>     >>> Thanks again,
>     >>> Hal
>     >>>
>     >>> ----- Original Message -----
>     >>>> From: "Chandler Carruth" < chandlerc at gmail.com
>     <mailto:chandlerc at gmail.com> >
>     >>>> To: "Mehdi Amini" < mehdi.amini at apple.com
>     <mailto:mehdi.amini at apple.com> >, "llvm-dev" <
>     >>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >
>     >>>> Cc: "Steve (Numerics) Canon" < scanon at apple.com
>     <mailto:scanon at apple.com> >, "Sergey
>     >>>> Dmitrouk" < sdmitrouk at accesssoftek.com
>     <mailto:sdmitrouk at accesssoftek.com> >, "David Majnemer"
>     >>>> < david.majnemer at gmail.com <mailto:david.majnemer at gmail.com>
>     >, "Hal Finkel" < hfinkel at anl.gov <mailto:hfinkel at anl.gov> >
>     >>>> Sent: Thursday, February 4, 2016 8:05:38 PM
>     >>>> Subject: Re: [RFC] FP Environment and Rounding mode handling in
>     >>>> LLVM
>     >>>>
>     >>>>
>     >>>> First, thanks Mehdi for putting something on llvm-dev and getting
>     >>>> wider awareness of this.
>     >>>>
>     >>>>
>     >>>> I am actually really interested in finding a way for LLVM to
>     >>>> support
>     >>>> the interesting functionality we are missing from fenv-like
>     >>>> interfaces. Things like rounding modes, exceptions, etc.
>     However, I
>     >>>> think the current design is going to be a really high burden for
>     >>>> the
>     >>>> entire optimizer and I think there is a simpler model that we
>     might
>     >>>> pursue instead.
>     >>>>
>     >>>>
>     >>>> I'll start off with some underlying principles that I'm operating
>     >>>> from:
>     >>>> a) Most code in the world will be very happy with the default
>     >>>> floating point environment, doesn't need to carefully model
>     >>>> floating
>     >>>> point exceptions, etc. Essentially, I think that LLVM's behavior
>     >>>> today is probably right for most code. Now, the code which needs
>     >>>> support for the other features of floating point isn't bad or
>     >>>> unimportant! But it is relatively speaking rare, and so I
>     think it
>     >>>> is reasonable to optimize the *representation* model for the
>     common
>     >>>> case provided we don't lose support for functionality.
>     >>>>
>     >>>>
>     >>>> a) When outside the default floating point environment's rules,
>     >>>> there
>     >>>> are few if any optimizations that we realistically expect from
>     >>>> LLVM.
>     >>>> Certainly, any changes to the LLVM optimizer which impact code
>     >>>> outside the default needs to be done *much* more carefully to
>     avoid
>     >>>> introducing subtle bugs.
>     >>>>
>     >>>>
>     >>>> OK, based on that, consider the following model:
>     >>>> We provide intrinsics that mirror the instructions 'fadd',
>     'fsub',
>     >>>> 'fmul', 'fdiv', and 'frem' (so 5 total). From here on out, I'll
>     >>>> exclusively use 'fadd' as my examples. The intrinsics would look
>     >>>> like:
>     >>>>
>     >>>> declare {f32, i1} @llvm.fadd.with.environment.f32(f32 %lhs, f32
>     >>>> %rhs,
>     >>>> i8 %rounding_mode, i8 %exception_behavior)
>     >>>>
>     >>>>
>     >>>> Then we define specific values to be used for the IEEE rounding
>     >>>> modes. And we define values to control exception behavior.
>     I'm not
>     >>>> an expert on floating point exceptions in particular (my
>     platforms
>     >>>> don't use them) but I'm imagining three states "ignore",
>     "return",
>     >>>> and "trap". I've used a single 'i1', but I'm assuming it
>     would need
>     >>>> to be several i1s or an iN in order to model the set of FP
>     >>>> exceptions. I'm using i1 here just to simplify the explanation, I
>     >>>> think it generalizes and I'll let the experts suggest the exact
>     >>>> formulation.
>     >>>>
>     >>>>
>     >>>> If the default rounding mode is provided to these intrinsics and
>     >>>> the
>     >>>> "ignore" exception behavior is provided, they behave exactly
>     as the
>     >>>> existing instructions do, and instcombine should canonicalize to
>     >>>> the
>     >>>> existing instructions.
>     >>>>
>     >>>>
>     >>>> The semantics of non-default rounding modes are to perform the
>     >>>> operation with that rounding mode.
>     >>>>
>     >>>>
>     >>>> If "return" is provided for the exception behavior, then the i1
>     >>>> component of the result is true if an FP exception occured and
>     >>>> false
>     >>>> otherwise. If "ignore" is provided then any FP exceptions are
>     >>>> ignored and the i1 is always false. If "trap" is provided
>     then the
>     >>>> i1 is always false, but the call to the intrinsic might trap. We
>     >>>> could either define a trap as precisely the same as a call to
>     >>>> @llvm.trap(), or we could introduce an @llvm.fp.trap() and define
>     >>>> it
>     >>>> as a call to that.
>     >>>>
>     >>>>
>     >>>> The frontend would then be responsible for lowering floating
>     point
>     >>>> arithmetic using these intrinsics. This may be somewhat
>     challenging
>     >>>> because in the frontend behavior is controlled dynamically in
>     some
>     >>>> languages. In those situations, we can either allow these
>     >>>> intrinsics
>     >>>> to accept non-constant arguments for %rounding_mode and
>     >>>> %exception_behavior so that frontends can emit code that just
>     >>>> dynamically computes them, or we could follow the same model that
>     >>>> atomics use, and if the frontend cannot trivially compute a
>     >>>> constant, it can emit a switch over the possible states with a
>     >>>> specific intrinsic call in each case. I don't have strong
>     opinions
>     >>>> about which would be best, I think either could be made to work.
>     >>>>
>     >>>>
>     >>>> If we go with constant arguments being required, we could use
>     >>>> "metadata arguments" which aren't actually metadata but just
>     >>>> encoded
>     >>>> arguments for intrinsics.
>     >>>>
>     >>>>
>     >>>> When emitting constants and trying to respect floating point
>     >>>> environment settings, frontends will have to emit runtime calls
>     >>>> instead of actual constants. But this seems actually good because
>     >>>> that is what we'll need anyways -- we aren't able to with full
>     >>>> generality emulate all the environment options if I understand
>     >>>> things correctly (and let me know if I've misunderstood).
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>> The two really big reasons why I like this model much more than
>     >>>> extending flags are:
>     >>>>
>     >>>>
>     >>>> 1) This avoids implicit state. The implicit state of the floating
>     >>>> point environment makes things like code motion extremely hard to
>     >>>> reason about. I think we will just get it wrong too often to make
>     >>>> this a good approach. By modeling all of this as actual SSA
>     values
>     >>>> I
>     >>>> think there is a much better chance we'll get this stuff
>     right. For
>     >>>> example by or-ing all the i1s for floating point exceptions and
>     >>>> testing the result to implement fetestexcept. Then the
>     backend can
>     >>>> spill the state when necessary and reload it when needed even if
>     >>>> other floating point math is introduced. I admit that first class
>     >>>> aggregate returns aren't a beautiful way to encapsulate this, but
>     >>>> they are an *effective* way that we know how to work with in the
>     >>>> LLVM IR. If we ever come up with a better multi-def model, we can
>     >>>> always switch these and all the other intrinsics which need
>     this to
>     >>>> that model.
>     >>>>
>     >>>>
>     >>>> 2) Every pass will conservatively correctly model the operations.
>     >>>> This is most significant when modeling trapping on exceptions. We
>     >>>> need every pass to realize that control flow might not
>     proceed past
>     >>>> such operations. We already have this logic for calls, and it
>     seems
>     >>>> a really nice fit for allowing most of the optimizer to be
>     unaware
>     >>>> of these constructs while respecting them and preserving behavior
>     >>>> in
>     >>>> the face of them.
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>> I suspect that there are things this model doesn't handle
>     that I've
>     >>>> not thought of (as this is outside the are of FP that I'm deeply
>     >>>> familiar with), but I really think this model would be easier to
>     >>>> reason about and would be much less invasive within the IR and
>     >>>> optimizer. I wonder if folks think this could work and would
>     be up
>     >>>> for moving their efforts in this direction?
>     >>>>
>     >>>>
>     >>>> -Chandler
>     >>>>
>     >>>>
>     >>>> On Wed, Feb 3, 2016 at 3:04 PM Mehdi Amini <
>     mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>
>     >>>>>
>     >>>> wrote:
>     >>>>
>     >>>>
>     >>>> Hi everyone,
>     >>>>
>     >>>> Sergey (CC’ed) worked on a series of patches to add support for
>     >>>> floating-point environment and floating-point rounding modes in
>     >>>> LLVM.
>     >>>> This started *in 2014* and the patches after multiple rounds of
>     >>>> review in the last months (involving amongst other Steve
>     Canon, Hal
>     >>>> Finkel, David Majnemer, and myself) are getting very close
>     (IMO) to
>     >>>> be in a state where we can land them.
>     >>>>
>     >>>> This is the thread that started this development: “ [LLVMdev]
>     More
>     >>>> careful treatment of floating point exceptions"
>     >>>> http://marc.info/?l=llvm-dev&m=141113983302113&w=2
>     >>>> And this is the thread where most of the discussion on the design
>     >>>> occurred: "[PATCH] Flag to enable IEEE-754 friendly FP
>     >>>> optimizations”
>     >>>> http://marc.info/?l=llvm-commits&m=141235814915999&w=2
>     >>>>
>     >>>> Since Chandler raised some concerns on IRC today, so I figured I
>     >>>> should send a heads-up on this topic to allow any one to
>     comment on
>     >>>> the current plan.
>     >>>>
>     >>>> We plan on adding two new FP env flags to the existing FMF
>     >>>> (fast-math
>     >>>> flags). Without these flags set, the optimizer has to assume that
>     >>>> the FP env can be observed, or the rounding mode can be changed.
>     >>>> For
>     >>>> clang, these flags would be set unless a command line option
>     would
>     >>>> require to preserve the FP env.
>     >>>>
>     >>>> Here is the list of patches:
>     >>>>
>     >>>> [FPEnv Core 01/14] Add flags and command-line switches for FPEnv:
>     >>>> http://reviews.llvm.org/D14066
>     >>>> [FPEnv Core 02/14] Add FPEnv access flags to fast-math flags:
>     >>>> http://reviews.llvm.org/D14067
>     >>>> [FPEnv Core 03/14] Make SelectionDAG aware of FPEnv flags:
>     >>>> http://reviews.llvm.org/D14068
>     >>>> [FPEnv Core 04/14] Skip constant folding to preserve FPEnv:
>     >>>> http://reviews.llvm.org/D14069
>     >>>> [FPEnv Core 05/14] Teach IR builder and folders about new flags:
>     >>>> http://reviews.llvm.org/D14070
>     >>>> [FPEnv Core 06/14] Do not fold constants on reading in IR
>     >>>> asm/bitcode: http://reviews.llvm.org/D14071
>     >>>> [FPEnv Core 07/14] Prevent undesired folding by InstSimplify:
>     >>>> http://reviews.llvm.org/D14072
>     >>>> [FPEnv Core 08/14] Do not simplify expressions with FPEnv access:
>     >>>> http://reviews.llvm.org/D14073
>     >>>> [FPEnv Core 09/14] Make Strict flag available for more clients:
>     >>>> http://reviews.llvm.org/D14074
>     >>>> [FPEnv Core 10/14] Use Strict in IRBuilder:
>     >>>> http://reviews.llvm.org/D14075
>     >>>> [FPEnv Core 11/14] Don't convert fpops to constexprs in SCCP:
>     >>>> http://reviews.llvm.org/D14076
>     >>>> [FPEnv Core 13/14] Don't hoist FP-ops with side-effects in LICM:
>     >>>> http://reviews.llvm.org/D14078
>     >>>> [FPEnv Core 14/14] Introduce F*_W_CHAIN instrs to prevent
>     >>>> reordering:
>     >>>> http://reviews.llvm.org/D14079
>     >>>>
>     >>>>
>     >>>> —
>     >>>> Mehdi
>     >>>>
>     >>>>
>     >>>
>     >>> --
>     >>> Hal Finkel
>     >>> Assistant Computational Scientist
>     >>> Leadership Computing Facility
>     >>> Argonne National Laboratory
>     >>> _______________________________________________
>     >>> LLVM Developers mailing list
>     >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     >>>
>     >>
>     >> --
>     >> Hal Finkel
>     >> Assistant Computational Scientist
>     >> Leadership Computing Facility
>     >> Argonne National Laboratory
>     >> _______________________________________________
>     >> LLVM Developers mailing list
>     >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     >
>     > _______________________________________________
>     > LLVM Developers mailing list
>     > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160209/3d037e79/attachment-0001.html>