[llvm-dev] [RFC] FP Environment and Rounding mode handling in LLVM

Fri Feb 5 18:03:24 PST 2016

Agreed.

On Fri, Feb 5, 2016 at 5:54 PM Pete Cooper via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> FWIW, +1 from me.
>
> Just one request on the implementation though.  However we model these
> intrinsics and their properties (metadata, constants, etc), can we please
> abstract away those details the same way we have MemCpyInst which just
> wraps an IntrinsicInst?
>
> I think this would be very beneficial if we ever need to add more state,
> or change something about the underlying implementation, and not have to
> search all the code for ‘bool traps =
> cast<ConstantInt>(I->getOperand(1))->getZextValue()’ or whatever it happens
> to be.
>
> Pete
> > On Feb 5, 2016, at 4:36 PM, Stephen Canon via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > Seems like everyone’s on board, but I want to mention that I also think
> this is very much the right approach.  In particular, it allows us to
> support both existing CPU designs with dynamic rounding modes as well as
> GPU designs and soft-float libraries with statically specified rounding.
> >
> > Support for “I want the flags, but I really don’t care about when they
> happen specifically” is somewhat interesting; I assume this would take the
> form of “returning” the flag state and OR-ing it into an integer that
> represents the cumulative flags (much like common cpu hardware does, but
> this would also let us support soft-float implementations).  This wouldn’t
> impose ordering restrictions, but would prevent speculation.
> >
> > – Steve
> >
> >> On Feb 5, 2016, at 4:25 PM, Hal Finkel via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>
> >> ----- Original Message -----
> >>> From: "Chandler Carruth" <chandlerc at gmail.com>
> >>> To: "Hal Finkel" <hfinkel at anl.gov>, "Chandler Carruth" <
> chandlerc at gmail.com>
> >>> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> >>> Sent: Friday, February 5, 2016 4:36:54 PM
> >>> Subject: Re: [llvm-dev] [RFC] FP Environment and Rounding mode
> handling in LLVM
> >>>
> >>> On Fri, Feb 5, 2016 at 2:10 PM Hal Finkel via llvm-dev <
> >>> llvm-dev at lists.llvm.org > wrote:
> >>>
> >>>
> >>> Hi Chandler,
> >>>
> >>> This scheme has significant advantages over what was being pursued,
> >>> but one question (or two)...
> >>>
> >>> Under the proposed system, how would you represent the necessary
> >>> dependency edges between the fp intrinsics and function calls? How
> >>> is the state 'returned' to the caller? [I was thinking that our new
> >>> operand bundles could help for the inputs, but the outputs? Plus
> >>> what about the live-in state?]
> >>>
> >>> This is important because any external subroutine call could
> >>> (potentially) change the rounding mode or any other part of the
> >>> floating-point environment.
> >>>
> >>>
> >>>
> >>> So, one thing that was missing in my original email and that talking
> >>> with Steve Canon offline clarified was that we need a way to
> >>> directly query the current modes for systems where those can be set
> >>> externally.
> >>>
> >>>
> >>> My suggestion was to have an intrinsic that "loads" this state. This
> >>> could then be used to load whatever the current state is, and pass
> >>> that to the floating point intrinsics proposed in order to pick up
> >>> whatever the "current" state happens to be on systems where this is
> >>> truly a background stateful thing, while still allowing us to model
> >>> operation-specific state for other systems. Naturally, there should
> >>> be a complimenting "store" of the state as well.
> >>>
> >>>
> >>> Then, for code which really needs this degree of faithful FP
> >>> environment handling, you would expect the #pragma to be present
> >>> enabling that mode. While that pragma is in place, all floating
> >>> point operations would be lowered using these intrinsics, and
> >>> external function calls could be guarded by storing and reloading
> >>> this state at the IR level. This would make the IR substantially
> >>> more verbose when the pragma is enabled, but that seems like an
> >>> acceptable tradeoff given that we expect this code to be rare (see
> >>> my preconditions section). And naturally, on any system that
> >>> actually manages FP environment in a state "register" or whatever,
> >>> we'd want to do some work to try to optimize away state changes.
> >>> Much like we have attributes that can be inferred about access to
> >>> memory, we could infer attributes on functions about whether they
> >>> change the FP environment state, and if not, propagate across the
> >>> function call boundaries.
> >>>
> >>>
> >>> But even though this would be some amount of work to optimize, the
> >>> nice thing (IMO) is that it would be localized. We would have
> >>> specific code that dealt with optimizing the FP environment
> >>> concerns, while the rest of LLVM could remain oblivious and rely on
> >>> simple common constructs to provide conservatively correct behavior.
> >>>
> >>> What do you think?
> >>
> >> SGTM.
> >>
> >> -Hal
> >>
> >>> -Chandler
> >>>
> >>>
> >>>
> >>>
> >>> Thanks again,
> >>> Hal
> >>>
> >>> ----- Original Message -----
> >>>> From: "Chandler Carruth" < chandlerc at gmail.com >
> >>>> To: "Mehdi Amini" < mehdi.amini at apple.com >, "llvm-dev" <
> >>>> llvm-dev at lists.llvm.org >
> >>>> Cc: "Steve (Numerics) Canon" < scanon at apple.com >, "Sergey
> >>>> Dmitrouk" < sdmitrouk at accesssoftek.com >, "David Majnemer"
> >>>> < david.majnemer at gmail.com >, "Hal Finkel" < hfinkel at anl.gov >
> >>>> Sent: Thursday, February 4, 2016 8:05:38 PM
> >>>> Subject: Re: [RFC] FP Environment and Rounding mode handling in
> >>>> LLVM
> >>>>
> >>>>
> >>>> First, thanks Mehdi for putting something on llvm-dev and getting
> >>>> wider awareness of this.
> >>>>
> >>>>
> >>>> I am actually really interested in finding a way for LLVM to
> >>>> support
> >>>> the interesting functionality we are missing from fenv-like
> >>>> interfaces. Things like rounding modes, exceptions, etc. However, I
> >>>> think the current design is going to be a really high burden for
> >>>> the
> >>>> entire optimizer and I think there is a simpler model that we might
> >>>> pursue instead.
> >>>>
> >>>>
> >>>> I'll start off with some underlying principles that I'm operating
> >>>> from:
> >>>> a) Most code in the world will be very happy with the default
> >>>> floating point environment, doesn't need to carefully model
> >>>> floating
> >>>> point exceptions, etc. Essentially, I think that LLVM's behavior
> >>>> today is probably right for most code. Now, the code which needs
> >>>> support for the other features of floating point isn't bad or
> >>>> unimportant! But it is relatively speaking rare, and so I think it
> >>>> is reasonable to optimize the *representation* model for the common
> >>>> case provided we don't lose support for functionality.
> >>>>
> >>>>
> >>>> a) When outside the default floating point environment's rules,
> >>>> there
> >>>> are few if any optimizations that we realistically expect from
> >>>> LLVM.
> >>>> Certainly, any changes to the LLVM optimizer which impact code
> >>>> outside the default needs to be done *much* more carefully to avoid
> >>>> introducing subtle bugs.
> >>>>
> >>>>
> >>>> OK, based on that, consider the following model:
> >>>> We provide intrinsics that mirror the instructions 'fadd', 'fsub',
> >>>> 'fmul', 'fdiv', and 'frem' (so 5 total). From here on out, I'll
> >>>> exclusively use 'fadd' as my examples. The intrinsics would look
> >>>> like:
> >>>>
> >>>> declare {f32, i1} @llvm.fadd.with.environment.f32(f32 %lhs, f32
> >>>> %rhs,
> >>>> i8 %rounding_mode, i8 %exception_behavior)
> >>>>
> >>>>
> >>>> Then we define specific values to be used for the IEEE rounding
> >>>> modes. And we define values to control exception behavior. I'm not
> >>>> an expert on floating point exceptions in particular (my platforms
> >>>> don't use them) but I'm imagining three states "ignore", "return",
> >>>> and "trap". I've used a single 'i1', but I'm assuming it would need
> >>>> to be several i1s or an iN in order to model the set of FP
> >>>> exceptions. I'm using i1 here just to simplify the explanation, I
> >>>> think it generalizes and I'll let the experts suggest the exact
> >>>> formulation.
> >>>>
> >>>>
> >>>> If the default rounding mode is provided to these intrinsics and
> >>>> the
> >>>> "ignore" exception behavior is provided, they behave exactly as the
> >>>> existing instructions do, and instcombine should canonicalize to
> >>>> the
> >>>> existing instructions.
> >>>>
> >>>>
> >>>> The semantics of non-default rounding modes are to perform the
> >>>> operation with that rounding mode.
> >>>>
> >>>>
> >>>> If "return" is provided for the exception behavior, then the i1
> >>>> component of the result is true if an FP exception occured and
> >>>> false
> >>>> otherwise. If "ignore" is provided then any FP exceptions are
> >>>> ignored and the i1 is always false. If "trap" is provided then the
> >>>> i1 is always false, but the call to the intrinsic might trap. We
> >>>> could either define a trap as precisely the same as a call to
> >>>> @llvm.trap(), or we could introduce an @llvm.fp.trap() and define
> >>>> it
> >>>> as a call to that.
> >>>>
> >>>>
> >>>> The frontend would then be responsible for lowering floating point
> >>>> arithmetic using these intrinsics. This may be somewhat challenging
> >>>> because in the frontend behavior is controlled dynamically in some
> >>>> languages. In those situations, we can either allow these
> >>>> intrinsics
> >>>> to accept non-constant arguments for %rounding_mode and
> >>>> %exception_behavior so that frontends can emit code that just
> >>>> dynamically computes them, or we could follow the same model that
> >>>> atomics use, and if the frontend cannot trivially compute a
> >>>> constant, it can emit a switch over the possible states with a
> >>>> specific intrinsic call in each case. I don't have strong opinions
> >>>> about which would be best, I think either could be made to work.
> >>>>
> >>>>
> >>>> If we go with constant arguments being required, we could use
> >>>> "metadata arguments" which aren't actually metadata but just
> >>>> encoded
> >>>> arguments for intrinsics.
> >>>>
> >>>>
> >>>> When emitting constants and trying to respect floating point
> >>>> environment settings, frontends will have to emit runtime calls
> >>>> instead of actual constants. But this seems actually good because
> >>>> that is what we'll need anyways -- we aren't able to with full
> >>>> generality emulate all the environment options if I understand
> >>>> things correctly (and let me know if I've misunderstood).
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> The two really big reasons why I like this model much more than
> >>>> extending flags are:
> >>>>
> >>>>
> >>>> 1) This avoids implicit state. The implicit state of the floating
> >>>> point environment makes things like code motion extremely hard to
> >>>> reason about. I think we will just get it wrong too often to make
> >>>> this a good approach. By modeling all of this as actual SSA values
> >>>> I
> >>>> think there is a much better chance we'll get this stuff right. For
> >>>> example by or-ing all the i1s for floating point exceptions and
> >>>> testing the result to implement fetestexcept. Then the backend can
> >>>> spill the state when necessary and reload it when needed even if
> >>>> other floating point math is introduced. I admit that first class
> >>>> aggregate returns aren't a beautiful way to encapsulate this, but
> >>>> they are an *effective* way that we know how to work with in the
> >>>> LLVM IR. If we ever come up with a better multi-def model, we can
> >>>> always switch these and all the other intrinsics which need this to
> >>>> that model.
> >>>>
> >>>>
> >>>> 2) Every pass will conservatively correctly model the operations.
> >>>> This is most significant when modeling trapping on exceptions. We
> >>>> need every pass to realize that control flow might not proceed past
> >>>> such operations. We already have this logic for calls, and it seems
> >>>> a really nice fit for allowing most of the optimizer to be unaware
> >>>> of these constructs while respecting them and preserving behavior
> >>>> in
> >>>> the face of them.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> I suspect that there are things this model doesn't handle that I've
> >>>> not thought of (as this is outside the are of FP that I'm deeply
> >>>> familiar with), but I really think this model would be easier to
> >>>> reason about and would be much less invasive within the IR and
> >>>> optimizer. I wonder if folks think this could work and would be up
> >>>> for moving their efforts in this direction?
> >>>>
> >>>>
> >>>> -Chandler
> >>>>
> >>>>
> >>>> On Wed, Feb 3, 2016 at 3:04 PM Mehdi Amini < mehdi.amini at apple.com
> >>>>>
> >>>> wrote:
> >>>>
> >>>>
> >>>> Hi everyone,
> >>>>
> >>>> Sergey (CC’ed) worked on a series of patches to add support for
> >>>> floating-point environment and floating-point rounding modes in
> >>>> LLVM.
> >>>> This started *in 2014* and the patches after multiple rounds of
> >>>> review in the last months (involving amongst other Steve Canon, Hal
> >>>> Finkel, David Majnemer, and myself) are getting very close (IMO) to
> >>>> be in a state where we can land them.
> >>>>
> >>>> This is the thread that started this development: “ [LLVMdev] More
> >>>> careful treatment of floating point exceptions"
> >>>> http://marc.info/?l=llvm-dev&m=141113983302113&w=2
> >>>> And this is the thread where most of the discussion on the design
> >>>> occurred: "[PATCH] Flag to enable IEEE-754 friendly FP
> >>>> optimizations”
> >>>> http://marc.info/?l=llvm-commits&m=141235814915999&w=2
> >>>>
> >>>> Since Chandler raised some concerns on IRC today, so I figured I
> >>>> should send a heads-up on this topic to allow any one to comment on
> >>>> the current plan.
> >>>>
> >>>> We plan on adding two new FP env flags to the existing FMF
> >>>> (fast-math
> >>>> flags). Without these flags set, the optimizer has to assume that
> >>>> the FP env can be observed, or the rounding mode can be changed.
> >>>> For
> >>>> clang, these flags would be set unless a command line option would
> >>>> require to preserve the FP env.
> >>>>
> >>>> Here is the list of patches:
> >>>>
> >>>> [FPEnv Core 01/14] Add flags and command-line switches for FPEnv:
> >>>> http://reviews.llvm.org/D14066
> >>>> [FPEnv Core 02/14] Add FPEnv access flags to fast-math flags:
> >>>> http://reviews.llvm.org/D14067
> >>>> [FPEnv Core 03/14] Make SelectionDAG aware of FPEnv flags:
> >>>> http://reviews.llvm.org/D14068
> >>>> [FPEnv Core 04/14] Skip constant folding to preserve FPEnv:
> >>>> http://reviews.llvm.org/D14069
> >>>> [FPEnv Core 05/14] Teach IR builder and folders about new flags:
> >>>> http://reviews.llvm.org/D14070
> >>>> [FPEnv Core 06/14] Do not fold constants on reading in IR
> >>>> asm/bitcode: http://reviews.llvm.org/D14071
> >>>> [FPEnv Core 07/14] Prevent undesired folding by InstSimplify:
> >>>> http://reviews.llvm.org/D14072
> >>>> [FPEnv Core 08/14] Do not simplify expressions with FPEnv access:
> >>>> http://reviews.llvm.org/D14073
> >>>> [FPEnv Core 09/14] Make Strict flag available for more clients:
> >>>> http://reviews.llvm.org/D14074
> >>>> [FPEnv Core 10/14] Use Strict in IRBuilder:
> >>>> http://reviews.llvm.org/D14075
> >>>> [FPEnv Core 11/14] Don't convert fpops to constexprs in SCCP:
> >>>> http://reviews.llvm.org/D14076
> >>>> [FPEnv Core 13/14] Don't hoist FP-ops with side-effects in LICM:
> >>>> http://reviews.llvm.org/D14078
> >>>> [FPEnv Core 14/14] Introduce F*_W_CHAIN instrs to prevent
> >>>> reordering:
> >>>> http://reviews.llvm.org/D14079
> >>>>
> >>>>
> >>>> —
> >>>> Mehdi
> >>>>
> >>>>
> >>>
> >>> --
> >>> Hal Finkel
> >>> Assistant Computational Scientist
> >>> Leadership Computing Facility
> >>> Argonne National Laboratory
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>
> >>
> >> --
> >> Hal Finkel
> >> Assistant Computational Scientist
> >> Leadership Computing Facility
> >> Argonne National Laboratory
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160206/4a18116f/attachment-0001.html>