[llvm-dev] [RFC] FP Environment and Rounding mode handling in LLVM
Philip Reames via llvm-dev
llvm-dev at lists.llvm.org
Tue Feb 9 19:57:57 PST 2016
+1 to this. Having it structured this way would make things much easier
if we someday decided to promote these intrinsics to instructions or
merge them (via non-optional modifiers like "volatile") with the
existing floating point instructions.
Philip
On 02/05/2016 06:03 PM, Chandler Carruth via llvm-dev wrote:
> Agreed.
>
> On Fri, Feb 5, 2016 at 5:54 PM Pete Cooper via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> FWIW, +1 from me.
>
> Just one request on the implementation though. However we model
> these intrinsics and their properties (metadata, constants, etc),
> can we please abstract away those details the same way we have
> MemCpyInst which just wraps an IntrinsicInst?
>
> I think this would be very beneficial if we ever need to add more
> state, or change something about the underlying implementation,
> and not have to search all the code for ‘bool traps =
> cast<ConstantInt>(I->getOperand(1))->getZextValue()’ or whatever
> it happens to be.
>
> Pete
> > On Feb 5, 2016, at 4:36 PM, Stephen Canon via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> >
> > Seems like everyone’s on board, but I want to mention that I
> also think this is very much the right approach. In particular,
> it allows us to support both existing CPU designs with dynamic
> rounding modes as well as GPU designs and soft-float libraries
> with statically specified rounding.
> >
> > Support for “I want the flags, but I really don’t care about
> when they happen specifically” is somewhat interesting; I assume
> this would take the form of “returning” the flag state and OR-ing
> it into an integer that represents the cumulative flags (much like
> common cpu hardware does, but this would also let us support
> soft-float implementations). This wouldn’t impose ordering
> restrictions, but would prevent speculation.
> >
> > – Steve
> >
> >> On Feb 5, 2016, at 4:25 PM, Hal Finkel via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> >>
> >> ----- Original Message -----
> >>> From: "Chandler Carruth" <chandlerc at gmail.com
> <mailto:chandlerc at gmail.com>>
> >>> To: "Hal Finkel" <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>,
> "Chandler Carruth" <chandlerc at gmail.com <mailto:chandlerc at gmail.com>>
> >>> Cc: "llvm-dev" <llvm-dev at lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>>
> >>> Sent: Friday, February 5, 2016 4:36:54 PM
> >>> Subject: Re: [llvm-dev] [RFC] FP Environment and Rounding mode
> handling in LLVM
> >>>
> >>> On Fri, Feb 5, 2016 at 2:10 PM Hal Finkel via llvm-dev <
> >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > wrote:
> >>>
> >>>
> >>> Hi Chandler,
> >>>
> >>> This scheme has significant advantages over what was being
> pursued,
> >>> but one question (or two)...
> >>>
> >>> Under the proposed system, how would you represent the necessary
> >>> dependency edges between the fp intrinsics and function calls? How
> >>> is the state 'returned' to the caller? [I was thinking that
> our new
> >>> operand bundles could help for the inputs, but the outputs? Plus
> >>> what about the live-in state?]
> >>>
> >>> This is important because any external subroutine call could
> >>> (potentially) change the rounding mode or any other part of the
> >>> floating-point environment.
> >>>
> >>>
> >>>
> >>> So, one thing that was missing in my original email and that
> talking
> >>> with Steve Canon offline clarified was that we need a way to
> >>> directly query the current modes for systems where those can
> be set
> >>> externally.
> >>>
> >>>
> >>> My suggestion was to have an intrinsic that "loads" this
> state. This
> >>> could then be used to load whatever the current state is, and pass
> >>> that to the floating point intrinsics proposed in order to pick up
> >>> whatever the "current" state happens to be on systems where
> this is
> >>> truly a background stateful thing, while still allowing us to
> model
> >>> operation-specific state for other systems. Naturally, there
> should
> >>> be a complimenting "store" of the state as well.
> >>>
> >>>
> >>> Then, for code which really needs this degree of faithful FP
> >>> environment handling, you would expect the #pragma to be present
> >>> enabling that mode. While that pragma is in place, all floating
> >>> point operations would be lowered using these intrinsics, and
> >>> external function calls could be guarded by storing and reloading
> >>> this state at the IR level. This would make the IR substantially
> >>> more verbose when the pragma is enabled, but that seems like an
> >>> acceptable tradeoff given that we expect this code to be rare (see
> >>> my preconditions section). And naturally, on any system that
> >>> actually manages FP environment in a state "register" or whatever,
> >>> we'd want to do some work to try to optimize away state changes.
> >>> Much like we have attributes that can be inferred about access to
> >>> memory, we could infer attributes on functions about whether they
> >>> change the FP environment state, and if not, propagate across the
> >>> function call boundaries.
> >>>
> >>>
> >>> But even though this would be some amount of work to optimize, the
> >>> nice thing (IMO) is that it would be localized. We would have
> >>> specific code that dealt with optimizing the FP environment
> >>> concerns, while the rest of LLVM could remain oblivious and
> rely on
> >>> simple common constructs to provide conservatively correct
> behavior.
> >>>
> >>> What do you think?
> >>
> >> SGTM.
> >>
> >> -Hal
> >>
> >>> -Chandler
> >>>
> >>>
> >>>
> >>>
> >>> Thanks again,
> >>> Hal
> >>>
> >>> ----- Original Message -----
> >>>> From: "Chandler Carruth" < chandlerc at gmail.com
> <mailto:chandlerc at gmail.com> >
> >>>> To: "Mehdi Amini" < mehdi.amini at apple.com
> <mailto:mehdi.amini at apple.com> >, "llvm-dev" <
> >>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >
> >>>> Cc: "Steve (Numerics) Canon" < scanon at apple.com
> <mailto:scanon at apple.com> >, "Sergey
> >>>> Dmitrouk" < sdmitrouk at accesssoftek.com
> <mailto:sdmitrouk at accesssoftek.com> >, "David Majnemer"
> >>>> < david.majnemer at gmail.com <mailto:david.majnemer at gmail.com>
> >, "Hal Finkel" < hfinkel at anl.gov <mailto:hfinkel at anl.gov> >
> >>>> Sent: Thursday, February 4, 2016 8:05:38 PM
> >>>> Subject: Re: [RFC] FP Environment and Rounding mode handling in
> >>>> LLVM
> >>>>
> >>>>
> >>>> First, thanks Mehdi for putting something on llvm-dev and getting
> >>>> wider awareness of this.
> >>>>
> >>>>
> >>>> I am actually really interested in finding a way for LLVM to
> >>>> support
> >>>> the interesting functionality we are missing from fenv-like
> >>>> interfaces. Things like rounding modes, exceptions, etc.
> However, I
> >>>> think the current design is going to be a really high burden for
> >>>> the
> >>>> entire optimizer and I think there is a simpler model that we
> might
> >>>> pursue instead.
> >>>>
> >>>>
> >>>> I'll start off with some underlying principles that I'm operating
> >>>> from:
> >>>> a) Most code in the world will be very happy with the default
> >>>> floating point environment, doesn't need to carefully model
> >>>> floating
> >>>> point exceptions, etc. Essentially, I think that LLVM's behavior
> >>>> today is probably right for most code. Now, the code which needs
> >>>> support for the other features of floating point isn't bad or
> >>>> unimportant! But it is relatively speaking rare, and so I
> think it
> >>>> is reasonable to optimize the *representation* model for the
> common
> >>>> case provided we don't lose support for functionality.
> >>>>
> >>>>
> >>>> a) When outside the default floating point environment's rules,
> >>>> there
> >>>> are few if any optimizations that we realistically expect from
> >>>> LLVM.
> >>>> Certainly, any changes to the LLVM optimizer which impact code
> >>>> outside the default needs to be done *much* more carefully to
> avoid
> >>>> introducing subtle bugs.
> >>>>
> >>>>
> >>>> OK, based on that, consider the following model:
> >>>> We provide intrinsics that mirror the instructions 'fadd',
> 'fsub',
> >>>> 'fmul', 'fdiv', and 'frem' (so 5 total). From here on out, I'll
> >>>> exclusively use 'fadd' as my examples. The intrinsics would look
> >>>> like:
> >>>>
> >>>> declare {f32, i1} @llvm.fadd.with.environment.f32(f32 %lhs, f32
> >>>> %rhs,
> >>>> i8 %rounding_mode, i8 %exception_behavior)
> >>>>
> >>>>
> >>>> Then we define specific values to be used for the IEEE rounding
> >>>> modes. And we define values to control exception behavior.
> I'm not
> >>>> an expert on floating point exceptions in particular (my
> platforms
> >>>> don't use them) but I'm imagining three states "ignore",
> "return",
> >>>> and "trap". I've used a single 'i1', but I'm assuming it
> would need
> >>>> to be several i1s or an iN in order to model the set of FP
> >>>> exceptions. I'm using i1 here just to simplify the explanation, I
> >>>> think it generalizes and I'll let the experts suggest the exact
> >>>> formulation.
> >>>>
> >>>>
> >>>> If the default rounding mode is provided to these intrinsics and
> >>>> the
> >>>> "ignore" exception behavior is provided, they behave exactly
> as the
> >>>> existing instructions do, and instcombine should canonicalize to
> >>>> the
> >>>> existing instructions.
> >>>>
> >>>>
> >>>> The semantics of non-default rounding modes are to perform the
> >>>> operation with that rounding mode.
> >>>>
> >>>>
> >>>> If "return" is provided for the exception behavior, then the i1
> >>>> component of the result is true if an FP exception occured and
> >>>> false
> >>>> otherwise. If "ignore" is provided then any FP exceptions are
> >>>> ignored and the i1 is always false. If "trap" is provided
> then the
> >>>> i1 is always false, but the call to the intrinsic might trap. We
> >>>> could either define a trap as precisely the same as a call to
> >>>> @llvm.trap(), or we could introduce an @llvm.fp.trap() and define
> >>>> it
> >>>> as a call to that.
> >>>>
> >>>>
> >>>> The frontend would then be responsible for lowering floating
> point
> >>>> arithmetic using these intrinsics. This may be somewhat
> challenging
> >>>> because in the frontend behavior is controlled dynamically in
> some
> >>>> languages. In those situations, we can either allow these
> >>>> intrinsics
> >>>> to accept non-constant arguments for %rounding_mode and
> >>>> %exception_behavior so that frontends can emit code that just
> >>>> dynamically computes them, or we could follow the same model that
> >>>> atomics use, and if the frontend cannot trivially compute a
> >>>> constant, it can emit a switch over the possible states with a
> >>>> specific intrinsic call in each case. I don't have strong
> opinions
> >>>> about which would be best, I think either could be made to work.
> >>>>
> >>>>
> >>>> If we go with constant arguments being required, we could use
> >>>> "metadata arguments" which aren't actually metadata but just
> >>>> encoded
> >>>> arguments for intrinsics.
> >>>>
> >>>>
> >>>> When emitting constants and trying to respect floating point
> >>>> environment settings, frontends will have to emit runtime calls
> >>>> instead of actual constants. But this seems actually good because
> >>>> that is what we'll need anyways -- we aren't able to with full
> >>>> generality emulate all the environment options if I understand
> >>>> things correctly (and let me know if I've misunderstood).
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> The two really big reasons why I like this model much more than
> >>>> extending flags are:
> >>>>
> >>>>
> >>>> 1) This avoids implicit state. The implicit state of the floating
> >>>> point environment makes things like code motion extremely hard to
> >>>> reason about. I think we will just get it wrong too often to make
> >>>> this a good approach. By modeling all of this as actual SSA
> values
> >>>> I
> >>>> think there is a much better chance we'll get this stuff
> right. For
> >>>> example by or-ing all the i1s for floating point exceptions and
> >>>> testing the result to implement fetestexcept. Then the
> backend can
> >>>> spill the state when necessary and reload it when needed even if
> >>>> other floating point math is introduced. I admit that first class
> >>>> aggregate returns aren't a beautiful way to encapsulate this, but
> >>>> they are an *effective* way that we know how to work with in the
> >>>> LLVM IR. If we ever come up with a better multi-def model, we can
> >>>> always switch these and all the other intrinsics which need
> this to
> >>>> that model.
> >>>>
> >>>>
> >>>> 2) Every pass will conservatively correctly model the operations.
> >>>> This is most significant when modeling trapping on exceptions. We
> >>>> need every pass to realize that control flow might not
> proceed past
> >>>> such operations. We already have this logic for calls, and it
> seems
> >>>> a really nice fit for allowing most of the optimizer to be
> unaware
> >>>> of these constructs while respecting them and preserving behavior
> >>>> in
> >>>> the face of them.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> I suspect that there are things this model doesn't handle
> that I've
> >>>> not thought of (as this is outside the are of FP that I'm deeply
> >>>> familiar with), but I really think this model would be easier to
> >>>> reason about and would be much less invasive within the IR and
> >>>> optimizer. I wonder if folks think this could work and would
> be up
> >>>> for moving their efforts in this direction?
> >>>>
> >>>>
> >>>> -Chandler
> >>>>
> >>>>
> >>>> On Wed, Feb 3, 2016 at 3:04 PM Mehdi Amini <
> mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>
> >>>>>
> >>>> wrote:
> >>>>
> >>>>
> >>>> Hi everyone,
> >>>>
> >>>> Sergey (CC’ed) worked on a series of patches to add support for
> >>>> floating-point environment and floating-point rounding modes in
> >>>> LLVM.
> >>>> This started *in 2014* and the patches after multiple rounds of
> >>>> review in the last months (involving amongst other Steve
> Canon, Hal
> >>>> Finkel, David Majnemer, and myself) are getting very close
> (IMO) to
> >>>> be in a state where we can land them.
> >>>>
> >>>> This is the thread that started this development: “ [LLVMdev]
> More
> >>>> careful treatment of floating point exceptions"
> >>>> http://marc.info/?l=llvm-dev&m=141113983302113&w=2
> >>>> And this is the thread where most of the discussion on the design
> >>>> occurred: "[PATCH] Flag to enable IEEE-754 friendly FP
> >>>> optimizations”
> >>>> http://marc.info/?l=llvm-commits&m=141235814915999&w=2
> >>>>
> >>>> Since Chandler raised some concerns on IRC today, so I figured I
> >>>> should send a heads-up on this topic to allow any one to
> comment on
> >>>> the current plan.
> >>>>
> >>>> We plan on adding two new FP env flags to the existing FMF
> >>>> (fast-math
> >>>> flags). Without these flags set, the optimizer has to assume that
> >>>> the FP env can be observed, or the rounding mode can be changed.
> >>>> For
> >>>> clang, these flags would be set unless a command line option
> would
> >>>> require to preserve the FP env.
> >>>>
> >>>> Here is the list of patches:
> >>>>
> >>>> [FPEnv Core 01/14] Add flags and command-line switches for FPEnv:
> >>>> http://reviews.llvm.org/D14066
> >>>> [FPEnv Core 02/14] Add FPEnv access flags to fast-math flags:
> >>>> http://reviews.llvm.org/D14067
> >>>> [FPEnv Core 03/14] Make SelectionDAG aware of FPEnv flags:
> >>>> http://reviews.llvm.org/D14068
> >>>> [FPEnv Core 04/14] Skip constant folding to preserve FPEnv:
> >>>> http://reviews.llvm.org/D14069
> >>>> [FPEnv Core 05/14] Teach IR builder and folders about new flags:
> >>>> http://reviews.llvm.org/D14070
> >>>> [FPEnv Core 06/14] Do not fold constants on reading in IR
> >>>> asm/bitcode: http://reviews.llvm.org/D14071
> >>>> [FPEnv Core 07/14] Prevent undesired folding by InstSimplify:
> >>>> http://reviews.llvm.org/D14072
> >>>> [FPEnv Core 08/14] Do not simplify expressions with FPEnv access:
> >>>> http://reviews.llvm.org/D14073
> >>>> [FPEnv Core 09/14] Make Strict flag available for more clients:
> >>>> http://reviews.llvm.org/D14074
> >>>> [FPEnv Core 10/14] Use Strict in IRBuilder:
> >>>> http://reviews.llvm.org/D14075
> >>>> [FPEnv Core 11/14] Don't convert fpops to constexprs in SCCP:
> >>>> http://reviews.llvm.org/D14076
> >>>> [FPEnv Core 13/14] Don't hoist FP-ops with side-effects in LICM:
> >>>> http://reviews.llvm.org/D14078
> >>>> [FPEnv Core 14/14] Introduce F*_W_CHAIN instrs to prevent
> >>>> reordering:
> >>>> http://reviews.llvm.org/D14079
> >>>>
> >>>>
> >>>> —
> >>>> Mehdi
> >>>>
> >>>>
> >>>
> >>> --
> >>> Hal Finkel
> >>> Assistant Computational Scientist
> >>> Leadership Computing Facility
> >>> Argonne National Laboratory
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>
> >>
> >> --
> >> Hal Finkel
> >> Assistant Computational Scientist
> >> Leadership Computing Facility
> >> Argonne National Laboratory
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160209/3d037e79/attachment-0001.html>
More information about the llvm-dev
mailing list