[llvm-dev] Update on strict FP status

Hal Finkel via llvm-dev llvm-dev at lists.llvm.org
Wed May 23 09:19:55 PDT 2018


On 05/23/2018 11:06 AM, Hubert Tong via llvm-dev wrote:
> Hi Ulrich,
>
> I am interested in knowing if the current proposals also take into
> account the FP_CONTRACT pragma

We should already do this (we turn relevant operations into the
@llvm.fmuladd. when FP_CONTRACT is set to on during IR generation).

> and the ability to implement options that imply a specific value for
> the FLT_EVAL_METHOD macro.

What do you mean by this?

 -Hal

>
> Additionally, I am not aware of the IR being able to represent the
> potentially deferred loss of precision that the C language semantics
> provide; in particular, applying such semantics to the existing IR
> would hit an issue that the limits of such deferment would need an
> agreed representation.
>
> As for the mixing of strict and non-strict modes, I would be
> interested in where LLVM is in its handling of non-SSA
> (pseudo-memory?) dependencies. I have a vague impression that it is
> very coarse-grained in that respect, but I admit to not being
> particularly informed in that space. If there is a good model for such
> dependencies, then I think it could be used to handle the
> strict/non-strict mixing.
>
> -- Hubert Tong, IBM
>
> PS A nitpick on wording: The idea of being inside or outside of
> FENV_ACCESS regions is instead be expressed in terms of the state of
> the FENV_ACCESS pragma within the C Standard.
>
> On Wed, May 23, 2018 at 10:48 AM, Ulrich Weigand via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>     Hello,
>
>     at the recent EuroLLVM developer meeting in Bristol I held a BoF
>     session on the topic "Towards implementing #pragma STDC FENV_ACCESS".
>     I've also had a number of follow-on discussions both on-site in
>     Bristol and online since. This post is intended as a summary of
>     my current understanding set of requirements and implementation
>     details covering the overall topic.
>
>     I'm posting this here in the hope this can serve as a basis for
>     the various more detailed discussions that are still ongoing
>     (e.g. in various Phabricator proposals right now). Any comments
>     are welcome!
>
>
>     Semantics of #pragma STDC FENV_ACCESS
>     =====================================
>
>     To provide a baseline for the implementation discussion, first an
>     overview of the features required to handle the strict floating-point
>     mode defined by the C and IEEE standard:
>
>     1. Floating-point rounding modes
>     2. Default floating-point exception handling
>     3. Trapping floating-point exception handling
>
>     Each of these separate features imposes different constraints on the
>     optimizations that LLVM may perform involving FP expressions:
>
>     1. Floating-point rounding modes
>
>     Outside of FENV_ACCESS regions, all FP operations are supposed to be
>     performed in the "default" rounding mode.
>
>     But inside FENV_ACCESS regions, FP operations implicitly depend on
>     a "current" rounding mode setting, which may be changed by certain
>     C library calls (plus some platform-specific intrinsics). In addition,
>     those calls may be performed within subroutines (as long as those are
>     also within FENV_ACCESS), so *any* function call within a FENV_ACCESS
>     must be considered as potentially changing the rounding mode.
>
>     In effect, this means the compiler may not move or combine FP
>     operations accross function call sites.
>
>     2. Default floating-point exception handling
>
>     Inside FENV_ACCESS regions, every floating-point operation that
>     causes an exception must be considered to set a "status flag"
>     associated with this exception type. Those flags can be queried
>     using C library calls (plus some platform-specific intrinsics),
>     and there are other such calls to explicitly set or clear those
>     flags as well. As with the rounding modes, those calls may be
>     performed in subroutines as well, so any function call within a
>     FENV_ACCESS region must be considered as potentially *using* and
>     changing the floating-point exception status flags.
>
>     The values of the status flags on entry to a FENV_ACCESS are to
>     be considered undefined according to the C standard.
>
>     Compiler optimizations are supposed to preserve the values of
>     all exception status bits at any point where they can be
>     (potentially) inspected by the program, i.e. at all call sites
>     within FENV_ACCESS regions. This still allows a number of
>     optimizations, e.g. to reorder FP operations or combine two
>     identical operations within a region uninterrupted by calls.
>     But other optimizations should be avoided, e.g. optimizing
>     away an unused FP operation may result in an exception flag
>     now being unset that would otherwise have been set. The same
>     applies to floating-point constant folding.
>
>     3. Trapping floating-point exception handling
>
>     Within a FENV_ACCESS region, library calls may be used to switch
>     exception handling semantics to a "trapping" mode by setting
>     corresponding mask bits. Any subsequent FP instruction that
>     raises an exception with the associated mask bit set will cause
>     a trap. Usually, this will be a hardware trap that is translated
>     by the operating system into some form of software exception that
>     can by handled by the applcation; on Linux systems this takes the
>     form of a SIGFPE signal.
>
>     As above, those mask bits can be set and reset via (operating-
>     system specific) library calls and/or platform-specific intrinsics,
>     all of which may also be done within subroutine calls.
>
>     In effect, this requires the compiler to treat any floating-point
>     operation within a FENV_ACCESS region as potentially trapping,
>     which means the same restrictions apply as with e.g. memory accesses
>     (cannot be speculated etc.) However, according to the C standard,
>     the implementation is not required to preserve the *number* of
>     different traps, so identical operations may still be combined
>     (unless there is an intervening function call).
>
>     The C standard requires all user code to explicitly switch back
>     to non-trapping mode for all exceptions whenever leaving a
>     FENV_ACCESS region (both by "falling off the end" of the region
>     and by calling a subroutine defined outside of FENV_ACCESS).
>
>
>     Implementation requirements on parts of the compiler
>     ====================================================
>
>     A. clang front end
>
>     The front end needs to determine which instructions are part of
>     FENV_ACCESS regions and which are not. This takes into account
>     both the semantics of the #pragma as defined by the standard,
>     and the implementation-defined default rules that apply to code
>     outside of any #pragma. GCC currently has the following two
>     related command-line options:
>
>     -frounding-math: Do not assume default rounding mode
>     -ftrapping-math: Assume FP operations may trap
>
>     clang accepts but (basically) ignores those options. As a first
>     step, it might make sense to have the FENV_ACCESS default
>     behavior triggered by these options, even while the front end
>     does not yet support the actual #pragma.
>
>     The front end then needs to transmit the information about
>     FENV_ACCESS regions to later passes. However, I believe that
>     we do not actually have to implement "regions" as such at the
>     IR level. Instead, it would be sufficient to track the follwing
>     information:
>
>     - For each FP operation, whether it is within a FENV_ACCESS region.
>     - For each call site, whether it is within a FENV_ACCESS region.
>
>     The former requires new IR support; the approach currently under
>     investigation uses the experimental "constrained FP" intrinsics
>     instead of traditional floating-point operations for this. The
>     latter can be done simply by annotating those call sites with an
>     attribute.
>
>     In addition to that, the front-end itself needs to disable any
>     early optimizations that do not preserve strict FP semantics,
>     in particular it must not speculate FP operations if they may
>     trap. (Currently, the front end transforms "? :" on floating-
>     point types into a select IR statement; for trapping FP
>     operations, an explicit branch must be used instead.)
>
>
>     B. LLVM IR and LLVM common optimizations
>
>     As mentioned in the previous section, we need some IR to annotate
>     FP instructions and call sites within FENV_ACCESS regions. All
>     common optimizations then need to respect the strict FP semantics
>     associated with those regions.
>
>     The current approach uses experimental intrinsics. This has the
>     advantage that most optimizations never trigger since they don't
>     even recognize those new intrinsics. Also, the intrinsics can
>     be marked as having side-effects and/or being non-speculatable.
>
>     The overall effect is that more optimizations are suppressed
>     than would be strictly necessary. But this may still be a good
>     first step, since the result is now safe but maybe not optimal
>     -- which can be improved upon over time by teaching the specific
>     semantics of those intrinsics to optimization passes.
>
>     However, some open questions remain. If at some point we want
>     to model the constrained FP semantics more precisely than just
>     as "unmodeled side effects", this may have to be reflected at
>     the IR level directly. For example, to model rounding mode
>     behavior, at some point we might require explicit tracking of
>     data dependencies on the rounding mode by representing the
>     rounding mode as SSA values defined by function calls and used
>     by FP intrinsics. Similarly, to track exception status flags,
>     they might be modeled as SSA values set by FP intrinsics and
>     used by function calls.
>
>     (There is a possibly related question of how to optimally model
>     the property of many math library routines that they may access
>     the "errno" variable but no other memory ... It might also be
>     possible to model e.g. exception status as a thread-local "memory"
>     location that is modified by FP operations, just like errno.)
>
>     Another currently unresolved issue is that at the moment nothing
>     prevents *standard* floating-point operations from being moved
>     *inside* FENV_ACCESS regions. This may also be invalid, since
>     those operations now may cause unexpected traps etc. (More
>     specifically, what is invalid is moving any standard FP operation
>     across a *call site* within a FENV_ACCESS region.) Note that
>     this is even an issue if we only support changing the default
>     (and no actual #pragma) if mutiple object files using different
>     default settings are being linked together using LTO.
>
>     This last issue could in theory be solved by having all optimization
>     passes respect the requirement that floating-point operations may
>     not be moved across call sites marked with the strict FP attribute.
>     But that does not appear to be straightforward since it would
>     introduce a "new" type of dependeny that would have to be added
>     throughout LLVM code. If this must be avoided, we'd have to
>     find a way to explicity track dependencies at the IR level. In
>     the extreme, this could end up equivalent to just always using
>     the constrained intrinsics for everything ...
>
>
>     C. Code generation
>
>     In the back end, effects of strict FP mode have to passed through
>     to lower-level representations including SelectionDAG and MI.
>
>     Currently, the "unmodeled side effect" logic of the constrained
>     intrinsics is modeled by putting them on the chain during
>     SelectionDAG.
>     (If we ever model semantics more precisely at the IR level, that
>     would need to be reflected on SelectionDAG accordingly.)
>
>     At the MI level, there is no representation at all. One option to
>     fix this would be to model target-specific registers that implement
>     the IEEE semantics. Most platforms have registers (or parts of
>     registers) that hold:
>     - the current rounding mode
>     - the exception status flags
>     - the exception masks (which enable traps)
>     Marking FP instructions as using and/or defining these registers
>     would enforce ordering requirements. It may be too strict in some
>     cases (e.g. two instructions setting exception status flags may
>     still be reordered). On the other hand, I believe if instructions
>     may actually *trap*, we actually need the hasSideEffects flag even
>     if register dependencies are modeled.
>
>     If we do need hasSideEffects, there is a separate discussion on
>     whether this can be implemented without each back end having to
>     duplicate all FP instruction patterns (one with hasSideEffects
>     and one without), e.g. by having a new feature that allows to
>     describe the side-effect status using an MI operand.
>
>
>     Next steps
>     ==========
>
>     I believe it is important to break up the full amount of work
>     into incremental steps that provide some useful benefits on their
>     own. At first, we should be able to get to a state where clang
>     can be used to build programs that use some (maybe not all) strict
>     FP features, where the generated code is always correct but may
>     not always be optimal. To get there, I think we need at a
>     minimum:
>
>     - Implement clang support for the default flags, e.g. GCC's
>     -frounding-math and -ftrapping-math, and generate always
>     the constrained intrinsics. clang should also mark all
>     call sites then (as mentioned above).
>
>     - For now, add the requirement that LTO is not supported if
>     this would cause mixing of strict and non-strict FP code.
>     In the alternative, have the LTO pass automatically transform
>     and floating-point operation into a constrained intrinsic
>     if *any* (other) module already uses the latter.
>
>     - At the IR level, complete the set of supported constrained
>     FP intrinsics (there are still some missing, see e.g
>     https://reviews.llvm.org/D43515 <https://reviews.llvm.org/D43515>).
>     Also, it seems not all variants (e.g. for vector types) are
>     supported correctly through codegen (see e.g.
>     https://reviews.llvm.org/D46967 <https://reviews.llvm.org/D46967>).
>
>     - Allow targets to correctly reflect constrained intrinsics
>     semantics at the MI level and final machine code generation
>     (see e.g. https://reviews.llvm.org/D45576
>     <https://reviews.llvm.org/D45576>).
>
>     - Review all optimization and codegen passes to verify they
>     fully respect strict FP semantics.
>
>     Once this is done, we can improve on the solution by:
>
>     - Supporting mixing strict and non-strict FP operations
>     (would lift the LTO restriction). (Note: there seems
>     to be still some "invention required" here, see above.)
>
>     - Actually implementing the #pragma supporting different
>     regions within a compilation unit (prereq: support for
>     mixing strict and non-strict FP operations).
>
>     - Add more optimization of constrained FP intrinsics in
>     common optimizers and/or target back ends.
>
>     Does this look reasonable? Please let me know if there's
>     anything I overlooked, or you have any additional comments
>     or questions.
>
>
>
>     Mit freundlichen Gruessen / Best Regards
>
>     Ulrich Weigand
>
>     -- 
>     Dr. Ulrich Weigand | Phone: +49-7031/16-3727
>     STSM, GNU/Linux compilers and toolchain
>     IBM Deutschland Research & Development GmbH
>     Vorsitzende des Aufsichtsrats: Martina Koederitz |
>     Geschäftsführung: Dirk Wittkopp
>     Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
>     Stuttgart, HRB 243294
>
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180523/430e68b2/attachment-0001.html>


More information about the llvm-dev mailing list