[llvm-dev] Update on strict FP status

Wed May 23 09:06:26 PDT 2018

Hi Ulrich,

I am interested in knowing if the current proposals also take into account
the FP_CONTRACT pragma and the ability to implement options that imply a
specific value for the FLT_EVAL_METHOD macro.

Additionally, I am not aware of the IR being able to represent the
potentially deferred loss of precision that the C language semantics
provide; in particular, applying such semantics to the existing IR would
hit an issue that the limits of such deferment would need an agreed
representation.

As for the mixing of strict and non-strict modes, I would be interested in
where LLVM is in its handling of non-SSA (pseudo-memory?) dependencies. I
have a vague impression that it is very coarse-grained in that respect, but
I admit to not being particularly informed in that space. If there is a
good model for such dependencies, then I think it could be used to handle
the strict/non-strict mixing.

-- Hubert Tong, IBM

PS A nitpick on wording: The idea of being inside or outside of FENV_ACCESS
regions is instead be expressed in terms of the state of the FENV_ACCESS
pragma within the C Standard.

On Wed, May 23, 2018 at 10:48 AM, Ulrich Weigand via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hello,
>
> at the recent EuroLLVM developer meeting in Bristol I held a BoF
> session on the topic "Towards implementing #pragma STDC FENV_ACCESS".
> I've also had a number of follow-on discussions both on-site in
> Bristol and online since. This post is intended as a summary of
> my current understanding set of requirements and implementation
> details covering the overall topic.
>
> I'm posting this here in the hope this can serve as a basis for
> the various more detailed discussions that are still ongoing
> (e.g. in various Phabricator proposals right now). Any comments
> are welcome!
>
>
> Semantics of #pragma STDC FENV_ACCESS
> =====================================
>
> To provide a baseline for the implementation discussion, first an
> overview of the features required to handle the strict floating-point
> mode defined by the C and IEEE standard:
>
> 1. Floating-point rounding modes
> 2. Default floating-point exception handling
> 3. Trapping floating-point exception handling
>
> Each of these separate features imposes different constraints on the
> optimizations that LLVM may perform involving FP expressions:
>
> 1. Floating-point rounding modes
>
> Outside of FENV_ACCESS regions, all FP operations are supposed to be
> performed in the "default" rounding mode.
>
> But inside FENV_ACCESS regions, FP operations implicitly depend on
> a "current" rounding mode setting, which may be changed by certain
> C library calls (plus some platform-specific intrinsics). In addition,
> those calls may be performed within subroutines (as long as those are
> also within FENV_ACCESS), so *any* function call within a FENV_ACCESS
> must be considered as potentially changing the rounding mode.
>
> In effect, this means the compiler may not move or combine FP
> operations accross function call sites.
>
> 2. Default floating-point exception handling
>
> Inside FENV_ACCESS regions, every floating-point operation that
> causes an exception must be considered to set a "status flag"
> associated with this exception type. Those flags can be queried
> using C library calls (plus some platform-specific intrinsics),
> and there are other such calls to explicitly set or clear those
> flags as well. As with the rounding modes, those calls may be
> performed in subroutines as well, so any function call within a
> FENV_ACCESS region must be considered as potentially *using* and
> changing the floating-point exception status flags.
>
> The values of the status flags on entry to a FENV_ACCESS are to
> be considered undefined according to the C standard.
>
> Compiler optimizations are supposed to preserve the values of
> all exception status bits at any point where they can be
> (potentially) inspected by the program, i.e. at all call sites
> within FENV_ACCESS regions. This still allows a number of
> optimizations, e.g. to reorder FP operations or combine two
> identical operations within a region uninterrupted by calls.
> But other optimizations should be avoided, e.g. optimizing
> away an unused FP operation may result in an exception flag
> now being unset that would otherwise have been set. The same
> applies to floating-point constant folding.
>
> 3. Trapping floating-point exception handling
>
> Within a FENV_ACCESS region, library calls may be used to switch
> exception handling semantics to a "trapping" mode by setting
> corresponding mask bits. Any subsequent FP instruction that
> raises an exception with the associated mask bit set will cause
> a trap. Usually, this will be a hardware trap that is translated
> by the operating system into some form of software exception that
> can by handled by the applcation; on Linux systems this takes the
> form of a SIGFPE signal.
>
> As above, those mask bits can be set and reset via (operating-
> system specific) library calls and/or platform-specific intrinsics,
> all of which may also be done within subroutine calls.
>
> In effect, this requires the compiler to treat any floating-point
> operation within a FENV_ACCESS region as potentially trapping,
> which means the same restrictions apply as with e.g. memory accesses
> (cannot be speculated etc.) However, according to the C standard,
> the implementation is not required to preserve the *number* of
> different traps, so identical operations may still be combined
> (unless there is an intervening function call).
>
> The C standard requires all user code to explicitly switch back
> to non-trapping mode for all exceptions whenever leaving a
> FENV_ACCESS region (both by "falling off the end" of the region
> and by calling a subroutine defined outside of FENV_ACCESS).
>
>
> Implementation requirements on parts of the compiler
> ====================================================
>
> A. clang front end
>
> The front end needs to determine which instructions are part of
> FENV_ACCESS regions and which are not. This takes into account
> both the semantics of the #pragma as defined by the standard,
> and the implementation-defined default rules that apply to code
> outside of any #pragma. GCC currently has the following two
> related command-line options:
>
> -frounding-math: Do not assume default rounding mode
> -ftrapping-math: Assume FP operations may trap
>
> clang accepts but (basically) ignores those options. As a first
> step, it might make sense to have the FENV_ACCESS default
> behavior triggered by these options, even while the front end
> does not yet support the actual #pragma.
>
> The front end then needs to transmit the information about
> FENV_ACCESS regions to later passes. However, I believe that
> we do not actually have to implement "regions" as such at the
> IR level. Instead, it would be sufficient to track the follwing
> information:
>
> - For each FP operation, whether it is within a FENV_ACCESS region.
> - For each call site, whether it is within a FENV_ACCESS region.
>
> The former requires new IR support; the approach currently under
> investigation uses the experimental "constrained FP" intrinsics
> instead of traditional floating-point operations for this. The
> latter can be done simply by annotating those call sites with an
> attribute.
>
> In addition to that, the front-end itself needs to disable any
> early optimizations that do not preserve strict FP semantics,
> in particular it must not speculate FP operations if they may
> trap. (Currently, the front end transforms "? :" on floating-
> point types into a select IR statement; for trapping FP
> operations, an explicit branch must be used instead.)
>
>
> B. LLVM IR and LLVM common optimizations
>
> As mentioned in the previous section, we need some IR to annotate
> FP instructions and call sites within FENV_ACCESS regions. All
> common optimizations then need to respect the strict FP semantics
> associated with those regions.
>
> The current approach uses experimental intrinsics. This has the
> advantage that most optimizations never trigger since they don't
> even recognize those new intrinsics. Also, the intrinsics can
> be marked as having side-effects and/or being non-speculatable.
>
> The overall effect is that more optimizations are suppressed
> than would be strictly necessary. But this may still be a good
> first step, since the result is now safe but maybe not optimal
> -- which can be improved upon over time by teaching the specific
> semantics of those intrinsics to optimization passes.
>
> However, some open questions remain. If at some point we want
> to model the constrained FP semantics more precisely than just
> as "unmodeled side effects", this may have to be reflected at
> the IR level directly. For example, to model rounding mode
> behavior, at some point we might require explicit tracking of
> data dependencies on the rounding mode by representing the
> rounding mode as SSA values defined by function calls and used
> by FP intrinsics. Similarly, to track exception status flags,
> they might be modeled as SSA values set by FP intrinsics and
> used by function calls.
>
> (There is a possibly related question of how to optimally model
> the property of many math library routines that they may access
> the "errno" variable but no other memory ... It might also be
> possible to model e.g. exception status as a thread-local "memory"
> location that is modified by FP operations, just like errno.)
>
> Another currently unresolved issue is that at the moment nothing
> prevents *standard* floating-point operations from being moved
> *inside* FENV_ACCESS regions. This may also be invalid, since
> those operations now may cause unexpected traps etc. (More
> specifically, what is invalid is moving any standard FP operation
> across a *call site* within a FENV_ACCESS region.) Note that
> this is even an issue if we only support changing the default
> (and no actual #pragma) if mutiple object files using different
> default settings are being linked together using LTO.
>
> This last issue could in theory be solved by having all optimization
> passes respect the requirement that floating-point operations may
> not be moved across call sites marked with the strict FP attribute.
> But that does not appear to be straightforward since it would
> introduce a "new" type of dependeny that would have to be added
> throughout LLVM code. If this must be avoided, we'd have to
> find a way to explicity track dependencies at the IR level. In
> the extreme, this could end up equivalent to just always using
> the constrained intrinsics for everything ...
>
>
> C. Code generation
>
> In the back end, effects of strict FP mode have to passed through
> to lower-level representations including SelectionDAG and MI.
>
> Currently, the "unmodeled side effect" logic of the constrained
> intrinsics is modeled by putting them on the chain during SelectionDAG.
> (If we ever model semantics more precisely at the IR level, that
> would need to be reflected on SelectionDAG accordingly.)
>
> At the MI level, there is no representation at all. One option to
> fix this would be to model target-specific registers that implement
> the IEEE semantics. Most platforms have registers (or parts of
> registers) that hold:
> - the current rounding mode
> - the exception status flags
> - the exception masks (which enable traps)
> Marking FP instructions as using and/or defining these registers
> would enforce ordering requirements. It may be too strict in some
> cases (e.g. two instructions setting exception status flags may
> still be reordered). On the other hand, I believe if instructions
> may actually *trap*, we actually need the hasSideEffects flag even
> if register dependencies are modeled.
>
> If we do need hasSideEffects, there is a separate discussion on
> whether this can be implemented without each back end having to
> duplicate all FP instruction patterns (one with hasSideEffects
> and one without), e.g. by having a new feature that allows to
> describe the side-effect status using an MI operand.
>
>
> Next steps
> ==========
>
> I believe it is important to break up the full amount of work
> into incremental steps that provide some useful benefits on their
> own. At first, we should be able to get to a state where clang
> can be used to build programs that use some (maybe not all) strict
> FP features, where the generated code is always correct but may
> not always be optimal. To get there, I think we need at a
> minimum:
>
> - Implement clang support for the default flags, e.g. GCC's
> -frounding-math and -ftrapping-math, and generate always
> the constrained intrinsics. clang should also mark all
> call sites then (as mentioned above).
>
> - For now, add the requirement that LTO is not supported if
> this would cause mixing of strict and non-strict FP code.
> In the alternative, have the LTO pass automatically transform
> and floating-point operation into a constrained intrinsic
> if *any* (other) module already uses the latter.
>
> - At the IR level, complete the set of supported constrained
> FP intrinsics (there are still some missing, see e.g
> https://reviews.llvm.org/D43515).
> Also, it seems not all variants (e.g. for vector types) are
> supported correctly through codegen (see e.g.
> https://reviews.llvm.org/D46967).
>
> - Allow targets to correctly reflect constrained intrinsics
> semantics at the MI level and final machine code generation
> (see e.g. https://reviews.llvm.org/D45576).
>
> - Review all optimization and codegen passes to verify they
> fully respect strict FP semantics.
>
> Once this is done, we can improve on the solution by:
>
> - Supporting mixing strict and non-strict FP operations
> (would lift the LTO restriction). (Note: there seems
> to be still some "invention required" here, see above.)
>
> - Actually implementing the #pragma supporting different
> regions within a compilation unit (prereq: support for
> mixing strict and non-strict FP operations).
>
> - Add more optimization of constrained FP intrinsics in
> common optimizers and/or target back ends.
>
> Does this look reasonable? Please let me know if there's
> anything I overlooked, or you have any additional comments
> or questions.
>
>
>
> Mit freundlichen Gruessen / Best Regards
>
> Ulrich Weigand
>
> --
> Dr. Ulrich Weigand | Phone: +49-7031/16-3727
> STSM, GNU/Linux compilers and toolchain
> IBM Deutschland Research & Development GmbH
> Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
> Wittkopp
> Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart,
> HRB 243294
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180523/6df93c4e/attachment.html>