[llvm-dev] Should rint and nearbyint be always constrained?

Wed Mar 4 22:09:58 PST 2020

Even the basic arithmetic instructions fadd, fsub, fmul, fdiv use the
rounding mode. And when we constant fold them we assume the default
rounding mode.

~Craig

On Wed, Mar 4, 2020 at 9:57 PM Serge Pavlov <sepavloff at gmail.com> wrote:

> Actually it is hard to rely on default FP environment in many cases. We
> know that a program starts with default FP state installed. But in other
> cases we generally cannot assume this. For example, can we assume default
> FP environment in this case?
>
> float qqq(float x) {
>   return nearbyint(x);
> }
>
>
> Depending on the answer compiler either generates non-constrained
> intrinsic or constrained. Result of `nearbyint` depends on current rounding
> mode, so this function accesses FP environment - it implicitly reads
> rounding mode. Shall user use `#pragma STDC FENV_ACCESS on` here? Actually
> no.
>
> C standard (n2454):
>
> 7.6.1p2
> The FENV_ACCESS pragma provides a means to inform the implementation when
> a program might
> access the floating-point environment to test floating-point status flags
> or run under non-default
> floating-point control modes.
>
> 7.6p1
> A floating-point status flag is a
> system variable whose value is set (but never cleared) when a
> floating-point exception is raised, which
> occurs as a side effect of exceptional floating-point arithmetic to
> provide auxiliary information.
>
>
> Not every access to FP environment requires `#pragma STDC FENV_ACCESS on`,
> only that which reads FP exception status or sets control modes. None
> occurs in the example above.
>
> So, even if `#pragma STDC FENV_ACCESS on` is absent we should not assume
> default FP environment in the case of functions that read control modes,
> including nearbyint and rint. They cannot assume default rounding mode and
> must be ordered relative to other instructions that may access FP
> environment. The scope of non-constrained intrinsics would be only
> initialization code, which seems to be marginal case.
>
> Thanks,
> --Serge
>
>
> On Wed, Mar 4, 2020 at 12:59 AM Serge Pavlov <sepavloff at gmail.com> wrote:
>
>> One concern with replacing llvm.rint and llvm.nearbyint with
>>> llvm.roundeven makes it difficult to turn back into a libcall if the
>>> backend doesn't have an instruction for it. You can't just call the
>>> roundeven library function since that wouldn't exist in older libm
>>> implementations. So ideally you would know which function was originally
>>> used in the user code and call that.
>>
>>
>> Yes, you are right. Such optimization at IR level probably does not make
>> sense.
>>
>> Thanks,
>> --Serge
>>
>>
>> On Tue, Mar 3, 2020 at 11:41 PM Craig Topper <craig.topper at gmail.com>
>> wrote:
>>
>>> Note, EVEX static rounding forces suppress all exceptions. You can't
>>> have static rounding with exceptions.
>>>
>>> We're also talking about making the vector predicated floating point
>>> intrinsics that Simon Moll is working on support both strict and non-strict
>>> using operand bundles. So you're right we could probably merge constrained
>>> and non-constrained versions of the existing intrinsics.
>>>
>>> One concern with replacing llvm.rint and llvm.nearbyint with
>>> llvm.roundeven makes it difficult to turn back into a libcall if the
>>> backend doesn't have an instruction for it. You can't just call the
>>> roundeven library function since that wouldn't exist in older libm
>>> implementations. So ideally you would know which function was originally
>>> used in the user code and call that.
>>>
>>> ~Craig
>>>
>>>
>>> On Tue, Mar 3, 2020 at 8:23 AM Serge Pavlov via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> The only issue I see is that since we also assume FP operations have no
>>>>> side effects by default there is no difference between llvm.rint and
>>>>> llvm.nearbyint. I wouldn’t have a problem with dropping llvm.rint
>>>>> completely.
>>>>
>>>>
>>>> The forthcoming C standard (
>>>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2454.pdf, 7.12.9.8)
>>>> defines new function, `roundeven`, which implements IEEE-754 operation
>>>> `roundToIntegralTiesToEven`. When corresponding intrinsic will be
>>>> implemented (I am working on such patch), llvm.rint and llvm.nearbyint will
>>>> identical to llvm.roundeven in default environment and both can be dropped.
>>>> We'll end up with a funny situation, there are constrained intrinsics
>>>> (experimental!) but not corresponding 'usual' intrinsics. This demonstrates
>>>> that splitting an operation into constrained and non-constrained variants
>>>> does not work in the case of `rint` and `nearbyint`.
>>>>
>>>> As for the target-specific intrinsics, you are correct that we need a
>>>>> plan for that.
>>>>
>>>>
>>>> When making such plan we should keep in mind that some targets encode
>>>> rounding mode in instructions, rather than in some hardware register. In
>>>> this case "floating point environment" is an attribute of particular
>>>> instruction. By the way, X86 target also has such property: EVEX prefix
>>>> allows static rounding support or suppress-all-exceptions. Such properties
>>>> are naturally modeled with metadata operands but splitting to constrained
>>>> and non-constrained variants makes little sense.
>>>>
>>>> My suggestion would be that we should set the strictfp attribute on
>>>>> these intrinsics and provide the rounding mode and exception behavior
>>>>> arguments using an operand bundle.
>>>>
>>>>
>>>> This is an interesting variant. IIUC it means that FP environment is a
>>>> property of a call rather that an instruction? That is some call may have
>>>> rounding mode argument and another call of the same intrinsic may have not?
>>>> It would the third way to express FP environment, together with the current
>>>> per-intrinsic way and the rejected per-basic-block one. I wonder if we can
>>>> model “inaccessibleMemOnly” or something like that using this way. The main
>>>> justification of splitting an intrinsic to constrained and non-constrained
>>>> variants is that one has side effect and the other does not. If we could
>>>> deliberately assign this property to a particular call, we could eventually
>>>> merge constrained and non-constrained intrinsics.
>>>>
>>>> It’s probably best to say in the documentation that the llvm.nearbyint
>>>>> and llvm.rint functions “assume the default rounding mode, roundToNearest”.
>>>>> This will allow the optimizer to transform them as if they were rounding to
>>>>> nearest without requiring backends to use an encoding that enforces
>>>>> roundToNearest as the rounding mode for these operations.
>>>>
>>>>
>>>> Optimizer could make the same optimization with constrained nearbyint
>>>> and rint, replacing them with llvm.roundeven, it is knows that the
>>>> environment is default.
>>>>
>>>> Also, we should take care to document the non-constrained forms of
>>>>> these intrinsics in a way that makes clear that we are “assuming” and not
>>>>> requiring that the operation has no side effects.
>>>>
>>>>
>>>> What non-constrained forms of rint/nearbyint can be used for? They are
>>>> do the same job as llvm.roundeven does. They are useless. These intrinsics
>>>> were introduced to represent C library functions rint/nearbyint, but the
>>>> standard explicitly states that the result of either depends on current
>>>> rounding mode. So these intrinsics should not be split into constrained and
>>>> non-constrained forms, only the form that is ordered relative to other
>>>> operations accessing FP environment may exist.
>>>>
>>>> Here are some suggested wordings for the “Semantics” section of the
>>>>> langref for these functions:
>>>>
>>>>
>>>> Thank you!
>>>>
>>>> I’d like to also say that these intrinsics can be lowered to the
>>>>> corresponding libm functions, but I’m not sure all libm implementations
>>>>> meet the requirements above.
>>>>
>>>>
>>>> I think we should reference C standard rather than particular library.
>>>> For example, semantics of roundeven:
>>>>
>>>> This function implements IEEE-754 operation
>>>> ``roundToIntegralTiesToEven``. It
>>>> also behaves in the same way as C standard function ``roundeven``,
>>>> except that
>>>> it does not raise floating point exceptions.
>>>>
>>>>
>>>> Thanks,
>>>> --Serge
>>>>
>>>>
>>>> On Tue, Mar 3, 2020 at 7:32 PM Hanna Kruppe <hanna.kruppe at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Andy,
>>>>>
>>>>> On Mon, 2 Mar 2020 at 23:59, Kaylor, Andrew via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Some clarification after getting feedback from Craig Topper….
>>>>>>
>>>>>>
>>>>>>
>>>>>> It’s probably best to say in the documentation that the
>>>>>> llvm.nearbyint and llvm.rint functions “assume the default rounding mode,
>>>>>> roundToNearest”. This will allow the optimizer to transform them as if they
>>>>>> were rounding to nearest without requiring backends to use an encoding that
>>>>>> enforces roundToNearest as the rounding mode for these operations. On
>>>>>> modern x86 targets we can encode it either way, but it seems more
>>>>>> consistent to continue using the current encoding which tells the processor
>>>>>> to use the current rounding mode. For other targets (including cases where
>>>>>> x86 is forced to use x87 instructions), it may be much easier to leave this
>>>>>> at the discretion of the backend.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Also, we should take care to document the non-constrained forms of
>>>>>> these intrinsics in a way that makes clear that we are “assuming” and not
>>>>>> requiring that the operation has no side effects.
>>>>>>
>>>>>
>>>>> Note that these aspects are shared by most other FP operations and
>>>>> already discussed in the LangRef section <
>>>>> https://llvm.org/docs/LangRef.html#floating-point-environment> which
>>>>> currently reads:
>>>>>
>>>>> > The default LLVM floating-point environment assumes that
>>>>> floating-point instructions do not have side effects. Results assume the
>>>>> round-to-nearest rounding mode. No floating-point exception state is
>>>>> maintained in this environment. Therefore, there is no attempt to create or
>>>>> preserve invalid operation (SNaN) or division-by-zero exceptions.
>>>>> >
>>>>> >  The benefit of this exception-free assumption is that
>>>>> floating-point operations may be speculated freely without any other
>>>>> fast-math relaxations to the floating-point model.
>>>>> >
>>>>> > Code that requires different behavior than this should use the
>>>>> Constrained Floating-Point Intrinsics.
>>>>>
>>>>> Your explanation of the implications for optimizers and backends seems
>>>>> like a useful addition to this section. As many intrinsics (not just
>>>>> nearbyint/rint) and instructions (fadd, fmul, etc.) behave this way, I
>>>>> think it would be more useful to consolidate all the information into this
>>>>> section and reference it from the relevant "Semantics" sections.
>>>>>
>>>>> While we're on it, let me point out the consequences of breaking these
>>>>> assumptions are still fuzzy even with your clarifications. In general, when
>>>>> a compiler "assumes" something that is not actually true, it's useful to
>>>>> specify what exactly happens when the assumption is actually false, e.g.
>>>>> the result is an undefined value (undef/poison), or a non-deterministic
>>>>> choice is made (e.g. branching on poison, at the moment), or Undefined
>>>>> Behavior happens. In this sense, I wonder what should happen when the
>>>>> assumptions about rounding mode and FP exception state are broken? If it's
>>>>> going to take broader discussion to agree on an answer, that's probably out
>>>>> of scope for this thread, but perhaps there's a clear answer that just
>>>>> wasn't written down so far?
>>>>>
>>>>> For the constrained version of nearbyint, we will require that the
>>>>>> inexact exception is not raised (to be consistent with iEEE 754-2019’s
>>>>>> roundToIntegral operations) and for the constrained version of rint we will
>>>>>> require that the inexact exception is raised (to be consistent with iEEE
>>>>>> 754-2019’s roundToIntegralExact operation), but for the non-constrained
>>>>>> forms it should be clear that the backend is free to implement this in the
>>>>>> most efficient way possible, without regard to FP exception behavior.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Finally, I see now the problem with documenting these in terms of the
>>>>>> IEEE operations, given that IEEE 754-2019 doesn’t describe an operation
>>>>>> that uses the current rounding mode without knowing what that is. I see
>>>>>> this as a problem of documentation rather than one that presents any
>>>>>> difficulty for the implementation.
>>>>>>
>>>>>
>>>>> I'm not quite sure what you mean by "uses the current rounding without
>>>>> knowing what it is" --are you referring to the wobbly uncertainty caused by
>>>>> optimizations assuming one rounding mode but runtime code possibly using a
>>>>> different dynamic rounding mode? If so, explicitly defining what happens
>>>>> when dynamic and "assumed" rounding mode don't match (see above) also
>>>>> addresses this problem. Then the operations can be described like this:
>>>>>
>>>>> > If a rounding mode is assumed [RNE for non-constrained intrinsic or
>>>>> roundingMode argument != fpround.dynamic] and the current dynamic rounding
>>>>> mode differs from the assumed rounding mode, [pick one: behavior is
>>>>> undefined / result is poison / ...]. Otherwise, X operation is performed
>>>>> with the current dynamic rounding mode [which equals the statically assumed
>>>>> rounding mode if this clause applies].
>>>>>
>>>>> Best regards,
>>>>> Hanna
>>>>>
>>>>>
>>>>>> Here are some suggested wordings for the “Semantics” section of the
>>>>>> langref for these functions:
>>>>>>
>>>>>>
>>>>>>
>>>>>> llvm.nearbyint::semantics
>>>>>>
>>>>>>
>>>>>>
>>>>>> This function returns the same value as one of the IEEE 754-2019
>>>>>> roundToIntegral operations using the current rounding mode. The optimizer
>>>>>> may assume that actual rounding mode is roundToNearest (IEEE 754:
>>>>>> roundTiesToEven), but backends may encode this operation either using that
>>>>>> rounding mode explicitly or using the dynamic rounding mode from the
>>>>>> floating point environment. The optimizer may assume that the operation has
>>>>>> no side effects and raises no FP exceptions, but backends may encode this
>>>>>> operation using either instructions that raise exceptions or instructions
>>>>>> that do not. The FP exceptions are assumed to be ignored.
>>>>>>
>>>>>>
>>>>>>
>>>>>> llvm.rint (delete, or identical semantics to llvm.nearbyint)
>>>>>>
>>>>>>
>>>>>>
>>>>>> llvm.experimental.constrained.nearbyint::semantics
>>>>>>
>>>>>>
>>>>>>
>>>>>> This function returns the same value as one of the IEEE 754-2019
>>>>>> roundToIntegral operations. If the roundingMode argument is
>>>>>> fpround.dynamic, the behavior corresponds to whichever of the
>>>>>> roundToIntegral operations matches the dynamic rounding mode when the
>>>>>> operation is executed. The optimizer may not assume any rounding mode in
>>>>>> this case, and backends must encode the operation in a way that uses the
>>>>>> dynamic rounding mode. Otherwise, the rounding mode may be assumed to be
>>>>>> that described by the roundingMode argument and backends may either use
>>>>>> instructions that encode that rounding mode explicitly or use the current
>>>>>> rounding mode from the FP environment.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The optimizer may assume that this operation does not raise the
>>>>>> inexact exception when the return value differs from the input value, and
>>>>>> if the exceptionBehavior argument is not fpexcept.ignore, the backend must
>>>>>> encode this operation using instructions that guarantee that the inexact
>>>>>> exception is not raised. If the exceptionBehavior argument is
>>>>>> fpexcept.ignore, backends may encode this operation using either
>>>>>> instructions that raise exceptions or instructions that do not.
>>>>>>
>>>>>>
>>>>>>
>>>>>> llvm.experimental.constrained.rint::semantics
>>>>>>
>>>>>>
>>>>>>
>>>>>> This function returns the same value as the IEEE 754-2019
>>>>>> roundToIntegralExact operation. If the roundingMode argument is
>>>>>> fpround.dynamic, the behavior uses to the dynamic rounding mode when the
>>>>>> operation is executed. The optimizer may not assume any rounding mode in
>>>>>> this case, and backends must encode the operation in a way that uses the
>>>>>> dynamic rounding mode. Otherwise, the rounding mode may be assumed to be
>>>>>> that described by the roundingMode argument and backends may either use
>>>>>> instructions that encode that rounding mode explicitly or use the current
>>>>>> rounding mode from the FP environment.
>>>>>>
>>>>>> If the exceptionBehavior argument is not fpexcept.ignore, the
>>>>>> optimizer must assume that this operation will raise the inexact exception
>>>>>> when the return value differs from the input value and the backend must
>>>>>> encode this operation using instructions that guarantee that the inexact
>>>>>> exception is raised in that case. If the exceptionBehavior argument is
>>>>>> fpexcept.ignore, backends may encode this operation using either
>>>>>> instructions that raise exceptions or instructions that do not.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I’d like to also say that these intrinsics can be lowered to the
>>>>>> corresponding libm functions, but I’m not sure all libm implementations
>>>>>> meet the requirements above.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Andy
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Kaylor,
>>>>>> Andrew via llvm-dev
>>>>>> *Sent:* Monday, March 02, 2020 9:56 AM
>>>>>> *To:* Serge Pavlov <sepavloff at gmail.com>; Ulrich Weigand <
>>>>>> Ulrich.Weigand at de.ibm.com>
>>>>>> *Cc:* LLVM Developers <llvm-dev at lists.llvm.org>
>>>>>> *Subject:* Re: [llvm-dev] Should rint and nearbyint be always
>>>>>> constrained?
>>>>>>
>>>>>>
>>>>>>
>>>>>> I agree with Ulrich. The default behavior of LLVM IR is to assume
>>>>>> that the roundToNearest is the current rounding mode everywhere. This
>>>>>> corresponds to the C standard, which says that the user may only modify the
>>>>>> floating point environment if fenv access is enabled. In the latest version
>>>>>> of the C standard, pragmas are added which can change the rounding mode for
>>>>>> a region, and if these are implemented in clang the constrained versions of
>>>>>> all FP operations should be used. However, outside of regions where fenv
>>>>>> access is enabled either by pragma or command line option, we are free to
>>>>>> assume that the current rounding mode is the default rounding mode.
>>>>>>
>>>>>>
>>>>>>
>>>>>> So, llvm.rint and llvm.nearbyint (the non-constrained versions) can
>>>>>> be specifically documented as performing their operation according to
>>>>>> roundToNearest and clang can use them in the default case for the
>>>>>> corresponding libm functions, and llvm.experimental.constrained.rint and
>>>>>> llvm.experimental.constrained.nearbyint can be documented as using the
>>>>>> current rounding mode.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The only issue I see is that since we also assume FP operations have
>>>>>> no side effects by default there is no difference between llvm.rint and
>>>>>> llvm.nearbyint. I wouldn’t have a problem with dropping llvm.rint
>>>>>> completely.
>>>>>>
>>>>>>
>>>>>>
>>>>>> As for the target-specific intrinsics, you are correct that we need a
>>>>>> plan for that. I have given it some thought, but nothing is currently
>>>>>> implemented. My suggestion would be that we should set the strictfp
>>>>>> attribute on these intrinsics and provide the rounding mode and exception
>>>>>> behavior arguments using an operand bundle. We do still need some way to
>>>>>> handle the side effects. My suggestion here is to add some new attribute
>>>>>> that means “no side effects” in the absence of the strictfp attribute and
>>>>>> something similar to “inaccessibleMemOnly” in the presence of strictfp.
>>>>>>
>>>>>>
>>>>>>
>>>>>> We could make the new attribute less restrictive than
>>>>>> inaccessibleMemOnly in that it only really needs to act as a barrier
>>>>>> relative to other things that are accessing the fp environment. I believe
>>>>>> Ulrich suggested this to me at the last LLVM Developer Meeting.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Andy
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Serge Pavlov <sepavloff at gmail.com>
>>>>>> *Sent:* Monday, March 02, 2020 8:10 AM
>>>>>> *To:* Ulrich Weigand <Ulrich.Weigand at de.ibm.com>
>>>>>> *Cc:* Kaylor, Andrew <andrew.kaylor at intel.com>; Cameron McInally <
>>>>>> cameron.mcinally at nyu.edu>; Kevin Neal <kevin.neal at sas.com>; LLVM
>>>>>> Developers <llvm-dev at lists.llvm.org>
>>>>>> *Subject:* Re: Should rint and nearbyint be always constrained?
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm not sure why this is an issue.  Yes, these two intrinsics depend
>>>>>> on the current rounding mode according to the C standard, and yes,
>>>>>> LLVM in default mode assumes that the current rounding mode is the
>>>>>> default rounding mode.  But the same holds true for many other
>>>>>> intrinsics and even the arithmetic IR operations like add.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Any other intrinsic, like `floor`, `round` etc has meaning at default
>>>>>> rounding mode. But use of `rint` or `nearbyint` in default FP environment
>>>>>> is strange, `roundeven` can be used instead. We could use more general
>>>>>> intrinsics in all cases, as the special case of default environment is not
>>>>>> of practical interest.
>>>>>>
>>>>>>
>>>>>>
>>>>>> There is another reason for special handling. Set of intrinsics
>>>>>> includes things like `x86_sse_cvtss2si`. It is unlikely that all of them
>>>>>> eventually get constrained counterpart. It looks more natural that such
>>>>>> intrinsics are defined as accessing FP environment and can be optimized if
>>>>>> the latter is default. These two intrinsics could be a good model for such
>>>>>> cases. IIUC, splitting entities into constrained or non-constrained is a
>>>>>> temporary solution, ideally they will merge into one entity. We could do it
>>>>>> for some intrinsics now.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> --Serge
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 2, 2020 at 8:58 PM Ulrich Weigand <
>>>>>> Ulrich.Weigand at de.ibm.com> wrote:
>>>>>>
>>>>>> Serge Pavlov <sepavloff at gmail.com> wrote on 02.03.2020 14:38:48:
>>>>>>
>>>>>> > This approach has issues when applied to the intrinsics `rint` and
>>>>>> > `nearbyint`. Value returned by either of these intrinsics depends on
>>>>>> > current rounding mode. If they are considered as operation in
>>>>>> > default environment, they would round only to nearest. It is by far
>>>>>> > not the meaning of the standard C functions that these intrinsics
>>>>>> represent.
>>>>>>
>>>>>> I'm not sure why this is an issue.  Yes, these two intrinsics depend
>>>>>> on the current rounding mode according to the C standard, and yes,
>>>>>> LLVM in default mode assumes that the current rounding mode is the
>>>>>> default rounding mode.  But the same holds true for many other
>>>>>> intrinsics and even the arithmetic IR operations like add.
>>>>>>
>>>>>> If you want to stop clang from making the default rounding mode
>>>>>> assumption, you need to use the -frounding-math option (or one
>>>>>> of its equivalents), which will cause clang to emit the corresponding
>>>>>> constrained intrinsics instead, for those two as well all other
>>>>>> affected intrinsics.
>>>>>>
>>>>>> I don't see why it would make sense to add another special case
>>>>>> just for those two intrinsics ...
>>>>>>
>>>>>>
>>>>>> Bye,
>>>>>> Ulrich
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200304/b204f235/attachment.html>