[cfe-dev] [llvm-dev] Floating point operations with specific rounding and exception properties

Tue Aug 20 15:12:25 PDT 2019

On Tue, Aug 20, 2019 at 1:02 PM Serge Pavlov via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi all,
>
> During the review of https://reviews.llvm.org/D65997
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D65997&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=muQ0CzykdJ9Nhg0UshJTnEQXPSKHdFGptZXwFvVD2l0&e=>
> an issue was revealed, which relates to the decision of how compiler should
> represents constrained floating point operations.
>
> If a floating point operation requires rounding mode or exception behavior
> different from the default, it should be represented by constrained
> intrinsic (
> http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_LangRef.html-23constrained-2Dfloating-2Dpoint-2Dintrinsics&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=5LUnqvzxtBJcUMLBjiVfnEwD53rKH1ZIQmEcvXhbEdo&e=>).
> An important point is that according to the current design decision, if
> some part of a function contains such intrinsic, all floating point
> operations in the function must be represented by constrained intrinsics as
> well. Such decision should prevent from undesired moves of fp operations.
> The discussion is in the thread
> http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_pipermail_cfe-2Ddev_2017-2DAugust_055325.html&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=l5-qb-0vYUlkEbQT46x1HYz9WtpgOLaojeUkghA_QNg&e=>,
> the relevant example is:
>
> double f(double a, double b, double c) {
>   {
> #pragma STDC FENV_ACCESS ON
>     feenableexcept(FE_OVERFLOW);
>     double d = a * b;
>     fedisableexcept(FE_OVERFLOW);
>   }
>   return c * d;
> }
>
>
> The second fmul must not be hoisted up to before the fedisableexcept.
> Using constrained intrinsics is expected to help in this case as they are
> not handled by optimization passes.
>
> The concern is that using constrained intrinsics in a small region of a
> function results in using such intrinsics everywhere in the function
> including functions that inline it. As constrained intrinsics prevent from
> optimizations, it can result in performance degradation.
>
> A couple of examples:
> 1. There is a performance critical function that makes most of
> calculations in default fp mode, but in some points it enables fp
> exceptions and makes an action that can trigger such exception. Using
> constrained intrinsics would result in performance loss, although the code
> that actually needs them is very compact.
> 2. Cores that are used for machine learning usually work with short data
> (half, bfloat16 or even shorter). Rounding control in this case is much
> more important than for big cores; using proper rounding in different parts
> of algorithm can gain precision. Constrained intrinsics is the only way to
> enforce particular rounding mode. However using them results in poor
> optimization, which is intolerable. In such cores rounding mode may be
> encoded in instructions, so code movements cannot break semantics.
>
> Representation of fp operations could be more flexible, so that a user
> would not pay for rounding/exception control by performance degradation.
> For that we need to be able to mix constrained intrinsics and regular fp
> operation in a function.
>
> The question is: how can we prevent from moving fp operations through
> boundaries of a region, where specific rounding and/or exception behavior
> are applied? Any ideas?
>

Okay, I'll bite...

Preventing the hoisting of FP arithmetic was one of the driving factors in
creating the constrained intrinsics. If we could solve that problem, then
the constrained intrinsics would be *less* necessary (I say "less" since
there are other problems, but hoisting is one of the significant ones).

That said, our out-of-tree FPEnv mode attempts to do just that --
selectively throttle unsafe optimizations. Barring any YDKWYDK's, I intend
to blow the doors off of the constrained intrinsics, performance-wise. :P
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190820/fd4c2b5b/attachment.html>