[llvm-dev] Floating point operations with specific rounding and exception properties

Tue Aug 20 18:13:46 PDT 2019

Which optimization did you find unsafe?

Thanks,
--Serge

ср, 21 авг. 2019 г. в 05:12, Cameron McInally <cameron.mcinally at nyu.edu>:

> On Tue, Aug 20, 2019 at 1:02 PM Serge Pavlov via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>> During the review of https://reviews.llvm.org/D65997
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D65997&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=muQ0CzykdJ9Nhg0UshJTnEQXPSKHdFGptZXwFvVD2l0&e=>
>> an issue was revealed, which relates to the decision of how compiler should
>> represents constrained floating point operations.
>>
>> If a floating point operation requires rounding mode or exception
>> behavior different from the default, it should be represented by
>> constrained intrinsic (
>> http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_LangRef.html-23constrained-2Dfloating-2Dpoint-2Dintrinsics&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=5LUnqvzxtBJcUMLBjiVfnEwD53rKH1ZIQmEcvXhbEdo&e=>).
>> An important point is that according to the current design decision, if
>> some part of a function contains such intrinsic, all floating point
>> operations in the function must be represented by constrained intrinsics as
>> well. Such decision should prevent from undesired moves of fp operations.
>> The discussion is in the thread
>> http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_pipermail_cfe-2Ddev_2017-2DAugust_055325.html&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=l5-qb-0vYUlkEbQT46x1HYz9WtpgOLaojeUkghA_QNg&e=>,
>> the relevant example is:
>>
>> double f(double a, double b, double c) {
>>   {
>> #pragma STDC FENV_ACCESS ON
>>     feenableexcept(FE_OVERFLOW);
>>     double d = a * b;
>>     fedisableexcept(FE_OVERFLOW);
>>   }
>>   return c * d;
>> }
>>
>>
>> The second fmul must not be hoisted up to before the fedisableexcept.
>> Using constrained intrinsics is expected to help in this case as they are
>> not handled by optimization passes.
>>
>> The concern is that using constrained intrinsics in a small region of a
>> function results in using such intrinsics everywhere in the function
>> including functions that inline it. As constrained intrinsics prevent from
>> optimizations, it can result in performance degradation.
>>
>> A couple of examples:
>> 1. There is a performance critical function that makes most of
>> calculations in default fp mode, but in some points it enables fp
>> exceptions and makes an action that can trigger such exception. Using
>> constrained intrinsics would result in performance loss, although the code
>> that actually needs them is very compact.
>> 2. Cores that are used for machine learning usually work with short data
>> (half, bfloat16 or even shorter). Rounding control in this case is much
>> more important than for big cores; using proper rounding in different parts
>> of algorithm can gain precision. Constrained intrinsics is the only way to
>> enforce particular rounding mode. However using them results in poor
>> optimization, which is intolerable. In such cores rounding mode may be
>> encoded in instructions, so code movements cannot break semantics.
>>
>> Representation of fp operations could be more flexible, so that a user
>> would not pay for rounding/exception control by performance degradation.
>> For that we need to be able to mix constrained intrinsics and regular fp
>> operation in a function.
>>
>> The question is: how can we prevent from moving fp operations through
>> boundaries of a region, where specific rounding and/or exception behavior
>> are applied? Any ideas?
>>
>
> Okay, I'll bite...
>
> Preventing the hoisting of FP arithmetic was one of the driving factors in
> creating the constrained intrinsics. If we could solve that problem, then
> the constrained intrinsics would be *less* necessary (I say "less" since
> there are other problems, but hoisting is one of the significant ones).
>
> That said, our out-of-tree FPEnv mode attempts to do just that --
> selectively throttle unsafe optimizations. Barring any YDKWYDK's, I intend
> to blow the doors off of the constrained intrinsics, performance-wise. :P
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190821/24a0b818/attachment.html>