[llvm-dev] Floating point operations with specific rounding and exception properties

Tue Aug 20 10:00:47 PDT 2019

Hi all,

During the review of https://reviews.llvm.org/D65997 an issue was revealed,
which relates to the decision of how compiler should represents constrained
floating point operations.

If a floating point operation requires rounding mode or exception behavior
different from the default, it should be represented by constrained
intrinsic (
http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics).
An important point is that according to the current design decision, if
some part of a function contains such intrinsic, all floating point
operations in the function must be represented by constrained intrinsics as
well. Such decision should prevent from undesired moves of fp operations.
The discussion is in the thread
http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html, the
relevant example is:

double f(double a, double b, double c) {
  {
#pragma STDC FENV_ACCESS ON
    feenableexcept(FE_OVERFLOW);
    double d = a * b;
    fedisableexcept(FE_OVERFLOW);
  }
  return c * d;
}

The second fmul must not be hoisted up to before the fedisableexcept. Using
constrained intrinsics is expected to help in this case as they are not
handled by optimization passes.

The concern is that using constrained intrinsics in a small region of a
function results in using such intrinsics everywhere in the function
including functions that inline it. As constrained intrinsics prevent from
optimizations, it can result in performance degradation.

A couple of examples:
1. There is a performance critical function that makes most of calculations
in default fp mode, but in some points it enables fp exceptions and makes
an action that can trigger such exception. Using constrained intrinsics
would result in performance loss, although the code that actually needs
them is very compact.
2. Cores that are used for machine learning usually work with short data
(half, bfloat16 or even shorter). Rounding control in this case is much
more important than for big cores; using proper rounding in different parts
of algorithm can gain precision. Constrained intrinsics is the only way to
enforce particular rounding mode. However using them results in poor
optimization, which is intolerable. In such cores rounding mode may be
encoded in instructions, so code movements cannot break semantics.

Representation of fp operations could be more flexible, so that a user
would not pay for rounding/exception control by performance degradation.
For that we need to be able to mix constrained intrinsics and regular fp
operation in a function.

The question is: how can we prevent from moving fp operations through
boundaries of a region, where specific rounding and/or exception behavior
are applied? Any ideas?

Thanks,
--Serge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190821/de9e6993/attachment.html>