[PATCH] D136176: Implement support for option 'fexcess-precision'.

Fri Nov 18 18:37:21 PST 2022

rjmccall added a comment.

At some point, we're going to have to figure out how this impacts `FLT_EVAL_METHOD`, but we can do that in a separate patch.

================
Comment at: clang/docs/UsersManual.rst:1732
+.. option:: -fexcess-precision:
+
+   By default, Clang uses excess precision to calculate ``_Float16``
----------------

   The C and C++ standards allow floating-point expressions to be computed
   as if intermediate results had more precision (and/or a wider range) than the
   type of the expression strictly allows.  This is called excess precision arithmetic.
   Excess precision arithmetic can improve the accuracy of results (although not
   always), and it can make computation significantly faster if the target lacks
   direct hardware support for arithmetic in a particular type.  However, it can
   also undermine strict floating-point reproducibility.

   Under the standards, assignments and explicit casts force the operand to be
   converted to its formal type, discarding any excess precision.  Because data
   can only flow between statements via an assignment, this means that the
   use of excess precision arithmetic is a reliable local property of a single
   statement, and results do not change based on optimization.  However, when
   excess precision arithmetic is in use, Clang does not guarantee strict
   reproducibility, and future compiler releases may recognize more opportunities
   to use excess precision arithmetic, e.g. with floating-point builtins.

   Clang does not use excess precision arithmetic for most types or on most targets.
   For example, even on pre-SSE X86 targets where ``float`` and ``double``
   computations must be performed in the 80-bit X87 format, Clang rounds
   all intermediate results correctly for their type.  Clang currently uses excess
   precision arithmetic by default only for the following types and targets:

   * ``_Float16`` on X86 targets without ``AVX512-FP16``

   The ``-fexcess-precision=<value>`` option can be used to control the use of excess
   precision arithmetic.  Valid values are:

   * ``standard`` - The default.  Allow the use of excess precision arithmetic under
     the constraints of the C and C++ standards. Has no effect except on the types
     and targets listed above.
   * ``fast`` - Accepted for GCC compatibility, but currently treated as an alias
     for ``standard``.
   * ``16`` - Forces ``_Float16`` operations to be emitted without using excess
     precision arithmetic.

================
Comment at: clang/include/clang/Basic/LangOptions.h:301

+  enum Float16ExcessPrecisionKind { FPP_Standard, FPP_Fast, FPP_None };
+
----------------
You can leave this named `ExcessPrecisionKind` — if we introduce excess precision for other types, they'll have the same set of options.

================
Comment at: clang/include/clang/Driver/Options.td:1581
+def ffloat16_excess_precision_EQ : Joined<["-"], "ffloat16-excess-precision=">,
+  Group<f_Group>, Flags<[CC1Option]>,
+  HelpText<"Allows control over excess precision on targets where native "
----------------
I think you have to be explicit about `NoDriverOption` here to make sure it's only for `-cc1`.

================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2795
   bool StrictFPModel = false;
+  StringRef Float16ExcessPrecision = "standard";

----------------
It's minor, but let's default this to empty so that we don't put any work towards rendering this option unless the user actually passes something.

================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2997
+             << A->getAsString(Args) << TC.getTriple().str();
+       if (Val.equals("standard") || Val.equals("fast") || Val.equals("none"))
+         Float16ExcessPrecision = Val;
----------------
GCC doesn't allow `none` here, right?  Let's keep that.  But our type-specific frontend option will accept `none`.

================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2995
+      StringRef Val = A->getValue();
+       if (TC.getTriple().getArch() == llvm::Triple::x86 && Val.equals("16"))
+        D.Diag(diag::err_drv_unsupported_opt_for_target)
----------------
zahiraam wrote:
> andrew.w.kaylor wrote:
> > Why is 16 only supported for x86? Is it only here for gcc compatibility?
> Yes for gcc compatibility (although we are using here that value "none" to disable excess precision instead of using "16") and also because we are dealing with excess precision for _Float16 types only, so sticking to X86.
`llvm::Triple::x86` is just i386, and I think you want to include x86_64, right?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136176/new/

https://reviews.llvm.org/D136176