[llvm] LangRef: state explicitly that floats generally behave according to IEEE-754 (PR #102140)

Wed Aug 21 01:24:14 PDT 2024

https://github.com/RalfJung updated https://github.com/llvm/llvm-project/pull/102140

>From d643f2c167253fe7c35e6936a36a4fe921218cb1 Mon Sep 17 00:00:00 2001
From: Ralf Jung <post at ralfj.de>
Date: Tue, 6 Aug 2024 15:25:37 +0200
Subject: [PATCH 1/3] LangRef: state explicitly that floats generally behave
 according to IEEE-754

---
 llvm/docs/LangRef.rst | 32 ++++++++++++++++++++++++++++----
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index b17e3c828ed3d5..caf7ee4fd351aa 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -3572,6 +3572,29 @@ or ``syncscope("<target-scope>")`` *synchronizes with* and participates in the
 seq\_cst total orderings of other operations that are not marked
 ``syncscope("singlethread")`` or ``syncscope("<target-scope>")``.
 
+.. _floatsem:
+
+Floating-Point Semantics
+------------------------
+
+LLVM floating-point types fall into two categories:
+
+- half, float, double, and fp128, which correspond to the binary16, binary32,
+  binary64, and binary128 formats described in the IEEE-754 specification.
+- The remaining types, which do not directly correspond to a standard IEEE
+  format.
+
+For types that do correspond to an IEEE format, LLVM IR float operations behave
+like the corresponding operations in IEEE-754, with two exceptions: LLVM makes
+:ref:`specific assumptions about the state of the floating-point environment
+<floatenv>` and it implements :ref:`different rules for operations that return
+NaN values <floatnan>`.
+
+This means that optimizations and backends cannot change the precision of these
+operations (unless there are fast-math flags), and frontends can rely on these
+operations deterministically providing perfectly rounded results as described
+in the standard (except when a NaN is returned).
+
 .. _floatenv:
 
 Floating-Point Environment
@@ -3608,10 +3631,11 @@ are not "floating-point math operations": ``fneg``, ``llvm.fabs``, and
 ``llvm.copysign``. These operations act directly on the underlying bit
 representation and never change anything except possibly for the sign bit.
 
-For floating-point math operations, unless specified otherwise, the following
-rules apply when a NaN value is returned: the result has a non-deterministic
-sign; the quiet bit and payload are non-deterministically chosen from the
-following set of options:
+Floating-point math operations that return a NaN are an exception from the
+general principle that LLVM implements IEEE-754 semantics. Unless specified
+otherwise, the following rules apply when a NaN value is returned: the result
+has a non-deterministic sign; the quiet bit and payload are
+non-deterministically chosen from the following set of options:
 
 - The quiet bit is set and the payload is all-zero. ("Preferred NaN" case)
 - The quiet bit is set and the payload is copied from any input operand that is

>From 648d3cef3e7df35e6507b2d0e9f96a5f503ebbf4 Mon Sep 17 00:00:00 2001
From: Ralf Jung <post at ralfj.de>
Date: Wed, 21 Aug 2024 09:50:56 +0200
Subject: [PATCH 2/3] Floating-Point Types: move comment about IEEE formats
 into the table

---
 llvm/docs/LangRef.rst | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index caf7ee4fd351aa..de5247917d3b45 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -3967,7 +3967,7 @@ Floating-Point Types
      - Description
 
    * - ``half``
-     - 16-bit floating-point value
+     - 16-bit floating-point value (IEEE-754 binary16)
 
    * - ``bfloat``
      - 16-bit "brain" floating-point value (7-bit significand).  Provides the
@@ -3976,13 +3976,13 @@ Floating-Point Types
        extensions and Arm's ARMv8.6-A extensions, among others.
 
    * - ``float``
-     - 32-bit floating-point value
+     - 32-bit floating-point value (IEEE-754 binary32)
 
    * - ``double``
-     - 64-bit floating-point value
+     - 64-bit floating-point value (IEEE-754 binary64)
 
    * - ``fp128``
-     - 128-bit floating-point value (113-bit significand)
+     - 128-bit floating-point value (IEEE-754 binary128)
 
    * - ``x86_fp80``
      -  80-bit floating-point value (X87)
@@ -3990,10 +3990,6 @@ Floating-Point Types
    * - ``ppc_fp128``
      - 128-bit floating-point value (two 64-bits)
 
-The binary format of half, float, double, and fp128 correspond to the
-IEEE-754-2008 specifications for binary16, binary32, binary64, and binary128
-respectively.
-
 X86_amx Type
 """"""""""""
 

>From d107aa0330c5438bca0e89a1d056d4880ffeed86 Mon Sep 17 00:00:00 2001
From: Ralf Jung <post at ralfj.de>
Date: Wed, 21 Aug 2024 10:16:00 +0200
Subject: [PATCH 3/3] spell out more clearly which part of IEEE-754 we are
 importing

---
 llvm/docs/LangRef.rst | 39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index de5247917d3b45..cfceec9f49addf 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -3584,16 +3584,31 @@ LLVM floating-point types fall into two categories:
 - The remaining types, which do not directly correspond to a standard IEEE
   format.
 
-For types that do correspond to an IEEE format, LLVM IR float operations behave
-like the corresponding operations in IEEE-754, with two exceptions: LLVM makes
-:ref:`specific assumptions about the state of the floating-point environment
-<floatenv>` and it implements :ref:`different rules for operations that return
-NaN values <floatnan>`.
+For floating-point operations acting on types with a corresponding IEEE format,
+unless otherwise specified the value returned by that operation matches that of
+the corresponding IEEE-754 operation executed in the :ref:`default
+floating-point environment <floatenv>`, except that the behavior of NaN results
+is instead :ref:`as specified here <floatnan>`. (This statement concerns only
+the returned *value*; we make no statement about status flags or
+traps/exceptions.) In particular, a floating-point instruction returning a
+non-NaN value is guaranteed to always return the same bit-identical result on
+all machines and optimization levels.
+
+This means that optimizations and backends may not change the observed bitwise
+result of these operations in any way (unless NaNs are returned), and frontends
+can rely on these operations providing perfectly rounded results as described in
+the standard.
+
+Various flags and attributes can alter the behavior of these operations and thus
+make them not bit-identical across machines and optimization levels any more:
+most notably, the :ref:`fast-math flags <fastmath>` as well as the ``strictfp``
+and ``denormal-fp-math`` attributes. See their corresponding documentation for
+details.
 
-This means that optimizations and backends cannot change the precision of these
-operations (unless there are fast-math flags), and frontends can rely on these
-operations deterministically providing perfectly rounded results as described
-in the standard (except when a NaN is returned).
+If the compiled code is executed in a non-default floating-point environment
+(this includes non-standard behavior such as subnormal flushing), the result is
+typically undefined behavior unless attributes like ``strictfp`` and
+``denormal-fp-math`` or :ref:`constrained intrinsics <constrainedfp>` are used.
 
 .. _floatenv:
 
@@ -3633,9 +3648,9 @@ representation and never change anything except possibly for the sign bit.
 
 Floating-point math operations that return a NaN are an exception from the
 general principle that LLVM implements IEEE-754 semantics. Unless specified
-otherwise, the following rules apply when a NaN value is returned: the result
-has a non-deterministic sign; the quiet bit and payload are
-non-deterministically chosen from the following set of options:
+otherwise, the following rules apply whenever the IEEE-754 semantics say that a
+NaN value is returned: the result has a non-deterministic sign; the quiet bit
+and payload are non-deterministically chosen from the following set of options:
 
 - The quiet bit is set and the payload is all-zero. ("Preferred NaN" case)
 - The quiet bit is set and the payload is copied from any input operand that is