[clang] [llvm] [IR] Allow non-constrained math intrinsics in strictfp functions (PR #188297)
Serge Pavlov via cfe-commits
cfe-commits at lists.llvm.org
Tue Mar 24 10:52:35 PDT 2026
https://github.com/spavloff created https://github.com/llvm/llvm-project/pull/188297
The current implementation of floating-point support uses two different representations for each floating-point operation, such as `llvm.trunc` and `llvm.experimental.constrained.trunc`. The main difference between them is the presence of side effects that describe interaction with the floating-point environment. Which of the two functions should be used is determined by the enclosing function's attribute 'strictfp'. The compiler does not check whether a regular functions, like `llvm.trunc` is used in a strictfp function, so maintaining consistency is the user's responsibility. It is easy to mistakenly use the regular, side-effect-free intrinsic in a strictfp function, and even LLVM tests contain examples of this.
If the variant of intrinsic is determined solely by the 'strictfp' function attribute, the distinction between the two forms appear to be redundant, and the regular form could be used in all cases. This would require the compiler to deduce side effects from the function attributes. In this scenario, floating-point operations would have "optional" side effects.
Currently, it is not possible to completely avoid constrained functions. In addition to representing side effects, they can also carry compiler hints, namely the expected rounding mode and exception behavior. However, using regular intrinsics in a strictfp function could be allowed if the mechanism of optional side effects were implemented for them. Such a change would make the current implementation of strictfp support more robust and would represent a step toward more powerful floating-point support.
This change implements minimal support for optional side effects in floating-point operations sufficient to allow use of non-constrained math function intrinsics in strictfp code. It does not alter the compiler's behavior for the code that is correct under the current floating-point model, it affects only the case of non-constrained intrinsics used in a strictfp function, which is currently disallowed.
>From 06c51b1a110d653a2474780d0dd70d162b09781f Mon Sep 17 00:00:00 2001
From: Serge Pavlov <sepavloff at gmail.com>
Date: Sat, 21 Mar 2026 14:40:51 +0700
Subject: [PATCH] [IR] Allow non-constrained math intrinsics in strictfp
functions
The current implementation of floating-point support uses two different
representations for each floating-point operation, such as `llvm.trunc`
and `llvm.experimental.constrained.trunc`. The main difference between
them is the presence of side effects that describe interaction with the
floating-point environment. Which of the two functions should be used is
determined by the enclosing function's attribute 'strictfp'. The
compiler does not check whether a regular functions, like `llvm.trunc`
is used in a strictfp function, so maintaining consistency is the user's
responsibility. It is easy to mistakenly use the regular,
side-effect-free intrinsic in a strictfp function, and even LLVM tests
contain examples of this.
If the variant of intrinsic is determined solely by the 'strictfp'
function attribute, the distinction between the two forms appear to be
redundant, and the regular form could be used in all cases. This would
require the compiler to deduce side effects from the function
attributes. In this scenario, floating-point operations would have
"optional" side effects.
Currently, it is not possible to completely avoid constrained functions.
In addition to representing side effects, they can also carry compiler
hints, namely the expected rounding mode and exception behavior.
However, using regular intrinsics in a strictfp function could be
allowed if the mechanism of optional side effects were implemented for
them. Such a change would make the current implementation of strictfp
support more robust and would represent a step toward more powerful
floating-point support.
This change implements minimal support for optional side effects in
floating-point operations sufficient to allow use of non-constrained
math function intrinsics in strictfp code. It does not alter the
compiler's behavior for the code that is correct under the current
floating-point model, it affects only the case of non-constrained
intrinsics used in a strictfp function, which is currently disallowed.
---
.../CodeGen/strictfp-elementwise-builtins.cpp | 8 +-
llvm/docs/LangRef.rst | 180 +++++++++++++++---
llvm/docs/ReleaseNotes.md | 3 +
llvm/include/llvm/IR/FloatingPointOps.def | 72 +++++++
llvm/include/llvm/IR/Function.h | 3 +
llvm/include/llvm/IR/IRBuilder.h | 17 +-
llvm/include/llvm/IR/InstrTypes.h | 3 +
llvm/include/llvm/IR/Intrinsics.h | 8 +
llvm/include/llvm/Support/ModRef.h | 5 +
llvm/lib/Analysis/BasicAliasAnalysis.cpp | 19 ++
llvm/lib/Analysis/GlobalsModRef.cpp | 7 +-
.../SelectionDAG/SelectionDAGBuilder.cpp | 176 +++++++----------
.../SelectionDAG/SelectionDAGBuilder.h | 1 +
llvm/lib/IR/Function.cpp | 4 +
llvm/lib/IR/IRBuilder.cpp | 32 ++++
llvm/lib/IR/Instructions.cpp | 19 ++
llvm/lib/IR/Intrinsics.cpp | 11 ++
.../AMDGPU/amdgpu-simplify-libcall-pow.ll | 23 +--
.../AMDGPU/amdgpu-simplify-libcall-pown.ll | 4 +-
.../AMDGPU/amdgpu-simplify-libcall-rootn.ll | 5 +-
20 files changed, 426 insertions(+), 174 deletions(-)
create mode 100644 llvm/include/llvm/IR/FloatingPointOps.def
diff --git a/clang/test/CodeGen/strictfp-elementwise-builtins.cpp b/clang/test/CodeGen/strictfp-elementwise-builtins.cpp
index 6453d50f044aa..696f3f65236fc 100644
--- a/clang/test/CodeGen/strictfp-elementwise-builtins.cpp
+++ b/clang/test/CodeGen/strictfp-elementwise-builtins.cpp
@@ -48,9 +48,9 @@ float4 strict_elementwise_min(float4 a, float4 b) {
}
// CHECK-LABEL: define dso_local noundef <4 x float> @_Z26strict_elementwise_maximumDv4_fS_
-// CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR2]] {
+// CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR0]] {
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[ELT_MAXIMUM:%.*]] = tail call <4 x float> @llvm.maximum.v4f32(<4 x float> [[A]], <4 x float> [[B]]) #[[ATTR4]]
+// CHECK-NEXT: [[ELT_MAXIMUM:%.*]] = tail call <4 x float> @llvm.maximum.v4f32(<4 x float> [[A]], <4 x float> [[B]]) #[[ATTR5:[0-9]+]]
// CHECK-NEXT: ret <4 x float> [[ELT_MAXIMUM]]
//
float4 strict_elementwise_maximum(float4 a, float4 b) {
@@ -58,9 +58,9 @@ float4 strict_elementwise_maximum(float4 a, float4 b) {
}
// CHECK-LABEL: define dso_local noundef <4 x float> @_Z26strict_elementwise_minimumDv4_fS_
-// CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR2]] {
+// CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR0]] {
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[ELT_MINIMUM:%.*]] = tail call <4 x float> @llvm.minimum.v4f32(<4 x float> [[A]], <4 x float> [[B]]) #[[ATTR4]]
+// CHECK-NEXT: [[ELT_MINIMUM:%.*]] = tail call <4 x float> @llvm.minimum.v4f32(<4 x float> [[A]], <4 x float> [[B]]) #[[ATTR5]]
// CHECK-NEXT: ret <4 x float> [[ELT_MINIMUM]]
//
float4 strict_elementwise_minimum(float4 a, float4 b) {
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 13883883d3981..33968d70e024f 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -4045,18 +4045,56 @@ seq\_cst total orderings of other operations that are not marked
Floating-Point Environment
--------------------------
-The default LLVM floating-point environment assumes that traps are disabled and
-status flags are not observable. Therefore, floating-point math operations do
-not have side effects and may be speculated freely. Results assume the
-round-to-nearest rounding mode, and subnormals are assumed to be preserved.
-
-Running LLVM code in an environment where these assumptions are not met
-typically leads to undefined behavior. The ``strictfp`` and
-:ref:`denormal_fpenv <denormal_fpenv>` attributes as well as
-:ref:`Constrained Floating-Point Intrinsics <constrainedfp>` can be
-used to weaken LLVM's assumptions and ensure defined behavior in
-non-default floating-point environments; see their respective
-documentation for details.
+The execution of an operation on floating-point values is often a more complex
+process than simply evaluating a function of its input arguments. First, it can
+depend on various parameters like rounding mode, denormal behavior, trap masks
+and so on. These are referenced to as "control modes" and are stored in
+floating-point control registers. In addition, the operation may set status bits
+in a status register. Floating-point environment is a collection of registers
+that hold control modes and status bits.
+
+Interaction with the floating-point environment, including reading control
+modes, writing status bits and trapping, is regarded as side effects. Depending
+on how the side effects are treated, compilation occurs in one of two modes.
+
+In the ``strict mode``, all side effects produced by the floating-point
+operations are taken into account. Modifications to the floating-point
+environment are allowed only in this mode.
+
+In the ``unconstrained mode``, control modes are not modified and status bits
+are not observed. This allows floating-point operations to be considered free
+of side effects, which facilitates code optimizations. An important case of this
+mode is the ``default mode``, in which the control modes have default values:
+rounding mode is "round to nearest, ties to even", traps are disabled, and
+subnormals are assumed to be preserved.
+
+The compilation mode is defined for an entire function and is specified by
+``strictfp`` attribute. If this attribute is set, compilation occurs in strict
+mode. The value of the floating-point environment is specified by function
+attributes (such as :ref:`denormal_fpenv <denormal_fpenv>`) and can be modified
+either by intrinsic functions like ``llvm.set_rounding`` or external functions
+like ``fesetround``.
+
+.. _floatop:
+
+Floating-point operations
+-------------------------
+
+Whether an operation interacts with the floating-point environment depends on
+the operation itself and the attributes of its containing function.
+Operations that can exhibit such
+interaction, and which may be ignored in the unconstrained mode, are referred to
+as ``floating-point operations``. These are computational operations that
+produce floating-point or integer results, round all results according to the
+value of the floating-point environment, and might signal floating-point
+exceptions.
+
+Some operations on floating-point values are not classified as floating-point
+operations. For instance, ``llvm.copysign``, ``llvm.fabs`` or
+``llvm.is_fpclass`` do not depend on control modes and cannot raise exceptions.
+The operations like ``llvm.set_rounding`` or ``llvm.set_fpenv`` interact with
+the floating-point environment, but they are not computational and their
+interaction cannot be ignored.
.. _floatnan:
@@ -16666,6 +16704,9 @@ matches a conforming libm implementation.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
'``llvm.powi.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -16714,6 +16755,9 @@ function may non-deterministically treat signaling NaNs as quiet NaNs. For
example, `powi(QNaN, 0)` returns `1.0`, and `powi(SNaN, 0)` may
non-deterministically return `1.0` or a NaN.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. _t_llvm_sin:
'``llvm.sin.*``' Intrinsic
@@ -16753,6 +16797,9 @@ trapping or setting ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. _t_llvm_cos:
'``llvm.cos.*``' Intrinsic
@@ -16792,6 +16839,9 @@ trapping or setting ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
'``llvm.tan.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -16829,6 +16879,9 @@ trapping or setting ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
'``llvm.asin.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -16866,6 +16919,9 @@ trapping or setting ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
'``llvm.acos.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -16903,6 +16959,9 @@ trapping or setting ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
'``llvm.atan.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -16940,6 +16999,9 @@ trapping or setting ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
'``llvm.atan2.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -16978,6 +17040,9 @@ trapping or setting ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
'``llvm.sinh.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -17015,6 +17080,9 @@ trapping or setting ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
'``llvm.cosh.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -17052,6 +17120,9 @@ trapping or setting ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
'``llvm.tanh.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -17089,6 +17160,8 @@ trapping or setting ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
'``llvm.sincos.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -17283,6 +17356,9 @@ function may non-deterministically treat signaling NaNs as quiet NaNs. For
example, `pow(QNaN, 0.0)` returns `1.0`, and `pow(SNaN, 0.0)` may
non-deterministically return `1.0` or a NaN.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. _int_exp:
'``llvm.exp.*``' Intrinsic
@@ -17323,6 +17399,9 @@ trapping or setting ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. _int_exp2:
'``llvm.exp2.*``' Intrinsic
@@ -17363,6 +17442,9 @@ trapping or setting ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. _int_exp10:
'``llvm.exp10.*``' Intrinsic
@@ -17445,6 +17527,9 @@ value is returned. If the result underflows a zero with the same sign
is returned. If the result overflows, the result is an infinity with
the same sign.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. _int_frexp:
'``llvm.frexp.*``' Intrinsic
@@ -17541,6 +17626,9 @@ trapping or setting ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. _int_log10:
'``llvm.log10.*``' Intrinsic
@@ -17581,6 +17669,8 @@ trapping or setting ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
.. _int_log2:
@@ -17622,6 +17712,9 @@ trapping or setting ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. _int_fma:
'``llvm.fma.*``' Intrinsic
@@ -17661,6 +17754,9 @@ is assumed to not trap or set ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. _int_fabs:
'``llvm.fabs.*``' Intrinsic
@@ -17803,6 +17899,9 @@ which follow :ref:`LLVM's usual signaling NaN behavior <floatnan>` instead.
The ``llvm.minnum`` intrinsic can be refined into ``llvm.minimumnum``, as the
latter exhibits a subset of behaviors of the former.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. warning::
If the intrinsic is used without nsz, not all backends currently respect the
@@ -17869,6 +17968,9 @@ which follow :ref:`LLVM's usual signaling NaN behavior <floatnan>` instead.
The ``llvm.maxnum`` intrinsic can be refined into ``llvm.maximumnum``, as the
latter exhibits a subset of behaviors of the former.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. warning::
If the intrinsic is used without nsz, not all backends currently respect the
@@ -17924,6 +18026,9 @@ If the ``nsz`` flag is specified, ``llvm.maximum`` with one +0.0 and one
``nsz`` semantics, if both operands have the same sign, the result must also
have the same sign.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. _i_maximum:
'``llvm.maximum.*``' Intrinsic
@@ -17972,6 +18077,9 @@ If the ``nsz`` flag is specified, ``llvm.maximum`` with one +0.0 and one
``nsz`` semantics, if both operands have the same sign, the result must also
have the same sign.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. _i_minimumnum:
'``llvm.minimumnum.*``' Intrinsic
@@ -18157,6 +18265,9 @@ Semantics:
This function returns the same values as the libm ``floor`` functions
would, and handles error conditions in the same way.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. _int_ceil:
'``llvm.ceil.*``' Intrinsic
@@ -18194,6 +18305,8 @@ Semantics:
This function returns the same values as the libm ``ceil`` functions
would, and handles error conditions in the same way.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
.. _int_llvm_trunc:
@@ -18233,6 +18346,9 @@ Semantics:
This function returns the same values as the libm ``trunc`` functions
would, and handles error conditions in the same way.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. _int_rint:
'``llvm.rint.*``' Intrinsic
@@ -18270,11 +18386,12 @@ Semantics:
""""""""""
This function returns the same values as the libm ``rint`` functions
-would, and handles error conditions in the same way. Since LLVM assumes the
-:ref:`default floating-point environment <floatenv>`, the rounding mode is
-assumed to be set to "nearest", so halfway cases are rounded to the even
-integer. Use :ref:`Constrained Floating-Point Intrinsics <constrainedfp>`
-to avoid that assumption.
+would, and handles error conditions in the same way.
+
+In the :ref:`default floating-point environment <floatenv>`, the rounding mode is
+assumed to be "round to nearest, ties to even".
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
.. _int_nearbyint:
@@ -18312,11 +18429,12 @@ Semantics:
""""""""""
This function returns the same values as the libm ``nearbyint``
-functions would, and handles error conditions in the same way. Since LLVM
-assumes the :ref:`default floating-point environment <floatenv>`, the rounding
-mode is assumed to be set to "nearest", so halfway cases are rounded to the even
-integer. Use :ref:`Constrained Floating-Point Intrinsics <constrainedfp>` to
-avoid that assumption.
+functions would, and handles error conditions in the same way.
+
+In the :ref:`default floating-point environment <floatenv>`, the rounding mode is
+assumed to be "round to nearest, ties to even".
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
.. _int_round:
@@ -18356,6 +18474,9 @@ Semantics:
This function returns the same values as the libm ``round``
functions would, and handles error conditions in the same way.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. _int_roundeven:
'``llvm.roundeven.*``' Intrinsic
@@ -18395,6 +18516,8 @@ This function implements IEEE 754 operation ``roundToIntegralTiesToEven``. It
also behaves in the same way as C standard function ``roundeven``, including
that it disregards rounding mode and does not raise floating point exceptions.
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
'``llvm.lround.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -18441,6 +18564,9 @@ would, but without setting errno. If the rounded value is too large to
be stored in the result type, the return value is a non-deterministic
value (equivalent to `freeze poison`).
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
'``llvm.llround.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -18478,6 +18604,9 @@ functions would, but without setting errno. If the rounded value is
too large to be stored in the result type, the return value is a
non-deterministic value (equivalent to `freeze poison`).
+As a :ref:`floating-point operation <floatop>`, this function has side effects
+in a ``strictfp`` function.
+
.. _int_lrint:
'``llvm.lrint.*``' Intrinsic
@@ -28045,13 +28174,6 @@ Constrained FP intrinsics are used to support non-default rounding modes and
accurately preserve exception behavior without compromising LLVM's ability to
optimize FP code when the default behavior is used.
-If any FP operation in a function is constrained then they all must be
-constrained. This is required for correct LLVM IR. Optimizations that
-move code around can create miscompiles if mixing of constrained and normal
-operations is done. The correct way to mix constrained and less constrained
-operations is to use the rounding mode and exception handling metadata to
-mark constrained intrinsics as having LLVM's default behavior.
-
Each of these intrinsics corresponds to a normal floating-point operation. The
data arguments and the return value are the same as the corresponding FP
operation.
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 249ddc27fb3ea..6068eef48bc90 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -63,6 +63,9 @@ Changes to the LLVM IR
intrinsics. These are equivalent to `fptrunc` and `fpext` with half
with a bitcast.
+* Calls to floating-point intrinsics implicitly acquire side effects if
+ containing function has strictfp attribute.
+
* "denormal-fp-math" and "denormal-fp-math-f32" string attributes were
migrated to first-class denormal_fpenv attribute.
diff --git a/llvm/include/llvm/IR/FloatingPointOps.def b/llvm/include/llvm/IR/FloatingPointOps.def
new file mode 100644
index 0000000000000..e533689bd995e
--- /dev/null
+++ b/llvm/include/llvm/IR/FloatingPointOps.def
@@ -0,0 +1,72 @@
+//===- llvm/IR/FloatingPointOps.def - FP intrinsics -------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Defines set of intrinsics, which are classified as floating-point operations.
+//
+//===----------------------------------------------------------------------===//
+
+// Describes floating-point operation.
+// Arguments of the entries are:
+// N - intrinsic function name,
+// R - true if the operation depends on rounding mode
+// D - DAG node corresponding to the intrinsic.
+#ifndef FUNCTION
+#define FUNCTION(N,R,D)
+#endif
+
+// Describes a floating-point operation that is lowered to DAG nodes STRICT_*
+// in strictfp functions.
+#ifndef LEGACY_DAG
+#define LEGACY_DAG(N,R,D) FUNCTION(N,R,D)
+#endif
+
+// Describes a floating-point operation that is lowered to DAG nodes STRICT_*
+// in strictfp functions, and requires special expansion in the DAG builder.
+#ifndef LEGACY_EXP
+#define LEGACY_EXP(N,R,D) FUNCTION(N,R,D)
+#endif
+
+LEGACY_DAG(nearbyint, 1, FNEARBYINT)
+LEGACY_DAG(trunc, 0, FTRUNC)
+LEGACY_DAG(ceil, 0, FCEIL)
+LEGACY_DAG(floor, 0, FFLOOR)
+LEGACY_DAG(round, 0, FROUND)
+LEGACY_DAG(roundeven, 0, FROUNDEVEN)
+LEGACY_DAG(rint, 1, FRINT)
+LEGACY_DAG(lround, 0, LROUND)
+LEGACY_DAG(llround, 0, LLROUND)
+LEGACY_DAG(lrint, 1, LRINT)
+LEGACY_DAG(llrint, 1, LLRINT)
+LEGACY_DAG(minnum, 0, FMINNUM)
+LEGACY_DAG(maxnum, 0, FMAXNUM)
+LEGACY_DAG(minimum, 0, FMINIMUM)
+LEGACY_DAG(maximum, 0, FMAXIMUM)
+LEGACY_DAG(sqrt, 1, FSQRT)
+LEGACY_EXP(exp, 1, FEXP)
+LEGACY_EXP(exp2, 1, FEXP2)
+LEGACY_EXP(log, 1, FLOG)
+LEGACY_EXP(log2, 1, FLOG2)
+LEGACY_EXP(log10, 1, FLOG10)
+LEGACY_EXP(pow, 1, FPOW)
+LEGACY_EXP(powi, 1, FPOWI)
+LEGACY_DAG(sin, 1, FSIN)
+LEGACY_DAG(cos, 1, FCOS)
+LEGACY_DAG(tan, 1, FTAN)
+LEGACY_DAG(sinh, 1, FSINH)
+LEGACY_DAG(cosh, 1, FCOSH)
+LEGACY_DAG(tanh, 1, FTANH)
+LEGACY_DAG(asin, 1, FASIN)
+LEGACY_DAG(acos, 1, FACOS)
+LEGACY_DAG(atan, 1, FATAN)
+LEGACY_DAG(atan2, 1, FATAN2)
+LEGACY_DAG(ldexp, 1, FLDEXP)
+LEGACY_DAG(fma, 1, FMA)
+
+#undef FUNCTION
+#undef LEGACY_DAG
+#undef LEGACY_EXP
diff --git a/llvm/include/llvm/IR/Function.h b/llvm/include/llvm/IR/Function.h
index f39fe509a49a4..5303aa60be219 100644
--- a/llvm/include/llvm/IR/Function.h
+++ b/llvm/include/llvm/IR/Function.h
@@ -260,6 +260,9 @@ class LLVM_ABI Function : public GlobalObject, public ilist_node<Function> {
/// getIntrinsicID() returns Intrinsic::not_intrinsic.
bool isConstrainedFPIntrinsic() const;
+ /// Returns true if the function is a floating-point operations.
+ bool isFPOperation() const;
+
/// Update internal caches that depend on the function name (such as the
/// intrinsic ID and libcall cache).
/// Note, this method does not need to be called directly, as it is called
diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h
index 4ed3d73c4a057..eea50075f8a99 100644
--- a/llvm/include/llvm/IR/IRBuilder.h
+++ b/llvm/include/llvm/IR/IRBuilder.h
@@ -2510,24 +2510,13 @@ class IRBuilderBase {
CallInst *CreateCall(FunctionType *FTy, Value *Callee,
ArrayRef<Value *> Args = {}, const Twine &Name = "",
MDNode *FPMathTag = nullptr) {
- CallInst *CI = CallInst::Create(FTy, Callee, Args, DefaultOperandBundles);
- if (IsFPConstrained)
- setConstrainedFPCallAttr(CI);
- if (isa<FPMathOperator>(CI))
- setFPAttrs(CI, FPMathTag, FMF);
- return Insert(CI, Name);
+ return CreateCall(FTy, Callee, Args, DefaultOperandBundles, Name,
+ FPMathTag);
}
CallInst *CreateCall(FunctionType *FTy, Value *Callee, ArrayRef<Value *> Args,
ArrayRef<OperandBundleDef> OpBundles,
- const Twine &Name = "", MDNode *FPMathTag = nullptr) {
- CallInst *CI = CallInst::Create(FTy, Callee, Args, OpBundles);
- if (IsFPConstrained)
- setConstrainedFPCallAttr(CI);
- if (isa<FPMathOperator>(CI))
- setFPAttrs(CI, FPMathTag, FMF);
- return Insert(CI, Name);
- }
+ const Twine &Name = "", MDNode *FPMathTag = nullptr);
CallInst *CreateCall(FunctionCallee Callee, ArrayRef<Value *> Args = {},
const Twine &Name = "", MDNode *FPMathTag = nullptr) {
diff --git a/llvm/include/llvm/IR/InstrTypes.h b/llvm/include/llvm/IR/InstrTypes.h
index 61dc5ebef1b1d..938b82ba891ce 100644
--- a/llvm/include/llvm/IR/InstrTypes.h
+++ b/llvm/include/llvm/IR/InstrTypes.h
@@ -1195,6 +1195,9 @@ class CallBase : public Instruction {
return nullptr;
}
+ /// Get memory effects specific to floating-point operations.
+ std::optional<MemoryEffects> getFloatingPointMemoryEffects() const;
+
static bool classof(const Instruction *I) {
return I->getOpcode() == Instruction::Call ||
I->getOpcode() == Instruction::Invoke ||
diff --git a/llvm/include/llvm/IR/Intrinsics.h b/llvm/include/llvm/IR/Intrinsics.h
index 5aecec9fd5925..cfd53604b9ebb 100644
--- a/llvm/include/llvm/IR/Intrinsics.h
+++ b/llvm/include/llvm/IR/Intrinsics.h
@@ -149,6 +149,14 @@ namespace Intrinsic {
/// Floating-Point Intrinsics" that take rounding mode metadata.
LLVM_ABI bool hasConstrainedFPRoundingModeOperand(ID QID);
+ /// Returns true if \p ID represents an intrinsic function that may access FP
+ /// environment.
+ ///
+ /// Access to FP environment means that in the strict FP environment the
+ /// function has read/write memory effect, which is used to maintain proper
+ /// instructions ordering.
+ LLVM_ABI bool isFPOperation(ID IID);
+
/// This is a type descriptor which explains the type requirements of an
/// intrinsic. This is returned by getIntrinsicInfoTableEntries.
struct IITDescriptor {
diff --git a/llvm/include/llvm/Support/ModRef.h b/llvm/include/llvm/Support/ModRef.h
index 83091c617f629..cba572bbf05ec 100644
--- a/llvm/include/llvm/Support/ModRef.h
+++ b/llvm/include/llvm/Support/ModRef.h
@@ -275,6 +275,11 @@ template <typename LocationEnum> class MemoryEffectsBase {
return ME.getWithoutLoc(Location::InaccessibleMem).doesNotAccessMemory();
}
+ /// Whether this function accesses inaccessible memory.
+ bool doesAccessInaccessibleMem() const {
+ return isModOrRefSet(getModRef(Location::InaccessibleMem));
+ }
+
/// Whether location is target memory location.
bool isTargetMemLoc(IRMemLocation Loc) const {
for (auto L : targetMemLocations())
diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
index 80b14646c8889..b4dc865bcd0f1 100644
--- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
+++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
@@ -837,6 +837,15 @@ MemoryEffects BasicAAResult::getMemoryEffects(const CallBase *Call,
if (const Function *F = dyn_cast<Function>(Call->getCalledOperand())) {
MemoryEffects FuncME = AAQI.AAR.getMemoryEffects(F);
+
+ // Floating-point operations have memory effects that describe interaction
+ // with the floating-point environment. These memory effects depend on
+ // the attributes of the containing function.
+ if (auto FPME = Call->getFloatingPointMemoryEffects())
+ FuncME =
+ FuncME.getWithModRef(IRMemLocation::InaccessibleMem,
+ FPME->getModRef(IRMemLocation::InaccessibleMem));
+
// Operand bundles on the call may also read or write memory, in addition
// to the behavior of the called function.
if (Call->hasReadingOperandBundles())
@@ -863,6 +872,16 @@ MemoryEffects BasicAAResult::getMemoryEffects(const Function *F) {
// inaccessible memory to model control dependence.
return MemoryEffects::readOnly() |
MemoryEffects::inaccessibleMemOnly(ModRefInfo::ModRef);
+#define FUNCTION(NAME, R, D) case Intrinsic::NAME:
+#include "llvm/IR/FloatingPointOps.def"
+ // Floating-point operations may have or may not have side effects due to
+ // the interaction with floating-point environment. Which case is realized,
+ // it depends on the corresponding call site bundles and attribute of the
+ // containing function. Here we conservatively assume that in stricfp
+ // function the side effects exist.
+ if (F->getAttributes().hasFnAttr(llvm::Attribute::StrictFP))
+ return MemoryEffects::inaccessibleMemOnly(ModRefInfo::ModRef);
+ return MemoryEffects::none();
}
return F->getMemoryEffects();
diff --git a/llvm/lib/Analysis/GlobalsModRef.cpp b/llvm/lib/Analysis/GlobalsModRef.cpp
index 295e267848b23..0f639126c7e0a 100644
--- a/llvm/lib/Analysis/GlobalsModRef.cpp
+++ b/llvm/lib/Analysis/GlobalsModRef.cpp
@@ -537,7 +537,12 @@ void GlobalsAAResult::AnalyzeCallGraph(CallGraph &CG, Module &M) {
if (F->isDeclaration() || F->hasOptNone()) {
// Try to get mod/ref behaviour from function attributes.
- if (F->doesNotAccessMemory()) {
+ if (F->isFPOperation()) {
+ // Floating-point operations have mod/ref behaviour that depends on
+ // the call site properties and containing function attributes.
+ // Conservatively assume RW access to the floating-point environment.
+ FI.addModRefInfo(ModRefInfo::ModRef);
+ } else if (F->doesNotAccessMemory()) {
// Can't do better than that!
} else if (F->onlyReadsMemory()) {
FI.addModRefInfo(ModRefInfo::Ref);
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 04b17b56b3d49..05c869462c839 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -6927,111 +6927,24 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
setValue(&I,
expandExp2(sdl, getValue(I.getArgOperand(0)), DAG, TLI, Flags));
return;
+ case Intrinsic::exp10:
+ setValue(&I, DAG.getNode(ISD::FEXP10, sdl,
+ getValue(I.getArgOperand(0)).getValueType(),
+ getValue(I.getArgOperand(0)), Flags));
+ return;
case Intrinsic::pow:
setValue(&I, expandPow(sdl, getValue(I.getArgOperand(0)),
getValue(I.getArgOperand(1)), DAG, TLI, Flags));
return;
- case Intrinsic::sqrt:
case Intrinsic::fabs:
- case Intrinsic::sin:
- case Intrinsic::cos:
- case Intrinsic::tan:
- case Intrinsic::asin:
- case Intrinsic::acos:
- case Intrinsic::atan:
- case Intrinsic::sinh:
- case Intrinsic::cosh:
- case Intrinsic::tanh:
- case Intrinsic::exp10:
- case Intrinsic::floor:
- case Intrinsic::ceil:
- case Intrinsic::trunc:
- case Intrinsic::rint:
- case Intrinsic::nearbyint:
- case Intrinsic::round:
- case Intrinsic::roundeven:
- case Intrinsic::canonicalize: {
- unsigned Opcode;
- // clang-format off
- switch (Intrinsic) {
- default: llvm_unreachable("Impossible intrinsic"); // Can't reach here.
- case Intrinsic::sqrt: Opcode = ISD::FSQRT; break;
- case Intrinsic::fabs: Opcode = ISD::FABS; break;
- case Intrinsic::sin: Opcode = ISD::FSIN; break;
- case Intrinsic::cos: Opcode = ISD::FCOS; break;
- case Intrinsic::tan: Opcode = ISD::FTAN; break;
- case Intrinsic::asin: Opcode = ISD::FASIN; break;
- case Intrinsic::acos: Opcode = ISD::FACOS; break;
- case Intrinsic::atan: Opcode = ISD::FATAN; break;
- case Intrinsic::sinh: Opcode = ISD::FSINH; break;
- case Intrinsic::cosh: Opcode = ISD::FCOSH; break;
- case Intrinsic::tanh: Opcode = ISD::FTANH; break;
- case Intrinsic::exp10: Opcode = ISD::FEXP10; break;
- case Intrinsic::floor: Opcode = ISD::FFLOOR; break;
- case Intrinsic::ceil: Opcode = ISD::FCEIL; break;
- case Intrinsic::trunc: Opcode = ISD::FTRUNC; break;
- case Intrinsic::rint: Opcode = ISD::FRINT; break;
- case Intrinsic::nearbyint: Opcode = ISD::FNEARBYINT; break;
- case Intrinsic::round: Opcode = ISD::FROUND; break;
- case Intrinsic::roundeven: Opcode = ISD::FROUNDEVEN; break;
- case Intrinsic::canonicalize: Opcode = ISD::FCANONICALIZE; break;
- }
- // clang-format on
-
- setValue(&I, DAG.getNode(Opcode, sdl,
+ setValue(&I, DAG.getNode(ISD::FABS, sdl,
getValue(I.getArgOperand(0)).getValueType(),
getValue(I.getArgOperand(0)), Flags));
return;
- }
- case Intrinsic::atan2:
- setValue(&I, DAG.getNode(ISD::FATAN2, sdl,
- getValue(I.getArgOperand(0)).getValueType(),
- getValue(I.getArgOperand(0)),
- getValue(I.getArgOperand(1)), Flags));
- return;
- case Intrinsic::lround:
- case Intrinsic::llround:
- case Intrinsic::lrint:
- case Intrinsic::llrint: {
- unsigned Opcode;
- // clang-format off
- switch (Intrinsic) {
- default: llvm_unreachable("Impossible intrinsic"); // Can't reach here.
- case Intrinsic::lround: Opcode = ISD::LROUND; break;
- case Intrinsic::llround: Opcode = ISD::LLROUND; break;
- case Intrinsic::lrint: Opcode = ISD::LRINT; break;
- case Intrinsic::llrint: Opcode = ISD::LLRINT; break;
- }
- // clang-format on
-
- EVT RetVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
- setValue(&I, DAG.getNode(Opcode, sdl, RetVT,
- getValue(I.getArgOperand(0))));
- return;
- }
- case Intrinsic::minnum:
- setValue(&I, DAG.getNode(ISD::FMINNUM, sdl,
- getValue(I.getArgOperand(0)).getValueType(),
- getValue(I.getArgOperand(0)),
- getValue(I.getArgOperand(1)), Flags));
- return;
- case Intrinsic::maxnum:
- setValue(&I, DAG.getNode(ISD::FMAXNUM, sdl,
- getValue(I.getArgOperand(0)).getValueType(),
- getValue(I.getArgOperand(0)),
- getValue(I.getArgOperand(1)), Flags));
- return;
- case Intrinsic::minimum:
- setValue(&I, DAG.getNode(ISD::FMINIMUM, sdl,
+ case Intrinsic::canonicalize:
+ setValue(&I, DAG.getNode(ISD::FCANONICALIZE, sdl,
getValue(I.getArgOperand(0)).getValueType(),
- getValue(I.getArgOperand(0)),
- getValue(I.getArgOperand(1)), Flags));
- return;
- case Intrinsic::maximum:
- setValue(&I, DAG.getNode(ISD::FMAXIMUM, sdl,
- getValue(I.getArgOperand(0)).getValueType(),
- getValue(I.getArgOperand(0)),
- getValue(I.getArgOperand(1)), Flags));
+ getValue(I.getArgOperand(0)), Flags));
return;
case Intrinsic::minimumnum:
setValue(&I, DAG.getNode(ISD::FMINIMUMNUM, sdl,
@@ -7051,12 +6964,6 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
getValue(I.getArgOperand(0)),
getValue(I.getArgOperand(1)), Flags));
return;
- case Intrinsic::ldexp:
- setValue(&I, DAG.getNode(ISD::FLDEXP, sdl,
- getValue(I.getArgOperand(0)).getValueType(),
- getValue(I.getArgOperand(0)),
- getValue(I.getArgOperand(1)), Flags));
- return;
case Intrinsic::modf:
case Intrinsic::sincos:
case Intrinsic::sincospi:
@@ -7091,12 +6998,6 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
getValue(I.getArgOperand(0)), Flags));
return;
}
- case Intrinsic::fma:
- setValue(&I, DAG.getNode(
- ISD::FMA, sdl, getValue(I.getArgOperand(0)).getValueType(),
- getValue(I.getArgOperand(0)), getValue(I.getArgOperand(1)),
- getValue(I.getArgOperand(2)), Flags));
- return;
#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC) \
case Intrinsic::INTRINSIC:
#include "llvm/IR/ConstrainedOps.def"
@@ -7106,6 +7007,12 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
#include "llvm/IR/VPIntrinsics.def"
visitVectorPredicationIntrinsic(cast<VPIntrinsic>(I));
return;
+#define LEGACY_EXP(NAME, R, DAGN)
+#define FUNCTION(NAME, R, DAGN) \
+ case Intrinsic::NAME: \
+ visitFPOperation(I, ISD::DAGN); \
+ return;
+#include "llvm/IR/FloatingPointOps.def"
case Intrinsic::fptrunc_round: {
// Get the last argument, the metadata and convert it to an integer in the
// call
@@ -9654,6 +9561,59 @@ bool SelectionDAGBuilder::visitBinaryFloatCall(const CallInst &I,
return true;
}
+bool SelectionDAGBuilder::visitFPOperation(const CallInst &I, unsigned Opcode) {
+ MemoryEffects ME = I.getMemoryEffects();
+
+ SmallVector<SDValue, 4> Operands;
+ bool HasChain = ME.doesAccessInaccessibleMem();
+ if (HasChain)
+ Operands.push_back(getRoot());
+ for (auto &Arg : I.args())
+ Operands.push_back(getValue(Arg));
+
+ const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+ EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType(), true);
+ SDVTList NodeVT;
+ if (HasChain)
+ NodeVT = DAG.getVTList(VT, MVT::Other);
+ else
+ NodeVT = DAG.getVTList(VT);
+
+ SDNodeFlags Flags;
+ if (auto *FPOp = dyn_cast<FPMathOperator>(&I))
+ Flags.copyFMF(*FPOp);
+ fp::ExceptionBehavior EB;
+ if (DAG.getMachineFunction().getFunction().getAttributes().hasFnAttr(
+ llvm::Attribute::StrictFP)) {
+ EB = fp::ebStrict;
+ Flags.setNoFPExcept(true);
+ }
+ else {
+ EB = fp::ebIgnore;
+ }
+
+ // Temporary solution: use STRICT_* nodes.
+ if (HasChain)
+ switch (Opcode) {
+ default:
+ break;
+#define LEGACY_DAG(NAME, RM, DAGN) \
+ case ISD::DAGN: \
+ Opcode = ISD::STRICT_##DAGN; \
+ break;
+#include "llvm/IR/FloatingPointOps.def"
+ }
+
+ SDLoc sdl = getCurSDLoc();
+ SDValue Result = DAG.getNode(Opcode, sdl, NodeVT, Operands, Flags);
+ if (HasChain)
+ pushFPOpOutChain(Result, EB);
+
+ SDValue FPResult = Result.getValue(0);
+ setValue(&I, FPResult);
+ return true;
+}
+
void SelectionDAGBuilder::visitCall(const CallInst &I) {
// Handle inline assembly differently.
if (I.isInlineAsm()) {
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
index bab0509dd138f..e68f4c471fc88 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
@@ -638,6 +638,7 @@ class SelectionDAGBuilder {
bool visitStrstrCall(const CallInst &I);
bool visitUnaryFloatCall(const CallInst &I, unsigned Opcode);
bool visitBinaryFloatCall(const CallInst &I, unsigned Opcode);
+ bool visitFPOperation(const CallInst &I, unsigned Opcode);
void visitAtomicLoad(const LoadInst &I);
void visitAtomicStore(const StoreInst &I);
void visitLoadFromSwiftError(const LoadInst &I);
diff --git a/llvm/lib/IR/Function.cpp b/llvm/lib/IR/Function.cpp
index a6568bb50f0c8..bdddc070e37d1 100644
--- a/llvm/lib/IR/Function.cpp
+++ b/llvm/lib/IR/Function.cpp
@@ -555,6 +555,10 @@ bool Function::isConstrainedFPIntrinsic() const {
return Intrinsic::isConstrainedFPIntrinsic(getIntrinsicID());
}
+bool Function::isFPOperation() const {
+ return Intrinsic::isFPOperation(getIntrinsicID());
+}
+
void Function::clearArguments() {
for (Argument &A : makeArgArray(Arguments, NumArgs)) {
A.setName("");
diff --git a/llvm/lib/IR/IRBuilder.cpp b/llvm/lib/IR/IRBuilder.cpp
index 4c6f2326fe149..0fca1251cbf4d 100644
--- a/llvm/lib/IR/IRBuilder.cpp
+++ b/llvm/lib/IR/IRBuilder.cpp
@@ -186,6 +186,38 @@ IRBuilderBase::createCallHelper(Function *Callee, ArrayRef<Value *> Ops,
return CI;
}
+CallInst *IRBuilderBase::CreateCall(FunctionType *FTy, Value *Callee,
+ ArrayRef<Value *> Args,
+ ArrayRef<OperandBundleDef> OpBundles,
+ const Twine &Name, MDNode *FPMathTag) {
+ bool NeedUpdateMemoryEffects = false;
+ if (const auto *Func = dyn_cast<Function>(Callee))
+ if (Intrinsic::ID ID = Func->getIntrinsicID())
+ if (Intrinsic::isFPOperation(ID)) {
+ if (IsFPConstrained) {
+ // Due to potential setting FP exception bits, in modes other than
+ // the default, the memory effects must include read/write access
+ // to FPE.
+ MemoryEffects FME = Func->getMemoryEffects();
+ NeedUpdateMemoryEffects = !FME.doesAccessInaccessibleMem();
+ }
+ }
+
+ // If the call accesses FPE, update memory effects accordingly.
+ CallInst *CI = CallInst::Create(FTy, Callee, Args, OpBundles);
+ if (NeedUpdateMemoryEffects) {
+ MemoryEffects ME = MemoryEffects::inaccessibleMemOnly();
+ auto A = Attribute::getWithMemoryEffects(getContext(), ME);
+ CI->addFnAttr(A);
+ }
+
+ if (IsFPConstrained)
+ setConstrainedFPCallAttr(CI);
+ if (isa<FPMathOperator>(CI))
+ setFPAttrs(CI, FPMathTag, FMF);
+ return Insert(CI, Name);
+}
+
static Value *CreateVScaleMultiple(IRBuilderBase &B, Type *Ty, uint64_t Scale) {
Value *VScale = B.CreateVScale(Ty);
if (Scale == 1)
diff --git a/llvm/lib/IR/Instructions.cpp b/llvm/lib/IR/Instructions.cpp
index 80776857fa3d9..07c0c49449edd 100644
--- a/llvm/lib/IR/Instructions.cpp
+++ b/llvm/lib/IR/Instructions.cpp
@@ -629,10 +629,28 @@ bool CallBase::hasClobberingOperandBundles() const {
getIntrinsicID() != Intrinsic::assume;
}
+std::optional<MemoryEffects> CallBase::getFloatingPointMemoryEffects() const {
+ if (Intrinsic::ID IntrID = getIntrinsicID())
+ if (const BasicBlock *BB = getParent())
+ if (const Function *F = BB->getParent())
+ if (Intrinsic::isFPOperation(IntrID)) {
+ if (F->hasFnAttribute(Attribute::StrictFP))
+ // Floating-point operations in strictfp function always have side
+ // effect at least because they can raise exceptions.
+ return MemoryEffects::inaccessibleMemOnly();
+ return MemoryEffects::none();
+ }
+ return std::nullopt;
+}
+
MemoryEffects CallBase::getMemoryEffects() const {
MemoryEffects ME = getAttributes().getMemoryEffects();
if (auto *Fn = dyn_cast<Function>(getCalledOperand())) {
MemoryEffects FnME = Fn->getMemoryEffects();
+ if (auto FPME = getFloatingPointMemoryEffects())
+ FnME =
+ FnME.getWithModRef(IRMemLocation::InaccessibleMem,
+ FPME->getModRef(IRMemLocation::InaccessibleMem));
if (hasOperandBundles()) {
// TODO: Add a method to get memory effects for operand bundles instead.
if (hasReadingOperandBundles())
@@ -648,6 +666,7 @@ MemoryEffects CallBase::getMemoryEffects() const {
}
return ME;
}
+
void CallBase::setMemoryEffects(MemoryEffects ME) {
addFnAttr(Attribute::getWithMemoryEffects(getContext(), ME));
}
diff --git a/llvm/lib/IR/Intrinsics.cpp b/llvm/lib/IR/Intrinsics.cpp
index f2c6921bbb7e0..04f3df8cab88a 100644
--- a/llvm/lib/IR/Intrinsics.cpp
+++ b/llvm/lib/IR/Intrinsics.cpp
@@ -837,6 +837,17 @@ bool Intrinsic::hasConstrainedFPRoundingModeOperand(Intrinsic::ID QID) {
}
}
+bool Intrinsic::isFPOperation(ID IID) {
+ switch (IID) {
+#define FUNCTION(NAME, ROUND_MODE, DAGN) case Intrinsic::NAME:
+#include "llvm/IR/FloatingPointOps.def"
+#undef INSTRUCTION
+ return true;
+ default:
+ return false;
+ }
+}
+
using DeferredIntrinsicMatchPair =
std::pair<Type *, ArrayRef<Intrinsic::IITDescriptor>>;
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow.ll
index b2d5bb2faeca7..77630846f49db 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow.ll
@@ -1208,19 +1208,19 @@ define float @test_pow_afn_f32_strictfp(float %x, float %y) #2 {
; NOPRELINK-NEXT: [[TMP3:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP2]], float 0.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
; NOPRELINK-NEXT: [[TMP4:%.*]] = select nnan nsz afn i1 [[TMP3]], float 1.000000e+00, float [[X]]
; NOPRELINK-NEXT: [[TMP5:%.*]] = call nnan nsz afn float @llvm.fabs.f32(float [[TMP4]]) #[[ATTR3]]
-; NOPRELINK-NEXT: [[TMP6:%.*]] = call nnan nsz afn float @llvm.log2.f32(float [[TMP5]]) #[[ATTR3]]
+; NOPRELINK-NEXT: [[TMP6:%.*]] = call nnan nsz afn float @llvm.log2.f32(float [[TMP5]]) #[[ATTR5:[0-9]+]]
; NOPRELINK-NEXT: [[TMP7:%.*]] = call nnan nsz afn float @llvm.experimental.constrained.fmul.f32(float [[TMP2]], float [[TMP6]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-; NOPRELINK-NEXT: [[TMP8:%.*]] = call nnan nsz afn float @llvm.exp2.f32(float [[TMP7]]) #[[ATTR3]]
-; NOPRELINK-NEXT: [[TMP9:%.*]] = call nnan nsz afn float @llvm.trunc.f32(float [[TMP2]]) #[[ATTR3]]
+; NOPRELINK-NEXT: [[TMP8:%.*]] = call nnan nsz afn float @llvm.exp2.f32(float [[TMP7]]) #[[ATTR5]]
+; NOPRELINK-NEXT: [[TMP9:%.*]] = call nnan nsz afn float @llvm.trunc.f32(float [[TMP2]]) #[[ATTR5]]
; NOPRELINK-NEXT: [[TMP10:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP9]], float [[TMP2]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
; NOPRELINK-NEXT: [[TMP11:%.*]] = call nnan nsz afn float @llvm.experimental.constrained.fmul.f32(float [[TMP2]], float 5.000000e-01, metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-; NOPRELINK-NEXT: [[TMP12:%.*]] = call nnan nsz afn float @llvm.trunc.f32(float [[TMP11]]) #[[ATTR3]]
+; NOPRELINK-NEXT: [[TMP12:%.*]] = call nnan nsz afn float @llvm.trunc.f32(float [[TMP11]]) #[[ATTR5]]
; NOPRELINK-NEXT: [[TMP13:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP12]], float [[TMP11]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
; NOPRELINK-NEXT: [[TMP14:%.*]] = xor i1 [[TMP13]], true
; NOPRELINK-NEXT: [[TMP15:%.*]] = and i1 [[TMP10]], [[TMP14]]
; NOPRELINK-NEXT: [[TMP16:%.*]] = select nnan nsz afn i1 [[TMP15]], float [[TMP4]], float 1.000000e+00
; NOPRELINK-NEXT: [[TMP17:%.*]] = call nnan nsz afn float @llvm.copysign.f32(float [[TMP8]], float [[TMP16]]) #[[ATTR3]]
-; NOPRELINK-NEXT: [[TMP18:%.*]] = call nnan nsz afn float @llvm.trunc.f32(float [[TMP2]]) #[[ATTR3]]
+; NOPRELINK-NEXT: [[TMP18:%.*]] = call nnan nsz afn float @llvm.trunc.f32(float [[TMP2]]) #[[ATTR5]]
; NOPRELINK-NEXT: [[TMP19:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP18]], float [[TMP2]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
; NOPRELINK-NEXT: [[TMP20:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP4]], float 0.000000e+00, metadata !"olt", metadata !"fpexcept.strict") #[[ATTR3]]
; NOPRELINK-NEXT: [[TMP21:%.*]] = xor i1 [[TMP19]], true
@@ -1256,15 +1256,10 @@ define float @test_pow_afn_f32_strictfp(float %x, float %y) #2 {
}
define float @test_pow_fast_f32_nobuiltin(float %x, float %y) {
-; PRELINK-LABEL: define float @test_pow_fast_f32_nobuiltin
-; PRELINK-SAME: (float [[X:%.*]], float [[Y:%.*]]) {
-; PRELINK-NEXT: [[POW:%.*]] = tail call fast float @_Z3powff(float [[X]], float [[Y]]) #[[ATTR6:[0-9]+]]
-; PRELINK-NEXT: ret float [[POW]]
-;
-; NOPRELINK-LABEL: define float @test_pow_fast_f32_nobuiltin
-; NOPRELINK-SAME: (float [[X:%.*]], float [[Y:%.*]]) {
-; NOPRELINK-NEXT: [[POW:%.*]] = tail call fast float @_Z3powff(float [[X]], float [[Y]]) #[[ATTR5:[0-9]+]]
-; NOPRELINK-NEXT: ret float [[POW]]
+; CHECK-LABEL: define float @test_pow_fast_f32_nobuiltin
+; CHECK-SAME: (float [[X:%.*]], float [[Y:%.*]]) {
+; CHECK-NEXT: [[POW:%.*]] = tail call fast float @_Z3powff(float [[X]], float [[Y]]) #[[ATTR6:[0-9]+]]
+; CHECK-NEXT: ret float [[POW]]
;
%pow = tail call fast float @_Z3powff(float %x, float %y) #3
ret float %pow
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pown.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pown.ll
index b4182ccbf77a4..935ebc6dc41bf 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pown.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pown.ll
@@ -883,10 +883,10 @@ define float @test_pown_fast_f32_strictfp(float %x, i32 %y) #1 {
; CHECK-SAME: (float [[X:%.*]], i32 [[Y:%.*]]) #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[__FABS:%.*]] = call fast float @llvm.fabs.f32(float [[X]]) #[[ATTR0]]
-; CHECK-NEXT: [[__LOG2:%.*]] = call fast float @llvm.log2.f32(float [[__FABS]]) #[[ATTR0]]
+; CHECK-NEXT: [[__LOG2:%.*]] = call fast float @llvm.log2.f32(float [[__FABS]]) #[[ATTR5:[0-9]+]]
; CHECK-NEXT: [[POWNI2F:%.*]] = call fast float @llvm.experimental.constrained.sitofp.f32.i32(i32 [[Y]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
; CHECK-NEXT: [[__YLOGX:%.*]] = call fast float @llvm.experimental.constrained.fmul.f32(float [[POWNI2F]], float [[__LOG2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT: [[__EXP2:%.*]] = call fast nofpclass(nan ninf nzero nsub nnorm) float @llvm.exp2.f32(float [[__YLOGX]]) #[[ATTR0]]
+; CHECK-NEXT: [[__EXP2:%.*]] = call fast nofpclass(nan ninf nzero nsub nnorm) float @llvm.exp2.f32(float [[__YLOGX]]) #[[ATTR5]]
; CHECK-NEXT: [[__YEVEN:%.*]] = shl i32 [[Y]], 31
; CHECK-NEXT: [[TMP0:%.*]] = bitcast float [[X]] to i32
; CHECK-NEXT: [[__POW_SIGN:%.*]] = and i32 [[__YEVEN]], [[TMP0]]
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-rootn.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-rootn.ll
index 337ccb4a2d0e9..bb697e29ae303 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-rootn.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-rootn.ll
@@ -1282,9 +1282,9 @@ define float @test_rootn_fast_f32_strictfp(float %x, i32 %y) #1 {
; NOPRELINK-NEXT: [[TMP0:%.*]] = call fast float @llvm.experimental.constrained.sitofp.f32.i32(i32 [[Y]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
; NOPRELINK-NEXT: [[TMP1:%.*]] = call fast float @llvm.experimental.constrained.fdiv.f32(float 1.000000e+00, float [[TMP0]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
; NOPRELINK-NEXT: [[TMP2:%.*]] = call fast float @llvm.fabs.f32(float [[X]]) #[[ATTR0]]
-; NOPRELINK-NEXT: [[TMP3:%.*]] = call fast float @llvm.log2.f32(float [[TMP2]]) #[[ATTR0]]
+; NOPRELINK-NEXT: [[TMP3:%.*]] = call fast float @llvm.log2.f32(float [[TMP2]]) #[[ATTR5:[0-9]+]]
; NOPRELINK-NEXT: [[TMP4:%.*]] = call fast float @llvm.experimental.constrained.fmul.f32(float [[TMP1]], float [[TMP3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
-; NOPRELINK-NEXT: [[TMP5:%.*]] = call fast float @llvm.exp2.f32(float [[TMP4]]) #[[ATTR0]]
+; NOPRELINK-NEXT: [[TMP5:%.*]] = call fast float @llvm.exp2.f32(float [[TMP4]]) #[[ATTR5]]
; NOPRELINK-NEXT: [[TMP6:%.*]] = and i32 [[Y]], 1
; NOPRELINK-NEXT: [[DOTNOT:%.*]] = icmp eq i32 [[TMP6]], 0
; NOPRELINK-NEXT: [[TMP7:%.*]] = select fast i1 [[DOTNOT]], float 1.000000e+00, float [[X]]
@@ -1965,6 +1965,7 @@ attributes #2 = { noinline }
; NOPRELINK: attributes #[[ATTR2:[0-9]+]] = { nocallback nofree nosync nounwind strictfp willreturn memory(inaccessiblemem: readwrite) }
; NOPRELINK: attributes #[[ATTR3]] = { noinline }
; NOPRELINK: attributes #[[ATTR4]] = { nobuiltin }
+; NOPRELINK: attributes #[[ATTR5]] = { strictfp memory(inaccessiblemem: readwrite) }
;.
; PRELINK: [[META0:![0-9]+]] = !{i32 1, !"amdgpu-libcall-have-fast-pow", i32 1}
; PRELINK: [[META1]] = !{float 2.000000e+00}
More information about the cfe-commits
mailing list