[llvm] [docs] Add guide for Undefined Behavior (PR #119220)

Nuno Lopes via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 11 04:19:24 PST 2024


https://github.com/nunoplopes updated https://github.com/llvm/llvm-project/pull/119220

>From 496573e329b901f894e0862c66e6bd4551dcd938 Mon Sep 17 00:00:00 2001
From: Nuno Lopes <nuno.lopes at tecnico.ulisboa.pt>
Date: Mon, 9 Dec 2024 15:11:12 +0000
Subject: [PATCH 01/12] [docs] Add guide for Undefined Behavior

---
 llvm/docs/Reference.rst         |   3 +
 llvm/docs/UndefinedBehavior.rst | 368 ++++++++++++++++++++++++++++++++
 2 files changed, 371 insertions(+)
 create mode 100644 llvm/docs/UndefinedBehavior.rst

diff --git a/llvm/docs/Reference.rst b/llvm/docs/Reference.rst
index df61628b06c7db..e149b2b767c0db 100644
--- a/llvm/docs/Reference.rst
+++ b/llvm/docs/Reference.rst
@@ -120,6 +120,9 @@ LLVM IR
   Defines the LLVM intermediate representation and the assembly form of the
   different nodes.
 
+:doc:`Undefined Behavior (UB) <UndefinedBehavior>`
+  A guide on what UB/undef/poison are and when to use each one.
+
 :doc:`InAlloca`
   Description of the ``inalloca`` argument attribute.
 
diff --git a/llvm/docs/UndefinedBehavior.rst b/llvm/docs/UndefinedBehavior.rst
new file mode 100644
index 00000000000000..d0fcd1eec4f191
--- /dev/null
+++ b/llvm/docs/UndefinedBehavior.rst
@@ -0,0 +1,368 @@
+======================================
+LLVM IR Undefined Behavior (UB) Manual
+======================================
+
+.. contents::
+   :local:
+   :depth: 2
+
+Abstract
+========
+This document describes the undefined behavior (UB) in LLVM's IR, including
+undef and poison values, as well as the ``freeze`` instruction.
+We also provide guidelines on when to use each form of UB.
+
+
+Introduction
+============
+Undefined behavior is used to specify the behavior of corner cases for which we
+don't wish to specify the concrete results.
+For example, we could specify the result of division by zero as zero, but
+since we are not really interested in the result, we say it is UB.
+
+There are two forms of UB in LLVM: immediate UB and deferred UB (undef and
+poison values).
+The lattice of values in LLVM is:
+immediate UB > poison > undef > freeze > concrete value.
+
+
+Immediate UB
+============
+Immediate UB is the most severe form of UB. It should be avoided whenever
+possible.
+Immediate UB should be used only for operations that trap in most CPUs supported
+by LLVM.
+Examples include division by zero, dereferencing a null pointer, etc.
+
+The reason that immediate UB should be avoided is that it makes optimizations
+such as hoisting a lot harder.
+Consider the following example:
+
+.. code-block:: llvm
+
+    define i32 @f(i1 %c, i32 %v) {
+      br i1 %c, label %then, label %else
+
+    then:
+      %div = udiv i32 3, %v
+      br label %ret
+
+    else:
+      br label %ret
+
+    ret:
+      %r = phi i32 [ %div, %then ], [ 0, %else ]
+      ret i32 %r
+    }
+
+We might be tempted to simplify this function by removing the branching and
+executing the division speculatively because ``%c`` is true most of times.
+We would obtain the following IR:
+
+.. code-block:: llvm
+
+    define i32 @f(i1 %c, i32 %v) {
+      %div = udiv i32 3, %v
+      %r = select i1 %c, i32 %div, i32 0
+      ret i32 %r
+    }
+
+However, this transformation is not correct! Since division triggers UB
+when the divisor is zero, we can only execute speculatively if we are sure we
+don't hit that condition.
+For the function above, when called like ``f(false, 0)``, before the optimization
+it would return 0, and after the optimization it now triggers UB.
+
+This example highlights why we minimize the cases that trigger immediate UB
+as much as possible.
+As a rule of thumb, use immediate UB only for the cases that trap the CPU for
+most of the supported architectures.
+
+
+Deferred UB
+===========
+Deferred UB is a lighter form of UB. It enables instructions to be executed
+speculatively while marking some corner cases having erroneous values.
+Deferred UB should be used for cases where the semantics offered by common
+CPUs differs,but the CPU does not trap.
+
+As an example, consider the shift instructions. The x86 and ARM architectures
+offer different semantics when the shift amount is equal to or greater than
+the bitwidth.
+We could solve this tension in one of two ways: 1) pick one of the x86/ARM
+semantics for LLVM, which would make the code emitted for the other architecture
+slower; 2) define that case as yielding ``poison``.
+LLVM chose the latter option. For frontends for languages like C or C++
+(e.g., clang), they can map shifts in the source program directly to a shift in
+LLVM IR, since the semantics of C and C++ define such shifts as UB.
+For languages that offer strong semantics, they must use the value of the shift
+conditionally, e.g.:
+
+.. code-block:: llvm
+
+    define i32 @x86_shift(i32 %a, i32 %b) {
+      %mask = and i32 %b, 31
+      %shift = shl i32 %a, %mask
+      ret i32 %shift
+    }
+
+
+There are two deferred UB values in LLVM: ``undef`` and ``poison``, which we
+describe next.
+
+
+Undef Values
+------------
+.. warning::
+   Undef values are deprecated and should be used only when strictly necessary.
+   No new uses should be added unless justified.
+
+An undef value represents any value of a given type. Moreover, each use of
+an instruction that depends on undef can observe a different value.
+For example:
+
+.. code-block:: llvm
+
+    define i32 @fn() {
+      %add = add i32 undef, 0
+      %ret = add i32 %add, %add
+      ret i32 %ret
+    }
+
+Unsurprisingly, the first addition yields ``undef``.
+However, the result of the second addition is more subtle. We might be tempted
+to think that it yields an even number. But it might not be!
+Since each (transitive) use of ``undef`` can observe a different value,
+the second addition is equivalent to ``add i32 undef, undef``, which is
+equivalent to ``undef``.
+Hence, the function above is equivalent to:
+
+.. code-block:: llvm
+
+    define i32 @fn() {
+      ret i32 undef
+    }
+
+Each call to this function may observe a different value, namely any 32-bit
+number (even and odd).
+
+Because each use of undef can observe a different value, some optimizations
+are wrong if we are not sure a value is not undef.
+Consider a function that multiplies a number by 2:
+
+.. code-block:: llvm
+
+    define i32 @fn(i32 %v) {
+      %mul2 = mul i32 %v, 2
+      ret i32 %mul2
+    }
+
+This function is guaranteed to return an even number, even if ``%v`` is
+undef.
+However, as we've seen above, the following function does not:
+
+.. code-block:: llvm
+
+    define i32 @fn(i32 %v) {
+      %mul2 = add i32 %v, %v
+      ret i32 %mul2
+    }
+
+This optimization is wrong just because undef values exist, even if they are
+not used in this part of the program as LLVM has no way to tell if ``%v`` is
+undef or not.
+
+.. note::
+   Uses of undef values should be restricted to representing loads of
+   uninitialized memory. This is the only part of the IR semantics that cannot
+   be replaced with alternatives yet (work in ongoing).
+
+Looking at the value lattice, ``undef`` values can only be replaced with either
+a ``freeze`` instruction or a concrete value.
+A consequence is that giving undef as an operand to an instruction that triggers
+UB for some values of that operand makes the program UB. For example,
+``udiv %x, undef`` is UB since we replace undef with 0 (``udiv %x, 0``),
+becoming obvious that it is UB.
+
+
+Poison Values
+-------------
+Poison values are a stronger from of deferred UB than undef. They still
+allow instructions to be executed speculatively, but they taint the whole
+expression DAG (with some exceptions), akin to floating point NaN values.
+
+Example:
+
+.. code-block:: llvm
+
+    define i32 @fn(i32 %a, i32 %b, i32 %c) {
+      %add = add nsw i32 %a, %b
+      %ret = add nsw i32 %add, %c
+      ret i32 %ret
+    }
+
+The ``nsw`` attribute in the additions indicates that the operation yields
+poison if there is a signed overflow.
+If the first addition overflows, ``%add`` is poison and thus ``%ret`` is also
+poison since it taints the whole expression DAG.
+
+Poison values can be replaced with any value of type (undef, concrete values,
+or a ``freeze`` instruction).
+
+
+The Freeze Instruction
+======================
+Both undef and poison values sometimes propagate too much down an expression
+DAG. Undef values because each transitive use can observe a different value,
+and poison values because they make the whole DAG poison.
+There are some cases where it is important to stop such propagation.
+This is where the ``freeze`` instruction comes in.
+
+Take the following example function:
+
+.. code-block:: llvm
+
+    define i32 @fn(i32 %n, i1 %c) {
+    entry:
+      br label %loop
+
+   loop:
+      %i = phi i32 [ 0, %entry ], [ %i2, %loop.end ]
+      %cond = icmp ule i32 %i, %n
+      br i1 %cond, label %loop.cont, label %exit
+
+   loop.cont:
+      br i1 %c, label %then, label %else
+
+    then:
+      ...
+      br label %loop.end
+
+    else:
+      ...
+      br label %loop.end
+
+    loop.end:
+      %i2 = add i32 %i, 1
+      br label %loop
+
+    exit:
+      ...
+    }
+
+Imagine we want to perform loop unswitching on the loop above since the branch
+condition inside the loop is loop invariant.
+We would obtain the following IR:
+
+.. code-block:: llvm
+
+    define i32 @fn(i32 %n, i1 %c) {
+    entry:
+      br i1 %c, label %then, label %else
+
+   then:
+      %i = phi i32 [ 0, %entry ], [ %i2, %then.cont ]
+      %cond = icmp ule i32 %i, %n
+      br i1 %cond, label %then.cont, label %exit
+
+   then.cont:
+      ...
+      %i2 = add i32 %i, 1
+      br label %then
+
+   else:
+      %i3 = phi i32 [ 0, %entry ], [ %i4, %else.cont ]
+      %cond = icmp ule i32 %i3, %n
+      br i1 %cond, label %else.cont, label %exit
+
+   else.cont:
+      ...
+      %i4 = add i32 %i3, 1
+      br label %else
+
+    exit:
+      ...
+    }
+
+There is a subtle catch: when the function is called with ``%n`` being zero,
+the original function did not branch on ``%c``, while the optimized one does.
+Branching on a deferred UB value is immediate UB, hence the transformation is
+wrong in general because ``%c`` may be undef or poison.
+
+Cases like this need a way to tame deferred UB values. This is exactly what the
+``freeze`` instruction is for!
+When given a concrete value as argument, ``freeze`` is a no-op, returning the
+argument as-is. When given an undef or poison value, ``freeze`` returns a
+non-deterministic value of the type.
+This is not the same as undef: the value returned by ``freeze`` is the same
+for all users.
+
+Branching on a value returned by ``freeze`` is always safe since it either
+evaluates to true or false consistently.
+We can make the loop unswitching optimization above correct as follows:
+
+.. code-block:: llvm
+
+    define i32 @fn(i32 %n, i1 %c) {
+    entry:
+      %c2 = freeze i1 %c
+      br i1 %c2, label %then, label %else
+
+
+Writing Tests
+=============
+
+Avoiding UB
+-----------
+When writing tests, it is important to ensure that they don't trigger UB
+unnecessarily. Some automated test reduces sometimes use undef or poison
+values as dummy values, but this is considered a bad practice if this leads
+to triggering UB.
+
+For example, imagine that we want to write a test and we don't care about the
+particular divisor value because our optimization kicks in regardless:
+
+.. code-block:: llvm
+
+    define i32 @fn(i8 %a) {
+      %div = udiv i8 %a, poison
+      ...
+   }
+
+The issue with this test is that it triggers immediate UB. This prevents
+verification tools like Alive from validating the correctness of the
+optimization. Hence, it is considered a bad practice to have tests with
+unnecessary immediate UB (unless that is exactly what the test is for).
+The test above should use a dummy function argument instead of using poison:
+
+.. code-block:: llvm
+
+    define i32 @fn(i8 %a, i8 %dummy) {
+      %div = udiv i8 %a, %dummy
+      ...
+   }
+
+Common sources of immediate UB in tests include branching on undef/poison
+conditions and dereferencing undef/poison/null pointers.
+
+.. note::
+   If you need a placeholder value to pass as an argument to an instruction
+   that may trigger UB, add a new argument to the function rather than using
+   undef or poison.
+
+
+Reducing bitwidth
+-----------------
+To speed up automated verification of tests (e.g., using Alive), it is
+recommended that tests use low bitwidth formats and small vector sizes.
+For example, if we write a test to check that a multiplication by two is
+replaced by a shift left, we can do so using 8-bit integers instead of the
+usual 32-bit integers:
+
+.. code-block:: llvm
+
+    define i8 @fn(i8 %val) {
+      ; CHECK: %mul2 = shl %val, 1
+      %mul2 = mul i8 %val, 2
+      ret i8 %mul2
+   }

>From 62fcd7b0875eb5d087c196b7c8a8884b1bdae211 Mon Sep 17 00:00:00 2001
From: Nuno Lopes <nuno.lopes at tecnico.ulisboa.pt>
Date: Mon, 9 Dec 2024 15:26:37 +0000
Subject: [PATCH 02/12] fix build

---
 llvm/docs/Reference.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/docs/Reference.rst b/llvm/docs/Reference.rst
index e149b2b767c0db..2cae9186d7f9bd 100644
--- a/llvm/docs/Reference.rst
+++ b/llvm/docs/Reference.rst
@@ -52,6 +52,7 @@ LLVM and API reference documentation.
    TestingGuide
    TransformMetadata
    TypeMetadata
+   UndefinedBehavior
    XRay
    XRayExample
    XRayFDRFormat

>From 2f87e4a8b9be2e83f0761414898c9b1c47568cad Mon Sep 17 00:00:00 2001
From: Nuno Lopes <nuno.lopes at tecnico.ulisboa.pt>
Date: Mon, 9 Dec 2024 15:36:19 +0000
Subject: [PATCH 03/12] Update llvm/docs/UndefinedBehavior.rst

Co-authored-by: Nikita Popov <github at npopov.com>
---
 llvm/docs/UndefinedBehavior.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/docs/UndefinedBehavior.rst b/llvm/docs/UndefinedBehavior.rst
index d0fcd1eec4f191..2c55a021681021 100644
--- a/llvm/docs/UndefinedBehavior.rst
+++ b/llvm/docs/UndefinedBehavior.rst
@@ -84,7 +84,7 @@ Deferred UB
 Deferred UB is a lighter form of UB. It enables instructions to be executed
 speculatively while marking some corner cases having erroneous values.
 Deferred UB should be used for cases where the semantics offered by common
-CPUs differs,but the CPU does not trap.
+CPUs differ, but the CPU does not trap.
 
 As an example, consider the shift instructions. The x86 and ARM architectures
 offer different semantics when the shift amount is equal to or greater than

>From aa2e6e749683cbb365b195befe12f4ee605c9ffc Mon Sep 17 00:00:00 2001
From: Nuno Lopes <nuno.lopes at tecnico.ulisboa.pt>
Date: Mon, 9 Dec 2024 15:36:30 +0000
Subject: [PATCH 04/12] Update llvm/docs/UndefinedBehavior.rst

Co-authored-by: Nikita Popov <github at npopov.com>
---
 llvm/docs/UndefinedBehavior.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/docs/UndefinedBehavior.rst b/llvm/docs/UndefinedBehavior.rst
index 2c55a021681021..1bbbcc2668d210 100644
--- a/llvm/docs/UndefinedBehavior.rst
+++ b/llvm/docs/UndefinedBehavior.rst
@@ -82,7 +82,7 @@ most of the supported architectures.
 Deferred UB
 ===========
 Deferred UB is a lighter form of UB. It enables instructions to be executed
-speculatively while marking some corner cases having erroneous values.
+speculatively while marking some corner cases as having erroneous values.
 Deferred UB should be used for cases where the semantics offered by common
 CPUs differ, but the CPU does not trap.
 

>From be9516d8e50cb9363f1deec17cf4b62e81467cd8 Mon Sep 17 00:00:00 2001
From: Nuno Lopes <nuno.lopes at tecnico.ulisboa.pt>
Date: Mon, 9 Dec 2024 15:41:11 +0000
Subject: [PATCH 05/12] address review comments

---
 llvm/docs/UndefinedBehavior.rst | 27 +++++----------------------
 1 file changed, 5 insertions(+), 22 deletions(-)

diff --git a/llvm/docs/UndefinedBehavior.rst b/llvm/docs/UndefinedBehavior.rst
index 1bbbcc2668d210..b4551b09d82f98 100644
--- a/llvm/docs/UndefinedBehavior.rst
+++ b/llvm/docs/UndefinedBehavior.rst
@@ -16,7 +16,9 @@ We also provide guidelines on when to use each form of UB.
 Introduction
 ============
 Undefined behavior is used to specify the behavior of corner cases for which we
-don't wish to specify the concrete results.
+don't wish to specify the concrete results. UB is also used to provide
+additional constraints to the optimizers (e.g., assumptions that the frontend
+guarantees through the language type system or the runtime).
 For example, we could specify the result of division by zero as zero, but
 since we are not really interested in the result, we say it is UB.
 
@@ -309,11 +311,9 @@ We can make the loop unswitching optimization above correct as follows:
       br i1 %c2, label %then, label %else
 
 
-Writing Tests
-=============
+Writing Tests that Avoid UB
+===========================
 
-Avoiding UB
------------
 When writing tests, it is important to ensure that they don't trigger UB
 unnecessarily. Some automated test reduces sometimes use undef or poison
 values as dummy values, but this is considered a bad practice if this leads
@@ -349,20 +349,3 @@ conditions and dereferencing undef/poison/null pointers.
    If you need a placeholder value to pass as an argument to an instruction
    that may trigger UB, add a new argument to the function rather than using
    undef or poison.
-
-
-Reducing bitwidth
------------------
-To speed up automated verification of tests (e.g., using Alive), it is
-recommended that tests use low bitwidth formats and small vector sizes.
-For example, if we write a test to check that a multiplication by two is
-replaced by a shift left, we can do so using 8-bit integers instead of the
-usual 32-bit integers:
-
-.. code-block:: llvm
-
-    define i8 @fn(i8 %val) {
-      ; CHECK: %mul2 = shl %val, 1
-      %mul2 = mul i8 %val, 2
-      ret i8 %mul2
-   }

>From 8c3dfa2ffa42714e809e1ccc992b8678be55bae2 Mon Sep 17 00:00:00 2001
From: Nuno Lopes <nuno.lopes at tecnico.ulisboa.pt>
Date: Tue, 10 Dec 2024 09:32:20 +0000
Subject: [PATCH 06/12] Update llvm/docs/UndefinedBehavior.rst

Co-authored-by: Antonio Frighetto <me at antoniofrighetto.com>
---
 llvm/docs/UndefinedBehavior.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/docs/UndefinedBehavior.rst b/llvm/docs/UndefinedBehavior.rst
index b4551b09d82f98..406cc0803a525a 100644
--- a/llvm/docs/UndefinedBehavior.rst
+++ b/llvm/docs/UndefinedBehavior.rst
@@ -189,7 +189,7 @@ becoming obvious that it is UB.
 
 Poison Values
 -------------
-Poison values are a stronger from of deferred UB than undef. They still
+Poison values are a stronger form of deferred UB than undef. They still
 allow instructions to be executed speculatively, but they taint the whole
 expression DAG (with some exceptions), akin to floating point NaN values.
 

>From 7a542cbdf3d11bb258094f897069d4388220fd9d Mon Sep 17 00:00:00 2001
From: Nuno Lopes <nuno.lopes at tecnico.ulisboa.pt>
Date: Tue, 10 Dec 2024 09:33:42 +0000
Subject: [PATCH 07/12] Update llvm/docs/UndefinedBehavior.rst

Co-authored-by: Antonio Frighetto <me at antoniofrighetto.com>
---
 llvm/docs/UndefinedBehavior.rst | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/llvm/docs/UndefinedBehavior.rst b/llvm/docs/UndefinedBehavior.rst
index 406cc0803a525a..c9776d0632a0f8 100644
--- a/llvm/docs/UndefinedBehavior.rst
+++ b/llvm/docs/UndefinedBehavior.rst
@@ -22,8 +22,7 @@ guarantees through the language type system or the runtime).
 For example, we could specify the result of division by zero as zero, but
 since we are not really interested in the result, we say it is UB.
 
-There are two forms of UB in LLVM: immediate UB and deferred UB (undef and
-poison values).
+There exist two forms of undefined behaviour in LLVM: immediate UB and deferred UB. The latter comes in two flavours: undef and poison values.
 The lattice of values in LLVM is:
 immediate UB > poison > undef > freeze > concrete value.
 

>From 1cb2027f022d1c083197ffca818d4c970895434e Mon Sep 17 00:00:00 2001
From: Nuno Lopes <nuno.lopes at tecnico.ulisboa.pt>
Date: Tue, 10 Dec 2024 09:34:08 +0000
Subject: [PATCH 08/12] Update UndefinedBehavior.rst

---
 llvm/docs/UndefinedBehavior.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/llvm/docs/UndefinedBehavior.rst b/llvm/docs/UndefinedBehavior.rst
index c9776d0632a0f8..2a3146a05c13fd 100644
--- a/llvm/docs/UndefinedBehavior.rst
+++ b/llvm/docs/UndefinedBehavior.rst
@@ -22,7 +22,8 @@ guarantees through the language type system or the runtime).
 For example, we could specify the result of division by zero as zero, but
 since we are not really interested in the result, we say it is UB.
 
-There exist two forms of undefined behaviour in LLVM: immediate UB and deferred UB. The latter comes in two flavours: undef and poison values.
+There exist two forms of undefined behaviour in LLVM: immediate UB and deferred UB.
+The latter comes in two flavours: undef and poison values.
 The lattice of values in LLVM is:
 immediate UB > poison > undef > freeze > concrete value.
 

>From 0a65952e930ecba19c5b1a440a6f450bc5f114ee Mon Sep 17 00:00:00 2001
From: Nuno Lopes <nuno.lopes at tecnico.ulisboa.pt>
Date: Tue, 10 Dec 2024 10:02:49 +0000
Subject: [PATCH 09/12] add summary + time travel

---
 llvm/docs/UndefinedBehavior.rst | 56 ++++++++++++++++++++++++++++-----
 1 file changed, 49 insertions(+), 7 deletions(-)

diff --git a/llvm/docs/UndefinedBehavior.rst b/llvm/docs/UndefinedBehavior.rst
index 2a3146a05c13fd..0c34bb09d22ba2 100644
--- a/llvm/docs/UndefinedBehavior.rst
+++ b/llvm/docs/UndefinedBehavior.rst
@@ -15,17 +15,20 @@ We also provide guidelines on when to use each form of UB.
 
 Introduction
 ============
-Undefined behavior is used to specify the behavior of corner cases for which we
-don't wish to specify the concrete results. UB is also used to provide
+Undefined behavior (UB) is used to specify the behavior of corner cases for
+which we don't wish to specify the concrete results. UB is also used to provide
 additional constraints to the optimizers (e.g., assumptions that the frontend
 guarantees through the language type system or the runtime).
 For example, we could specify the result of division by zero as zero, but
 since we are not really interested in the result, we say it is UB.
 
-There exist two forms of undefined behaviour in LLVM: immediate UB and deferred UB.
-The latter comes in two flavours: undef and poison values.
+There exist two forms of undefined behavior in LLVM: immediate UB and deferred
+UB. The latter comes in two flavors: undef and poison values.
+There is also a ``freeze`` instruction to tame the propagation of deferred UB.
 The lattice of values in LLVM is:
-immediate UB > poison > undef > freeze > concrete value.
+immediate UB > poison > undef > freeze(poison) > concrete value.
+
+We explain each of the concepts in detail below.
 
 
 Immediate UB
@@ -81,6 +84,31 @@ As a rule of thumb, use immediate UB only for the cases that trap the CPU for
 most of the supported architectures.
 
 
+Time Travel
+-----------
+Immediate UB in LLVM IR allows the so-called time travelling. What this means
+is that if a program triggers UB, then we are not required to preserve any of
+its observable behavior, including I/O.
+For example, the following function triggers UB after calling ``printf``:
+
+.. code-block:: llvm
+
+    define void @fn() {
+      call void @printf(...) willreturn
+      unreachable
+    }
+
+Since we know that ``printf`` will always return, and because LLVM's UB can
+time-travel, it is legal to remove the call to ``printf`` altogether and
+optimize the function to simply:
+
+.. code-block:: llvm
+
+    define void @fn() {
+      unreachable
+    }
+
+
 Deferred UB
 ===========
 Deferred UB is a lighter form of UB. It enables instructions to be executed
@@ -311,8 +339,8 @@ We can make the loop unswitching optimization above correct as follows:
       br i1 %c2, label %then, label %else
 
 
-Writing Tests that Avoid UB
-===========================
+Writing Tests Without Undefined Behavior
+========================================
 
 When writing tests, it is important to ensure that they don't trigger UB
 unnecessarily. Some automated test reduces sometimes use undef or poison
@@ -349,3 +377,17 @@ conditions and dereferencing undef/poison/null pointers.
    If you need a placeholder value to pass as an argument to an instruction
    that may trigger UB, add a new argument to the function rather than using
    undef or poison.
+
+
+Summary
+=======
+Undefined behavior (UB) in LLVM IR consists of two well-defined concepts:
+immediate and deferred UB (undef and poison values).
+Passing deferred UB values to certain operations leads to immediate UB.
+This can be avoided in some cases through the use of the ``freeze``
+instruction.
+
+The lattice of values in LLVM is:
+immediate UB > poison > undef > freeze(poison) > concrete value.
+It is only valid to transform values from the left to the right (e.g., a poison
+value can be replaced with a concrete value, but not the other way around).

>From 9e6c3fef50cc7954c4acc0ee90c9031aeab4feee Mon Sep 17 00:00:00 2001
From: Nuno Lopes <nuno.lopes at tecnico.ulisboa.pt>
Date: Tue, 10 Dec 2024 15:21:29 +0000
Subject: [PATCH 10/12] merge note

---
 llvm/docs/UndefinedBehavior.rst | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/llvm/docs/UndefinedBehavior.rst b/llvm/docs/UndefinedBehavior.rst
index 0c34bb09d22ba2..6075de6506afc0 100644
--- a/llvm/docs/UndefinedBehavior.rst
+++ b/llvm/docs/UndefinedBehavior.rst
@@ -145,7 +145,9 @@ Undef Values
 ------------
 .. warning::
    Undef values are deprecated and should be used only when strictly necessary.
-   No new uses should be added unless justified.
+   Uses of undef values should be restricted to representing loads of
+   uninitialized memory. This is the only part of the IR semantics that cannot
+   be replaced with alternatives yet (work in ongoing).
 
 An undef value represents any value of a given type. Moreover, each use of
 an instruction that depends on undef can observe a different value.
@@ -202,11 +204,6 @@ This optimization is wrong just because undef values exist, even if they are
 not used in this part of the program as LLVM has no way to tell if ``%v`` is
 undef or not.
 
-.. note::
-   Uses of undef values should be restricted to representing loads of
-   uninitialized memory. This is the only part of the IR semantics that cannot
-   be replaced with alternatives yet (work in ongoing).
-
 Looking at the value lattice, ``undef`` values can only be replaced with either
 a ``freeze`` instruction or a concrete value.
 A consequence is that giving undef as an operand to an instruction that triggers
@@ -391,3 +388,6 @@ The lattice of values in LLVM is:
 immediate UB > poison > undef > freeze(poison) > concrete value.
 It is only valid to transform values from the left to the right (e.g., a poison
 value can be replaced with a concrete value, but not the other way around).
+
+Undef is now deprecated and should be used only to represent loads of
+uninitialized memory.

>From aade2a11f5825040c00d42372d36f5a01beb3090 Mon Sep 17 00:00:00 2001
From: Nuno Lopes <nuno.lopes at tecnico.ulisboa.pt>
Date: Wed, 11 Dec 2024 12:18:10 +0000
Subject: [PATCH 11/12] Update llvm/docs/UndefinedBehavior.rst

Co-authored-by: Antonio Frighetto <me at antoniofrighetto.com>
---
 llvm/docs/UndefinedBehavior.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/llvm/docs/UndefinedBehavior.rst b/llvm/docs/UndefinedBehavior.rst
index 6075de6506afc0..ebd4cf65eadb10 100644
--- a/llvm/docs/UndefinedBehavior.rst
+++ b/llvm/docs/UndefinedBehavior.rst
@@ -75,8 +75,8 @@ We would obtain the following IR:
 However, this transformation is not correct! Since division triggers UB
 when the divisor is zero, we can only execute speculatively if we are sure we
 don't hit that condition.
-For the function above, when called like ``f(false, 0)``, before the optimization
-it would return 0, and after the optimization it now triggers UB.
+The function above, when called as ``f(false, 0)``, would return 0 before the
+optimization, and triggers UB after being optimizing.
 
 This example highlights why we minimize the cases that trigger immediate UB
 as much as possible.

>From 78ec2f4a478c096a6a8f50245f36cffeb271b029 Mon Sep 17 00:00:00 2001
From: Nuno Lopes <nuno.lopes at tecnico.ulisboa.pt>
Date: Wed, 11 Dec 2024 12:19:08 +0000
Subject: [PATCH 12/12] Update UndefinedBehavior.rst

---
 llvm/docs/UndefinedBehavior.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/docs/UndefinedBehavior.rst b/llvm/docs/UndefinedBehavior.rst
index ebd4cf65eadb10..f68bbbd505330a 100644
--- a/llvm/docs/UndefinedBehavior.rst
+++ b/llvm/docs/UndefinedBehavior.rst
@@ -76,7 +76,7 @@ However, this transformation is not correct! Since division triggers UB
 when the divisor is zero, we can only execute speculatively if we are sure we
 don't hit that condition.
 The function above, when called as ``f(false, 0)``, would return 0 before the
-optimization, and triggers UB after being optimizing.
+optimization, and triggers UB after being optimized.
 
 This example highlights why we minimize the cases that trigger immediate UB
 as much as possible.



More information about the llvm-commits mailing list