[llvm] [docs] Add guide for Undefined Behavior (PR #119220)

Antonio Frighetto via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 11 03:28:07 PST 2024


================
@@ -0,0 +1,393 @@
+======================================
+LLVM IR Undefined Behavior (UB) Manual
+======================================
+
+.. contents::
+   :local:
+   :depth: 2
+
+Abstract
+========
+This document describes the undefined behavior (UB) in LLVM's IR, including
+undef and poison values, as well as the ``freeze`` instruction.
+We also provide guidelines on when to use each form of UB.
+
+
+Introduction
+============
+Undefined behavior (UB) is used to specify the behavior of corner cases for
+which we don't wish to specify the concrete results. UB is also used to provide
+additional constraints to the optimizers (e.g., assumptions that the frontend
+guarantees through the language type system or the runtime).
+For example, we could specify the result of division by zero as zero, but
+since we are not really interested in the result, we say it is UB.
+
+There exist two forms of undefined behavior in LLVM: immediate UB and deferred
+UB. The latter comes in two flavors: undef and poison values.
+There is also a ``freeze`` instruction to tame the propagation of deferred UB.
+The lattice of values in LLVM is:
+immediate UB > poison > undef > freeze(poison) > concrete value.
+
+We explain each of the concepts in detail below.
+
+
+Immediate UB
+============
+Immediate UB is the most severe form of UB. It should be avoided whenever
+possible.
+Immediate UB should be used only for operations that trap in most CPUs supported
+by LLVM.
+Examples include division by zero, dereferencing a null pointer, etc.
+
+The reason that immediate UB should be avoided is that it makes optimizations
+such as hoisting a lot harder.
+Consider the following example:
+
+.. code-block:: llvm
+
+    define i32 @f(i1 %c, i32 %v) {
+      br i1 %c, label %then, label %else
+
+    then:
+      %div = udiv i32 3, %v
+      br label %ret
+
+    else:
+      br label %ret
+
+    ret:
+      %r = phi i32 [ %div, %then ], [ 0, %else ]
+      ret i32 %r
+    }
+
+We might be tempted to simplify this function by removing the branching and
+executing the division speculatively because ``%c`` is true most of times.
+We would obtain the following IR:
+
+.. code-block:: llvm
+
+    define i32 @f(i1 %c, i32 %v) {
+      %div = udiv i32 3, %v
+      %r = select i1 %c, i32 %div, i32 0
+      ret i32 %r
+    }
+
+However, this transformation is not correct! Since division triggers UB
+when the divisor is zero, we can only execute speculatively if we are sure we
+don't hit that condition.
+For the function above, when called like ``f(false, 0)``, before the optimization
+it would return 0, and after the optimization it now triggers UB.
----------------
antoniofrighetto wrote:

Nit: readability (up to you).
```suggestion
The function above, when called as ``f(false, 0)``, would return 0 before the
optimization, and triggers UB after being optimizing.
```

https://github.com/llvm/llvm-project/pull/119220


More information about the llvm-commits mailing list