[llvm] [docs] Add guide for Undefined Behavior (PR #119220)

Nikita Popov via llvm-commits llvm-commits at lists.llvm.org
Mon Dec 9 07:31:09 PST 2024


================
@@ -0,0 +1,368 @@
+======================================
+LLVM IR Undefined Behavior (UB) Manual
+======================================
+
+.. contents::
+   :local:
+   :depth: 2
+
+Abstract
+========
+This document describes the undefined behavior (UB) in LLVM's IR, including
+undef and poison values, as well as the ``freeze`` instruction.
+We also provide guidelines on when to use each form of UB.
+
+
+Introduction
+============
+Undefined behavior is used to specify the behavior of corner cases for which we
+don't wish to specify the concrete results.
+For example, we could specify the result of division by zero as zero, but
+since we are not really interested in the result, we say it is UB.
+
+There are two forms of UB in LLVM: immediate UB and deferred UB (undef and
+poison values).
+The lattice of values in LLVM is:
+immediate UB > poison > undef > freeze > concrete value.
+
+
+Immediate UB
+============
+Immediate UB is the most severe form of UB. It should be avoided whenever
+possible.
+Immediate UB should be used only for operations that trap in most CPUs supported
+by LLVM.
+Examples include division by zero, dereferencing a null pointer, etc.
+
+The reason that immediate UB should be avoided is that it makes optimizations
+such as hoisting a lot harder.
+Consider the following example:
+
+.. code-block:: llvm
+
+    define i32 @f(i1 %c, i32 %v) {
+      br i1 %c, label %then, label %else
+
+    then:
+      %div = udiv i32 3, %v
+      br label %ret
+
+    else:
+      br label %ret
+
+    ret:
+      %r = phi i32 [ %div, %then ], [ 0, %else ]
+      ret i32 %r
+    }
+
+We might be tempted to simplify this function by removing the branching and
+executing the division speculatively because ``%c`` is true most of times.
+We would obtain the following IR:
+
+.. code-block:: llvm
+
+    define i32 @f(i1 %c, i32 %v) {
+      %div = udiv i32 3, %v
+      %r = select i1 %c, i32 %div, i32 0
+      ret i32 %r
+    }
+
+However, this transformation is not correct! Since division triggers UB
+when the divisor is zero, we can only execute speculatively if we are sure we
+don't hit that condition.
+For the function above, when called like ``f(false, 0)``, before the optimization
+it would return 0, and after the optimization it now triggers UB.
+
+This example highlights why we minimize the cases that trigger immediate UB
+as much as possible.
+As a rule of thumb, use immediate UB only for the cases that trap the CPU for
+most of the supported architectures.
+
+
+Deferred UB
+===========
+Deferred UB is a lighter form of UB. It enables instructions to be executed
+speculatively while marking some corner cases having erroneous values.
+Deferred UB should be used for cases where the semantics offered by common
+CPUs differs,but the CPU does not trap.
----------------
nikic wrote:

```suggestion
CPUs differ, but the CPU does not trap.
```

https://github.com/llvm/llvm-project/pull/119220


More information about the llvm-commits mailing list