[llvm] [docs] Add guide for Undefined Behavior (PR #119220)
Antonio Frighetto via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 11 03:28:07 PST 2024
================
@@ -0,0 +1,393 @@
+======================================
+LLVM IR Undefined Behavior (UB) Manual
+======================================
+
+.. contents::
+ :local:
+ :depth: 2
+
+Abstract
+========
+This document describes the undefined behavior (UB) in LLVM's IR, including
+undef and poison values, as well as the ``freeze`` instruction.
+We also provide guidelines on when to use each form of UB.
+
+
+Introduction
+============
+Undefined behavior (UB) is used to specify the behavior of corner cases for
+which we don't wish to specify the concrete results. UB is also used to provide
+additional constraints to the optimizers (e.g., assumptions that the frontend
+guarantees through the language type system or the runtime).
+For example, we could specify the result of division by zero as zero, but
+since we are not really interested in the result, we say it is UB.
+
+There exist two forms of undefined behavior in LLVM: immediate UB and deferred
+UB. The latter comes in two flavors: undef and poison values.
+There is also a ``freeze`` instruction to tame the propagation of deferred UB.
+The lattice of values in LLVM is:
+immediate UB > poison > undef > freeze(poison) > concrete value.
+
+We explain each of the concepts in detail below.
+
+
+Immediate UB
+============
+Immediate UB is the most severe form of UB. It should be avoided whenever
+possible.
+Immediate UB should be used only for operations that trap in most CPUs supported
+by LLVM.
+Examples include division by zero, dereferencing a null pointer, etc.
+
+The reason that immediate UB should be avoided is that it makes optimizations
+such as hoisting a lot harder.
+Consider the following example:
+
+.. code-block:: llvm
+
+ define i32 @f(i1 %c, i32 %v) {
+ br i1 %c, label %then, label %else
+
+ then:
+ %div = udiv i32 3, %v
+ br label %ret
+
+ else:
+ br label %ret
+
+ ret:
+ %r = phi i32 [ %div, %then ], [ 0, %else ]
+ ret i32 %r
+ }
+
+We might be tempted to simplify this function by removing the branching and
+executing the division speculatively because ``%c`` is true most of times.
+We would obtain the following IR:
+
+.. code-block:: llvm
+
+ define i32 @f(i1 %c, i32 %v) {
+ %div = udiv i32 3, %v
+ %r = select i1 %c, i32 %div, i32 0
+ ret i32 %r
+ }
+
+However, this transformation is not correct! Since division triggers UB
+when the divisor is zero, we can only execute speculatively if we are sure we
+don't hit that condition.
+For the function above, when called like ``f(false, 0)``, before the optimization
+it would return 0, and after the optimization it now triggers UB.
----------------
antoniofrighetto wrote:
Nit: readability (up to you).
```suggestion
The function above, when called as ``f(false, 0)``, would return 0 before the
optimization, and triggers UB after being optimizing.
```
https://github.com/llvm/llvm-project/pull/119220
More information about the llvm-commits
mailing list