[llvm] [RFC][Transforms] Prefer unreachable insn over optimizaiton with undef (PR #131020)
via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 12 13:21:28 PDT 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-llvm-analysis
Author: None (yonghong-song)
<details>
<summary>Changes</summary>
In some cases, the code has uninitialized variable and the uninitialized variable actually has an impact on the code. For example,
```
$ cat t.c
__attribute__((always_inline)) int bar(int a, int *b) {
if ((*b) == 0) return a;
else return 2 * a;
}
void tar(int);
int foo(int a) {
int b;
return bar(a, &b);
}
```
In the above variable 'b' is uninitialized. With the following compilation flag:
clang -O2 -S -emit-llvm -Wall -Werror t.c -mllvm -print-after-all
The EarlyCSEPass changes the input IR
```
define dso_local i32 @<!-- -->foo(i32 noundef %a) local_unnamed_addr #<!-- -->1 {
entry:
%cmp.i = icmp eq i32 undef, 0
%mul.i = shl nsw i32 %a, 1
%retval.0.i = select i1 %cmp.i, i32 %a, i32 %mul.i
ret i32 %retval.0.i
}
```
to output IR
```
define dso_local i32 @<!-- -->foo(i32 noundef %a) local_unnamed_addr #<!-- -->1 {
entry:
%mul.i = shl nsw i32 %a, 1
ret i32 %a
}
```
In the above, the llvm generates code without any 'undef' values. On the other hand, with the following compilation flag:
clang -O2 -S -emit-llvm -Wall -Werror t.c -fsanitize=undefined -mllvm -print-after-all
The SCCPPass changes the input IR
```
define dso_local i32 @<!-- -->foo(i32 noundef %a) local_unnamed_addr #<!-- -->3 !func_sanitize !12 {
entry:
br i1 undef, label %bar.exit, label %if.else.i
if.else.i: ; preds = %entry
%0 = shl i32 %a, 1
%1 = add i32 %a, 1073741824
%2 = icmp sgt i32 %1, -1
br i1 %2, label %bar.exit, label %handler.mul_overflow.i, !prof !7, !nosanitize !6
handler.mul_overflow.i: ; preds = %if.else.i
%3 = zext i32 %a to i64, !nosanitize !6
tail call void @<!-- -->__ubsan_handle_mul_overflow(ptr nonnull @<!-- -->2, i64 2, i64 %3) #<!-- -->5, !nosanitize !6
br label %bar.exit, !nosanitize !6
bar.exit: ; preds = %entry, %if.else.i, %handler.mul_overflow.i
%retval.0.i = phi i32 [ %a, %entry ], [ %0, %handler.mul_overflow.i ], [ %0, %if.else.i ]
ret i32 %retval.0.i
}
```
to output IR
```
define dso_local i32 @<!-- -->foo(i32 noundef %a) local_unnamed_addr #<!-- -->3 !func_sanitize !12 {
entry:
unreachable
}
```
Note that in the above example, the unitialized variable is used cross function boundary which makes frontend harder to emit error messages. And bpf prog needs to pass verifier which makes runtime sanitize not suitable. On the other hand, bpf prog typically is not that big and bpf prog also tends have quite some inlining for performance reason or to avoid kernel verification failure. So detecting and reporting impactful uninit var can help improve bpf developer productivity greatly.
There are more discussion in [1] and [2]. To maximize chances to report proper unit var warnings (beyond -Wall -Werror), I would like to discuss two things related to the above example:
1. Avoid generating legal code from 'undef' code. This is needed so the 'undef' code can be carried through entire compilation. And in many cases, 'undef' is eventually transformed to 'unreachable' insn. Generating legal code (without 'undef') will prevent later catching 'undef/unreachable' cases.
2. As in discussions in [2], looks like clang-format does not like BPF Backend to check undef values. So if possible, it would be great to convert 'undef' related code to 'unreachable', e.g. in the above SCCPPass.
This RFC intends to have some upstream discussion on how to achieve the above two goals.
With this patch, for the following compilation flag:
clang -O2 -S -emit-llvm -Wall -Werror t.c -mllvm -print-after-all
The EarlyCSEPass changes the input IR
```
define dso_local i32 @<!-- -->foo(i32 noundef %a) local_unnamed_addr #<!-- -->1 {
entry:
%cmp.i = icmp eq i32 undef, 0
%mul.i = shl nsw i32 %a, 1
%retval.0.i = select i1 %cmp.i, i32 %a, i32 %mul.i
ret i32 %retval.0.i
}
```
to output IR
```
define dso_local i32 @<!-- -->foo(i32 noundef %a) local_unnamed_addr #<!-- -->1 {
entry:
%mul.i = shl nsw i32 %a, 1
unreachable
}
```
And '%mul.i = shl nsw i32 %a, 1' is removed in SimplifyCFGPass.
```
; *** IR Dump After CorrelatedValuePropagationPass on foo ***
; Function Attrs: nounwind uwtable
define dso_local i32 @<!-- -->foo(i32 noundef %a) local_unnamed_addr #<!-- -->1 {
entry:
%mul.i = shl nsw i32 %a, 1
unreachable
}
; *** IR Dump After SimplifyCFGPass on foo ***
; Function Attrs: nounwind uwtable
define dso_local i32 @<!-- -->foo(i32 noundef %a) local_unnamed_addr #<!-- -->1 {
entry:
unreachable
}
```
[1] https://discourse.llvm.org/t/detect-undefined-behavior-due-to-uninitialized-variables-in-bpf-programs/84116/4?u=yonghong-song
[2] https://github.com/llvm/llvm-project/pull/126858
---
Full diff: https://github.com/llvm/llvm-project/pull/131020.diff
2 Files Affected:
- (modified) llvm/lib/Analysis/InstructionSimplify.cpp (+2-2)
- (modified) llvm/lib/Transforms/Scalar/EarlyCSE.cpp (+11)
``````````diff
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 1c33c6bebdd1b..b09ac42945a4a 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -4829,9 +4829,9 @@ static Value *simplifySelectInst(Value *Cond, Value *TrueVal, Value *FalseVal,
if (isa<PoisonValue>(CondC))
return PoisonValue::get(TrueVal->getType());
- // select undef, X, Y -> X or Y
+ // select undef, X, Y -> undef
if (Q.isUndefValue(CondC))
- return isa<Constant>(FalseVal) ? FalseVal : TrueVal;
+ return UndefValue::get(TrueVal->getType());
// select true, X, Y --> X
// select false, X, Y --> Y
diff --git a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
index 3a0ae6b01a114..0bf8a89894e60 100644
--- a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
+++ b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
@@ -1780,6 +1780,17 @@ bool EarlyCSE::processNode(DomTreeNode *Node) {
LastStore = nullptr;
}
}
+
+ // If the ret insn returns an undefined (but not a poinson) value,
+ // change it to unreachable.
+ if (Inst.getOpcode() == Instruction::Ret) {
+ auto *RI = cast<ReturnInst>(&Inst);
+ auto *RetVal = RI->getReturnValue();
+ if (isa<UndefValue>(RetVal) && !isa<PoisonValue>(RetVal)) {
+ changeToUnreachable(&Inst, false, NULL, &*MSSAUpdater);
+ Changed = true;
+ }
+ }
}
return Changed;
``````````
</details>
https://github.com/llvm/llvm-project/pull/131020
More information about the llvm-commits
mailing list