<div dir="ltr">FWIW - Clang is fairly allergic to emitting diagnostics based on optimization because they tend to present usability problems. They can appear/disappear due to seemingly unrelated changes in the code (that trigger or hinder optimizations that cause the diagnostic path to be hit).<br><br>Usually the idea is to implement these sort of bug finding techniques in Clang's static analyzer. So perhaps there would be a way to feed UBSan's facts/checks into the static analyzer in a more consistent way (I'm sure some of the same checks are implemented there already - but generalizing/unifying UBSan's checks to feed into the static analyzer could be handy).<br><br>- Dave<br><br><div class="gmail_quote"><div dir="ltr">On Wed, Mar 22, 2017 at 6:52 PM Vedant Kumar via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br class="gmail_msg">
<br class="gmail_msg">
I've performed some experiments with reporting UBSan diagnostics at<br class="gmail_msg">
compile-time and think that this is a useful thing to do. I'd like to discuss<br class="gmail_msg">
the motivation, the approach I took, and some results.<br class="gmail_msg">
<br class="gmail_msg">
=== Motivation ===<br class="gmail_msg">
<br class="gmail_msg">
We're interested in fixing UB in our projects and use UBSan to do this.<br class="gmail_msg">
However, we have lots of software that is easy to build but hard to run, or<br class="gmail_msg">
hard to test with adequate code coverage (e.g firmware). This limits the amount<br class="gmail_msg">
of bugs we can catch with UBSan.<br class="gmail_msg">
<br class="gmail_msg">
It would be nice if we could report UB at compile-time without false positives.<br class="gmail_msg">
We wouldn't be able to report everything a runtime tool could, but we would be<br class="gmail_msg">
able to report a large number of real bugs very quickly, just by rebuilding all<br class="gmail_msg">
our software with a flag enabled.<br class="gmail_msg">
<br class="gmail_msg">
=== Approach ===<br class="gmail_msg">
<br class="gmail_msg">
I wrote a simple analysis which detects UB statically by piggybacking off UBSan.<br class="gmail_msg">
It's actually able to issue decent diagnostics. It only issues a diagnostic if<br class="gmail_msg">
it finds a call to a UBSan diagnostic handler which post-dominates the function<br class="gmail_msg">
entry block.<br class="gmail_msg">
<br class="gmail_msg">
The idea is: if a function unconditionally exhibits UB when called, it's worth<br class="gmail_msg">
reporting the UB at compile-time.<br class="gmail_msg">
<br class="gmail_msg">
Here is a full example. This C program has UB because it returns a null pointer<br class="gmail_msg">
when it shouldn't:<br class="gmail_msg">
<br class="gmail_msg">
```<br class="gmail_msg">
__attribute__((returns_nonnull)) int *returns_nonnull(int *p) {<br class="gmail_msg">
return p; // Bug: null pointer returned here.<br class="gmail_msg">
}<br class="gmail_msg">
<br class="gmail_msg">
int main() {<br class="gmail_msg">
returns_nonnull((int *)0LL);<br class="gmail_msg">
return 0;<br class="gmail_msg">
}<br class="gmail_msg">
```<br class="gmail_msg">
<br class="gmail_msg">
With UBSan enabled, here's the IR we get:<br class="gmail_msg">
<br class="gmail_msg">
```<br class="gmail_msg">
define nonnull i32* @returns_nonnull(i32* %p) #0 {<br class="gmail_msg">
entry:<br class="gmail_msg">
...<br class="gmail_msg">
%1 = icmp ne i32* %p, null, !nosanitize !2<br class="gmail_msg">
br i1 %1, label %cont, label %handler.nonnull_return<br class="gmail_msg">
<br class="gmail_msg">
handler.nonnull_return:<br class="gmail_msg">
call void @__ubsan_handle_nonnull_return(...), !nosanitize !2<br class="gmail_msg">
br label %cont, !nosanitize !2<br class="gmail_msg">
<br class="gmail_msg">
cont:<br class="gmail_msg">
ret i32* %p<br class="gmail_msg">
}<br class="gmail_msg">
<br class="gmail_msg">
define i32 @main() #0 {<br class="gmail_msg">
entry:<br class="gmail_msg">
...<br class="gmail_msg">
%call = call nonnull i32* @returns_nonnull(i32* null)<br class="gmail_msg">
ret i32 0<br class="gmail_msg">
}<br class="gmail_msg">
```<br class="gmail_msg">
<br class="gmail_msg">
At -O2, LLVM inlines @returns_nonnull and throws away the null check:<br class="gmail_msg">
<br class="gmail_msg">
```<br class="gmail_msg">
define i32 @main() local_unnamed_addr #0 {<br class="gmail_msg">
entry:<br class="gmail_msg">
tail call void @__ubsan_handle_nonnull_return(...), !nosanitize !2<br class="gmail_msg">
ret i32 0<br class="gmail_msg">
}<br class="gmail_msg">
```<br class="gmail_msg">
<br class="gmail_msg">
The call to UBSan's diagnostic handler post-dominates the function entry block,<br class="gmail_msg">
so we report it right away:<br class="gmail_msg">
<br class="gmail_msg">
$ clang -fsanitize=undefined -O2 -Xclang -enable-llvm-linter buggy.c<br class="gmail_msg">
Undefined behavior: invalid null return value (buggy.c:3:1)<br class="gmail_msg">
<br class="gmail_msg">
=== Results ===<br class="gmail_msg">
<br class="gmail_msg">
I packaged up my analysis into LLVM's Lint pass and added a clang option to<br class="gmail_msg">
enable linting. The initial patch is up for review:<br class="gmail_msg">
<br class="gmail_msg">
<a href="https://reviews.llvm.org/D30949" rel="noreferrer" class="gmail_msg" target="_blank">https://reviews.llvm.org/D30949</a> - Add an option to enable LLVM IR linting<br class="gmail_msg">
<br class="gmail_msg">
I built a few internal projects with UBSan, optimizations, and linting enabled.<br class="gmail_msg">
This exposed real bugs. The only problem was that I got reports about UB in<br class="gmail_msg">
dead code. Maybe this can be addressed by setting up sanitizer blacklists?<br class="gmail_msg">
<br class="gmail_msg">
=== Alternatives? ===<br class="gmail_msg">
<br class="gmail_msg">
We could try implementing something like the STACK UB checker:<br class="gmail_msg">
<br class="gmail_msg">
<a href="https://people.csail.mit.edu/nickolai/papers/wang-stack-tocs.pdf" rel="noreferrer" class="gmail_msg" target="_blank">https://people.csail.mit.edu/nickolai/papers/wang-stack-tocs.pdf</a><br class="gmail_msg">
<br class="gmail_msg">
I haven't compared my approach vs. STACK in terms of bug-finding efficacy. The<br class="gmail_msg">
latter does seem harder to implement.<br class="gmail_msg">
<br class="gmail_msg">
I'm interested in hearing what others think.<br class="gmail_msg">
<br class="gmail_msg">
thanks,<br class="gmail_msg">
vedant<br class="gmail_msg">
<br class="gmail_msg">
_______________________________________________<br class="gmail_msg">
cfe-dev mailing list<br class="gmail_msg">
<a href="mailto:cfe-dev@lists.llvm.org" class="gmail_msg" target="_blank">cfe-dev@lists.llvm.org</a><br class="gmail_msg">
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" class="gmail_msg" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br class="gmail_msg">
</blockquote></div></div>