[cfe-dev] Reporting UBSan issues at compile-time

Sat Mar 25 14:35:16 PDT 2017

Ok, so the discussion in the thread so far as been pretty strong against 
optimizer influenced warnings.  To me, this seems a bit off point.  I 
completely and utterly agree that I don't want my -Werror builds to be 
influenced by the black box which is the optimizer, but that doesn't 
mean there isn't a utility to such diagnostics.

It seems like what we need here is a different interface to the 
warnings.  As a strawman, what if we had the compiler produce an output 
file which contains a list of functions which are known to trip an ubsan 
assert.  By combining all of these files, we can produce a binary wide 
list of "as built" warnings.  This could be integrated into tools like 
the static analyzer in a couple of obvious ways: a) report them as "must 
be dead" code, b) highlight calls to such functions as errors, or c) use 
them to emphasize warnings produced through another chain of logic (i.e. 
"hey this actually happens!").

Such "as built" warnings seem particular useful for large projects which 
have to support multiple compiler versions at once.  By combining a 
profile collected on one build with an "as built" warning file from 
another, you could immediate identify (some of the) the critical bugs 
which had to be fixed before moving to the new compiler.  Worth noting 
is that this works for miscompiles as well as for application bugs.  :)

Another use would be to intersect "as built" warnings list from two or 
more built flavors.  Any warning which wasn't in all variants is one 
which probably needs urgent attention.  ("Hey, why did my O3 build 
produce more "as built" warnings than my debug build?!")

Taking things in a different direction, the original idea could also be 
generalized a bit.  Instead of focusing on functions which are post 
dominated by failure points, let's focus on statements which are post 
dominated by such.  Any such statement produces a region which must be 
dead and can feed into all of the same uses as those above.

Philip

On 03/22/2017 06:52 PM, Vedant Kumar via cfe-dev wrote:
> Hi,
>
> I've performed some experiments with reporting UBSan diagnostics at
> compile-time and think that this is a useful thing to do. I'd like to discuss
> the motivation, the approach I took, and some results.
>
> === Motivation ===
>
> We're interested in fixing UB in our projects and use UBSan to do this.
> However, we have lots of software that is easy to build but hard to run, or
> hard to test with adequate code coverage (e.g firmware). This limits the amount
> of bugs we can catch with UBSan.
>
> It would be nice if we could report UB at compile-time without false positives.
> We wouldn't be able to report everything a runtime tool could, but we would be
> able to report a large number of real bugs very quickly, just by rebuilding all
> our software with a flag enabled.
>
> === Approach ===
>
> I wrote a simple analysis which detects UB statically by piggybacking off UBSan.
> It's actually able to issue decent diagnostics. It only issues a diagnostic if
> it finds a call to a UBSan diagnostic handler which post-dominates the function
> entry block.
>
> The idea is: if a function unconditionally exhibits UB when called, it's worth
> reporting the UB at compile-time.
>
> Here is a full example. This C program has UB because it returns a null pointer
> when it shouldn't:
>
>    ```
>    __attribute__((returns_nonnull)) int *returns_nonnull(int *p) {
>      return p; // Bug: null pointer returned here.
>    }
>    
>    int main() {
>      returns_nonnull((int *)0LL);
>      return 0;
>    }
>    ```
>
> With UBSan enabled, here's the IR we get:
>
>    ```
>    define nonnull i32* @returns_nonnull(i32* %p) #0 {
>    entry:
>      ...
>      %1 = icmp ne i32* %p, null, !nosanitize !2
>      br i1 %1, label %cont, label %handler.nonnull_return
>    
>    handler.nonnull_return:
>      call void @__ubsan_handle_nonnull_return(...), !nosanitize !2
>      br label %cont, !nosanitize !2
>    
>    cont:
>      ret i32* %p
>    }
>
>    define i32 @main() #0 {
>    entry:
>      ...
>      %call = call nonnull i32* @returns_nonnull(i32* null)
>      ret i32 0
>    }
>    ```
>
> At -O2, LLVM inlines @returns_nonnull and throws away the null check:
>
>    ```
>    define i32 @main() local_unnamed_addr #0 {
>    entry:
>      tail call void @__ubsan_handle_nonnull_return(...), !nosanitize !2
>      ret i32 0
>    }
>    ```
>
> The call to UBSan's diagnostic handler post-dominates the function entry block,
> so we report it right away:
>
>    $ clang -fsanitize=undefined -O2 -Xclang -enable-llvm-linter buggy.c
>    Undefined behavior: invalid null return value (buggy.c:3:1)
>
> === Results ===
>
> I packaged up my analysis into LLVM's Lint pass and added a clang option to
> enable linting. The initial patch is up for review:
>
>    https://reviews.llvm.org/D30949 - Add an option to enable LLVM IR linting
>
> I built a few internal projects with UBSan, optimizations, and linting enabled.
> This exposed real bugs. The only problem was that I got reports about UB in
> dead code. Maybe this can be addressed by setting up sanitizer blacklists?
>
> === Alternatives? ===
>
> We could try implementing something like the STACK UB checker:
>
>    https://people.csail.mit.edu/nickolai/papers/wang-stack-tocs.pdf
>
> I haven't compared my approach vs. STACK in terms of bug-finding efficacy. The
> latter does seem harder to implement.
>
> I'm interested in hearing what others think.
>
> thanks,
> vedant
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev