[cfe-dev] Reporting UBSan issues at compile-time

Thu Mar 23 08:18:57 PDT 2017

FWIW - Clang is fairly allergic to emitting diagnostics based on
optimization because they tend to present usability problems. They can
appear/disappear due to seemingly unrelated changes in the code (that
trigger or hinder optimizations that cause the diagnostic path to be hit).

Usually the idea is to implement these sort of bug finding techniques in
Clang's static analyzer. So perhaps there would be a way to feed UBSan's
facts/checks into the static analyzer in a more consistent way (I'm sure
some of the same checks are implemented there already - but
generalizing/unifying UBSan's checks to feed into the static analyzer could
be handy).

- Dave

On Wed, Mar 22, 2017 at 6:52 PM Vedant Kumar via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hi,
>
> I've performed some experiments with reporting UBSan diagnostics at
> compile-time and think that this is a useful thing to do. I'd like to
> discuss
> the motivation, the approach I took, and some results.
>
> === Motivation ===
>
> We're interested in fixing UB in our projects and use UBSan to do this.
> However, we have lots of software that is easy to build but hard to run, or
> hard to test with adequate code coverage (e.g firmware). This limits the
> amount
> of bugs we can catch with UBSan.
>
> It would be nice if we could report UB at compile-time without false
> positives.
> We wouldn't be able to report everything a runtime tool could, but we
> would be
> able to report a large number of real bugs very quickly, just by
> rebuilding all
> our software with a flag enabled.
>
> === Approach ===
>
> I wrote a simple analysis which detects UB statically by piggybacking off
> UBSan.
> It's actually able to issue decent diagnostics. It only issues a
> diagnostic if
> it finds a call to a UBSan diagnostic handler which post-dominates the
> function
> entry block.
>
> The idea is: if a function unconditionally exhibits UB when called, it's
> worth
> reporting the UB at compile-time.
>
> Here is a full example. This C program has UB because it returns a null
> pointer
> when it shouldn't:
>
>   ```
>   __attribute__((returns_nonnull)) int *returns_nonnull(int *p) {
>     return p; // Bug: null pointer returned here.
>   }
>
>   int main() {
>     returns_nonnull((int *)0LL);
>     return 0;
>   }
>   ```
>
> With UBSan enabled, here's the IR we get:
>
>   ```
>   define nonnull i32* @returns_nonnull(i32* %p) #0 {
>   entry:
>     ...
>     %1 = icmp ne i32* %p, null, !nosanitize !2
>     br i1 %1, label %cont, label %handler.nonnull_return
>
>   handler.nonnull_return:
>     call void @__ubsan_handle_nonnull_return(...), !nosanitize !2
>     br label %cont, !nosanitize !2
>
>   cont:
>     ret i32* %p
>   }
>
>   define i32 @main() #0 {
>   entry:
>     ...
>     %call = call nonnull i32* @returns_nonnull(i32* null)
>     ret i32 0
>   }
>   ```
>
> At -O2, LLVM inlines @returns_nonnull and throws away the null check:
>
>   ```
>   define i32 @main() local_unnamed_addr #0 {
>   entry:
>     tail call void @__ubsan_handle_nonnull_return(...), !nosanitize !2
>     ret i32 0
>   }
>   ```
>
> The call to UBSan's diagnostic handler post-dominates the function entry
> block,
> so we report it right away:
>
>   $ clang -fsanitize=undefined -O2 -Xclang -enable-llvm-linter buggy.c
>   Undefined behavior: invalid null return value (buggy.c:3:1)
>
> === Results ===
>
> I packaged up my analysis into LLVM's Lint pass and added a clang option to
> enable linting. The initial patch is up for review:
>
>   https://reviews.llvm.org/D30949 - Add an option to enable LLVM IR
> linting
>
> I built a few internal projects with UBSan, optimizations, and linting
> enabled.
> This exposed real bugs. The only problem was that I got reports about UB in
> dead code. Maybe this can be addressed by setting up sanitizer blacklists?
>
> === Alternatives? ===
>
> We could try implementing something like the STACK UB checker:
>
>   https://people.csail.mit.edu/nickolai/papers/wang-stack-tocs.pdf
>
> I haven't compared my approach vs. STACK in terms of bug-finding efficacy.
> The
> latter does seem harder to implement.
>
> I'm interested in hearing what others think.
>
> thanks,
> vedant
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170323/69cc33d6/attachment.html>