[cfe-dev] Reporting UBSan issues at compile-time

Thu Mar 23 13:42:58 PDT 2017

> On Mar 23, 2017, at 11:48 AM, Reid Kleckner via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> 
> On Thu, Mar 23, 2017 at 10:55 AM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
> On Thu, Mar 23, 2017 at 10:45 AM Reid Kleckner <rnk at google.com <mailto:rnk at google.com>> wrote:
> I don't think it will be feasible to generalize UBSan's knowledge to the static analyzer.
> 
> Why not? The rough idea I meant would be to express the constraints UBSan is checking into the static analyzer - I realize the current layering (UBSan being in Clang's IRGen) doesn't make that trivial/obvious, but it seems to me that the constraints could be shared in some way - with some work.
> 
> Maybe I am not imaginative enough, but I cannot envision a clean way to express the conditions that trigger UB that is useful for both static analysis and dynamic instrumentation. The best I can come up with is AST-level instrumentation: synthesizing AST nodes that can be translated to IR or used for analysis. That doesn't seem reasonable, so I think getting ubsan into the static analyzer would end up duplicating the knowledge of what conditions trigger UB.
> 
> The static analyzer CFG is also at best an approximation of the real CFG, especially for C++.
> 

First, the CFG is not only used by the analyzer, but it is also used by Sema (ex: unreachable code warnings and uninitialized variables).
Second, while there are corners of C++ that are not supported, it has high fidelity otherwise.

> Sure enough - and I believe some of the people working/caring about it would like to fix that. I think Manuel & Chandler have expressed the notion that the best way to do that would be to move to a world where the CFG is used for CodeGen, so it's a single/consistent source of truth.
> 
> Yes, we could do all that work, but LLVM's CFG today is already precise for C++. If we allow ourselves to emit diagnostics from the middle-end, we can save all that work.
> 
> Going down the high-effort path of extending the CFG and abstracting or duplicating UBSan’s checks as static analyses on that CFG would definitely provide a better diagnostic experience, but it's worth re-examining conventional wisdom and exploring alternatives first.

The idea of analysis based on top of LLVM IR is not new and have been discussed before. My personal belief is that having access to the AST (or just code as was written by the user) is very important. It ensures we can provide precise diagnostics. It also allows us to see when users want to suppress an issue report by changing the way the source code is uttered. For example, allows to tell the developer that they can suppress with a cast. We can also differentiate between “NULL” and “0”, which allows us to determine if the programmer intended to use a pointer constant or a zero numeric value.

Ensuring that clang CFG completely supports C++ is a bit challenging but not insurmountable task. In addition, fixing-up the unsupported corners in the CFG would benefit not only the clang static analyzer, but all the other “users” such as clang warnings, clang-tidy, possibly even refactoring in the future.

Here is a thread where this has been discussed before:
http://clang-developers.42468.n3.nabble.com/LLVM-Dev-meeting-Slides-amp-Minutes-from-the-Static-Analyzer-BoF-td4048418.html

> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170323/57d7f5d8/attachment.html>