[cfe-dev] Many .c files as input to scan-build

Fri Jul 12 16:13:52 PDT 2019

The problem you're describing is known as "cross translation unit 
analysis". The Static Analyzer is part of Clang, and the primary purpose 
of Clang is to compile one translation unit at a time, so the Static 
Analyzer inherits the same limitation.

Doing "unity builds" is one way around this problem. This wouldn't scale 
to huge projects and it's not that trivial to concatenate all the files, 
depending on the build system (the project may use freshly compiled 
executables to autogenerate source code for subsequent passes or compile 
the same source code for different architectures).

Note that even if you do a unity build, the time it takes for the Static 
Analyzer to perform analysis of a certain quality would grow 
non-linearly (in fact, "exponentially" would way more accurate). Even 
though all of the source code is available, making proper use of this 
information to achieve analysis quality similar to that of a smaller 
codebase would be impossible. You will be paying with loss of coverage, 
the analyzer will give up sooner and only find more shallow bugs.

There's an effort to perform cross-translation-unit analysis through 
ASTImporter - the same facility that supports executing arbitrary 
expressions in LLDB.This allows importing only small chunks of the 
program as needed without constructing a whole-program AST, but 
generally i feel it's not really that much better than unity builds. See 
CTU threads on this mailing list. They report success when it comes to 
overall usefulness of the Static Analyzer, so i guess it's worth it to 
think in that direction, but it's most likely less worth it than using 
more expensive and sophisticated but more scalable techniques such as 
summary-based analysis.

On 7/11/19 9:56 AM, Дилян Палаузов via cfe-dev wrote:
> Hello,
>
> the clang static analyzer does a good job, performing on the individual source files.  But it with a single .c/.cpp file
> as input it cannot catch all codepaths of a program having many source files.
>
> In particular, using GLibs g_hash_table_new allocates memory and g_hash_table_destroy() frees the memory, but scan-build
> does not know this and does not check for it.
>
> I mean, scan-build provides different results for the same program, depending on how source code is split into different
> files.
>
> One way to solve this is to create a huge .h file containing recursively all function definitions needed by a .c/.cpp
> file, including sources from libraries and feeding this to scan-build.
>
> It would be however easier, if scan-build is extended to accept as input many .c and .cpp files, glue them internally
> into one and then handle that big file as input.
>
> This will help finding troubles, that are split between source files.
>
> Regards
>    Дилян
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev