[cfe-dev] Many .c files as input to scan-build
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Fri Jul 12 16:13:52 PDT 2019
The problem you're describing is known as "cross translation unit
analysis". The Static Analyzer is part of Clang, and the primary purpose
of Clang is to compile one translation unit at a time, so the Static
Analyzer inherits the same limitation.
Doing "unity builds" is one way around this problem. This wouldn't scale
to huge projects and it's not that trivial to concatenate all the files,
depending on the build system (the project may use freshly compiled
executables to autogenerate source code for subsequent passes or compile
the same source code for different architectures).
Note that even if you do a unity build, the time it takes for the Static
Analyzer to perform analysis of a certain quality would grow
non-linearly (in fact, "exponentially" would way more accurate). Even
though all of the source code is available, making proper use of this
information to achieve analysis quality similar to that of a smaller
codebase would be impossible. You will be paying with loss of coverage,
the analyzer will give up sooner and only find more shallow bugs.
There's an effort to perform cross-translation-unit analysis through
ASTImporter - the same facility that supports executing arbitrary
expressions in LLDB.This allows importing only small chunks of the
program as needed without constructing a whole-program AST, but
generally i feel it's not really that much better than unity builds. See
CTU threads on this mailing list. They report success when it comes to
overall usefulness of the Static Analyzer, so i guess it's worth it to
think in that direction, but it's most likely less worth it than using
more expensive and sophisticated but more scalable techniques such as
summary-based analysis.
On 7/11/19 9:56 AM, Дилян Палаузов via cfe-dev wrote:
> Hello,
>
> the clang static analyzer does a good job, performing on the individual source files. But it with a single .c/.cpp file
> as input it cannot catch all codepaths of a program having many source files.
>
> In particular, using GLibs g_hash_table_new allocates memory and g_hash_table_destroy() frees the memory, but scan-build
> does not know this and does not check for it.
>
> I mean, scan-build provides different results for the same program, depending on how source code is split into different
> files.
>
> One way to solve this is to create a huge .h file containing recursively all function definitions needed by a .c/.cpp
> file, including sources from libraries and feeding this to scan-build.
>
> It would be however easier, if scan-build is extended to accept as input many .c and .cpp files, glue them internally
> into one and then handle that big file as input.
>
> This will help finding troubles, that are split between source files.
>
> Regards
> Дилян
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
More information about the cfe-dev
mailing list