[cfe-dev] Many .c files as input to scan-build

Thu Jul 18 03:13:10 PDT 2019

Hello Artem,

thanks for your answer.

For combining several source files into one, that is then analyzed by scan-build, 
https://clang-analyzer.llvm.org/scan-build.html in fact suggests:

> It is also possible to use scan-build to analyze specific files:
> $ scan-build gcc -c t1.c t2.c
> This example causes the files t1.c and t2.c to be analyzed.

My reading is that on this call scan-build generates a single report, resulted by merging t1.c and t2.c and then
analyzing the result, since the gcc call generates a single file.

Is the size of the input to the analyzer currently inversely proportional to the quality of the results?

I ask the last question, since you wrote, that for unity builds the compiler would give up sooner and only find more
shallow bugs.

Regards
  Дилян

On Fri, 2019-07-12 at 16:13 -0700, Artem Dergachev wrote:
> The problem you're describing is known as "cross translation unit 
> analysis". The Static Analyzer is part of Clang, and the primary purpose 
> of Clang is to compile one translation unit at a time, so the Static 
> Analyzer inherits the same limitation.
> 
> Doing "unity builds" is one way around this problem. This wouldn't scale 
> to huge projects and it's not that trivial to concatenate all the files, 
> depending on the build system (the project may use freshly compiled 
> executables to autogenerate source code for subsequent passes or compile 
> the same source code for different architectures).
> 
> Note that even if you do a unity build, the time it takes for the Static 
> Analyzer to perform analysis of a certain quality would grow 
> non-linearly (in fact, "exponentially" would way more accurate). Even 
> though all of the source code is available, making proper use of this 
> information to achieve analysis quality similar to that of a smaller 
> codebase would be impossible. You will be paying with loss of coverage, 
> the analyzer will give up sooner and only find more shallow bugs.
> 
> There's an effort to perform cross-translation-unit analysis through 
> ASTImporter - the same facility that supports executing arbitrary 
> expressions in LLDB.This allows importing only small chunks of the 
> program as needed without constructing a whole-program AST, but 
> generally i feel it's not really that much better than unity builds. See 
> CTU threads on this mailing list. They report success when it comes to 
> overall usefulness of the Static Analyzer, so i guess it's worth it to 
> think in that direction, but it's most likely less worth it than using 
> more expensive and sophisticated but more scalable techniques such as 
> summary-based analysis.
> 
> 
> On 7/11/19 9:56 AM, Дилян Палаузов via cfe-dev wrote:
> > Hello,
> > 
> > the clang static analyzer does a good job, performing on the individual source files.  But it with a single .c/.cpp file
> > as input it cannot catch all codepaths of a program having many source files.
> > 
> > In particular, using GLibs g_hash_table_new allocates memory and g_hash_table_destroy() frees the memory, but scan-build
> > does not know this and does not check for it.
> > 
> > I mean, scan-build provides different results for the same program, depending on how source code is split into different
> > files.
> > 
> > One way to solve this is to create a huge .h file containing recursively all function definitions needed by a .c/.cpp
> > file, including sources from libraries and feeding this to scan-build.
> > 
> > It would be however easier, if scan-build is extended to accept as input many .c and .cpp files, glue them internally
> > into one and then handle that big file as input.
> > 
> > This will help finding troubles, that are split between source files.
> > 
> > Regards
> >    Дилян
> > 
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev