[PATCH] D30691: [analyzer] Support for naive cross translational unit analysis

Fri Apr 21 08:24:17 PDT 2017

xazax.hun marked 6 inline comments as done and an inline comment as not done.
xazax.hun added a comment.

In https://reviews.llvm.org/D30691#731617, @zaks.anna wrote:

> I agree that scan-build or scan-build-py integration is an important issue to resolve here. What I envision is that users will just call scan-build and pass -whole-project as an option to it. Everything else will happen automagically:)

We had a skype meeting with Laszlo. He had no objections adding this to scan-build-py.

> Another concern is dumping the ASTs to the disk. There are really 2 concerns here. First one is the high disk usage, which is a blocker for having higher adoption. Second is that I am not sure how complete and reliable AST serialization and deserialization are. Are those components used for something else that is running in production or are we just utilizing -ast-dump, which is used for debugging? I do not quite understand why AST serialization is needed at all. Can we instead recompile the translation units on demand into a separate ASTContext and then ASTImport?

According to our measurements there are some optimization possibilities in the serialized AST format. Even though it might not get us an order of magnitude improvement, it can be still very significant. The main reason, why the AST dumps are so large: duplicated AST parts from the headers. Once modules are supported and utilized, the size could be reduced once again significantly.
The serialization/deserialization of AST nodes should be very reliable. The same mechanisms are used for precompiled headers and modules.

AST serialization is there to avoid the time overhead of reparsing translation units. This can be a big win for translation units that have metaprograms. Technically, I can not see any obstackles in reparsing a translation unit on demand, but we did not measure how much slower would that be. 
Having a serialized format could also make it possible to only load fragmets of the AST into the memory instead of the whole translation unit. (This feature is currently not utilized by this patch.)

================
Comment at: include/clang/AST/Decl.h:53
 class VarTemplateDecl;
+class CompilerInstance;

----------------
dcoughlin wrote:
> Is this needed? It seems like a layering violation.
Good catch, this is redundant. I will remove this in the next patch. 

================
Comment at: include/clang/AST/Mangle.h:59
+  // the static analyzer.
+  bool ShouldForceMangleProto;

----------------
dcoughlin wrote:
> I'm pretty worried about using C++ mangling for C functions. It doesn't ever seem appropriate to do so and it sounds like it is papering over problems with the design.
> 
> Some questions:
> - How do you handle when two translation units have different C functions with the same type signatures? Is there a collision? This can arise because of two-level namespacing or when building multiple targets with the same CTU directory.
> - How do you handle when a C function has the same signature as a C++ function. Is there a collision when you mangle the C function?
I agree that using C++ mangling for C+ is not the nicest solution, and I had to circumvent a problem in the name mangler for C prototypes.

In case a mangled name is found in multiple source files, it will not be imported. This is the way how collisions handled regardless of being C or C++ functions. 
The target arch is appended to the mangled name to support the cross compilation scenario. Currently we do not add the full triple, but this could be done.

An alternative solution would be to use USRs instead of mangled names but we did not explore this option in depth yet. 

================
Comment at: lib/AST/ASTContext.cpp:1481
+  assert(!FD->hasBody() && "FD has a definition in current translation unit!");
+  if (!FD->getType()->getAs<FunctionProtoType>())
+    return nullptr; // Cannot even mangle that.
----------------
a.sidorin wrote:
> This needs a FIXME because currently many functions cannot be analyzed because of it.
What do you mean here?

https://reviews.llvm.org/D30691