[PATCH] D75665: [analyzer] On-demand parsing capability for CTU

Endre Fülöp via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed May 27 04:18:11 PDT 2020


gamesh411 marked 2 inline comments as done.
gamesh411 added a comment.

The remaining documentation and test changes are also underway.



================
Comment at: clang/include/clang/CrossTU/CrossTranslationUnit.h:227
+    /// Identifier.
+    virtual LoadResultTy load(StringRef Identifier) = 0;
+    virtual ~ASTLoader() = default;
----------------
martong wrote:
> xazax.hun wrote:
> > martong wrote:
> > > xazax.hun wrote:
> > > > I am not sure if this is good design.
> > > > Here, if the meaning of the `Identifier` depends on the subclass, the caller of this method always needs to be aware of the dynamic type of the object. What is the point of a common base class if we always need to know the dynamic type?
> > > > 
> > > > Looking at the code this does not look bad. But it might be a code smell.
> > > The way how we process the `extDefMapping` file is identical in both cases. That's an index, keyed with the `USR`s of functions and then we get back a value. And the way how we use that value is different. In the PCH case that holds the path for the `.ast` file, in the ODM case that is the name of the source file which we must find in the compile db. So, I think the process of getting the AST for a USR requires the polymorphic behavior from the loaders.
> > > 
> > > We discussed other alternatives with Endre. We were thinking that maybe the `extDefMapping` file should be identical in both cases. But then we would need to add the `.ast` postfixes for the entries in the PCH case. And we cannot just do that, because we may not know if what is the correct postfix. The user may have generated `.pch` files instead. Also, we don't want to compel any Clang user to use CodeChecker (CC will always create `.ast` files). CTU should be running fine by manually executing the independent steps.
> > Let me rephrase my concerns a bit. Do we really need a polymorphic `ASTLoader` to be present for the whole analysis? Wouldn't it make more sense to always do the same thing, i.e. if we are given a pch file load it, if we are given a source file, parse it? This way we would not be restricted to on-demand or two pass ctu analysis, but we could do any combination of the two.
> > 
> Well yeah, we could do that, it is a good idea, thanks! We will consider this in the future. I like in this idea that the command line options to Clang would be simplified. But then we must be transparent and show/log the user which method we are using.
I will implement this separation by suffixes in the next change.


================
Comment at: clang/include/clang/CrossTU/CrossTranslationUnit.h:285
   public:
-    ASTUnitStorage(const CompilerInstance &CI);
+    ASTUnitStorage(CompilerInstance &CI);
     /// Loads an ASTUnit for a function.
----------------
xazax.hun wrote:
> Why is this no longer const?
Most usages of the CI require a non-const ref (this is due to lacking API support for getting analyzer options), and I found it more convenient to just take a non-const ref, instead of const_casting it away later. Not sure myself, but I think I will roll with this now (and I should definitely check the API of CompilerInstance).


================
Comment at: clang/include/clang/CrossTU/CrossTranslationUnit.h:234
+  public:
+    explicit ASTFileLoader(CompilerInstance &CI, StringRef CTUDir);
+
----------------
whisperity wrote:
> Is the `explicit` needed here?
This constructor started out with having a single param only. I agree, that `explicit` is no longer necessary.


================
Comment at: clang/include/clang/CrossTU/CrossTranslationUnit.h:260
+    /// that produce the AST used for analysis.
+    StringRef OnDemandParsingDatabase;
+
----------------
martong wrote:
> Should we rename this member to ....PathTo....InvocationList... ?
Renamed to `InvocationListFilePath`.


================
Comment at: clang/lib/CrossTU/CrossTranslationUnit.cpp:122-123
       return "Load threshold reached";
+    case index_error_code::ambiguous_compilation_database:
+      return "Compilation database contains multiple references to the same "
+             "source file.";
----------------
whisperity wrote:
> still using terms "compilation database", is this intended?
renamed


================
Comment at: clang/lib/CrossTU/CrossTranslationUnit.cpp:362
 
-  return ASTUnit::LoadFromASTFile(
-      std::string(ASTFilePath), CI.getPCHContainerOperations()->getRawReader(),
-      ASTUnit::LoadEverything, Diags, CI.getFileSystemOpts());
+  auto LoadFromFile = [this](StringRef Path) {
+    IntrusiveRefCntPtr<DiagnosticOptions> DiagOpts = new DiagnosticOptions();
----------------
balazske wrote:
> Is here a lambda necessary? I think it must not be over-used. In this case we can construct the file path as the first step and then use it to load the file, without using lambda call.
Refactored to use path concatenation.


================
Comment at: clang/lib/CrossTU/CrossTranslationUnit.cpp:581
+  SmallVector<const char *, 32> CommandLineArgs(InvocationCommand.size());
+  std::transform(InvocationCommand.begin(), InvocationCommand.end(),
+                 CommandLineArgs.begin(),
----------------
martong wrote:
> Could we avoid this transfer if InvocationList was storing `const char *` values instead of std::strings? Why can't we store `char*`s in InvocationList?
We certainly could, I think there won't be null-characters in paths so storing char* s would not lose generality. However, I think the memory management done by std::string is worth it, as it is clear where the character-data lives in memory.


================
Comment at: clang/lib/CrossTU/CrossTranslationUnit.cpp:583
+                 CommandLineArgs.begin(),
+                 [](auto &&CmdPart) { return CmdPart.c_str(); });
+
----------------
whisperity wrote:
> balazske wrote:
> > Is here a special reason to use `auto &&` type (and not `auto &` or `std::string &`)? Probably this code is faulty: The `CmdPart` is moved out and then destructed, before the content of it (where the `c_str` points to) is used?
> @balazske This is a generic lambda where `auto&&` is the universal reference. It is equivalent to writing `template <typename T> f(T&& param)`. It obeys the usual point of instantiation and reference cascading rules, if a `const std::string&` is given as `T`, `const std::string& &&` will become a single `&`.
Results of my research: In theory, the STL could do something fancy here if it wanted, but it is not the case in reality. If you want move-operations with algorithms (the ones not explicitly mentioning move in their names) you have to be explicit about it: [[ https://www.fluentcpp.com/2017/04/25/move-iterators/ | move-iterators-fluent-cpp ]]. 


================
Comment at: clang/lib/CrossTU/CrossTranslationUnit.cpp:609
+  llvm::SourceMgr SM;
+  llvm::yaml::Stream InvocationFiles(*ContentBuffer, SM);
+
----------------
whisperity wrote:
> Are the headers from which these types come from included?
Included the llvm/Support/YAMLParser.h


================
Comment at: clang/lib/CrossTU/CrossTranslationUnit.cpp:120
+    case index_error_code::ambiguous_compile_commands_database:
+      return "Compile commands database contains multiple references to the "
+             "same source file.";
----------------
whisperity wrote:
> martong wrote:
> > xazax.hun wrote:
> > > Unfortunately, this is a very common case for a large number of projects that supports multiple targets (e.g. 32 and 64 bits). Is there a plan to mitigate this problem?
> > I don't think we could do that. We need to merge ASTs of those TUs that are linkable. Otherwise the merge will be unsuccessful because of certain mismatches in Decls (structural eq will fail).
> > Even in CodeChecker we are planning to issue a hard error in these ambiguous cases, even with the PCH method too. (The error would be suppressible by the user of course, but if that is suppressed then we take no responsibility.)
> More clever logging of link commands and creating separate CDBs for each target or each binary could help this cause, but that needs infrastructural capabilities from both the build system and the analysis driver (i.e. CodeChecker).
I have solved the problem of supporting multiple architectures by offloading the responsibility to the user (to the tool driving the analysis). The invocation list file is now supposed to contain only invocations with the same arch/target and this should also be the same as the arch/target of the main TU analyzed. If there is still ambiguity after this separation, this error should nevertheless be emitted ( IMHO ).


================
Comment at: clang/test/Analysis/Inputs/ctu-other.c:34-38
+// TODO: Support the GNU extension asm keyword as well.
+// Example using the GNU extension: asm("mov $42, %0" : "=r"(res));
 int inlineAsm() {
   int res;
+  __asm__("mov $42, %0"
----------------
whisperity wrote:
> Possibly unrelated change?
Funny enough, this is very relevant. Dumping the AST does not errors out on this construction, however, the parser in on-demand mode does. This __asm__ syntax is however supported by both. To document this, I have added a TODO as well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75665/new/

https://reviews.llvm.org/D75665





More information about the cfe-commits mailing list