[PATCH] D51314: Parse compile commands lazily in InterpolatingCompilationDatabase
Sam McCall via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Aug 28 01:36:22 PDT 2018
sammccall added a comment.
Thanks for finding this problem, this fix *mostly* looks good (though I think we can probably drop memoization).
I'm a bit uncomfortable about the places where we need the type, because this is the only thing forcing us to parse before we've picked a command to transfer, and the number of commands we need to parse is data-dependent and hard to reason about.
Let me think about this a little - I suspect slightly more invasive changes (change the concept of type, tweak the heuristics, or do a "lightweight parse" to get the type) might make this cleaner and performance more predictable.
================
Comment at: lib/Tooling/InterpolatingCompilationDatabase.cpp:440
Result.emplace_back(std::move(Command));
- if (Result.back().Type == types::TY_INVALID)
- Result.pop_back();
----------------
ilya-biryukov wrote:
> We can't look at 'Type' at this point anymore, because it needs parsing of TranserableCommands. Not sure what's the best way to replace it. @sammccall, any ideas?
>
So filtering out this has a couple of effects
- it's a performance optimization (don't bother indexing filenames for useless files). We don't need this
- it prevents a TY_INVALID command being chosen for transfer. I'm not sure whether this would occur often enough to be a problem in practice.
================
Comment at: lib/Tooling/InterpolatingCompilationDatabase.cpp:133
+ assert(TraitsComputed && "calling transferTo on moved-from object");
+ const CommandTraits &T = getTraits();
+ CompileCommand Result = T.Cmd;
----------------
I think you're overthinking things with the memoization here (of course I say this as the person who underthought it!)
AIUI, the problem is that *eagerly* parsing all the compile commands takes 3x as long as reading them, which hurts startup time with big `compile_commands.json`.
But I think we can afford to just parse them when `transferTo` is called, without memoization. (Remember we only hit this code path when querying a file *not in the CDB*, so it should never get called in a tight loop).
The benefit of slightly reducing the latency of`getCompileCommand` for unknown files when we happen to pick the same template file for the second time... it's unclear to me, and the code would be easier to follow without it.
================
Comment at: lib/Tooling/InterpolatingCompilationDatabase.cpp:159
+ llvm::StringRef filename() const {
+ assert(TraitsComputed && "calling filename on moved-from object");
+ return OriginalCmd.Filename;
----------------
Is this so important to dynamically check? Most types don't.
================
Comment at: lib/Tooling/InterpolatingCompilationDatabase.cpp:171
+ // compute, so we only compute on first access.
+ struct CommandTraits {
+ // Flags that should not apply to all files are stripped from CommandLine.
----------------
Traits is a bit vague, and a bit template-nightmare-y!
maybe ParsedCommand?
================
Comment at: lib/Tooling/InterpolatingCompilationDatabase.cpp:383
S.Preferred = PreferredLanguage == types::TY_INVALID ||
- PreferredLanguage == Commands[S.Index].Type;
+ PreferredLanguage == Commands[S.Index].type();
S.Points = Candidate.second;
----------------
I think this is going to force parsing of all candidates that get any points at all, with a flat directory structure this could be quite a lot :-(
================
Comment at: lib/Tooling/InterpolatingCompilationDatabase.cpp:383
S.Preferred = PreferredLanguage == types::TY_INVALID ||
- PreferredLanguage == Commands[S.Index].Type;
+ PreferredLanguage == Commands[S.Index].type();
S.Points = Candidate.second;
----------------
sammccall wrote:
> I think this is going to force parsing of all candidates that get any points at all, with a flat directory structure this could be quite a lot :-(
ah, now I see that the memoization also allows us to pretend that this is an eagerly computed value, without considering exactly when it's computed.
I'm not sure I like this if we do consider it performance sensitive - it obfuscates exactly which set of commands we parse, it would be nice if we were upfront about this.
Repository:
rC Clang
https://reviews.llvm.org/D51314
More information about the cfe-commits
mailing list