[cfe-dev] RFC: prototype of clang-scan-deps, faster dependency scanning tool for explicit modules and clangd

Alex L via cfe-dev cfe-dev at lists.llvm.org
Fri Dec 7 17:10:30 PST 2018

Sorry for the late replies, just got back to working again on this very
recently. I posted a patch for the source minimization
https://reviews.llvm.org/D55463 as part of us starting our upstreaming work.

+ Michael (who will be working on the explicit modules support).

On Tue, 6 Nov 2018 at 10:22, David Blaikie <dblaikie at gmail.com> wrote:

> Thanks for sending this out!
> Yeah, I'm super interested in how (future standard) C++ modules will
> interact with build systems, as it's unlikely to be feasible to use an
> implicit compilation model (in part because of the code generation/linkage
> requirements - you could put everything from a C++ Modules definition in
> comdats, etc as is done for headers today (rather than in separate object
> files), but not really how it's meant to work).
> All models boil down to something like this - the build system having some
> explicit knowledge (through library dependencies within a project) and
> having to do some discovery (to find external dependencies (the standard
> library (if/once modularized and used as such), other external libraries
> written using modules) and to reduce internal dependencies (not all code in
> one library depends on all the libraries that library depends on - so by
> discovering the specific modular imports used in a given module, that
> module may be able to be built sooner (when only some of its libraries
> dependencies have been built, because it only needs that subset)) before
> executing any compilation steps (& then, ideally, passing around the
> compilation inputs/outputs rather than relying on the compiler to discover
> them itself in a cache directory or the like).
> You mentioned a few performance metrics
> Up to 10x speedup in non-modular dependency scanning - what do you mean by
> non-modular dependency scanning? (what's the non-modular part - in contrast
> to?)

By non-modular dependency scanning I mean getting the dependency list of a
regular compilation, so figuring out all of the headers included for a
compilation that doesn't use -fmodules.

> 4x when run on the first 1000 files in Clang's compilation database,
> compared to clang -Eonly - so this is running the whole tool, including
> generating the trimmed preprocessed files, and then reading those to
> discover the header module dependencies, compared to running -Eonly, then
> scanning those files? & the output is currently in what form? .d-like files?

Yes, the 4x number comes from a comparison in the time the tool takes to
preprocess all the files with source minimization, building all of the
implicit modules, and subsequently creating the list of dependencies for
all compilations in a compilation database to the time that it takes to run
parallel -Eonly clang invocations with the regular implicit modules path
for dependency discovery.

The output is either printed out by the tool or saved to .d files.
Ultimately an explicit module builder will consume it in a different way

> You mention relying on the compilation database for discovering the files
> to run over - is this the long term goal/design, or a current stepping
> stone? I was about to say that seems circular (thinking that the
> compiler/compilation phase generates the compilation database) but then
> realized/remembered that it's the build system that generates that, not the
> compiler, so you can have/use/run over the compilation database before
> compilation has begun. Sounds good. So the build system would have to have
> a phase that runs after generating the compilation database that runs this
> tool, then adds the module compilations produced by this tool to the list
> of commands it will execute (& probably also adds them back into the
> compilation database, too, really).

We rely on the CDB solely for the dependency discovery to simplify testing
and integration with existing project builds. It's definitely not the final
goal of what an integration with a build system should look like, but it
can be a way for the build system to feed in the compilations to the tool
if it desires to do so.

> So, as you mentioned (maybe in the phab review), the format of the output
> of this tool is still unknown, but the input is currently a (currently the
> classic json, I assume - but if the tool uses the compilation database
> access APIs, other sources implemented in that API could be used)
> compilation database - cool cool.
> Thanks again!
> - Dave
> On Tue, Oct 16, 2018 at 6:53 PM Alex L <arphaman at gmail.com> wrote:
>> Hi,
>> Bruno (CCed), Duncan (CCed) and I have been exploring if we can migrate
>> some of our clients to explicit modules. As part of this work Duncan and I
>> developed a new prototype dependency scanning service tool
>> (clang-scan-deps) that computes the set of file dependencies for a
>> particular compiler invocation using some optimizations that are outlined
>> below. This tool makes the non-modular dependency scanning up to 10 times
>> faster for particular workloads (e.g. llc target, 1542 C++ files) on one of
>> our machines, when compared to parallel invocations of clang with -Eonly.
>> We are still in the early stages of proper modules support, but our initial
>> crude prototype can get up to 4x when run on the first 1000 files from
>> clang’s compilation database for a build of LLVM with modules turned on.
>> We still run the full Clang preprocessor. Here’s what we do to reduce its
>> workload:
>>    - Minimize sources by stripping away unused tokens. We keep only the
>>    interesting PP directives (#define, #if, #include, etc.), i.e. those that
>>    might impact the set of dependencies.
>>    - Assume the filesystem is immutable for one run of the service, and
>>    cache the files and their minimized contents in memory in a global cache.
>>    - Skip over excluded preprocessor ranges by bumping up the buffer
>>    pointer in the lexer instead of lexing the skipped tokens.
>> We intend to upstream this service in the upcoming months. We also would
>> like to integrate this service into Clangd as part of our migration to
>> Clangd to help us determine a good compilation command for a header file
>> from a set of known compilation invocations.
>> I posted a very rough WIP patch on Phabricator (
>> https://reviews.llvm.org/D53354). It’s based on LLVM checkout r343343.
>> Please take a look if you’re interested.
>> Duncan, Bruno and I will be at the LLVM dev meeting. We are interested in
>> discussing this prototype and collecting feedback from anyone who might be
>> interested in this work.
>> Thanks,
>> Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181207/e52399aa/attachment.html>

More information about the cfe-dev mailing list