[cfe-dev] RFC: prototype of clang-scan-deps, faster dependency scanning tool for explicit modules and clangd

Tue Nov 6 10:22:40 PST 2018

Thanks for sending this out!

Yeah, I'm super interested in how (future standard) C++ modules will
interact with build systems, as it's unlikely to be feasible to use an
implicit compilation model (in part because of the code generation/linkage
requirements - you could put everything from a C++ Modules definition in
comdats, etc as is done for headers today (rather than in separate object
files), but not really how it's meant to work).

All models boil down to something like this - the build system having some
explicit knowledge (through library dependencies within a project) and
having to do some discovery (to find external dependencies (the standard
library (if/once modularized and used as such), other external libraries
written using modules) and to reduce internal dependencies (not all code in
one library depends on all the libraries that library depends on - so by
discovering the specific modular imports used in a given module, that
module may be able to be built sooner (when only some of its libraries
dependencies have been built, because it only needs that subset)) before
executing any compilation steps (& then, ideally, passing around the
compilation inputs/outputs rather than relying on the compiler to discover
them itself in a cache directory or the like).

You mentioned a few performance metrics
Up to 10x speedup in non-modular dependency scanning - what do you mean by
non-modular dependency scanning? (what's the non-modular part - in contrast
to?)
4x when run on the first 1000 files in Clang's compilation database,
compared to clang -Eonly - so this is running the whole tool, including
generating the trimmed preprocessed files, and then reading those to
discover the header module dependencies, compared to running -Eonly, then
scanning those files? & the output is currently in what form? .d-like files?

You mention relying on the compilation database for discovering the files
to run over - is this the long term goal/design, or a current stepping
stone? I was about to say that seems circular (thinking that the
compiler/compilation phase generates the compilation database) but then
realized/remembered that it's the build system that generates that, not the
compiler, so you can have/use/run over the compilation database before
compilation has begun. Sounds good. So the build system would have to have
a phase that runs after generating the compilation database that runs this
tool, then adds the module compilations produced by this tool to the list
of commands it will execute (& probably also adds them back into the
compilation database, too, really).

So, as you mentioned (maybe in the phab review), the format of the output
of this tool is still unknown, but the input is currently a (currently the
classic json, I assume - but if the tool uses the compilation database
access APIs, other sources implemented in that API could be used)
compilation database - cool cool.

Thanks again!

- Dave

On Tue, Oct 16, 2018 at 6:53 PM Alex L <arphaman at gmail.com> wrote:

> Hi,
>
>
> Bruno (CCed), Duncan (CCed) and I have been exploring if we can migrate
> some of our clients to explicit modules. As part of this work Duncan and I
> developed a new prototype dependency scanning service tool
> (clang-scan-deps) that computes the set of file dependencies for a
> particular compiler invocation using some optimizations that are outlined
> below. This tool makes the non-modular dependency scanning up to 10 times
> faster for particular workloads (e.g. llc target, 1542 C++ files) on one of
> our machines, when compared to parallel invocations of clang with -Eonly.
> We are still in the early stages of proper modules support, but our initial
> crude prototype can get up to 4x when run on the first 1000 files from
> clang’s compilation database for a build of LLVM with modules turned on.
>
>
> We still run the full Clang preprocessor. Here’s what we do to reduce its
> workload:
>
>    - Minimize sources by stripping away unused tokens. We keep only the
>    interesting PP directives (#define, #if, #include, etc.), i.e. those that
>    might impact the set of dependencies.
>    - Assume the filesystem is immutable for one run of the service, and
>    cache the files and their minimized contents in memory in a global cache.
>    - Skip over excluded preprocessor ranges by bumping up the buffer
>    pointer in the lexer instead of lexing the skipped tokens.
>
>
> We intend to upstream this service in the upcoming months. We also would
> like to integrate this service into Clangd as part of our migration to
> Clangd to help us determine a good compilation command for a header file
> from a set of known compilation invocations.
>
>
> I posted a very rough WIP patch on Phabricator (
> https://reviews.llvm.org/D53354). It’s based on LLVM checkout r343343.
> Please take a look if you’re interested.
>
> Duncan, Bruno and I will be at the LLVM dev meeting. We are interested in
> discussing this prototype and collecting feedback from anyone who might be
> interested in this work.
>
>
> Thanks,
>
> Alex
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181106/7b4b9756/attachment.html>