[all-commits] [llvm/llvm-project] 028717: [clang][deps] Include canonical invocation in Cont...

Thu Jul 28 12:24:31 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 02871701400253a49de502a5fef770f92772f6bc
      https://github.com/llvm/llvm-project/commit/02871701400253a49de502a5fef770f92772f6bc
  Author: Ben Langmuir <blangmuir at apple.com>
  Date:   2022-07-28 (Thu, 28 Jul 2022)

  Changed paths:
    M clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h
    M clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
    A clang/test/ClangScanDeps/modules-context-hash-ignore-macros.c
    A clang/test/ClangScanDeps/modules-context-hash-module-map-path.c
    A clang/test/ClangScanDeps/modules-context-hash-outputs.c
    A clang/test/ClangScanDeps/modules-context-hash-warnings.c

  Log Message:
  -----------
  [clang][deps] Include canonical invocation in ContextHash

The "strict context hash" is insufficient to identify module
dependencies during scanning, leading to different module build commands
being produced for a single module, and non-deterministically choosing
between them. This commit switches to hashing the canonicalized
`CompilerInvocation` of the module. By hashing the invocation we are
converting these from correctness issues to performance issues, and we
can then incrementally improve our ability to canonicalize
command-lines.

This change can cause a regression in the number of modules needed. Of
the 4 projects I tested, 3 had no regression, but 1, which was
clang+llvm itself, had a 66% regression in number of modules (4%
regression in total invocations). This is almost entirely due to
differences between -W options across targets.  Of this, 25% of the
additional modules are system modules, which we could avoid if we
canonicalized -W options when -Wsystem-headers is not present --
unfortunately this is non-trivial due to some warnings being enabled in
system headers by default. The rest of the additional modules are mostly
real differences in potential warnings, reflecting incorrect behaviour
in the current scanner.

There were also a couple of differences due to `-DFOO`
`-fmodule-ignore-macro=FOO`, which I fixed here.

Since the output paths for the module depend on its context hash, we
hash the invocation before filling in outputs, and rely on the build
system to always return the same output paths for a given module.

Note: since the scanner itself uses an implicit modules build, there can
still be non-determinism, but it will now present as different
module+hashes rather than different command-lines for the same
module+hash.

Differential Revision: https://reviews.llvm.org/D129884