[all-commits] [llvm/llvm-project] 84df7a: [clang][modules] Do not resolve `HeaderFileInfo` e...

Jan Svoboda via All-commits all-commits at lists.llvm.org
Thu Apr 11 14:46:10 PDT 2024


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 84df7a09f8da5809b85fd097015e5ac6cc8a3f88
      https://github.com/llvm/llvm-project/commit/84df7a09f8da5809b85fd097015e5ac6cc8a3f88
  Author: Jan Svoboda <jan_svoboda at apple.com>
  Date:   2024-04-11 (Thu, 11 Apr 2024)

  Changed paths:
    M clang-tools-extra/clangd/index/SymbolCollector.cpp
    M clang/include/clang/Lex/HeaderSearch.h
    M clang/lib/Lex/HeaderSearch.cpp
    M clang/lib/Serialization/ASTWriter.cpp

  Log Message:
  -----------
  [clang][modules] Do not resolve `HeaderFileInfo` externally in `ASTWriter` (#87848)

Clang uses the `HeaderFileInfo` struct to track bits of information on
header files, which gets used throughout the compiler. We also use this
to compute the set of affecting module maps in `ASTWriter` and in the
end serialize the information into the `HEADER_SEARCH_TABLE` record of a
PCM file, allowing clients to learn about headers from the module. In
doing so, Clang asks for existing `HeaderFileInfo` for all known
`FileEntries`. Note that this asks the loaded PCM files for the
information they have on each header file in question. This seems
unnecessary: we only want to serialize information on header files that
either belong to the current module or that got included textually.
Loaded PCM files can't provide us with any useful information.

For explicit modules with lazy loading (using `-fmodule-map-file=<path>`
with `-fmodule-file=<name>=<path>`) the compiler knows about header
files listed in the module map files on the command-line. This can be a
large number.

Asking for existing `HeaderFileInfo` can trigger deserialization of
`HEADER_SEARCH_TABLE` from loaded PCM files. Keys of the on-disk hash
table consist of the header file size and modification time. However,
with explicit modules Clang zeroes out the modification time. Moreover,
if you import lots of modules, some of their header files end up having
identical sizes. This means lots of hash collisions that can only be
resolved by running the serialized filename through `FileManager` and
comparing equality of the `FileEntry`. This ends up being super
expensive, essentially re-stating lots of the transitively loaded SDK
header files.

This patch cleans up the API for getting `HeaderFileInfo` and makes sure
`ASTWriter` uses the version that doesn't ask loaded PCM files for more
information. This removes the excessive stat traffic coming from
`ASTWriter` hopefully without changing observable behavior.



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list