[PATCH] D104465: [clang][deps] Prevent PCH validation failures by padding minimized files

Jan Svoboda via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Jun 18 07:45:20 PDT 2021


jansvoboda11 added a comment.

Thanks for sharing your ideas! I've left my initial thoughts below, but I want to revisit this and think about it some more.

In D104465#2825343 <https://reviews.llvm.org/D104465#2825343>, @dexonsmith wrote:

> I'm not sure this is the right approach.
>
> - Don't PCHs sometimes validate based on hashes of inputs instead of size? At least, that has been discussed, and this patch would block making the change.

The hash-based validation only kicks in when modification times don't match. The size of input files is always* validated <https://github.com/llvm/llvm-project/blob/cbfb124/clang/lib/Serialization/ASTReader.cpp#L2383-L2452>.

> - Can PCHs embed inputs, like modules can? How would that work (or not) minimized sources generally?

I think PCHs can embed inputs the same way modules can. When reading an AST file, the embedded input files get registered in the current compilation along with their buffers via `SourceManager::createFileID`.
But it doesn't seem to be propagated to the `FileManager` level, which the input file size check uses. I might need to experiment with this a bit to fully understand how this works in practice.

> - What happens if the PCH has a delayed diagnostic, triggered when parsing the next thing? Will the fix-it point in the wrong place? (Are there other things that depend on reading the original sources when reading a PCH?)

Can you point me to the code that handles delayed diagnostics? I can't find anything related in `ASTWriter`.

> A few other ideas:
>
> 1. Disable PCH validation when minimizing. (Is that less robust than your current workaround? If so, can you explain why?)

I think PCH validation (and detection of out-of-date input files) is still useful for the dependency scanner as it might help debugging build failures (e.g. if the input files changed between explicit PCH build and TU dependency scan).
Compared to disabling PCH validation, padding of minimized files is fairly local change that isn't susceptible to concurrency bugs and integrates with the rest of the implicit build infrastructure fairly well IMO.

> 2. Use the original PCH header in the scanning `-cc1`s (translate `-include-pch` to `-include`) and switch back in the generated `-cc1`s (back to `-include-pch`).

The two separate dependency scans (one for PCH, one for TU) could end up using different context hashes and we don't have an easy way to detect this.
If they both use a common module A, the explicitly-built PCH would contains version 1 of A, while the TU would attempt to build with version 2. This is currently not supported in explicit builds AFAIK.
The current implementation works around this by always choosing the module present in the explicitly-built module. How could we make this situation work here?

> 3. Embed instructions in the PCH for how to build it, and make a "minimized" version of the PCH.

That could work. This makes me think if we could emit a minimized PCH during the PCH scan and just reuse it in the TU scan (similarly how we're currently reusing the explicitly-built PCH, but with the file size issues gone). Though passing state from one dependency scan to another through the filesystem might be unreliable.

In D104465#2825786 <https://reviews.llvm.org/D104465#2825786>, @dexonsmith wrote:

> Two more options to consider:
>
> 4. If a compilation uses a PCH, use the original files for any input file transitively referenced by the PCH. Can be done by using a filesystem overlay that skips over the minimization layer for those files, or could tell the minimization layer not to minimize those files. You can figure out the files by adding a free function that preloads the PCH and extracts out all the input files.

I tend to prefer this option, since it integrates fairly well with the current way we build PCHs in XCBuild.
I have a WIP patch that implements this: D104536 <https://reviews.llvm.org/D104536>. I plan to run some benchmarks to see the performance impact compared to minification + padding.

> 5. Add support for building/using a PCH, and only support PCHes that are built this way. E.g., add a `-include-pch-auto` option or something. Scanner would use `-include`, spit out a command for building a PCH using explicit modules, and the generated `-cc1`s would use `-include-pch` to refer to it. This way all the modules are generated "in house".

This would be the ideal solution IMO, but needs build system support.

----

BTW: I noticed the file size stored in AST also plays role when deciding whether to import or include header. `HeaderSearch::findModuleForHeader` calls `HeaderSearch::getExistingFileInfo` which pulls information (such as the file size) from `ExternalSource` (a.k.a. `ASTReader`).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104465/new/

https://reviews.llvm.org/D104465



More information about the cfe-commits mailing list