[clang] [Serialization] Load Specializations Lazily (PR #76774)

Mon Jan 15 23:51:46 PST 2024

vgvassilev wrote:

> @dwblaikie do you have any memory of what caused regressions on the last try in the first place? Do we happen to do more work for compiles with many files because we accidentally push some computations further down the dependency tree and have to duplicate it instead of reading results from some common module?

The first version of that patch worked well. It reduced the memory footprint of the compiler jobs by roughly 30% for the larger workflows, however, we used to store the template specialization argument hashes into an array. That became a problem for Google as it has thousands of modules the compile-times slowed down because the search became quadratic. Some more details were given by Richard here: https://reviews.llvm.org/D41416#968968.

This patch stores the template specialization arguments into a hash table and the search becomes constant (or close to a constant with the on-disk hash table we use). However, we still wanted to hear back from you guys before moving forward.

In terms of debugability, I think it does not make things worse than anything else in clang modules. For example, we use the same approach for handling DeclContext's identifier tables across modules. Every module has a mapping between a DeclContext and a set of identifiers. The on-disk hash table collects the identifiers for the same entities and calls condense into a single hashtable if a threshold value is exceeded. We model the template specializations in a similar way. In case there is a template argument collision we deserialize both specializations and let clang decide which one needs to be used. In turn, that does not affect correctness but does require some more work which is fine compared to the status quo.

In terms of correctness, our internal forks have https://reviews.llvm.org/D41416 for years it worked well for us on a code bases ~50MLOC. I expect this patch to improve the build times of the sparsely used modules at Google dramatically.

https://github.com/llvm/llvm-project/pull/76774